WO2024055320A1 - Gene sequencing method, apparatus and device, and medium - Google Patents

Gene sequencing method, apparatus and device, and medium Download PDF

Info

Publication number
WO2024055320A1
WO2024055320A1 PCT/CN2022/119453 CN2022119453W WO2024055320A1 WO 2024055320 A1 WO2024055320 A1 WO 2024055320A1 CN 2022119453 W CN2022119453 W CN 2022119453W WO 2024055320 A1 WO2024055320 A1 WO 2024055320A1
Authority
WO
WIPO (PCT)
Prior art keywords
sample
sequencing
sequence
barcode
gene
Prior art date
Application number
PCT/CN2022/119453
Other languages
French (fr)
Chinese (zh)
Inventor
江遥
卢昕
龚梅花
阚飙
梁鑫明
何继伟
李臻鹏
唐岳
林颖
王乐
蒋慧
黄勇
张黎
施建文
孙敬
喻志学
董涪
李倩
张希雯
饶俊华
黄顺楷
Original Assignee
中国疾病预防控制中心传染病预防控制所
武汉华大智造科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国疾病预防控制中心传染病预防控制所, 武汉华大智造科技有限公司 filed Critical 中国疾病预防控制中心传染病预防控制所
Priority to PCT/CN2022/119453 priority Critical patent/WO2024055320A1/en
Publication of WO2024055320A1 publication Critical patent/WO2024055320A1/en

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12MAPPARATUS FOR ENZYMOLOGY OR MICROBIOLOGY; APPARATUS FOR CULTURING MICROORGANISMS FOR PRODUCING BIOMASS, FOR GROWING CELLS OR FOR OBTAINING FERMENTATION OR METABOLIC PRODUCTS, i.e. BIOREACTORS OR FERMENTERS
    • C12M1/00Apparatus for enzymology or microbiology
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12MAPPARATUS FOR ENZYMOLOGY OR MICROBIOLOGY; APPARATUS FOR CULTURING MICROORGANISMS FOR PRODUCING BIOMASS, FOR GROWING CELLS OR FOR OBTAINING FERMENTATION OR METABOLIC PRODUCTS, i.e. BIOREACTORS OR FERMENTERS
    • C12M1/00Apparatus for enzymology or microbiology
    • C12M1/34Measuring or testing with condition measuring or sensing means, e.g. colony counters
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12MAPPARATUS FOR ENZYMOLOGY OR MICROBIOLOGY; APPARATUS FOR CULTURING MICROORGANISMS FOR PRODUCING BIOMASS, FOR GROWING CELLS OR FOR OBTAINING FERMENTATION OR METABOLIC PRODUCTS, i.e. BIOREACTORS OR FERMENTERS
    • C12M1/00Apparatus for enzymology or microbiology
    • C12M1/36Apparatus for enzymology or microbiology including condition or time responsive control, e.g. automatically controlled fermentors
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing

Definitions

  • the invention belongs to the technical field of gene sequencing, and specifically relates to a gene sequencing method, device, equipment and medium.
  • Gene sequencing is a new type of genetic testing technology that can analyze and determine the complete sequence of genes from blood or saliva to predict the possibility of suffering from various diseases, individual behavioral characteristics and reasonable behavior.
  • both single-end sequencing and paired-end sequencing take a long time.
  • PE100 sequencing takes more than 24 hours
  • SE100 sequencing takes about 12 hours.
  • customers have to wait for a long time to obtain the report results. Poor customer experience.
  • This application provides a gene sequencing method, device, equipment and medium to solve the technical problems in the prior art that the barcode tag sequencing is placed at the end of the sequencing cycle, which results in a long time to obtain the report results and poor customer experience.
  • the first aspect provides a gene sequencing method, including:
  • the gene sample to be detected and the preset read length wherein the gene sample includes at least one sample, any sample includes at least one short sequence, and each short sequence includes the gene sequence to be detected and at most two barcodes.
  • Tag in the case where a short sequence includes the barcode tag, the position of the at least one barcode tag in the short sequence is located before the position of the gene sequence;
  • the intermediate stage sequencing result data of each short sequence in each sample is sent to the target server, so that the target server performs data analysis on the intermediate stage sequencing result data of each short sequence in each sample to obtain an intermediate stage detection report.
  • determining the sample type to which each sample contained in the genetic sample belongs includes:
  • the genetic sample includes one sample, it is determined that the sample belongs to the non-barcode single sample type;
  • the genetic sample includes multiple samples
  • if the short sequence of the sample includes a barcode tag it is determined that the sample belongs to the single-barcode multiple-sample type, and if If the short sequence of the sample includes two barcode tags located on the same strand, it is determined that the sample belongs to the multi-sample type with dual barcodes on a single strand. If the short sequence of the sample includes two barcode tags located on two strands , it is determined that the sample belongs to the multi-sample type with dual barcodes on both strands.
  • the sequencing sequence corresponding to the barcode-free single sample type is: sequencing the gene sequence in each short sequence of the sample under the barcode-free single sample type. ;
  • the sequencing sequence corresponding to the single barcode multiple sample type is: sequence the barcode tags in each short sequence of the sample under the single barcode multiple sample type, and after the barcode tag sequencing is completed, sequence the single barcode diverse The gene sequence in each short sequence of samples under this type is sequenced;
  • the sequencing sequence corresponding to the double barcode in the single-stranded multi-sample type is: sequence the two barcode tags in each short sequence of the sample in the single-stranded multi-sample type. After the sequencing of the two barcode tags is completed, sequence the gene sequence in each short sequence of the sample under the single-stranded multi-sample type of the double barcode;
  • the sequencing sequence corresponding to the double barcode in the double-stranded multi-sample type is: sequence the first barcode tag in each short sequence of the sample in the double-stranded multi-sample type, and then After the sequencing of the first barcode tag is completed, the gene sequence in each short sequence of the double-stranded multi-sample type sample of the double barcode is sequenced. After the gene sequence sequencing is completed, the double barcode is sequenced in the double-stranded multi-sample type. The second barcode tag in each short sequence of the sample under the chain's multi-sample type is sequenced.
  • the length of the barcode primer used when sequencing each short sequence of any sample under multiple sample types is smaller than the length of the historical barcode primer, wherein the diverse This type includes the single-barcode multi-sample type, the double-barcode multi-sample type on a single strand, and the double-barcode multi-sample type on double strands.
  • the preset read length includes at least one read length.
  • the intermediate stage sequencing result data of each short sequence in each sample is sent to the target server, including :
  • the intermediate stage sequencing result data corresponding to the plurality of samples is sent to the target server.
  • the gene sequencing method further includes:
  • the complete sequencing result data of each short sequence in each sample is sent to the target server, so that the target server performs data analysis on the complete sequencing result data of each short sequence in each sample and obtains a complete detection report.
  • the intermediate stage detection report includes the intermediate stage quality control results and the intermediate stage identification results of each sample, and the complete detection report includes the complete test results of each sample. Quality control results, complete identification results, complete assembly results and complete traceability results;
  • the intermediate stage quality control results and complete quality control results of a sample are used to reflect the short sequences in the sample whose quality is higher than the preset quality threshold, and the intermediate stage identification results and complete identification results of a sample are both used to reflect the sample.
  • Pathogen concentration information the complete assembly result of a sample is used to reflect the recombinant sample obtained by assembling all short sequences of the sample, and the complete traceability result of a sample is used to reflect the subtype to which the sample belongs.
  • the first read length includes a first read length and a second read length that is longer than the first read length
  • the first read length The intermediate stage detection report under the long term refers to the detection report obtained by analyzing the intermediate stage sequencing result data under the first read length for each short sequence of each sample, and the intermediate stage detection under the second read length. Both the report and the complete detection report refer to the detection report obtained by analyzing the intermediate stage sequencing result data of the short sequence identified as a non-host in each sample under the second read length.
  • a gene sequencing device including: a data acquisition module, a sample type determination module, a first sequencing module and a sequencing result data sending module;
  • the data acquisition module is used to obtain the gene sample to be detected and the preset read length, wherein the gene sample includes at least one sample, any sample includes at least one short sequence, and each short sequence includes the gene sample to be detected.
  • the sample type determination module is used to determine the sample type to which each sample contained in the genetic sample belongs;
  • the first sequencing module is used for sequencing each short sequence of each sample in the genetic sample according to the sequencing order corresponding to the sample type to which the sample belongs, until each short sequence of the sample is When the gene sequence in the sequence is sequenced to the preset read length, the intermediate stage sequencing result data of each short sequence in the sample is obtained;
  • the sequencing result data sending module is used to send the intermediate stage sequencing result data of each short sequence in each sample to the target server, so that the target server can process the intermediate stage sequencing result data of each short sequence in each sample. Carry out data analysis and obtain an intermediate stage detection report.
  • a gene sequencing device including a memory and a processor
  • the memory is used to store programs
  • the processor is used to execute the program to implement each step of the gene sequencing method described in any one of the above.
  • a fourth aspect provides a readable storage medium on which a computer program is stored.
  • the computer program is executed by a processor, each step of the gene sequencing method as described in any one of the above is implemented.
  • this application provides a gene sequencing method, device, equipment and medium.
  • the short sequence of the sample includes a barcode tag
  • the position of at least one barcode tag in the short sequence is located before the position of the gene sequence. Therefore, according to The sequencing sequence corresponding to the sample type to which the sample belongs.
  • sequencing each short sequence of the sample if the short sequence of the sample includes a barcode tag, sequence at least one barcode tag at the top of each short sequence of the sample first. , and then sequence the gene sequence in each short sequence of the sample, so that even if the gene sample includes multiple samples, this application can still enable the target server to generate an intermediate sequence based on the intermediate stage sequencing result data obtained when sequencing to the preset read length.
  • the stage inspection report enables preliminary pathogen identification in advance before the sequencing is completely completed, speeding up the detection, shortening the customer's waiting time, and providing a better customer experience.
  • Figure 1 is a schematic flow chart of a gene sequencing method provided by an embodiment of the present invention.
  • Figure 2 is a schematic diagram of a sequencing-while-analyzing application business process provided by an embodiment of the present application
  • Figure 3 is a schematic diagram of the time period for obtaining a three-stage test report through sample sequencing under multiple sample types
  • Figure 4 is a schematic structural diagram of a gene sequencing device provided by an embodiment of the present application.
  • Figure 5 is a hardware structural block diagram of a gene sequencing device provided by an embodiment of the present application.
  • the sample to be tested is first subjected to single-end (SE) sequencing and paired-end (PE) sequencing through a gene sequencer (such as Sequencer 200Plus), and then at the end of the sequencing cycle, the barcode of the sample to be tested is generated. Tags are sequenced. After the sequencing is completed, all sequencing data are uploaded to the server for data analysis, and a complete test report is obtained and provided to the customer.
  • SE single-end
  • PE paired-end
  • this application provides a gene sequencing method, device, equipment and medium.
  • the gene sequencing method, device, equipment and medium can be applied to a gene sequencer.
  • the gene sequencing method may include:
  • Step S101 Obtain the gene sample to be detected and the preset read length.
  • the gene sample to be detected (that is, the genome to be detected) includes at least one sample.
  • each sample contained in the genetic sample will be broken into long DNA fragments with larger molecular weights, and then the long DNA fragments will be allocated to reaction spaces with different tag sequences, and in different reaction spaces Short sequences with the same barcode tags were prepared in to sequence the short sequences with the barcode tags. That is, each sample includes at least one short sequence, and the at least one short sequence belongs to at least one long DNA fragment.
  • the gene sequence to be detected refers to the gene sequence that needs to be sequenced
  • the barcode label refers to the barcode label, which can be used as the identity information of the gene sequence.
  • each short sequence includes the gene sequence to be detected and at most two barcode tags.
  • the position of at least one barcode tag in the short sequence is located at the end of the gene sequence. before location.
  • each short sequence includes at most two barcode tags: if the genetic sample includes one sample, there is no need to distinguish the sample by barcode tags. At this time, each short sequence of the sample can only include the gene sequence to be detected. However, barcode tags are not included; if a genetic sample includes multiple samples, they need to be distinguished by barcode tags. At this time, each short sequence of each sample includes one or two barcode tags.
  • the position of at least one barcode tag in the short sequence is located before the position of the gene sequence in the short sequence. That is, if each short sequence of a sample includes a barcode tag, then each short sequence of the sample is composed of the barcode tag and the gene sequence sequence, that is, the barcode tag is located first and the gene sequence is located behind; If each short sequence of a sample includes two barcode tags, then in each short sequence, at least one of the two barcode tags is located before the position of the gene sequence. For example, the two barcode tags are marked respectively. are barcode1 and barcode2, then each short sequence of the sample can be in the form of BC1readBC2 or BC1BC2read, where BC represents barcode and read represents the gene sequence.
  • the above-mentioned default read length refers to the number of cycles for generating reports.
  • the specific value of the preset read length can be determined according to the actual situation. For example, in the currently achievable scenario, under the single-end sequencing (SE) read length, the customer can customize the single-end sequencing from 1 Any read length between ⁇ 100 BP, for example, the default read length is 40 bp (base pair, base pair), indicating that this embodiment needs to generate an intermediate stage detection report when the gene sequence is sequenced for 40 cycles.
  • SE single-end sequencing
  • the preset read length includes at least one read length, for example, the preset read length is 40bp and 80bp, indicating that this embodiment needs to generate a first intermediate stage detection report when the gene sequence is sequenced for 40 cycles, and generate a second intermediate stage detection report when the gene sequence is sequenced for 80 cycles.
  • Step S102 Determine the sample type to which each sample included in the genetic sample belongs.
  • sample types include but are not limited to the following four types: single sample type NoneBC without barcode, multi-sample type SingleBC with single barcode, multi-sample type BC1BC2read with dual barcodes on a single strand, and multi-sample type BC1readBC2 with dual barcodes on a double strand. .
  • the “single sample” and “multiple samples” here refer to the number of samples contained in the genetic sample. If the genetic sample contains one sample, the sample belongs to the single sample type. If the genetic sample contains If there are multiple samples, the multiple samples belong to the multi-sample type.
  • the samples need to be distinguished by barcode tags.
  • the short sequence of each sample can contain one barcode tag or two barcode tags.
  • the short sequence of a sample contains a barcode tag, that is, the short sequence of the sample is obtained by splicing the barcode tag and the gene sequence sequentially (the barcode tag comes first, the gene sequence follows), then this step can determine that the sample belongs to a single Barcode multiple sample types.
  • this step can determine that the sample belongs to the multi-sample type with dual barcodes on a single strand. For example, if the short sequence of a sample is barcode1 and barcode2 on one strand of the gene sequence (barcode1 and barcode2 come first, and the gene data on one strand comes after), then it is determined that the sample belongs to the multi-sample type with double barcodes on a single strand.
  • this step can determine that the sample belongs to the multi-sample type with double barcodes on both strands. For example, if the short sequence of a sample is barcode1 on the first strand (barcode1 comes first, and the gene data on the first strand comes after), and barcode2 is at the end of the second strand (the gene data on the second strand comes first, and barcode2 comes after), then it is determined This sample belongs to the multi-sample type with dual barcodes on both strands.
  • Step S103 For each sample in the genetic sample, sequence each short sequence of the sample according to the sequencing order corresponding to the sample type to which the sample belongs, until the gene sequence in each short sequence of the sample is sequenced to the predetermined level. When the read length is set, the intermediate stage sequencing result data of each short sequence in the sample is obtained.
  • the intermediate-stage sequencing result data includes the intermediate-stage sequencing result data of the gene sequence. If the genetic sample includes multiple samples, the intermediate-stage sequencing result data includes the intermediate-stage sequencing result data of the gene sequence. Data and barcode tag sequencing result data.
  • the multi-sample type includes a single barcode multi-sample type, a double barcode on a single strand multi-sample type and a double barcode on a double strand multi-sample type.
  • Step S104 Send the intermediate stage sequencing result data of each short sequence in each sample to the target server, so that the target server can perform data analysis on the intermediate stage sequencing result data of each short sequence in each sample and obtain an intermediate stage detection report. .
  • the target server directly performs data analysis on the intermediate stage sequencing result data of each short sequence in the sample to obtain an intermediate stage detection report.
  • the target server performs data analysis on the intermediate stage sequencing result data of each short sequence gene sequence in the multiple samples based on the barcode label sequencing result data of the multiple samples, and obtains an intermediate stage detection report.
  • the intermediate stage detection report includes the pathogen identification results of the read sequences obtained under the preset read length. It is understandable that since the sequencing is not completely completed, the pathogen identification results included in the intermediate stage test report are relatively rough results. However, if the preset read length is appropriately selected, relatively accurate pathogen identification results can also be obtained. For example, when the preset read length is 40bp, the preliminary identification results obtained by data analysis are basically consistent with the identification results of the complete read length (i.e. 100bp) sequencing data, which also illustrates the intermediate stage detection report obtained under the 40bp read length. The accuracy is higher.
  • This application provides a gene sequencing method.
  • the short sequence of a sample includes a barcode label
  • the position of at least one barcode label in the short sequence is located before the position of the gene sequence. Therefore, the sequencing order corresponding to the sample type to which the sample belongs is
  • sequencing each short sequence of the sample if the short sequence of the sample includes a barcode tag, first sequence at least one barcode tag at the top of each short sequence of the sample, and then sequence each short sequence of the sample.
  • the gene sequence in the sequence is sequenced, so that even if the gene sample includes multiple samples, this application can still enable the target server to generate an intermediate stage inspection report based on the intermediate stage sequencing result data obtained when sequencing to the preset read length, so that when the sequencing is not complete
  • preliminary pathogen identification can be carried out in advance, which speeds up the detection, shortens the customer's waiting time, and provides a better customer experience.
  • pathogen identification results can be obtained within 10 hours, while current gene sequencing technology requires more than 24 hours. In order to obtain accurate identification results, the customer experience is poor.
  • the customer can set the preset read length according to their own needs. If the customer needs more time to obtain the pathogen identification results, one or more larger preset read lengths can be set so that Obtain more accurate pathogen identification results. If customers do not have sufficient time to obtain pathogen identification results, they can set one or more smaller preset read lengths to obtain preliminary and rough pathogen identification results in a short time.
  • part of the data can be obtained for analysis in the intermediate sequencing stage, and a preliminary report can be obtained for customers to perform preliminary screening, which shortens the waiting time and improves customer experience.
  • step S102 the sequencing sequences corresponding to the four sample types provided in step S102 are introduced.
  • sequencing sequences corresponding to the four sample types are related to the barcode tags and gene sequence positions contained in the samples under the four sample types.
  • the sequencing sequence corresponding to the non-barcode single sample type is: sequence the gene sequence in each short sequence of the sample under the non-barcode single sample type.
  • the single-end (SE) sequencing process includes: nanosphere DNB loading (DNB loading) -> preloading (loading prime) -> loading (postloading) -> sequencing preprocessing (sequence prime) -> sequencing preprocessing and cleaning (first)—>read1 (first DNB) sequencing (read1sequencing);
  • the process of paired-end (PE) sequencing includes: DNB loading—>preloading—>loading—>sequencing preprocessing—>sequencing preprocessing cleanup—>read1 (First section of DNB) sequencing—>Second-strand synthesis (PE synthesis)—>read2 (first section of DNB) sequencing.
  • the sequencing sequence corresponding to the single barcode multiple sample type is: sequence the barcode tag in each short sequence of the sample under the single barcode multiple sample type, and after the barcode tag sequencing is completed, sequence the single barcode multiple sample type The gene sequence in each short sequence of the sample is sequenced.
  • the single-end sequencing process includes: DNB loading—>preloading—>loading—>barcode preprocessing (barcode prime)—>barcode sequencing—>sequencing preprocessing—>sequencing preprocessing cleanup—>read1 (first paragraph DNB) sequencing;
  • the process of paired-end sequencing includes: DNB loading—>preloading—>loading—>barcode preprocessing—>barcode sequencing—>sequencing preprocessing—>sequencing preprocessing cleanup—>read1 (first DNB) sequencing —>Second-strand synthesis—>read2 (first DNB) sequencing.
  • the specific processes of each of the above sub-processes are the same as those in the prior art, and will not be described again here.
  • the sequencing sequence corresponding to the multi-sample type with double barcodes in single strand is: sequence the two barcode tags in each short sequence of the sample with double barcode in single-stranded multi-sample type separately, and the sequencing of the two barcode tags is completed. Finally, the gene sequence in each short sequence of the sample under the single-stranded multi-sample type is sequenced.
  • the single-end sequencing process includes: DNB loading—>preloading—>loading—>barcode1 preprocessing—>barcode1 sequencing—>barcode2 preprocessing—>barcode2 sequencing—>sequencing preprocessing—>sequencing preprocessing cleanup—> read1 (first DNB) sequencing;
  • the process of paired-end sequencing includes: DNB loading—>preloading—>loading—>barcode1 preprocessing—>barcode1 sequencing—>barcode2 preprocessing—>barcode2 sequencing—>sequencing preprocessing—> Sequencing preprocessing and cleaning—>read1 (first DNB) sequencing—>second-strand synthesis—>read2 (first DNB) sequencing.
  • the sequencing sequence corresponding to the double-stranded multi-sample type is: sequence the first barcode tag in each short sequence of the sample under the double-stranded multi-sample type, and sequence the first barcode tag. After completion, the gene sequence in each short sequence of the sample under the double-stranded multi-sample type with double barcode is sequenced. After the gene sequence sequencing is completed, each short sequence of the sample with double-barcode under the double-stranded multi-sample type is sequenced. The second barcode tag in the short sequence is sequenced.
  • the single-end sequencing process includes: DNB loading—>preloading—>loading—>barcode1 preprocessing—>barcode1 sequencing—>sequencing preprocessing—>sequencing preprocessing cleanup—>read1 (first DNB) sequencing— >barcode2 preprocessing—>barcode2 sequencing;
  • the process of paired-end sequencing includes: DNB loading—>preloading—>loading—>barcode1 preprocessing—>barcode1 sequencing—>sequencing preprocessing—>sequencing preprocessing cleanup—>read1 (No. A section of DNB) sequencing -> second-strand synthesis -> read2 (first section of DNB) sequencing -> barcode2 preprocessing -> barcode2 sequencing.
  • the specific processes of each of the above sub-processes are the same as those in the prior art, and will not be described again here.
  • this embodiment implements the barcode testing first without affecting the sequencing quality.
  • historical barcode primers refer to barcode primers used in currently existing gene sequencing technologies. Those skilled in the art should understand that the length of historical barcode primers is usually 32 bp.
  • the length of the barcode primers used when sequencing each short sequence of any sample under multiple sample types can be 25 bp. It should be noted that this 25 bp is only an example and is not used as a Limitations on this application.
  • the nanosphere structure stabilizing information xlinker in order to prevent the DNB structure from loosening when the barcode tag is first tested, can be added before the barcode primer to stabilize the structure of the DNB.
  • the quality of optimized sequencing is less different from that of conventional sequencing.
  • the multiple samples need to be sequenced together, and the obtained intermediate stage sequencing result data of each short sequence in each sample is also together, and needs to be based on The barcode tag is split.
  • the process of step S104 of "sending the intermediate stage sequencing result data of each short sequence in each sample to the target server" may include: corresponding to the multiple samples respectively.
  • the barcode tags split and classify the intermediate stage sequencing result data of each short sequence in multiple samples to obtain the intermediate stage sequencing result data corresponding to multiple samples; the intermediate stage sequencing result data corresponding to multiple samples are obtained Sent to target server.
  • the above "split and classify the intermediate stage sequencing result data of each short sequence in multiple samples according to the barcode tags corresponding to multiple samples” specifically refers to the intermediate stage of the short sequence in each sample.
  • the sequencing result data is split and classified according to the barcode tags corresponding to each sample, so as to classify the intermediate stage sequencing result data of each sample together.
  • the intermediate stage sequencing result data of samples under single-stranded multi-sample types can be split and classified according to single barcode tags in the intermediate stage of sequencing; the intermediate stage of double-barcode samples under single-stranded multi-sample types Sequencing result data can be split and classified according to double barcode labels in the intermediate stage of sequencing; double barcode sequencing result data in the intermediate stage of samples under double-stranded multi-sample types can be split and classified according to the first barcode (i.e. position) in the intermediate stage of sequencing.
  • the barcode tag before the gene sequence is split and classified (it is split and classified according to the double barcode tag at the end of sequencing).
  • this embodiment can continue sequencing while generating an intermediate stage detection report, so that a complete detection report can be obtained when the sequencing is completed.
  • the embodiments of the present application can also perform complete sequencing of each short sequence of each sample in the genetic sample according to the sequencing order corresponding to the sample type to which the sample belongs, and obtain the sequence of each short sequence in the sample.
  • Complete sequencing result data and send the complete sequencing result data of each short sequence in each sample to the target server, so that the target server can perform data analysis on the complete sequencing result data of each short sequence in each sample and obtain a complete detection report .
  • this embodiment "performs complete sequencing of each short sequence of the sample according to the sequencing sequence corresponding to the sample type to which the sample belongs, and obtains complete sequencing result data of each short sequence in the sample.”
  • the process may include: when the sample type to which the sample belongs is a single sample type without barcode, sequencing the gene sequence in each short sequence of the sample; when the sample type to which the sample belongs is a single barcode multi-sample type, sequencing Sequencing the barcode tag in each short sequence of the sample, and after the sequencing of the barcode tag in each short sequence of the sample is completed, sequencing the gene sequence in each short sequence of the sample; in the sample When the sample type is double barcoded and has a single-stranded multi-sample type, the two barcode tags in each short sequence of the sample are sequenced separately.
  • each short sequence of the sample is sequenced.
  • the gene sequence in the sequence is sequenced; when the sample type to which the sample belongs is a multi-sample type with double barcodes on double strands, the first barcode tag in each short sequence of the sample is sequenced, and the first barcode tag is sequenced.
  • the tag sequencing is completed, the gene sequence in each short sequence of the sample is sequenced.
  • the second barcode in each short sequence of the sample is completed.
  • At least one barcode tag can be placed at the front of the entire sequencing process.
  • data analysis can be started after obtaining part of the sequencing data, and the base information of the sequencing result data in the intermediate stage can be used.
  • the intermediate stage test report includes the intermediate stage quality control results and the intermediate stage identification results of each sample
  • the complete test report includes the complete quality control results and complete identification results of each sample. Results, complete assembly results and complete traceability results.
  • the intermediate stage quality control results and complete quality control results of a sample are used to reflect the short sequences in the sample whose quality is higher than the preset quality threshold, and the intermediate stage identification results and complete identification results of a sample are both used to reflect the sample.
  • Pathogen concentration information the complete assembly result of a sample is used to reflect the recombinant sample obtained by assembling all short sequences of the sample, and the complete traceability result of a sample is used to reflect the subtype to which the sample belongs.
  • the target server performs data analysis based on the sequencing result data of each short sequence in the sample, including the four stages of quality control, identification, assembly and traceability.
  • quality control refers to determining whether the quality of each short sequence is higher than the preset quality threshold, and screening short sequences whose quality is lower than the preset quality threshold; identification refers to the sequencing result data of each short sequence in the sample. Compare with the known pathogen sequence database to determine the pathogen concentration information of the sample; assembly refers to splicing the sequencing result data of all short sequences contained in the sample into long sequence fragments; traceability refers to linking the spliced long sequence fragments with those from different countries Compare samples with known subtypes in regional and regional databases to determine the subtype to which the sample belongs.
  • whether it is intermediate stage sequencing or complete sequencing analysis can be performed according to the above four stages.
  • only quality control and identification can be performed in the intermediate stage sequencing to reduce the intermediate detection time.
  • the intermediate stage detection report under the first read length is Refers to the detection report obtained by analyzing the intermediate stage sequencing result data of each short sequence under the first read length of each sample.
  • the intermediate stage detection report and complete detection report under the second read length both refer to the analysis of each sample.
  • the short sequence identified as non-host is analyzed in the intermediate stage sequencing result data under the second read length and the detection report is obtained.
  • the target server After the target server obtains the intermediate stage sequencing result data of each short sequence of each sample under the first read length, it can determine through analysis whether each short sequence of the sample belongs to a pathogenic sequence or a host sequence (such as a human sequence). Source sample sequence, animal sample sequence, etc.) or unidentified sequence, the detection report obtained by analysis at this time is the intermediate stage detection report under the first read length.
  • the target server After the target server obtains the intermediate stage sequencing result data of each short sequence of each sample under the second read length, it can sequence the short sequence identified as a non-host in each sample at the intermediate stage under the second read length. The resulting data is analyzed and an intermediate stage detection report under the second read length is obtained.
  • non-host includes pathogenic and unidentified sequences.
  • the target server can analyze the intermediate stage sequencing result data of each short sequence under 40bp of each sample to determine which short sequences in the sample are pathogenic sequences and which short sequences are host sequences or Unidentified sequences (optional, each short sequence has a corresponding sequence number, and the sequence number is used to mark which short sequences are pathogenic sequences and which short sequences are host sequences or unidentified sequences).
  • the intermediate stage sequencing result data of those short sequences identified as non-hosts in each sample can be analyzed under the second read length to improve analysis efficiency.
  • this embodiment can also perform supplementary analysis on the complete sequencing result data of those short sequences identified as hosts in each sample to make the analysis results more complete.
  • Figure 2 is a schematic diagram of a sequencing-while-analyzing application business process provided by an embodiment of the present application.
  • you can set the number of cycles to generate the report that is, the preset read length, for example, you can set 40bp and 100bp, and then obtain the genetic sample, and determine the sample type to which each sample included in the genetic sample belongs.
  • the sequencing sequence corresponding to the sample type sequences each short sequence of each sample.
  • the intermediate stage sequencing result data of each short sequence in each sample in stage 1 is obtained, based on the intermediate stage of each short sequence in each sample in stage 1.
  • the stage sequencing result data generates 40cycle reports and fq files (fq is the main result file generated by sequencing, the full name is FASTQ file, including sequencing result data and corresponding quality values), and the gene sequence in each short sequence of each sample is sequenced
  • fq is the main result file generated by sequencing
  • the full name is FASTQ file, including sequencing result data and corresponding quality values
  • the gene sequence in each short sequence of each sample is sequenced
  • obtain the intermediate stage sequencing result data of each short sequence in each sample in stage 2 generate a 100cycle report and fq file based on the intermediate stage sequencing result data of each short sequence in each sample in stage 2, and complete the sequencing Get the complete sequencing report and fq file at the same time.
  • the sequencing result data obtained in the above three stages can be uploaded to the target server for data analysis to obtain test reports corresponding to the three stages.
  • the complete test report has the highest accuracy.
  • Stage 2 The accuracy of the second intermediate stage detection report obtained is second, and the accuracy of the first intermediate stage detection report obtained in stage 1 is the lowest.
  • FIG. 3 for a schematic diagram of the time period for obtaining a three-stage test report through sample sequencing under multiple sample types.
  • the report is issued in the 40th cycle of the sequencing process, and the time to obtain the intermediate stage detection report is 5.5 hours (h).
  • the report is issued in the 100th cycle, and the intermediate stage detection report is obtained.
  • the time for complete PE100 sequencing is 24.5 hours to obtain a complete test report; when sequencing samples with dual barcodes in single-stranded multi-sample types, the time to obtain three-stage test reports is 6.5 hours for 40 cycles respectively.
  • Figure 4 is a schematic structural diagram of an embodiment of a gene sequencing device provided by an embodiment of the present application, corresponding to a gene sequencing method provided by an embodiment of the present application described in Figure 1.
  • the gene sequencing device described in this embodiment is In practical applications, it can be specifically applied to the gene sequencer, and the device can include:
  • the data acquisition module 401 is used to obtain the gene sample to be detected and the preset read length, wherein the gene sample includes at least one sample, any sample includes at least one short sequence, and each short sequence includes the gene sequence to be detected. and at most two barcode tags, where a short sequence includes a barcode tag, the position of at least one barcode tag in the short sequence is located before the position of the gene sequence.
  • the sample type determination module 402 is used to determine the sample type to which each sample contained in the genetic sample belongs.
  • the first sequencing module 403 is configured to sequence each short sequence of each sample in the genetic sample according to the sequencing order corresponding to the sample type to which the sample belongs, until the sequence of each short sequence of the sample is When the gene sequence is sequenced to the preset read length, the intermediate stage sequencing result data of each short sequence in the sample is obtained.
  • the sequencing result data sending module 404 is used to send the intermediate stage sequencing result data of each short sequence in each sample to the target server, so that the target server can perform data analysis on the intermediate stage sequencing result data of each short sequence in each sample. , get the intermediate stage detection report.
  • the above-mentioned sample type determination module 402 can be specifically used to: when the genetic sample includes one sample, determine that the sample belongs to the non-barcode single sample type; when the genetic sample includes multiple samples, In this case, for each sample in multiple samples, if the short sequence of the sample includes a barcode tag, it is determined that the sample belongs to the single-barcode multiple sample type. If the short sequence of the sample includes two barcode tags located on the same chain. If there are multiple barcode tags, it is determined that the sample belongs to the multi-sample type with double barcodes on a single strand. If the short sequence of the sample includes two barcode tags located on two strands, it is determined that the sample belongs to the multi-sample type with double barcodes on both strands. This type.
  • the sequencing sequence corresponding to the above barcode-free single sample type is: sequencing the gene sequence in each short sequence of the sample under the barcode-free single sample type.
  • the above-mentioned sequencing sequence corresponding to the single barcode multiple sample type is: sequence the barcode tags in each short sequence of the sample under the single barcode multiple sample type, and after the barcode tag sequencing is completed, Sequence the gene sequence in each short sequence of samples under a single barcode multi-sample type.
  • the sequencing sequence corresponding to the above-mentioned double barcode in the single-stranded multi-sample type is: pair the two barcode tags in each short sequence of the sample with the double-barcode in the single-stranded multi-sample type respectively. Sequencing is performed. After the sequencing of the two barcode tags is completed, the gene sequence in each short sequence of the sample under the single-stranded multi-sample type is sequenced.
  • the sequencing sequence corresponding to the above-mentioned double barcode in the double-stranded multi-sample type is: the first barcode label in each short sequence of the sample in the double-stranded multi-sample type Sequencing is performed. After the first barcode tag is sequenced, the gene sequence in each short sequence of the double-stranded multi-sample type sample is sequenced. After the gene sequence sequencing is completed, the double barcode is sequenced in the double-stranded multi-sample type. The second barcode tag in each short sequence of the sample under the chain's multi-sample type is sequenced.
  • the length of the barcode primer used when sequencing each short sequence of any sample under the multi-sample type is smaller than the length of the historical barcode primer, wherein the multi-sample type includes a single barcode multi-sample type. , multi-sample type with double barcodes on single strand and multi-sample type with double barcodes on double strands.
  • the above-mentioned preset read length includes at least one read length.
  • the above-mentioned sequencing result data sending module 404 can specifically sequence the intermediate stage of each short sequence in the multiple samples according to the barcode tags corresponding to the multiple samples.
  • the result data is split and classified to obtain intermediate-stage sequencing result data corresponding to multiple samples, and the intermediate-stage sequencing result data corresponding to multiple samples is sent to the target server.
  • the gene sequencing device provided by the embodiment of the present application may further include: a second sequencing module and a complete sequencing result data sending module.
  • the second sequencing module is used to completely sequence each short sequence of each sample in the genetic sample according to the sequencing order corresponding to the sample type to which the sample belongs, and obtain the complete sequencing of each short sequence in the sample. Result data.
  • the complete sequencing result data sending module is used to send the complete sequencing result data of each short sequence in each sample to the target server, so that the target server can perform data analysis on the complete sequencing result data of each short sequence in each sample, and obtain Complete test report.
  • the intermediate stage test report includes the intermediate stage quality control results and the intermediate stage identification results of each sample
  • the complete test report includes the complete quality control results, complete identification results, and complete assembly of each sample. results and complete traceability results.
  • the intermediate stage quality control results and complete quality control results of a sample are used to reflect the short sequences in the sample whose quality is higher than the preset quality threshold, and the intermediate stage identification results and complete identification results of a sample are both used to reflect the sample.
  • Pathogen concentration information the complete assembly result of a sample is used to reflect the recombinant sample obtained by assembling all short sequences of the sample, and the complete traceability result of a sample is used to reflect the subtype to which the sample belongs.
  • the intermediate stage detection report under the first read length refers to each The detection report obtained by analyzing the intermediate stage sequencing result data under the first read length of each short sequence of the sample.
  • the intermediate stage detection report and the complete detection report under the second read length refer to the non-sequencing identified in each sample.
  • FIG. 5 shows a hardware structure block diagram of a gene sequencing device.
  • the hardware structure of the gene sequencing device may include: at least one processor 501, at least one communication interface 502, at least one memory 503 and at least one Communication bus 504;
  • the number of the processor 501, the communication interface 502, the memory 503, and the communication bus 504 is at least one, and the processor 501, the communication interface 502, and the memory 503 complete communication with each other through the communication bus 504;
  • the processor 501 may be a central processing unit CPU, or an application specific integrated circuit ASIC (Application Specific Integrated Circuit), or one or more integrated circuits configured to implement embodiments of the present invention, etc.;
  • ASIC Application Specific Integrated Circuit
  • the memory 503 may include high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory;
  • the memory 503 stores a program
  • the processor 501 can call the program stored in the memory 503.
  • the program is used for:
  • the gene sample to be detected and the preset read length wherein the gene sample includes at least one sample, any sample includes at least one short sequence, and each short sequence includes the gene sequence to be detected and at most two barcode tags, In the case where a short sequence includes a barcode tag, the position of at least one barcode tag in the short sequence is located before the position of the gene sequence;
  • Embodiments of the present application also provide a readable storage medium on which a computer program is stored.
  • the computer program is executed by a processor, the above gene sequencing method is implemented.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Organic Chemistry (AREA)
  • Wood Science & Technology (AREA)
  • Zoology (AREA)
  • Biotechnology (AREA)
  • General Health & Medical Sciences (AREA)
  • Microbiology (AREA)
  • Genetics & Genomics (AREA)
  • Biochemistry (AREA)
  • General Engineering & Computer Science (AREA)
  • Medicinal Chemistry (AREA)
  • Sustainable Development (AREA)
  • Biomedical Technology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Analytical Chemistry (AREA)
  • Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Immunology (AREA)
  • Molecular Biology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

A gene sequencing method, apparatus and device, and a medium. The method comprises: acquiring a gene sample to be detected and a preset read length; determining a sample type to which each sample contained in the gene sample belongs; for each sample in the gene sample, sequencing each short sequence of the sample according to a sequencing sequence corresponding to the sample type to which the sample belongs until the gene sequence in each short sequence of the sample is sequenced to the preset read length, to obtain intermediate stage sequencing result data of each short sequence in the sample; and sending the intermediate stage sequencing result data of each short sequence in each sample to a target server, so that the target server performs data analysis on the intermediate stage sequencing result data of each short sequence in each sample to obtain an intermediate stage detection report.

Description

一种基因测序方法、装置、设备和介质A gene sequencing method, device, equipment and medium 技术领域Technical field
本发明属于基因测序技术领域,具体涉及一种基因测序方法、装置、设备和介质。The invention belongs to the technical field of gene sequencing, and specifically relates to a gene sequencing method, device, equipment and medium.
背景技术Background technique
基因测序是一种新型基因检测技术,能够从血液或唾液中分析测定基因全序列,预测罹患多种疾病的可能性,个体的行为特征及行为合理。Gene sequencing is a new type of genetic testing technology that can analyze and determine the complete sequence of genes from blood or saliva to predict the possibility of suffering from various diseases, individual behavioral characteristics and reasonable behavior.
目前的测序流程中,由于单端测序和双端测序的时长均较长,例如PE100测序所需时长超24小时,SE100测序所需时长约12小时,导致客户等待获取报告结果的时间较长,客户体验较差。In the current sequencing process, both single-end sequencing and paired-end sequencing take a long time. For example, PE100 sequencing takes more than 24 hours, and SE100 sequencing takes about 12 hours. As a result, customers have to wait for a long time to obtain the report results. Poor customer experience.
发明内容Contents of the invention
本申请提供了一种基因测序方法、装置、设备和介质,用以解决现有技术中由于条形码标签测序放在测序循环的最后导致获取报告结果的时间较长,客户体验较差的技术问题。This application provides a gene sequencing method, device, equipment and medium to solve the technical problems in the prior art that the barcode tag sequencing is placed at the end of the sequencing cycle, which results in a long time to obtain the report results and poor customer experience.
为了实现上述目的,本申请提供如下技术方案:In order to achieve the above objectives, this application provides the following technical solutions:
第一方面,提供了一种基因测序方法,包括:The first aspect provides a gene sequencing method, including:
获取待检测的基因样本和预设读长,其中,所述基因样本中包括至少一个样本,任一样本中包括至少一个短序列,每个短序列中包括待检测的基因序列和至多两个条形码标签,在一短序列包括所述条形码标签的情况下,该短序列中所述至少一个条形码标签的位置位于所述基因序列的位置之前;Obtain the gene sample to be detected and the preset read length, wherein the gene sample includes at least one sample, any sample includes at least one short sequence, and each short sequence includes the gene sequence to be detected and at most two barcodes. Tag, in the case where a short sequence includes the barcode tag, the position of the at least one barcode tag in the short sequence is located before the position of the gene sequence;
确定所述基因样本包含的每个样本所属的样本类型;Determine the sample type to which each sample contained in the genetic sample belongs;
针对所述基因样本中的每个样本,按照该样本所属的样本类型对应的测序顺序对该样本的每个短序列进行测序,直至对该样本的每个短序列中的基因序列测序至所述预设读长时,得到该样本中每个短序列的中间阶段 测序结果数据;For each sample in the genetic sample, sequence each short sequence of the sample according to the sequencing order corresponding to the sample type to which the sample belongs, until the gene sequence in each short sequence of the sample is sequenced to the described When the read length is preset, the intermediate stage sequencing result data of each short sequence in the sample is obtained;
将每个样本中每个短序列的中间阶段测序结果数据发送至目标服务器,以便所述目标服务器对每个样本中每个短序列的中间阶段测序结果数据进行数据分析,得到中间阶段检测报告。The intermediate stage sequencing result data of each short sequence in each sample is sent to the target server, so that the target server performs data analysis on the intermediate stage sequencing result data of each short sequence in each sample to obtain an intermediate stage detection report.
在所述第一方面的第一种可能实现方式中,所述确定所述基因样本包含的每个样本所属的样本类型,包括:In a first possible implementation manner of the first aspect, determining the sample type to which each sample contained in the genetic sample belongs includes:
在所述基因样本中包括一个样本的情况下,确定该样本属于无条形码单样本类型;In the case where the genetic sample includes one sample, it is determined that the sample belongs to the non-barcode single sample type;
在所述基因样本中包括多个样本的情况下,针对所述多个样本中的每个样本,若该样本的短序列中包括一个条形码标签,则确定该样本属于单条形码多样本类型,若该样本的短序列中包括位于同一链上的两个条形码标签,则确定该样本属于双条形码在单链的多样本类型,若该样本的短序列中包括位于两个链上的两个条形码标签,则确定该样本属于双条形码在双链的多样本类型。In the case where the genetic sample includes multiple samples, for each sample in the multiple samples, if the short sequence of the sample includes a barcode tag, it is determined that the sample belongs to the single-barcode multiple-sample type, and if If the short sequence of the sample includes two barcode tags located on the same strand, it is determined that the sample belongs to the multi-sample type with dual barcodes on a single strand. If the short sequence of the sample includes two barcode tags located on two strands , it is determined that the sample belongs to the multi-sample type with dual barcodes on both strands.
在所述第一方面的第二种可能实现方式中,所述无条形码单样本类型对应的测序顺序为:对所述无条形码单样本类型下的样本的每个短序列中的基因序列进行测序;In a second possible implementation of the first aspect, the sequencing sequence corresponding to the barcode-free single sample type is: sequencing the gene sequence in each short sequence of the sample under the barcode-free single sample type. ;
所述单条形码多样本类型对应的测序顺序为:对所述单条形码多样本类型下的样本的每个短序列中的条形码标签进行测序,并在条形码标签测序完成后,对所述单条形码多样本类型下的样本的每个短序列中的基因序列进行测序;The sequencing sequence corresponding to the single barcode multiple sample type is: sequence the barcode tags in each short sequence of the sample under the single barcode multiple sample type, and after the barcode tag sequencing is completed, sequence the single barcode diverse The gene sequence in each short sequence of samples under this type is sequenced;
所述双条形码在单链的多样本类型对应的测序顺序为:对所述双条形码在单链的多样本类型下的样本的每个短序列中的两个条形码标签分别进行测序,在所述两个条形码标签测序完成后,对所述双条形码在单链的多样本类型下的样本的每个短序列中的基因序列进行测序;The sequencing sequence corresponding to the double barcode in the single-stranded multi-sample type is: sequence the two barcode tags in each short sequence of the sample in the single-stranded multi-sample type. After the sequencing of the two barcode tags is completed, sequence the gene sequence in each short sequence of the sample under the single-stranded multi-sample type of the double barcode;
所述双条形码在双链的多样本类型对应的测序顺序为:对所述双条形码在双链的多样本类型下的样本的每个短序列中的第一个条形码标签进行测序,在所述第一个条形码标签测序完成后,对所述双条形码在双链的多 样本类型下的样本的每个短序列中的基因序列进行测序,在基因序列测序完成后,对所述双条形码在双链的多样本类型下的样本的每个短序列中的第二个条形码标签进行测序。The sequencing sequence corresponding to the double barcode in the double-stranded multi-sample type is: sequence the first barcode tag in each short sequence of the sample in the double-stranded multi-sample type, and then After the sequencing of the first barcode tag is completed, the gene sequence in each short sequence of the double-stranded multi-sample type sample of the double barcode is sequenced. After the gene sequence sequencing is completed, the double barcode is sequenced in the double-stranded multi-sample type. The second barcode tag in each short sequence of the sample under the chain's multi-sample type is sequenced.
在所述第一方面的第三种可能实现方式中,对多样本类型下的任一样本的每个短序列进行测序时使用的条形码引物的长度小于历史条形码引物的长度,其中,所述多样本类型包括所述单条形码多样本类型、所述双条形码在单链的多样本类型和所述双条形码在双链的多样本类型。In a third possible implementation of the first aspect, the length of the barcode primer used when sequencing each short sequence of any sample under multiple sample types is smaller than the length of the historical barcode primer, wherein the diverse This type includes the single-barcode multi-sample type, the double-barcode multi-sample type on a single strand, and the double-barcode multi-sample type on double strands.
在所述第一方面的第四种可能实现方式中,所述预设读长包括至少一个读长。In a fourth possible implementation manner of the first aspect, the preset read length includes at least one read length.
在所述第一方面的第五种可能实现方式中,若所述基因样本中包括多个样本,则所述将每个样本中每个短序列的中间阶段测序结果数据发送至目标服务器,包括:In a fifth possible implementation manner of the first aspect, if the gene sample includes multiple samples, the intermediate stage sequencing result data of each short sequence in each sample is sent to the target server, including :
按照所述多个样本分别对应的条形码标签对所述多个样本中每个短序列的中间阶段测序结果数据进行拆分和归类,得到所述多个样本分别对应的中间阶段测序结果数据;Split and classify the intermediate-stage sequencing result data of each short sequence in the multiple samples according to the barcode tags corresponding to the multiple samples, to obtain the intermediate-stage sequencing result data corresponding to the multiple samples;
将所述多个样本分别对应的中间阶段测序结果数据发送至所述目标服务器。The intermediate stage sequencing result data corresponding to the plurality of samples is sent to the target server.
在所述第一方面的第六种可能实现方式中,所述基因测序方法还包括:In a sixth possible implementation manner of the first aspect, the gene sequencing method further includes:
针对所述基因样本中的每个样本,按照该样本所属的样本类型对应的测序顺序对该样本的每个短序列进行完整测序,得到该样本中每个短序列的完整测序结果数据;For each sample in the genetic sample, perform complete sequencing of each short sequence of the sample according to the sequencing order corresponding to the sample type to which the sample belongs, to obtain complete sequencing result data of each short sequence in the sample;
将每个样本中每个短序列的完整测序结果数据发送至所述目标服务器,以便所述目标服务器对每个样本中每个短序列的完整测序结果数据进行数据分析,得到完整检测报告。The complete sequencing result data of each short sequence in each sample is sent to the target server, so that the target server performs data analysis on the complete sequencing result data of each short sequence in each sample and obtains a complete detection report.
在所述第一方面的第七种可能实现方式中,所述中间阶段检测报告中包括每个样本的中间阶段质控结果和中间阶段鉴定结果,所述完整检测报告中包括每个样本的完整质控结果、完整鉴定结果、完整组装结果和完整溯源结果;In a seventh possible implementation manner of the first aspect, the intermediate stage detection report includes the intermediate stage quality control results and the intermediate stage identification results of each sample, and the complete detection report includes the complete test results of each sample. Quality control results, complete identification results, complete assembly results and complete traceability results;
其中,一样本的中间阶段质控结果和完整质控结果均用于反映该样本中质量高于预设质量阈值的短序列,一样本的中间阶段鉴定结果和完整鉴定结果均用于反映该样本的病原浓度信息,一样本的完整组装结果用于反映该样本的所有短序列组装得到的重组样本,一样本的完整溯源结果用于反映该样本所属的亚型。Among them, the intermediate stage quality control results and complete quality control results of a sample are used to reflect the short sequences in the sample whose quality is higher than the preset quality threshold, and the intermediate stage identification results and complete identification results of a sample are both used to reflect the sample. Pathogen concentration information, the complete assembly result of a sample is used to reflect the recombinant sample obtained by assembling all short sequences of the sample, and the complete traceability result of a sample is used to reflect the subtype to which the sample belongs.
在所述第一方面的第八种可能实现方式中,若所述预设读长中包括第一读长和读长大于所述第一读长的第二读长,则所述第一读长下的中间阶段检测报告是指对每个样本的每个短序列在所述第一读长下的中间阶段测序结果数据进行分析得到的检测报告,所述第二读长下的中间阶段检测报告和所述完整检测报告均是指对每个样本中识别为非宿主的短序列在所述第二读长下的中间阶段测序结果数据进行分析得到的检测报告。In an eighth possible implementation manner of the first aspect, if the preset read length includes a first read length and a second read length that is longer than the first read length, then the first read length The intermediate stage detection report under the long term refers to the detection report obtained by analyzing the intermediate stage sequencing result data under the first read length for each short sequence of each sample, and the intermediate stage detection under the second read length. Both the report and the complete detection report refer to the detection report obtained by analyzing the intermediate stage sequencing result data of the short sequence identified as a non-host in each sample under the second read length.
第二方面,提供了一种基因测序装置,包括:数据获取模块、样本类型确定模块、第一测序模块和测序结果数据发送模块;In a second aspect, a gene sequencing device is provided, including: a data acquisition module, a sample type determination module, a first sequencing module and a sequencing result data sending module;
所述数据获取模块,用于获取待检测的基因样本和预设读长,其中,所述基因样本中包括至少一个样本,任一样本中包括至少一个短序列,每个短序列中包括待检测的基因序列和至多两个条形码标签,在一短序列包括所述条形码标签的情况下,该短序列中所述至少一个条形码标签的位置位于所述基因序列的位置之前;The data acquisition module is used to obtain the gene sample to be detected and the preset read length, wherein the gene sample includes at least one sample, any sample includes at least one short sequence, and each short sequence includes the gene sample to be detected. The gene sequence and at most two barcode tags, in the case where a short sequence includes the barcode tag, the position of the at least one barcode tag in the short sequence is located before the position of the gene sequence;
所述样本类型确定模块,用于确定所述基因样本包含的每个样本所属的样本类型;The sample type determination module is used to determine the sample type to which each sample contained in the genetic sample belongs;
所述第一测序模块,用于针对所述基因样本中的每个样本,按照该样本所属的样本类型对应的测序顺序对该样本的每个短序列进行测序,直至对该样本的每个短序列中的基因序列测序至所述预设读长时,得到该样本中每个短序列的中间阶段测序结果数据;The first sequencing module is used for sequencing each short sequence of each sample in the genetic sample according to the sequencing order corresponding to the sample type to which the sample belongs, until each short sequence of the sample is When the gene sequence in the sequence is sequenced to the preset read length, the intermediate stage sequencing result data of each short sequence in the sample is obtained;
所述测序结果数据发送模块,用于将每个样本中每个短序列的中间阶段测序结果数据发送至目标服务器,以便所述目标服务器对每个样本中每个短序列的中间阶段测序结果数据进行数据分析,得到中间阶段检测报告。The sequencing result data sending module is used to send the intermediate stage sequencing result data of each short sequence in each sample to the target server, so that the target server can process the intermediate stage sequencing result data of each short sequence in each sample. Carry out data analysis and obtain an intermediate stage detection report.
第三方面,提供了一种基因测序设备,包括存储器和处理器;In a third aspect, a gene sequencing device is provided, including a memory and a processor;
所述存储器,用于存储程序;The memory is used to store programs;
所述处理器,用于执行所述程序,实现如上述任一项所述的基因测序方法的各个步骤。The processor is used to execute the program to implement each step of the gene sequencing method described in any one of the above.
第四方面,提供了一种可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时,实现如上述任一项所述的基因测序方法的各个步骤。A fourth aspect provides a readable storage medium on which a computer program is stored. When the computer program is executed by a processor, each step of the gene sequencing method as described in any one of the above is implemented.
综上,本申请提供了一种基因测序方法、装置、设备和介质,在样本的短序列中包括条形码标签的情况下,短序列中至少一个条形码标签的位置位于基因序列的位置之前,因此按照样本所属的样本类型对应的测序顺序对样本的每个短序列进行测序时,如果样本的短序列中包括条形码标签,则先对样本的每个短序列中位置靠前的至少一个条形码标签进行测序,再对样本的每个短序列中的基因序列进行测序,从而即使基因样本中包括多个样本,本申请依然能够使目标服务器基于测序至预设读长时得到的中间阶段测序结果数据生成中间阶段检报告,从而在测序未完全结束时就能提前进行初步的病原鉴定,加快了检测速度,缩短了客户的等待时长,客户体验更好。In summary, this application provides a gene sequencing method, device, equipment and medium. When the short sequence of the sample includes a barcode tag, the position of at least one barcode tag in the short sequence is located before the position of the gene sequence. Therefore, according to The sequencing sequence corresponding to the sample type to which the sample belongs. When sequencing each short sequence of the sample, if the short sequence of the sample includes a barcode tag, sequence at least one barcode tag at the top of each short sequence of the sample first. , and then sequence the gene sequence in each short sequence of the sample, so that even if the gene sample includes multiple samples, this application can still enable the target server to generate an intermediate sequence based on the intermediate stage sequencing result data obtained when sequencing to the preset read length. The stage inspection report enables preliminary pathogen identification in advance before the sequencing is completely completed, speeding up the detection, shortening the customer's waiting time, and providing a better customer experience.
附图说明Description of drawings
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据提供的附图获得其他的附图。In order to explain the embodiments of the present application or the technical solutions in the prior art more clearly, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings in the following description are only This is an embodiment of the present application. For those of ordinary skill in the art, other drawings can be obtained based on the provided drawings without exerting creative efforts.
图1为本发明实施例提供的基因测序方法的流程示意图;Figure 1 is a schematic flow chart of a gene sequencing method provided by an embodiment of the present invention;
图2为本申请实施例提供的一种边测序边分析应用业务流程的示意图;Figure 2 is a schematic diagram of a sequencing-while-analyzing application business process provided by an embodiment of the present application;
图3为多样本类型下的样本测序获得三阶段检测报告的时间段示意图;Figure 3 is a schematic diagram of the time period for obtaining a three-stage test report through sample sequencing under multiple sample types;
图4为本申请实施例提供的基因测序装置的结构示意图;Figure 4 is a schematic structural diagram of a gene sequencing device provided by an embodiment of the present application;
图5为本申请实施例提供的基因测序设备的硬件结构框图。Figure 5 is a hardware structural block diagram of a gene sequencing device provided by an embodiment of the present application.
具体实施方式Detailed ways
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are only some of the embodiments of the present application, rather than all of the embodiments. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative efforts fall within the scope of protection of this application.
目前在进行基因测试时,先通过基因测序仪(例如测序仪200Plus)对待测样本进行单端(SE)测序及双端(PE)测序,再在测序循环的最后对待测样本的条形码(barcode)标签进行测序,在测序完成后,将所有的测序数据上传至服务器进行数据分析,得到完整的检测报告并提供给客户。Currently, when conducting genetic testing, the sample to be tested is first subjected to single-end (SE) sequencing and paired-end (PE) sequencing through a gene sequencer (such as Sequencer 200Plus), and then at the end of the sequencing cycle, the barcode of the sample to be tested is generated. Tags are sequenced. After the sequencing is completed, all sequencing data are uploaded to the server for data analysis, and a complete test report is obtained and provided to the customer.
但是将条形码标签测序放在测序循环的最后进行,使得数据分析过程只能在整个测序完成后才能进行,导致客户等待获取报告结果的时间较长,客户体验较差。However, placing barcode tag sequencing at the end of the sequencing cycle means that the data analysis process can only be carried out after the entire sequencing is completed, resulting in long wait times for customers to obtain report results and poor customer experience.
为了解决现有技术存在的问题,本申请提供了一种基因测序方法、装置、设备和介质,可选的,该基因测序方法、装置、设备和介质可应用于基因测序仪,接下来首先通过下述实施例对本申请提供的基因测序方法进行详细介绍。In order to solve the problems existing in the existing technology, this application provides a gene sequencing method, device, equipment and medium. Optionally, the gene sequencing method, device, equipment and medium can be applied to a gene sequencer. Next, first pass The following examples introduce in detail the gene sequencing method provided by this application.
请参阅图1,示出了本申请实施例提供的基因测序方法的流程示意图,该基因测序方法可以包括:Please refer to Figure 1, which shows a schematic flow chart of a gene sequencing method provided by an embodiment of the present application. The gene sequencing method may include:
步骤S101、获取待检测的基因样本和预设读长。Step S101: Obtain the gene sample to be detected and the preset read length.
在本步骤中,待检测的基因样本(也即待检测的基因组)中包括至少一个样本。In this step, the gene sample to be detected (that is, the genome to be detected) includes at least one sample.
应当理解,在对基因样本进行测序前,会将基因样本包含的每个样本打断成分子量较大的DNA长片段,然后将DNA长片段分配到具有不同标签序列的反应空间,在不同反应空间中制备出带有相同条形码标签的短序列,以对带有条形码标签的短序列进行测序。也就是说,每个样本中包括至少一个短序列,该至少一个短序列属于至少一个DNA长片段。It should be understood that before sequencing a genetic sample, each sample contained in the genetic sample will be broken into long DNA fragments with larger molecular weights, and then the long DNA fragments will be allocated to reaction spaces with different tag sequences, and in different reaction spaces Short sequences with the same barcode tags were prepared in to sequence the short sequences with the barcode tags. That is, each sample includes at least one short sequence, and the at least one short sequence belongs to at least one long DNA fragment.
这里,待检测的基因序列是指需要进行基因测序的基因序列,条形码标签是指barcode标签,其可以作为基因序列的身份信息。Here, the gene sequence to be detected refers to the gene sequence that needs to be sequenced, and the barcode label refers to the barcode label, which can be used as the identity information of the gene sequence.
在本步骤中,每个短序列中包括待检测的基因序列和至多两个条形码标签,在一短序列包括所述条形码标签的情况下,该短序列中至少一个条形码标签的位置位于基因序列的位置之前。In this step, each short sequence includes the gene sequence to be detected and at most two barcode tags. When a short sequence includes the barcode tag, the position of at least one barcode tag in the short sequence is located at the end of the gene sequence. before location.
其中,每个短序列中包括至多两个条形码标签是指:若基因样本包括一个样本,不需要通过barcode标签区分样本,此时该样本的每个短序列中可以仅包括待检测的基因序列,但是不包括barcode标签;若基因样本中包括多个样本,则需要通过barcode标签进行区分,此时每个样本的每个短序列中包括一个或两个barcode标签。Among them, each short sequence includes at most two barcode tags: if the genetic sample includes one sample, there is no need to distinguish the sample by barcode tags. At this time, each short sequence of the sample can only include the gene sequence to be detected. However, barcode tags are not included; if a genetic sample includes multiple samples, they need to be distinguished by barcode tags. At this time, each short sequence of each sample includes one or two barcode tags.
在本实施例中,当一样本的短序列中包括barcode标签的情况下,至少一个barcode标签在短序列中的位置位于基因序列在短序列中的位置之前。也即,若一样本的每个短序列中包括一个barcode标签,则该样本的每个短序列均由barcode标签和基因序列顺序组成,即barcode标签的位置在前,基因序列的位置在后;若一样本的每个短序列中包括两个barcode标签,则在每个短序列中这两个barcode标签中至少有一个barcode标签的位置在基因序列的位置之前,例如这两个barcode标签分别记为barcode1和barcode2,则该样本的每个短序列的形式可以为BC1readBC2,也可以为BC1BC2read,这里,BC代表barcode,read代表基因序列。In this embodiment, when the short sequence of a sample includes a barcode tag, the position of at least one barcode tag in the short sequence is located before the position of the gene sequence in the short sequence. That is, if each short sequence of a sample includes a barcode tag, then each short sequence of the sample is composed of the barcode tag and the gene sequence sequence, that is, the barcode tag is located first and the gene sequence is located behind; If each short sequence of a sample includes two barcode tags, then in each short sequence, at least one of the two barcode tags is located before the position of the gene sequence. For example, the two barcode tags are marked respectively. are barcode1 and barcode2, then each short sequence of the sample can be in the form of BC1readBC2 or BC1BC2read, where BC represents barcode and read represents the gene sequence.
上述预设读长是指生成报告的循环(cycle)数。在本实施例中,预设读长的具体取值可以根据实际情况进行确定,例如,在目前可以实现的场景中,在单端测序(SE)读长下,客户可以自定义单端从1~100BP之间的任意读长,例如,预设读长为40bp(base pair,碱基对),表征本实施例需要在对基因序列循环测序40cycle时生成中间阶段检测报告。The above-mentioned default read length refers to the number of cycles for generating reports. In this embodiment, the specific value of the preset read length can be determined according to the actual situation. For example, in the currently achievable scenario, under the single-end sequencing (SE) read length, the customer can customize the single-end sequencing from 1 Any read length between ~100 BP, for example, the default read length is 40 bp (base pair, base pair), indicating that this embodiment needs to generate an intermediate stage detection report when the gene sequence is sequenced for 40 cycles.
在一可选实施例中,预设读长包括至少一个读长,例如预设读长为40bp和80bp,表征本实施例需要在对基因序列循环测序40cycle时生成第一中间阶段检测报告,并在循环测序80cycle时生成第二中间阶段检测报 告。In an optional embodiment, the preset read length includes at least one read length, for example, the preset read length is 40bp and 80bp, indicating that this embodiment needs to generate a first intermediate stage detection report when the gene sequence is sequenced for 40 cycles, and generate a second intermediate stage detection report when the gene sequence is sequenced for 80 cycles.
值得注意的是,预设读长越长,病原鉴定(即中间阶段检测报告)的精准性越高。It is worth noting that the longer the preset read length, the higher the accuracy of pathogen identification (that is, the intermediate stage detection report).
步骤S102、确定基因样本包含的每个样本所属的样本类型。Step S102: Determine the sample type to which each sample included in the genetic sample belongs.
可选的,样本类型包括但不限于以下四种类型:无条形码单样本类型NoneBC、单条形码多样本类型SingleBC、双条形码在单链的多样本类型BC1BC2read和双条形码在双链的多样本类型BC1readBC2。Optional, sample types include but are not limited to the following four types: single sample type NoneBC without barcode, multi-sample type SingleBC with single barcode, multi-sample type BC1BC2read with dual barcodes on a single strand, and multi-sample type BC1readBC2 with dual barcodes on a double strand. .
需要说明的是,这里的“单样本”和“多样本”是针对基因样本包含的样本个数而言的,如果基因样本中包含一个样本,则该样本属于单样本类型,如果基因样本中包含多个样本,则该多个样本均属于多样本类型。It should be noted that the "single sample" and "multiple samples" here refer to the number of samples contained in the genetic sample. If the genetic sample contains one sample, the sample belongs to the single sample type. If the genetic sample contains If there are multiple samples, the multiple samples belong to the multi-sample type.
可以理解的是,基因样本仅包含一个样本时,无需barcode标签进行区分,因此在这种情况下,该唯一的样本属于无条形码单样本类型。It is understandable that when a genetic sample contains only one sample, no barcode label is needed for differentiation, so in this case, the unique sample belongs to the non-barcode single sample type.
而基因样本包含多个样本时,需要通过barcode标签进行样本区分,在这种情况下,每个样本的短序列可以包含一个barcode标签,也可以包含两个barcode标签。When a genetic sample contains multiple samples, the samples need to be distinguished by barcode tags. In this case, the short sequence of each sample can contain one barcode tag or two barcode tags.
其中,若一样本的短序列中包含一个barcode标签,即该样本的短序列为barcode标签和基因序列顺序拼接得到(barcode标签在前,基因序列在后),则本步骤可以确定该样本属于单条形码多样本类型。Among them, if the short sequence of a sample contains a barcode tag, that is, the short sequence of the sample is obtained by splicing the barcode tag and the gene sequence sequentially (the barcode tag comes first, the gene sequence follows), then this step can determine that the sample belongs to a single Barcode multiple sample types.
若一样本的短序列中包含两个barcode标签,且该两个barcode标签位于同一链上,则本步骤可以确定该样本属于双条形码在单链的多样本类型。例如,一样本的短序列为barcode1和barcode2在基因序列的一链上(barcode1和barcode2在前,一链上的基因数据在后),则确定该样本属于双条形码在单链的多样本类型。If the short sequence of a sample contains two barcode tags, and the two barcode tags are located on the same strand, this step can determine that the sample belongs to the multi-sample type with dual barcodes on a single strand. For example, if the short sequence of a sample is barcode1 and barcode2 on one strand of the gene sequence (barcode1 and barcode2 come first, and the gene data on one strand comes after), then it is determined that the sample belongs to the multi-sample type with double barcodes on a single strand.
若一样本的短序列中包含两个barcode标签,且该两个barcode标签位于两个链上,则本步骤可以确定该样本属于双条形码在双链的多样本类型。例如,一样本的短序列为barcode1在一链上(barcode1在前,一链上的基因数据在后),barcode2在二链末尾(二链上的基因数据在前,barcode2在 后),则确定该样本属于双条形码在双链的多样本类型。If the short sequence of a sample contains two barcode tags, and the two barcode tags are located on two strands, this step can determine that the sample belongs to the multi-sample type with double barcodes on both strands. For example, if the short sequence of a sample is barcode1 on the first strand (barcode1 comes first, and the gene data on the first strand comes after), and barcode2 is at the end of the second strand (the gene data on the second strand comes first, and barcode2 comes after), then it is determined This sample belongs to the multi-sample type with dual barcodes on both strands.
步骤S103、针对基因样本中的每个样本,按照该样本所属的样本类型对应的测序顺序对该样本的每个短序列进行测序,直至对该样本的每个短序列中的基因序列测序至预设读长时,得到该样本中每个短序列的中间阶段测序结果数据。Step S103: For each sample in the genetic sample, sequence each short sequence of the sample according to the sequencing order corresponding to the sample type to which the sample belongs, until the gene sequence in each short sequence of the sample is sequenced to the predetermined level. When the read length is set, the intermediate stage sequencing result data of each short sequence in the sample is obtained.
前述步骤中已经说明了“当一样本的短序列中包括barcode标签的情况下,至少一个barcode标签在短序列中的位置位于基因序列在短序列中的位置之前”,而基因测序是依次对样本的短序列中的每个数据进行测序,因此在基因样本为多样本类型时,任一多样本类型对应的测序顺序中均先对该多样本类型下的样本的短序列中的至少一个条形码标签进行测序,再对该多样本类型下的样本中的基因序列进行测序。The previous steps have explained that "when the short sequence of a sample includes a barcode tag, the position of at least one barcode tag in the short sequence is located before the position of the gene sequence in the short sequence", and gene sequencing is to sequence the samples. Each data in the short sequence is sequenced. Therefore, when the genetic sample is of multi-sample type, the sequencing sequence corresponding to any multi-sample type must first sequence at least one barcode in the short sequence of the sample under the multi-sample type. Tags are sequenced, and then the gene sequences in samples of multiple sample types are sequenced.
因此,若基因样本包括一个样本,则中间阶段测序结果数据中包括基因序列的中间阶段测序结果数据,若基因样本中包括多个样本,则中间阶段测序结果数据中包括基因序列的中间阶段测序结果数据和条形码标签测序结果数据。Therefore, if the genetic sample includes one sample, the intermediate-stage sequencing result data includes the intermediate-stage sequencing result data of the gene sequence. If the genetic sample includes multiple samples, the intermediate-stage sequencing result data includes the intermediate-stage sequencing result data of the gene sequence. Data and barcode tag sequencing result data.
这里,多样本类型包括单条形码多样本类型、双条形码在单链的多样本类型和双条形码在双链的多样本类型。Here, the multi-sample type includes a single barcode multi-sample type, a double barcode on a single strand multi-sample type and a double barcode on a double strand multi-sample type.
步骤S104、将每个样本中每个短序列的中间阶段测序结果数据发送至目标服务器,以便目标服务器对每个样本中每个短序列的中间阶段测序结果数据进行数据分析,得到中间阶段检测报告。Step S104: Send the intermediate stage sequencing result data of each short sequence in each sample to the target server, so that the target server can perform data analysis on the intermediate stage sequencing result data of each short sequence in each sample and obtain an intermediate stage detection report. .
具体的,若基因样本包括一个样本,则目标服务器直接对该样本中每个短序列的中间阶段测序结果数据进行数据分析,得到中间阶段检测报告。Specifically, if the gene sample includes one sample, the target server directly performs data analysis on the intermediate stage sequencing result data of each short sequence in the sample to obtain an intermediate stage detection report.
若基因样本包括多个样本,则目标服务器基于多个样本的条形码标签测序结果数据对多个样本中每个短序列的基因序列的中间阶段测序结果数据进行数据分析,得到中间阶段检测报告。这里,中间阶段检测报告中包括预设读长下得到的read序列的病原鉴定结果。可以理解的是,由于测序未完全结束,中间阶段检测报告中包含的病原鉴定结果为一种比较粗略的 结果,但是如果预设读长选取的比较合适,也可以得到相对准确的病原鉴定结果,例如,预设读长为40bp时,进行数据分析获得的初步鉴定结果与完整读长(即100bp)测序数据的鉴定结果基本达到一致,这也说明了40bp的读长下得到的中间阶段检测报告的准确度较高。If the gene sample includes multiple samples, the target server performs data analysis on the intermediate stage sequencing result data of each short sequence gene sequence in the multiple samples based on the barcode label sequencing result data of the multiple samples, and obtains an intermediate stage detection report. Here, the intermediate stage detection report includes the pathogen identification results of the read sequences obtained under the preset read length. It is understandable that since the sequencing is not completely completed, the pathogen identification results included in the intermediate stage test report are relatively rough results. However, if the preset read length is appropriately selected, relatively accurate pathogen identification results can also be obtained. For example, when the preset read length is 40bp, the preliminary identification results obtained by data analysis are basically consistent with the identification results of the complete read length (i.e. 100bp) sequencing data, which also illustrates the intermediate stage detection report obtained under the 40bp read length. The accuracy is higher.
本申请提供了一种基因测序方法,在样本的短序列中包括条形码标签的情况下,短序列中至少一个条形码标签的位置位于基因序列的位置之前,因此按照样本所属的样本类型对应的测序顺序对样本的每个短序列进行测序时,如果样本的短序列中包括条形码标签,则先对样本的每个短序列中位置靠前的至少一个条形码标签进行测序,再对样本的每个短序列中的基因序列进行测序,从而即使基因样本中包括多个样本,本申请依然能够使目标服务器基于测序至预设读长时得到的中间阶段测序结果数据生成中间阶段检报告,从而在测序未完全结束时就能提前进行初步的病原鉴定,加快了检测速度,缩短了客户的等待时长,客户体验更好。This application provides a gene sequencing method. When the short sequence of a sample includes a barcode label, the position of at least one barcode label in the short sequence is located before the position of the gene sequence. Therefore, the sequencing order corresponding to the sample type to which the sample belongs is When sequencing each short sequence of the sample, if the short sequence of the sample includes a barcode tag, first sequence at least one barcode tag at the top of each short sequence of the sample, and then sequence each short sequence of the sample. The gene sequence in the sequence is sequenced, so that even if the gene sample includes multiple samples, this application can still enable the target server to generate an intermediate stage inspection report based on the intermediate stage sequencing result data obtained when sequencing to the preset read length, so that when the sequencing is not complete At the end, preliminary pathogen identification can be carried out in advance, which speeds up the detection, shortens the customer's waiting time, and provides a better customer experience.
例如,在一种可能的场景中,客户提交测序需求后,可能想在比较短的时间内得到病原鉴定结果,比如,在10小时内获得病原鉴定结果,而目前的基因测序技术需要超24小时才能获得精准的鉴定结果,客户体验较差。For example, in one possible scenario, after customers submit sequencing requirements, they may want to obtain pathogen identification results in a relatively short time. For example, the pathogen identification results can be obtained within 10 hours, while current gene sequencing technology requires more than 24 hours. In order to obtain accurate identification results, the customer experience is poor.
而如果采用本申请提供的基因测序方法,可以由客户按照自身需求设置预设读长,如果客户需要获得病原鉴定结果的时间较为充裕,可以设置一个或多个较大的预设读长,以便获得更精准的病原鉴定结果,如果客户需要获得病原鉴定结果的时间不充裕,可以设置一个或多个较小的预设读长,以便在短时间内获得初步较为粗略的病原鉴定结果。If the gene sequencing method provided by this application is used, the customer can set the preset read length according to their own needs. If the customer needs more time to obtain the pathogen identification results, one or more larger preset read lengths can be set so that Obtain more accurate pathogen identification results. If customers do not have sufficient time to obtain pathogen identification results, they can set one or more smaller preset read lengths to obtain preliminary and rough pathogen identification results in a short time.
本实施例通过自定义预设读长,可以在中间测序阶段获得部分数据进行分析,得到初步的报告提供给客户进行初步筛查,缩短了等待时间,提升了客户体验。In this embodiment, by customizing the preset read length, part of the data can be obtained for analysis in the intermediate sequencing stage, and a preliminary report can be obtained for customers to perform preliminary screening, which shortens the waiting time and improves customer experience.
在本申请的一种可能实现的方式中,对步骤S102提供的四种样本类型对应的测序顺序进行介绍。In one possible implementation manner of the present application, the sequencing sequences corresponding to the four sample types provided in step S102 are introduced.
在本实施例中,四种样本类型对应的测序顺序与四种样本类型下的样本包含的barcode标签和基因序列的位置相关。In this embodiment, the sequencing sequences corresponding to the four sample types are related to the barcode tags and gene sequence positions contained in the samples under the four sample types.
具体来说,无条形码单样本类型对应的测序顺序为:对无条形码单样本类型下的样本的每个短序列中的基因序列进行测序。具体的,单端(SE)测序的流程包括:纳米球DNB加载(DNB loading)—>预加载(loading prime)—>加载(postloading)—>测序预处理(sequence prime)—>测序预处理清理(first)—>read1(第一段DNB)测序(read1sequencing);双端(PE)测序的流程包括:DNB加载—>预加载—>加载—>测序预处理—>测序预处理清理—>read1(第一段DNB)测序—>二链合成(PE合成)—>read2(第一段DNB)测序。上述各个子流程的具体过程与现有技术相同,在此不再赘述。Specifically, the sequencing sequence corresponding to the non-barcode single sample type is: sequence the gene sequence in each short sequence of the sample under the non-barcode single sample type. Specifically, the single-end (SE) sequencing process includes: nanosphere DNB loading (DNB loading) -> preloading (loading prime) -> loading (postloading) -> sequencing preprocessing (sequence prime) -> sequencing preprocessing and cleaning (first)—>read1 (first DNB) sequencing (read1sequencing); the process of paired-end (PE) sequencing includes: DNB loading—>preloading—>loading—>sequencing preprocessing—>sequencing preprocessing cleanup—>read1 (First section of DNB) sequencing—>Second-strand synthesis (PE synthesis)—>read2 (first section of DNB) sequencing. The specific processes of each of the above sub-processes are the same as those in the prior art, and will not be described again here.
可选的,单条形码多样本类型对应的测序顺序为:对单条形码多样本类型下的样本的每个短序列中的条形码标签进行测序,并在条形码标签测序完成后,对单条形码多样本类型下的样本的每个短序列中的基因序列进行测序。具体的,单端测序的流程包括:DNB加载—>预加载—>加载—>barcode预处理(barcode prime)—>barcode测序—>测序预处理—>测序预处理清理—>read1(第一段DNB)测序;双端测序的流程包括:DNB加载—>预加载—>加载—>barcode预处理—>barcode测序—>测序预处理—>测序预处理清理—>read1(第一段DNB)测序—>二链合成—>read2(第一段DNB)测序。上述各个子流程的具体过程与现有技术相同,在此不再赘述。Optionally, the sequencing sequence corresponding to the single barcode multiple sample type is: sequence the barcode tag in each short sequence of the sample under the single barcode multiple sample type, and after the barcode tag sequencing is completed, sequence the single barcode multiple sample type The gene sequence in each short sequence of the sample is sequenced. Specifically, the single-end sequencing process includes: DNB loading—>preloading—>loading—>barcode preprocessing (barcode prime)—>barcode sequencing—>sequencing preprocessing—>sequencing preprocessing cleanup—>read1 (first paragraph DNB) sequencing; the process of paired-end sequencing includes: DNB loading—>preloading—>loading—>barcode preprocessing—>barcode sequencing—>sequencing preprocessing—>sequencing preprocessing cleanup—>read1 (first DNB) sequencing —>Second-strand synthesis—>read2 (first DNB) sequencing. The specific processes of each of the above sub-processes are the same as those in the prior art, and will not be described again here.
双条形码在单链的多样本类型对应的测序顺序为:对双条形码在单链的多样本类型下的样本的每个短序列中的两个条形码标签分别进行测序,在两个条形码标签测序完成后,对双条形码在单链的多样本类型下的样本的每个短序列中的基因序列进行测序。具体的,单端测序的流程包括:DNB加载—>预加载—>加载—>barcode1预处理—>barcode1测序—>barcode2预处理—>barcode2测序—>测序预处理—>测序预处理清理—>read1(第一段DNB)测序;双端测序的流程包括:DNB加载—>预加载—>加载 —>barcode1预处理—>barcode1测序—>barcode2预处理—>barcode2测序—>测序预处理—>测序预处理清理—>read1(第一段DNB)测序—>二链合成—>read2(第一段DNB)测序。上述各个子流程的具体过程与现有技术相同,在此不再赘述。The sequencing sequence corresponding to the multi-sample type with double barcodes in single strand is: sequence the two barcode tags in each short sequence of the sample with double barcode in single-stranded multi-sample type separately, and the sequencing of the two barcode tags is completed. Finally, the gene sequence in each short sequence of the sample under the single-stranded multi-sample type is sequenced. Specifically, the single-end sequencing process includes: DNB loading—>preloading—>loading—>barcode1 preprocessing—>barcode1 sequencing—>barcode2 preprocessing—>barcode2 sequencing—>sequencing preprocessing—>sequencing preprocessing cleanup—> read1 (first DNB) sequencing; the process of paired-end sequencing includes: DNB loading—>preloading—>loading—>barcode1 preprocessing—>barcode1 sequencing—>barcode2 preprocessing—>barcode2 sequencing—>sequencing preprocessing—> Sequencing preprocessing and cleaning—>read1 (first DNB) sequencing—>second-strand synthesis—>read2 (first DNB) sequencing. The specific processes of each of the above sub-processes are the same as those in the prior art, and will not be described again here.
双条形码在双链的多样本类型对应的测序顺序为:对双条形码在双链的多样本类型下的样本的每个短序列中的第一个条形码标签进行测序,在第一个条形码标签测序完成后,对双条形码在双链的多样本类型下的样本的每个短序列中的基因序列进行测序,在基因序列测序完成后,对双条形码在双链的多样本类型下的样本的每个短序列中的第二个条形码标签进行测序。具体的,单端测序的流程包括:DNB加载—>预加载—>加载—>barcode1预处理—>barcode1测序—>测序预处理—>测序预处理清理—>read1(第一段DNB)测序—>barcode2预处理—>barcode2测序;双端测序的流程包括:DNB加载—>预加载—>加载—>barcode1预处理—>barcode1测序—>测序预处理—>测序预处理清理—>read1(第一段DNB)测序—>二链合成—>read2(第一段DNB)测序—>barcode2预处理—>barcode2测序。上述各个子流程的具体过程与现有技术相同,在此不再赘述。The sequencing sequence corresponding to the double-stranded multi-sample type is: sequence the first barcode tag in each short sequence of the sample under the double-stranded multi-sample type, and sequence the first barcode tag. After completion, the gene sequence in each short sequence of the sample under the double-stranded multi-sample type with double barcode is sequenced. After the gene sequence sequencing is completed, each short sequence of the sample with double-barcode under the double-stranded multi-sample type is sequenced. The second barcode tag in the short sequence is sequenced. Specifically, the single-end sequencing process includes: DNB loading—>preloading—>loading—>barcode1 preprocessing—>barcode1 sequencing—>sequencing preprocessing—>sequencing preprocessing cleanup—>read1 (first DNB) sequencing— >barcode2 preprocessing—>barcode2 sequencing; the process of paired-end sequencing includes: DNB loading—>preloading—>loading—>barcode1 preprocessing—>barcode1 sequencing—>sequencing preprocessing—>sequencing preprocessing cleanup—>read1 (No. A section of DNB) sequencing -> second-strand synthesis -> read2 (first section of DNB) sequencing -> barcode2 preprocessing -> barcode2 sequencing. The specific processes of each of the above sub-processes are the same as those in the prior art, and will not be described again here.
在本申请的另一种可能实现的方式中,考虑到先测barcode标签会占用二链合成时的引物,从而可能会影响测序质量,本实施例为了不影响测序质量的前提下实现先测barcode标签,可以在对多样本类型下的任一样本的每个短序列进行测序时,使用长度小于历史条形码引物的长度的条形码引物(即barcode引物),以便在二链合成前洗脱。In another possible implementation method of this application, considering that testing the barcode tag first will occupy the primers during second-strand synthesis, which may affect the sequencing quality, this embodiment implements the barcode testing first without affecting the sequencing quality. tag, you can use barcode primers that are smaller than the length of historical barcode primers (i.e., barcode primers) when sequencing each short sequence of any sample under multiple sample types to allow elution before second-strand synthesis.
这里,历史条形码引物是指目前已有的基因测序技术中使用的条形码引物,本领域技术人员应当理解,历史条形码引物的长度通常为32bp。Here, historical barcode primers refer to barcode primers used in currently existing gene sequencing technologies. Those skilled in the art should understand that the length of historical barcode primers is usually 32 bp.
可选的,本申请实施例中,对多样本类型下的任一样本的每个短序列进行测序时使用的条形码引物的长度可以为25bp,需要说明的是,该25bp仅为示例,不作为对本申请的限定。Optionally, in the embodiments of this application, the length of the barcode primers used when sequencing each short sequence of any sample under multiple sample types can be 25 bp. It should be noted that this 25 bp is only an example and is not used as a Limitations on this application.
在本申请的又一种可能实现的方式中,为了防止先测barcode标签时DNB结构松散,可以在barcode引物前加入了纳米球结构稳固信息xlinker,以用于稳固DNB的结构。优化过后的测序质量与常规测序差异较小。In another possible implementation method of this application, in order to prevent the DNB structure from loosening when the barcode tag is first tested, the nanosphere structure stabilizing information xlinker can be added before the barcode primer to stabilize the structure of the DNB. The quality of optimized sequencing is less different from that of conventional sequencing.
在本申请的又一种可能实现的方式中,基因样本包含多个样本时,需要将多个样本一起测序,得到的各样本中每个短序列的中间阶段测序结果数据也在一起,需要基于barcode标签进行拆分。In another possible implementation method of this application, when the genetic sample contains multiple samples, the multiple samples need to be sequenced together, and the obtained intermediate stage sequencing result data of each short sequence in each sample is also together, and needs to be based on The barcode tag is split.
因此,可选的,若基因样本中包括多个样本,则步骤S104“将每个样本中每个短序列的中间阶段测序结果数据发送至目标服务器”的过程可以包括:按照多个样本分别对应的条形码标签对多个样本中每个短序列的中间阶段测序结果数据进行拆分和归类,得到多个样本分别对应的中间阶段测序结果数据;将多个样本分别对应的中间阶段测序结果数据发送至目标服务器。Therefore, optionally, if the genetic sample includes multiple samples, the process of step S104 of "sending the intermediate stage sequencing result data of each short sequence in each sample to the target server" may include: corresponding to the multiple samples respectively. The barcode tags split and classify the intermediate stage sequencing result data of each short sequence in multiple samples to obtain the intermediate stage sequencing result data corresponding to multiple samples; the intermediate stage sequencing result data corresponding to multiple samples are obtained Sent to target server.
值得注意的是,上述“按照多个样本分别对应的条形码标签对多个样本中每个短序列的中间阶段测序结果数据进行拆分和归类”具体是指将各样本中短序列的中间阶段测序结果数据按照各个样本分别对应的条形码标签进行拆分和归类,以将各样本的中间阶段测序结果数据归类到一起。It is worth noting that the above "split and classify the intermediate stage sequencing result data of each short sequence in multiple samples according to the barcode tags corresponding to multiple samples" specifically refers to the intermediate stage of the short sequence in each sample. The sequencing result data is split and classified according to the barcode tags corresponding to each sample, so as to classify the intermediate stage sequencing result data of each sample together.
更具体的说,单条形码多样本类型下的样本的中间阶段测序结果数据可以在测序中间阶段按照单barcode标签进行拆分和归类;双条形码在单链的多样本类型下的样本的中间阶段测序结果数据可以在测序中间阶段按照双barcode标签进行拆分和归类;双条形码在双链的多样本类型下的样本的中间阶段测序结果数据可以在测序中间阶段按照第一个barcode(即位置在基因序列之前的barcode标签)进行拆分和归类(在测序结束时按照双barcode标签进行拆分和归类)。More specifically, the intermediate stage sequencing result data of samples under single-stranded multi-sample types can be split and classified according to single barcode tags in the intermediate stage of sequencing; the intermediate stage of double-barcode samples under single-stranded multi-sample types Sequencing result data can be split and classified according to double barcode labels in the intermediate stage of sequencing; double barcode sequencing result data in the intermediate stage of samples under double-stranded multi-sample types can be split and classified according to the first barcode (i.e. position) in the intermediate stage of sequencing. The barcode tag before the gene sequence) is split and classified (it is split and classified according to the double barcode tag at the end of sequencing).
在本申请的又一种可能实现的方式中,本实施例在生成中间阶段检测报告的同时还可以继续测序,从而测序完成时可以获得完整检测报告。In another possible implementation manner of the present application, this embodiment can continue sequencing while generating an intermediate stage detection report, so that a complete detection report can be obtained when the sequencing is completed.
具体的,本申请实施例还可以针对基因样本中的每个样本,按照该样本所属的样本类型对应的测序顺序对该样本的每个短序列进行完整测序, 得到该样本中每个短序列的完整测序结果数据,并将每个样本中每个短序列的完整测序结果数据发送至目标服务器,以便目标服务器对每个样本中每个短序列的完整测序结果数据进行数据分析,得到完整检测报告。Specifically, the embodiments of the present application can also perform complete sequencing of each short sequence of each sample in the genetic sample according to the sequencing order corresponding to the sample type to which the sample belongs, and obtain the sequence of each short sequence in the sample. Complete sequencing result data, and send the complete sequencing result data of each short sequence in each sample to the target server, so that the target server can perform data analysis on the complete sequencing result data of each short sequence in each sample and obtain a complete detection report .
结合前述实施例介绍的测序顺序,本实施例“按照该样本所属的样本类型对应的测序顺序对该样本的每个短序列进行完整测序,得到该样本中每个短序列的完整测序结果数据”的过程可以包括:在该样本所属的样本类型为无条形码单样本类型时,对该样本的每个短序列中的基因序列进行测序;在该样本所属的样本类型为单条形码多样本类型时,对该样本的每个短序列中的条形码标签进行测序,并在该样本的每个短序列中的条形码标签测序完成后,对该样本的每个短序列中的基因序列进行测序;在该样本所属的样本类型为双条形码在单链的多样本类型时,对该样本的每个短序列中的两个条形码标签分别进行测序,在两个条形码标签测序完成后,对该样本的每个短序列中的基因序列进行测序;在该样本所属的样本类型为双条形码在双链的多样本类型时,对该样本的每个短序列中的第一个条形码标签进行测序,在第一个条形码标签测序完成后,对该样本的每个短序列中的基因序列进行测序,在该样本的每个短序列中的基因序列测序完成后,对该样本的每个短序列中的第二个条形码标签进行测序。Combined with the sequencing sequence introduced in the previous embodiment, this embodiment "performs complete sequencing of each short sequence of the sample according to the sequencing sequence corresponding to the sample type to which the sample belongs, and obtains complete sequencing result data of each short sequence in the sample." The process may include: when the sample type to which the sample belongs is a single sample type without barcode, sequencing the gene sequence in each short sequence of the sample; when the sample type to which the sample belongs is a single barcode multi-sample type, sequencing Sequencing the barcode tag in each short sequence of the sample, and after the sequencing of the barcode tag in each short sequence of the sample is completed, sequencing the gene sequence in each short sequence of the sample; in the sample When the sample type is double barcoded and has a single-stranded multi-sample type, the two barcode tags in each short sequence of the sample are sequenced separately. After the sequencing of the two barcode tags is completed, each short sequence of the sample is sequenced. The gene sequence in the sequence is sequenced; when the sample type to which the sample belongs is a multi-sample type with double barcodes on double strands, the first barcode tag in each short sequence of the sample is sequenced, and the first barcode tag is sequenced. After the tag sequencing is completed, the gene sequence in each short sequence of the sample is sequenced. After the gene sequence in each short sequence of the sample is sequenced, the second barcode in each short sequence of the sample is completed. Tags for sequencing.
综上,本实施例可以将至少一个barcode标签放在整个测序流程的前面进行,这样可以在测序中间阶段时,获取到部分测序数据就可启动数据分析,用中间阶段测序结果数据的碱基信息得到中间阶段检测报告,以进行初步的鉴定分析;同时继续测序,待测序全部结束后,用完整测序结果数据再次进行数据分析,得到完整检测报告,以进行精准的鉴定分析。In summary, in this embodiment, at least one barcode tag can be placed at the front of the entire sequencing process. In this way, in the intermediate stage of sequencing, data analysis can be started after obtaining part of the sequencing data, and the base information of the sequencing result data in the intermediate stage can be used. Obtain an intermediate stage detection report for preliminary identification and analysis; at the same time, continue sequencing. After all sequencing is completed, perform data analysis again with the complete sequencing result data to obtain a complete detection report for accurate identification and analysis.
在本申请的又一种可能实现的方式中,中间阶段检测报告中包括每个样本的中间阶段质控结果和中间阶段鉴定结果,完整检测报告中包括每个样本的完整质控结果、完整鉴定结果、完整组装结果和完整溯源结果。In another possible implementation method of this application, the intermediate stage test report includes the intermediate stage quality control results and the intermediate stage identification results of each sample, and the complete test report includes the complete quality control results and complete identification results of each sample. Results, complete assembly results and complete traceability results.
其中,一样本的中间阶段质控结果和完整质控结果均用于反映该样本中质量高于预设质量阈值的短序列,一样本的中间阶段鉴定结果和完整鉴定结果均用于反映该样本的病原浓度信息,一样本的完整组装结果用于反 映该样本的所有短序列组装得到的重组样本,一样本的完整溯源结果用于反映该样本所属的亚型。Among them, the intermediate stage quality control results and complete quality control results of a sample are used to reflect the short sequences in the sample whose quality is higher than the preset quality threshold, and the intermediate stage identification results and complete identification results of a sample are both used to reflect the sample. Pathogen concentration information, the complete assembly result of a sample is used to reflect the recombinant sample obtained by assembling all short sequences of the sample, and the complete traceability result of a sample is used to reflect the subtype to which the sample belongs.
具体的来说,对于基因样本包含的每个样本,目标服务器基于该样本中每个短序列的测序结果数据进行数据分析的过程包括:质控、鉴定、组装和溯源四个阶段。Specifically, for each sample contained in the genetic sample, the target server performs data analysis based on the sequencing result data of each short sequence in the sample, including the four stages of quality control, identification, assembly and traceability.
其中,质控是指确定每个短序列的质量是否高于预设质量阈值,将质量低于预设质量阈值的短序列筛除;鉴定是指将该样本中每个短序列的测序结果数据与已知的病原序列数据库比较,以确定该样本的病原浓度信息;组装是指将该样本包含的所有短序列的测序结果数据拼接成长序列片段;溯源是指将拼接的长序列片段与不同国家、地区的数据库中已知亚型的样本进行比较,以确定该样本所属的亚型。Among them, quality control refers to determining whether the quality of each short sequence is higher than the preset quality threshold, and screening short sequences whose quality is lower than the preset quality threshold; identification refers to the sequencing result data of each short sequence in the sample. Compare with the known pathogen sequence database to determine the pathogen concentration information of the sample; assembly refers to splicing the sequencing result data of all short sequences contained in the sample into long sequence fragments; traceability refers to linking the spliced long sequence fragments with those from different countries Compare samples with known subtypes in regional and regional databases to determine the subtype to which the sample belongs.
在本实施例中,无论是中间阶段的测序还是完整测序,均可以按照上述四个阶段进行分析,优选的,可以在中间阶段的测序中仅进行质控和鉴定,以减少中间检测时间。In this embodiment, whether it is intermediate stage sequencing or complete sequencing, analysis can be performed according to the above four stages. Preferably, only quality control and identification can be performed in the intermediate stage sequencing to reduce the intermediate detection time.
可选的,考虑到中间阶段的分析结果可能不准确,为了避免因不准确的分析结果对客户产生误导,可以仅在中间阶段检测报告展示每个样本的中间阶段质控结果和中间阶段鉴定结果,而在完整检测报告中展示每个样本的完整质控结果、完整鉴定结果、完整组装结果和完整溯源结果。Optionally, considering that the analysis results at the intermediate stage may be inaccurate, in order to avoid misleading customers due to inaccurate analysis results, you can only display the intermediate stage quality control results and intermediate stage identification results of each sample in the intermediate stage test report. , and the complete quality control results, complete identification results, complete assembly results and complete traceability results of each sample are displayed in the complete test report.
在本申请的又一种可能实现的方式中,若预设读长中包括第一读长和读长大于第一读长的第二读长,则第一读长下的中间阶段检测报告是指对每个样本的每个短序列在第一读长下的中间阶段测序结果数据进行分析得到的检测报告,第二读长下的中间阶段检测报告和完整检测报告均是指对每个样本中识别为非宿主的短序列在第二读长下的中间阶段测序结果数据进行分析得到的检测报告。In another possible implementation manner of this application, if the preset read length includes a first read length and a second read length that is longer than the first read length, then the intermediate stage detection report under the first read length is Refers to the detection report obtained by analyzing the intermediate stage sequencing result data of each short sequence under the first read length of each sample. The intermediate stage detection report and complete detection report under the second read length both refer to the analysis of each sample. The short sequence identified as non-host is analyzed in the intermediate stage sequencing result data under the second read length and the detection report is obtained.
具体的,目标服务器在得到每个样本的每个短序列在第一读长下的中间阶段测序结果数据后,可以通过分析确定该样本的每个短序列是否属于病原序列或者宿主序列(例如人源样本序列、动物样本序列等)或者未识 别的序列,此时分析得到的检测报告为第一读长下的中间阶段检测报告。Specifically, after the target server obtains the intermediate stage sequencing result data of each short sequence of each sample under the first read length, it can determine through analysis whether each short sequence of the sample belongs to a pathogenic sequence or a host sequence (such as a human sequence). Source sample sequence, animal sample sequence, etc.) or unidentified sequence, the detection report obtained by analysis at this time is the intermediate stage detection report under the first read length.
在目标服务器得到每个样本的每个短序列在第二读长下的中间阶段测序结果数据后,可以对前述每个样本中识别为非宿主的短序列在第二读长下的中间阶段测序结果数据进行分析,得到第二读长下的中间阶段检测报告。After the target server obtains the intermediate stage sequencing result data of each short sequence of each sample under the second read length, it can sequence the short sequence identified as a non-host in each sample at the intermediate stage under the second read length. The resulting data is analyzed and an intermediate stage detection report under the second read length is obtained.
这里,非宿主包括病原和未识别的序列。Here, non-host includes pathogenic and unidentified sequences.
例如,在测序到SE40时,目标服务器可以对每个样本的每个短序列在40bp下的中间阶段测序结果数据进行分析,以确定样本中哪些短序列为病原序列,哪些短序列为宿主序列或未识别的序列(可选的,每个短序列都有对应的序列号,通过序列号标记哪些短序列为病原序列,哪些短序列为宿主序列或未识别的序列)。在后续测序到SE100和PE100时,可以对每个样本中识别为非宿主的那些短序列在第二读长下的中间阶段测序结果数据进行分析,以提高分析效率。For example, when sequencing to SE40, the target server can analyze the intermediate stage sequencing result data of each short sequence under 40bp of each sample to determine which short sequences in the sample are pathogenic sequences and which short sequences are host sequences or Unidentified sequences (optional, each short sequence has a corresponding sequence number, and the sequence number is used to mark which short sequences are pathogenic sequences and which short sequences are host sequences or unidentified sequences). When subsequent sequencing reaches SE100 and PE100, the intermediate stage sequencing result data of those short sequences identified as non-hosts in each sample can be analyzed under the second read length to improve analysis efficiency.
可选的,本实施例还可以对每个样本中识别为宿主的那些短序列的完整测序结果数据进行补充分析,以使得分析结果更完整。Optionally, this embodiment can also perform supplementary analysis on the complete sequencing result data of those short sequences identified as hosts in each sample to make the analysis results more complete.
为了使本领域技术人员更加理解本申请,参见图2所示,为本申请实施例提供的一种边测序边分析应用业务流程的示意图。本实施例可以设置生成报告cycle数,即预设读长,例如可以设置40bp和100bp,然后获取基因样本,并确定基因样本包含的每个样本所属的样本类型,之后可按照每个样本所属的样本类型对应的测序顺序对每个样本的每个短序列进行测序。In order to enable those skilled in the art to better understand the present application, refer to Figure 2, which is a schematic diagram of a sequencing-while-analyzing application business process provided by an embodiment of the present application. In this embodiment, you can set the number of cycles to generate the report, that is, the preset read length, for example, you can set 40bp and 100bp, and then obtain the genetic sample, and determine the sample type to which each sample included in the genetic sample belongs. The sequencing sequence corresponding to the sample type sequences each short sequence of each sample.
在对每个样本的每个短序列中的基因序列测序至40bp时,获得阶段1每个样本中每个短序列的中间阶段测序结果数据,基于阶段1每个样本中每个短序列的中间阶段测序结果数据生成40cycle报告和fq文件(fq是测序生成的主要结果文件,全称是FASTQ文件,包含测序结果数据和对应质量值),在对每个样本的每个短序列中的基因序列测序至100bp时,获得阶段2每个样本中每个短序列的中间阶段测序结果数据,基于阶段2每个 样本中每个短序列的中间阶段测序结果数据生成100cycle报告和fq文件,并在测序完成时获得完整测序报告和fq文件。When the gene sequence in each short sequence of each sample is sequenced to 40bp, the intermediate stage sequencing result data of each short sequence in each sample in stage 1 is obtained, based on the intermediate stage of each short sequence in each sample in stage 1. The stage sequencing result data generates 40cycle reports and fq files (fq is the main result file generated by sequencing, the full name is FASTQ file, including sequencing result data and corresponding quality values), and the gene sequence in each short sequence of each sample is sequenced When reaching 100bp, obtain the intermediate stage sequencing result data of each short sequence in each sample in stage 2, generate a 100cycle report and fq file based on the intermediate stage sequencing result data of each short sequence in each sample in stage 2, and complete the sequencing Get the complete sequencing report and fq file at the same time.
上述三个阶段获得的测序结果数据均可上传至目标服务器,以进行数据分析,获得三个阶段分别对应的检测报告,这三个阶段的检测报告中,完整检测报告的精准度最高,阶段2得到的第二中间阶段检测报告的精准度次之,阶段1得到的第一中间阶段检测报告的精准度最低。The sequencing result data obtained in the above three stages can be uploaded to the target server for data analysis to obtain test reports corresponding to the three stages. Among the test reports of these three stages, the complete test report has the highest accuracy. Stage 2 The accuracy of the second intermediate stage detection report obtained is second, and the accuracy of the first intermediate stage detection report obtained in stage 1 is the lowest.
参见图3所示的多样本类型下的样本测序获得三阶段检测报告的时间段示意图。其中,单条形码多样本类型下的样本在测序时,在测序过程第40个cycle出报告,获得中间阶段检测报告的时间为5.5小时(h),第100个cycle出报告,获得中间阶段检测报告的时间为11小时,完整PE100测序,获得完整检测报告的时间为24.5小时;双条形码在单链的多样本类型下的样本在测序时,获得三阶段检测报告的时间分别为40cycle需6.5小时,100cycle需12小时,完整PE100测序需25.5小时(图3未示出);双条形码在双链的多样本类型(与单条形码多样本类型的时间相同,图3未示出)下的样本在测序时,获得三阶段检测报告的时间分别为40cycle需5.5小时,100cycle需11小时,完整PE100测序需24.5小时。See Figure 3 for a schematic diagram of the time period for obtaining a three-stage test report through sample sequencing under multiple sample types. Among them, when samples with a single barcode and multiple sample types are sequenced, the report is issued in the 40th cycle of the sequencing process, and the time to obtain the intermediate stage detection report is 5.5 hours (h). The report is issued in the 100th cycle, and the intermediate stage detection report is obtained. The time for complete PE100 sequencing is 24.5 hours to obtain a complete test report; when sequencing samples with dual barcodes in single-stranded multi-sample types, the time to obtain three-stage test reports is 6.5 hours for 40 cycles respectively. 100cycle takes 12 hours, and complete PE100 sequencing takes 25.5 hours (not shown in Figure 3); samples with double barcodes in double-stranded multi-sample types (the same time as single-barcode multi-sample types, not shown in Figure 3) are sequenced At that time, the time to obtain the three-stage test report was 5.5 hours for 40cycle, 11 hours for 100cycle, and 24.5 hours for complete PE100 sequencing.
对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请并不受所描述的动作顺序的限制,因为依据本申请,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定是本申请所必须的。For the foregoing method embodiments, for the sake of simple description, they are all expressed as a series of action combinations. However, those skilled in the art should know that this application is not limited by the described action sequence, because according to this application, Some steps may be performed in other orders or simultaneously. Secondly, those skilled in the art should also know that the embodiments described in the specification are preferred embodiments, and the actions and modules involved are not necessarily necessary for this application.
图4为本申请实施例提供的一种基因测序装置一个实施例的结构示意图,与图1描述的本申请实施例提供的一种基因测序方法相对应,本实施例所述基因测序装置,在实际应用中可以具体应用于所述基因测序仪,该装置可以包括:Figure 4 is a schematic structural diagram of an embodiment of a gene sequencing device provided by an embodiment of the present application, corresponding to a gene sequencing method provided by an embodiment of the present application described in Figure 1. The gene sequencing device described in this embodiment is In practical applications, it can be specifically applied to the gene sequencer, and the device can include:
数据获取模块401,用于获取待检测的基因样本和预设读长,其中,基因样本中包括至少一个样本,任一样本中包括至少一个短序列,每个短 序列中包括待检测的基因序列和至多两个条形码标签,在一短序列包括条形码标签的情况下,该短序列中至少一个条形码标签的位置位于基因序列的位置之前。The data acquisition module 401 is used to obtain the gene sample to be detected and the preset read length, wherein the gene sample includes at least one sample, any sample includes at least one short sequence, and each short sequence includes the gene sequence to be detected. and at most two barcode tags, where a short sequence includes a barcode tag, the position of at least one barcode tag in the short sequence is located before the position of the gene sequence.
样本类型确定模块402,用于确定基因样本包含的每个样本所属的样本类型。The sample type determination module 402 is used to determine the sample type to which each sample contained in the genetic sample belongs.
第一测序模块403,用于针对基因样本中的每个样本,按照该样本所属的样本类型对应的测序顺序对该样本的每个短序列进行测序,直至对该样本的每个短序列中的基因序列测序至预设读长时,得到该样本中每个短序列的中间阶段测序结果数据。The first sequencing module 403 is configured to sequence each short sequence of each sample in the genetic sample according to the sequencing order corresponding to the sample type to which the sample belongs, until the sequence of each short sequence of the sample is When the gene sequence is sequenced to the preset read length, the intermediate stage sequencing result data of each short sequence in the sample is obtained.
测序结果数据发送模块404,用于将每个样本中每个短序列的中间阶段测序结果数据发送至目标服务器,以便目标服务器对每个样本中每个短序列的中间阶段测序结果数据进行数据分析,得到中间阶段检测报告。The sequencing result data sending module 404 is used to send the intermediate stage sequencing result data of each short sequence in each sample to the target server, so that the target server can perform data analysis on the intermediate stage sequencing result data of each short sequence in each sample. , get the intermediate stage detection report.
在一种可能的实现方式中,上述样本类型确定模块402具体可以用于:在基因样本中包括一个样本的情况下,确定该样本属于无条形码单样本类型;在基因样本中包括多个样本的情况下,针对多个样本中的每个样本,若该样本的短序列中包括一个条形码标签,则确定该样本属于单条形码多样本类型,若该样本的短序列中包括位于同一链上的两个条形码标签,则确定该样本属于双条形码在单链的多样本类型,若该样本的短序列中包括位于两个链上的两个条形码标签,则确定该样本属于双条形码在双链的多样本类型。In a possible implementation, the above-mentioned sample type determination module 402 can be specifically used to: when the genetic sample includes one sample, determine that the sample belongs to the non-barcode single sample type; when the genetic sample includes multiple samples, In this case, for each sample in multiple samples, if the short sequence of the sample includes a barcode tag, it is determined that the sample belongs to the single-barcode multiple sample type. If the short sequence of the sample includes two barcode tags located on the same chain. If there are multiple barcode tags, it is determined that the sample belongs to the multi-sample type with double barcodes on a single strand. If the short sequence of the sample includes two barcode tags located on two strands, it is determined that the sample belongs to the multi-sample type with double barcodes on both strands. This type.
在一种可能的实现方式中,上述无条形码单样本类型对应的测序顺序为:对无条形码单样本类型下的样本的每个短序列中的基因序列进行测序。In one possible implementation, the sequencing sequence corresponding to the above barcode-free single sample type is: sequencing the gene sequence in each short sequence of the sample under the barcode-free single sample type.
在一种可能的实现方式中,上述单条形码多样本类型对应的测序顺序为:对单条形码多样本类型下的样本的每个短序列中的条形码标签进行测序,并在条形码标签测序完成后,对单条形码多样本类型下的样本的每个短序列中的基因序列进行测序。In a possible implementation, the above-mentioned sequencing sequence corresponding to the single barcode multiple sample type is: sequence the barcode tags in each short sequence of the sample under the single barcode multiple sample type, and after the barcode tag sequencing is completed, Sequence the gene sequence in each short sequence of samples under a single barcode multi-sample type.
在一种可能的实现方式中,上述双条形码在单链的多样本类型对应的 测序顺序为:对双条形码在单链的多样本类型下的样本的每个短序列中的两个条形码标签分别进行测序,在两个条形码标签测序完成后,对双条形码在单链的多样本类型下的样本的每个短序列中的基因序列进行测序。In a possible implementation, the sequencing sequence corresponding to the above-mentioned double barcode in the single-stranded multi-sample type is: pair the two barcode tags in each short sequence of the sample with the double-barcode in the single-stranded multi-sample type respectively. Sequencing is performed. After the sequencing of the two barcode tags is completed, the gene sequence in each short sequence of the sample under the single-stranded multi-sample type is sequenced.
在一种可能的实现方式中,上述双条形码在双链的多样本类型对应的测序顺序为:对双条形码在双链的多样本类型下的样本的每个短序列中的第一个条形码标签进行测序,在第一个条形码标签测序完成后,对双条形码在双链的多样本类型下的样本的每个短序列中的基因序列进行测序,在基因序列测序完成后,对双条形码在双链的多样本类型下的样本的每个短序列中的第二个条形码标签进行测序。In a possible implementation, the sequencing sequence corresponding to the above-mentioned double barcode in the double-stranded multi-sample type is: the first barcode label in each short sequence of the sample in the double-stranded multi-sample type Sequencing is performed. After the first barcode tag is sequenced, the gene sequence in each short sequence of the double-stranded multi-sample type sample is sequenced. After the gene sequence sequencing is completed, the double barcode is sequenced in the double-stranded multi-sample type. The second barcode tag in each short sequence of the sample under the chain's multi-sample type is sequenced.
在一种可能的实现方式中,对多样本类型下的任一样本的每个短序列进行测序时使用的条形码引物的长度小于历史条形码引物的长度,其中,多样本类型包括单条形码多样本类型、双条形码在单链的多样本类型和双条形码在双链的多样本类型。In a possible implementation, the length of the barcode primer used when sequencing each short sequence of any sample under the multi-sample type is smaller than the length of the historical barcode primer, wherein the multi-sample type includes a single barcode multi-sample type. , multi-sample type with double barcodes on single strand and multi-sample type with double barcodes on double strands.
在一种可能的实现方式中,上述预设读长包括至少一个读长。In a possible implementation, the above-mentioned preset read length includes at least one read length.
在一种可能的实现方式中,若基因样本中包括多个样本,则上述测序结果数据发送模块404具体可以按照多个样本分别对应的条形码标签对多个样本中每个短序列的中间阶段测序结果数据进行拆分和归类,得到多个样本分别对应的中间阶段测序结果数据,将多个样本分别对应的中间阶段测序结果数据发送至目标服务器。In a possible implementation, if the genetic sample includes multiple samples, the above-mentioned sequencing result data sending module 404 can specifically sequence the intermediate stage of each short sequence in the multiple samples according to the barcode tags corresponding to the multiple samples. The result data is split and classified to obtain intermediate-stage sequencing result data corresponding to multiple samples, and the intermediate-stage sequencing result data corresponding to multiple samples is sent to the target server.
在一种可能的实现方式中,本申请实施例提供的基因测序装置还可以包括:第二测序模块和完整测序结果数据发送模块。In a possible implementation, the gene sequencing device provided by the embodiment of the present application may further include: a second sequencing module and a complete sequencing result data sending module.
第二测序模块,用于针对基因样本中的每个样本,按照该样本所属的样本类型对应的测序顺序对该样本的每个短序列进行完整测序,得到该样本中每个短序列的完整测序结果数据。The second sequencing module is used to completely sequence each short sequence of each sample in the genetic sample according to the sequencing order corresponding to the sample type to which the sample belongs, and obtain the complete sequencing of each short sequence in the sample. Result data.
完整测序结果数据发送模块,用于将每个样本中每个短序列的完整测序结果数据发送至目标服务器,以便目标服务器对每个样本中每个短序列的完整测序结果数据进行数据分析,得到完整检测报告。The complete sequencing result data sending module is used to send the complete sequencing result data of each short sequence in each sample to the target server, so that the target server can perform data analysis on the complete sequencing result data of each short sequence in each sample, and obtain Complete test report.
在一种可能的实现方式中,中间阶段检测报告中包括每个样本的中间阶段质控结果和中间阶段鉴定结果,完整检测报告中包括每个样本的完整质控结果、完整鉴定结果、完整组装结果和完整溯源结果。In a possible implementation, the intermediate stage test report includes the intermediate stage quality control results and the intermediate stage identification results of each sample, and the complete test report includes the complete quality control results, complete identification results, and complete assembly of each sample. results and complete traceability results.
其中,一样本的中间阶段质控结果和完整质控结果均用于反映该样本中质量高于预设质量阈值的短序列,一样本的中间阶段鉴定结果和完整鉴定结果均用于反映该样本的病原浓度信息,一样本的完整组装结果用于反映该样本的所有短序列组装得到的重组样本,一样本的完整溯源结果用于反映该样本所属的亚型。Among them, the intermediate stage quality control results and complete quality control results of a sample are used to reflect the short sequences in the sample whose quality is higher than the preset quality threshold, and the intermediate stage identification results and complete identification results of a sample are both used to reflect the sample. Pathogen concentration information, the complete assembly result of a sample is used to reflect the recombinant sample obtained by assembling all short sequences of the sample, and the complete traceability result of a sample is used to reflect the subtype to which the sample belongs.
在一种可能的实现方式中,若预设读长中包括第一读长和读长大于第一读长的第二读长,则第一读长下的中间阶段检测报告是指对每个样本的每个短序列在第一读长下的中间阶段测序结果数据进行分析得到的检测报告,第二读长下的中间阶段检测报告和完整检测报告均是指对每个样本中识别为非宿主的短序列在第二读长下的中间阶段测序结果数据进行分析得到的检测报告。In a possible implementation, if the preset read length includes a first read length and a second read length that is longer than the first read length, then the intermediate stage detection report under the first read length refers to each The detection report obtained by analyzing the intermediate stage sequencing result data under the first read length of each short sequence of the sample. The intermediate stage detection report and the complete detection report under the second read length refer to the non-sequencing identified in each sample. A test report obtained by analyzing the intermediate stage sequencing result data of the host's short sequence under the second read length.
本申请实施例还提供了一种基因测序设备。可选的,图5示出了基因测序设备的硬件结构框图,参照图5,该基因测序设备的硬件结构可以包括:至少一个处理器501,至少一个通信接口502,至少一个存储器503和至少一个通信总线504;Embodiments of the present application also provide a gene sequencing device. Optionally, Figure 5 shows a hardware structure block diagram of a gene sequencing device. Referring to Figure 5, the hardware structure of the gene sequencing device may include: at least one processor 501, at least one communication interface 502, at least one memory 503 and at least one Communication bus 504;
在本申请实施例中,处理器501、通信接口502、存储器503、通信总线504的数量为至少一个,且处理器501、通信接口502、存储器503通过通信总线504完成相互间的通信;In this embodiment of the present application, the number of the processor 501, the communication interface 502, the memory 503, and the communication bus 504 is at least one, and the processor 501, the communication interface 502, and the memory 503 complete communication with each other through the communication bus 504;
处理器501可能是一个中央处理器CPU,或者是特定集成电路ASIC(Application Specific Integrated Circuit),或者是被配置成实施本发明实施例的一个或多个集成电路等;The processor 501 may be a central processing unit CPU, or an application specific integrated circuit ASIC (Application Specific Integrated Circuit), or one or more integrated circuits configured to implement embodiments of the present invention, etc.;
存储器503可能包含高速RAM存储器,也可能还包括非易失性存储器(non-volatile memory)等,例如至少一个磁盘存储器;The memory 503 may include high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory;
其中,存储器503存储有程序,处理器501可调用存储器503存储的 程序,所述程序用于:Among them, the memory 503 stores a program, and the processor 501 can call the program stored in the memory 503. The program is used for:
获取待检测的基因样本和预设读长,其中,基因样本中包括至少一个样本,任一样本中包括至少一个短序列,每个短序列中包括待检测的基因序列和至多两个条形码标签,在一短序列包括条形码标签的情况下,该短序列中至少一个条形码标签的位置位于基因序列的位置之前;Obtain the gene sample to be detected and the preset read length, wherein the gene sample includes at least one sample, any sample includes at least one short sequence, and each short sequence includes the gene sequence to be detected and at most two barcode tags, In the case where a short sequence includes a barcode tag, the position of at least one barcode tag in the short sequence is located before the position of the gene sequence;
确定基因样本包含的每个样本所属的样本类型;Determine the sample type to which each sample contained in the genetic sample belongs;
针对基因样本中的每个样本,按照该样本所属的样本类型对应的测序顺序对该样本的每个短序列进行测序,直至对该样本的每个短序列中的基因序列测序至预设读长时,得到该样本中每个短序列的中间阶段测序结果数据;For each sample in the genetic sample, sequence each short sequence of the sample according to the sequencing order corresponding to the sample type to which the sample belongs, until the gene sequence in each short sequence of the sample is sequenced to the preset read length. When, the intermediate stage sequencing result data of each short sequence in the sample is obtained;
将每个样本中每个短序列的中间阶段测序结果数据发送至目标服务器,以便目标服务器对每个样本中每个短序列的中间阶段测序结果数据进行数据分析,得到中间阶段检测报告。Send the intermediate stage sequencing result data of each short sequence in each sample to the target server, so that the target server can perform data analysis on the intermediate stage sequencing result data of each short sequence in each sample and obtain an intermediate stage detection report.
可选的,所述程序的细化功能和扩展功能可参照上文描述。Optionally, the detailed functions and extended functions of the program may refer to the above description.
本申请实施例还提供一种可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时,实现如上述基因测序方法。Embodiments of the present application also provide a readable storage medium on which a computer program is stored. When the computer program is executed by a processor, the above gene sequencing method is implemented.
可选的,所述程序的细化功能和扩展功能可参照上文描述。Optionally, the detailed functions and extended functions of the program may refer to the above description.
最后,还需要说明的是,在本文中,诸如和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。Finally, it should be noted that in this article, relational terms such as and second are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply that these entities or operations There is no such actual relationship or sequence between them. Furthermore, the terms "comprises," "comprises," or any other variations thereof are intended to cover a non-exclusive inclusion such that a process, method, article, or apparatus that includes a list of elements includes not only those elements, but also those not expressly listed other elements, or elements inherent to the process, method, article or equipment. Without further limitation, an element defined by the statement "comprises a..." does not exclude the presence of additional identical elements in a process, method, article, or apparatus that includes the stated element.
本说明书中各个实施例采用递进的方式描述,每个实施例重点说明的 都是与其他实施例的不同之处,各个实施例之间相同相似部分互相参见即可。Each embodiment in this specification is described in a progressive manner. Each embodiment focuses on its differences from other embodiments. The same and similar parts between the various embodiments can be referred to each other.
对所公开的实施例的上述说明,使本领域专业技术人员能够实现或使用本申请。对这些实施例的多种修改对本领域的专业技术人员来说将是显而易见的,本文中所定义的一般原理可以在不脱离本申请的精神或范围的情况下,在其它实施例中实现。因此,本申请将不会被限制于本文所示的这些实施例,而是要符合与本文所公开的原理和新颖特点相一致的最宽的范围。The above description of the disclosed embodiments enables those skilled in the art to implement or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be practiced in other embodiments without departing from the spirit or scope of the application. Therefore, the present application is not to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (12)

  1. 一种基因测序方法,其特征在于,包括以下步骤:A gene sequencing method, characterized by including the following steps:
    获取待检测的基因样本和预设读长,其中,所述基因样本中包括至少一个样本,任一样本中包括至少一个短序列,每个短序列中包括待检测的基因序列和至多两个条形码标签,在一短序列包括所述条形码标签的情况下,该短序列中所述至少一个条形码标签的位置位于所述基因序列的位置之前;Obtain the gene sample to be detected and the preset read length, wherein the gene sample includes at least one sample, any sample includes at least one short sequence, and each short sequence includes the gene sequence to be detected and at most two barcodes. Tag, in the case where a short sequence includes the barcode tag, the position of the at least one barcode tag in the short sequence is located before the position of the gene sequence;
    确定所述基因样本包含的每个样本所属的样本类型;Determine the sample type to which each sample contained in the genetic sample belongs;
    针对所述基因样本中的每个样本,按照该样本所属的样本类型对应的测序顺序对该样本的每个短序列进行测序,直至对该样本的每个短序列中的基因序列测序至所述预设读长时,得到该样本中每个短序列的中间阶段测序结果数据;For each sample in the genetic sample, sequence each short sequence of the sample according to the sequencing order corresponding to the sample type to which the sample belongs, until the gene sequence in each short sequence of the sample is sequenced to the described When the read length is preset, the intermediate stage sequencing result data of each short sequence in the sample is obtained;
    将每个样本中每个短序列的中间阶段测序结果数据发送至目标服务器,以便所述目标服务器对每个样本中每个短序列的中间阶段测序结果数据进行数据分析,得到中间阶段检测报告。The intermediate stage sequencing result data of each short sequence in each sample is sent to the target server, so that the target server performs data analysis on the intermediate stage sequencing result data of each short sequence in each sample to obtain an intermediate stage detection report.
  2. 根据权利要求1所述的基因测序方法,其特征在于,所述确定所述基因样本包含的每个样本所属的样本类型,包括:The gene sequencing method according to claim 1, wherein determining the sample type to which each sample contained in the gene sample belongs includes:
    在所述基因样本中包括一个样本的情况下,确定该样本属于无条形码单样本类型;In the case where the genetic sample includes one sample, it is determined that the sample belongs to the non-barcode single sample type;
    在所述基因样本中包括多个样本的情况下,针对所述多个样本中的每个样本,若该样本的短序列中包括一个条形码标签,则确定该样本属于单条形码多样本类型,若该样本的短序列中包括位于同一链上的两个条形码标签,则确定该样本属于双条形码在单链的多样本类型,若该样本的短序列中包括位于两个链上的两个条形码标签,则确定该样本属于双条形码在双链的多样本类型。In the case where the genetic sample includes multiple samples, for each sample in the multiple samples, if the short sequence of the sample includes a barcode tag, it is determined that the sample belongs to the single-barcode multiple-sample type, and if If the short sequence of the sample includes two barcode tags located on the same strand, it is determined that the sample belongs to the multi-sample type with dual barcodes on a single strand. If the short sequence of the sample includes two barcode tags located on two strands , it is determined that the sample belongs to the multi-sample type with dual barcodes on both strands.
  3. 根据权利要求2所述的基因测序方法,其特征在于,所述无条形码单样本类型对应的测序顺序为:对所述无条形码单样本类型下的样本的每 个短序列中的基因序列进行测序;The gene sequencing method according to claim 2, characterized in that the sequencing sequence corresponding to the barcode-free single sample type is: sequencing the gene sequence in each short sequence of the sample under the barcode-free single sample type. ;
    所述单条形码多样本类型对应的测序顺序为:对所述单条形码多样本类型下的样本的每个短序列中的条形码标签进行测序,并在条形码标签测序完成后,对所述单条形码多样本类型下的样本的每个短序列中的基因序列进行测序;The sequencing sequence corresponding to the single barcode multiple sample type is: sequence the barcode tags in each short sequence of the sample under the single barcode multiple sample type, and after the barcode tag sequencing is completed, sequence the single barcode diverse The gene sequence in each short sequence of samples under this type is sequenced;
    所述双条形码在单链的多样本类型对应的测序顺序为:对所述双条形码在单链的多样本类型下的样本的每个短序列中的两个条形码标签分别进行测序,在所述两个条形码标签测序完成后,对所述双条形码在单链的多样本类型下的样本的每个短序列中的基因序列进行测序;The sequencing sequence corresponding to the double barcode in the single-stranded multi-sample type is: sequence the two barcode tags in each short sequence of the sample in the single-stranded multi-sample type. After the sequencing of the two barcode tags is completed, sequence the gene sequence in each short sequence of the sample under the single-stranded multi-sample type of the double barcode;
    所述双条形码在双链的多样本类型对应的测序顺序为:对所述双条形码在双链的多样本类型下的样本的每个短序列中的第一个条形码标签进行测序,在所述第一个条形码标签测序完成后,对所述双条形码在双链的多样本类型下的样本的每个短序列中的基因序列进行测序,在基因序列测序完成后,对所述双条形码在双链的多样本类型下的样本的每个短序列中的第二个条形码标签进行测序。The sequencing sequence corresponding to the double barcode in the double-stranded multi-sample type is: sequence the first barcode tag in each short sequence of the sample in the double-stranded multi-sample type, and then After the sequencing of the first barcode tag is completed, the gene sequence in each short sequence of the double-stranded multi-sample type sample of the double barcode is sequenced. After the gene sequence sequencing is completed, the double barcode is sequenced in the double-stranded multi-sample type. The second barcode tag in each short sequence of the sample under the chain's multi-sample type is sequenced.
  4. 根据权利要求3所述的基因测序方法,其特征在于,对多样本类型下的任一样本的每个短序列进行测序时使用的条形码引物的长度小于历史条形码引物的长度,其中,所述多样本类型包括所述单条形码多样本类型、所述双条形码在单链的多样本类型和所述双条形码在双链的多样本类型。The gene sequencing method according to claim 3, characterized in that the length of the barcode primer used when sequencing each short sequence of any sample under multiple sample types is smaller than the length of the historical barcode primer, wherein the diverse This type includes the single-barcode multi-sample type, the double-barcode multi-sample type on a single strand, and the double-barcode multi-sample type on double strands.
  5. 根据权利要求1所述的基因测序方法,其特征在于,所述预设读长包括至少一个读长。The gene sequencing method according to claim 1, wherein the preset read length includes at least one read length.
  6. 根据权利要求1所述的基因测序方法,其特征在于,若所述基因样本中包括多个样本,则所述将每个样本中每个短序列的中间阶段测序结果数据发送至目标服务器,包括:The gene sequencing method according to claim 1, characterized in that if the gene sample includes multiple samples, the intermediate stage sequencing result data of each short sequence in each sample is sent to the target server, including :
    按照所述多个样本分别对应的条形码标签对所述多个样本中每个短序列的中间阶段测序结果数据进行拆分和归类,得到所述多个样本分别对应的中间阶段测序结果数据;Split and classify the intermediate-stage sequencing result data of each short sequence in the multiple samples according to the barcode tags corresponding to the multiple samples, to obtain the intermediate-stage sequencing result data corresponding to the multiple samples;
    将所述多个样本分别对应的中间阶段测序结果数据发送至所述目标服务器。The intermediate stage sequencing result data corresponding to the plurality of samples is sent to the target server.
  7. 根据权利要求1所述的基因测序方法,其特征在于,还包括:The gene sequencing method according to claim 1, further comprising:
    针对所述基因样本中的每个样本,按照该样本所属的样本类型对应的测序顺序对该样本的每个短序列进行完整测序,得到该样本中每个短序列的完整测序结果数据;For each sample in the genetic sample, perform complete sequencing of each short sequence of the sample according to the sequencing order corresponding to the sample type to which the sample belongs, to obtain complete sequencing result data of each short sequence in the sample;
    将每个样本中每个短序列的完整测序结果数据发送至所述目标服务器,以便所述目标服务器对每个样本中每个短序列的完整测序结果数据进行数据分析,得到完整检测报告。The complete sequencing result data of each short sequence in each sample is sent to the target server, so that the target server performs data analysis on the complete sequencing result data of each short sequence in each sample and obtains a complete detection report.
  8. 根据权利要求7所述的基因序列方法,其特征在于,所述中间阶段检测报告中包括每个样本的中间阶段质控结果和中间阶段鉴定结果,所述完整检测报告中包括每个样本的完整质控结果、完整鉴定结果、完整组装结果和完整溯源结果;The gene sequence method according to claim 7, characterized in that the intermediate stage detection report includes the intermediate stage quality control results and the intermediate stage identification results of each sample, and the complete detection report includes the complete test results of each sample. Quality control results, complete identification results, complete assembly results and complete traceability results;
    其中,一样本的中间阶段质控结果和完整质控结果均用于反映该样本中质量高于预设质量阈值的短序列,一样本的中间阶段鉴定结果和完整鉴定结果均用于反映该样本的病原浓度信息,一样本的完整组装结果用于反映该样本的所有短序列组装得到的重组样本,一样本的完整溯源结果用于反映该样本所属的亚型。Among them, the intermediate stage quality control results and complete quality control results of a sample are used to reflect the short sequences in the sample whose quality is higher than the preset quality threshold, and the intermediate stage identification results and complete identification results of a sample are both used to reflect the sample. Pathogen concentration information, the complete assembly result of a sample is used to reflect the recombinant sample obtained by assembling all short sequences of the sample, and the complete traceability result of a sample is used to reflect the subtype to which the sample belongs.
  9. 根据权利要求7所述的基因序列方法,其特征在于,若所述预设读长中包括第一读长和读长大于所述第一读长的第二读长,则所述第一读长下的中间阶段检测报告是指对每个样本的每个短序列在所述第一读长下的中间阶段测序结果数据进行分析得到的检测报告,所述第二读长下的中间阶段检测报告和所述完整检测报告均是指对每个样本中识别为非宿主的短序列在所述第二读长下的中间阶段测序结果数据进行分析得到的检测报告。The gene sequence method according to claim 7, characterized in that if the preset read length includes a first read length and a second read length that is longer than the first read length, then the first read length The intermediate stage detection report under the long term refers to the detection report obtained by analyzing the intermediate stage sequencing result data under the first read length for each short sequence of each sample, and the intermediate stage detection under the second read length. Both the report and the complete detection report refer to the detection report obtained by analyzing the intermediate stage sequencing result data of the short sequence identified as a non-host in each sample under the second read length.
  10. 一种基因测序装置,其特征在于,包括数据获取模块、样本类型确定模块、第一测序模块和测序结果数据发送模块;A gene sequencing device, characterized by comprising a data acquisition module, a sample type determination module, a first sequencing module and a sequencing result data sending module;
    所述数据获取模块,用于获取待检测的基因样本和预设读长,其中,所述基因样本中包括至少一个样本,任一样本中包括至少一个短序列,每个短序列中包括待检测的基因序列和至多两个条形码标签,在一短序列包括所述条形码标签的情况下,该短序列中所述至少一个条形码标签的位置位于所述基因序列的位置之前;The data acquisition module is used to obtain the gene sample to be detected and the preset read length, wherein the gene sample includes at least one sample, any sample includes at least one short sequence, and each short sequence includes the gene sample to be detected. The gene sequence and at most two barcode tags, in the case where a short sequence includes the barcode tag, the position of the at least one barcode tag in the short sequence is located before the position of the gene sequence;
    所述样本类型确定模块,用于确定所述基因样本包含的每个样本所属的样本类型;The sample type determination module is used to determine the sample type to which each sample contained in the genetic sample belongs;
    所述第一测序模块,用于针对所述基因样本中的每个样本,按照该样本所属的样本类型对应的测序顺序对该样本的每个短序列进行测序,直至对该样本的每个短序列中的基因序列测序至所述预设读长时,得到该样本中每个短序列的中间阶段测序结果数据;The first sequencing module is used for sequencing each short sequence of each sample in the genetic sample according to the sequencing order corresponding to the sample type to which the sample belongs, until each short sequence of the sample is When the gene sequence in the sequence is sequenced to the preset read length, the intermediate stage sequencing result data of each short sequence in the sample is obtained;
    所述测序结果数据发送模块,用于将每个样本中每个短序列的中间阶段测序结果数据发送至目标服务器,以便所述目标服务器对每个样本中每个短序列的中间阶段测序结果数据进行数据分析,得到中间阶段检测报告。The sequencing result data sending module is used to send the intermediate stage sequencing result data of each short sequence in each sample to the target server, so that the target server can process the intermediate stage sequencing result data of each short sequence in each sample. Carry out data analysis and obtain an intermediate stage detection report.
  11. 一种基因测序设备,其特征在于,包括存储器和处理器;A gene sequencing device, characterized by including a memory and a processor;
    所述存储器,用于存储程序;The memory is used to store programs;
    所述处理器,用于执行所述程序,实现如权利要求1~9任一项所述的基因测序方法的各个步骤。The processor is used to execute the program and implement each step of the gene sequencing method according to any one of claims 1 to 9.
  12. 一种可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时,实现如权利要求1~9任一项所述的基因测序方法的各个步骤。A readable storage medium with a computer program stored thereon, characterized in that when the computer program is executed by a processor, each step of the gene sequencing method according to any one of claims 1 to 9 is implemented.
PCT/CN2022/119453 2022-09-16 2022-09-16 Gene sequencing method, apparatus and device, and medium WO2024055320A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/119453 WO2024055320A1 (en) 2022-09-16 2022-09-16 Gene sequencing method, apparatus and device, and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/119453 WO2024055320A1 (en) 2022-09-16 2022-09-16 Gene sequencing method, apparatus and device, and medium

Publications (1)

Publication Number Publication Date
WO2024055320A1 true WO2024055320A1 (en) 2024-03-21

Family

ID=90274016

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/119453 WO2024055320A1 (en) 2022-09-16 2022-09-16 Gene sequencing method, apparatus and device, and medium

Country Status (1)

Country Link
WO (1) WO2024055320A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017223366A1 (en) * 2016-06-23 2017-12-28 Accuragen Holdings Limited Cell-free nucleic acid standards and uses thereof
WO2018053362A1 (en) * 2016-09-15 2018-03-22 ArcherDX, Inc. Methods of nucleic acid sample preparation
WO2018094031A1 (en) * 2016-11-16 2018-05-24 Progenity, Inc. Multimodal assay for detecting nucleic acid aberrations
CN112410408A (en) * 2020-11-12 2021-02-26 江苏高美基因科技有限公司 Gene sequencing method, apparatus, device and computer readable storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017223366A1 (en) * 2016-06-23 2017-12-28 Accuragen Holdings Limited Cell-free nucleic acid standards and uses thereof
WO2018053362A1 (en) * 2016-09-15 2018-03-22 ArcherDX, Inc. Methods of nucleic acid sample preparation
WO2018094031A1 (en) * 2016-11-16 2018-05-24 Progenity, Inc. Multimodal assay for detecting nucleic acid aberrations
CN112410408A (en) * 2020-11-12 2021-02-26 江苏高美基因科技有限公司 Gene sequencing method, apparatus, device and computer readable storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
DOUGLAS W FADROSH;BING MA;PAWEL GAJER;NAOMI SENGAMALAY;SANDRA OTT;REBECCA M BROTMAN;JACQUES RAVEL: "An improved dual-indexing approach for multiplexed 16S rRNA gene sequencing on the Illumina MiSeq platform", MICROBIOME, BIOMED CENTRAL LTD, LONDON, UK, vol. 2, no. 1, 24 February 2014 (2014-02-24), London, UK , pages 6, XP021179536, ISSN: 2049-2618, DOI: 10.1186/2049-2618-2-6 *
HENRIK STRANNEHEIM;MARTIN ENGVALL;KARIN NAESS;NICOLE LESKO;PONTUS LARSSON;MATS DAHLBERG;ROBIN ANDEER;ANNA WREDENBERG;CHRIS FREYER;: "Rapid pulsed whole genome sequencing for comprehensive acute diagnostics of inborn errors of metabolism", BMC GENOMICS, BIOMED CENTRAL LTD, LONDON, UK, vol. 15, no. 1, 11 December 2014 (2014-12-11), London, UK , pages 1090, XP021206674, ISSN: 1471-2164, DOI: 10.1186/1471-2164-15-1090 *
MARTIN KIRCHER, SUSANNA SAWYER, MATTHIAS MEYER: "Double indexing overcomes inaccuracies in multiplex sequencing on the Illumina platform", NUCLEIC ACIDS RESEARCH, OXFORD UNIVERSITY PRESS, GB, vol. 40, no. 1, 1 January 2012 (2012-01-01), GB , pages e3 - e3, XP055476349, ISSN: 0305-1048, DOI: 10.1093/nar/gkr771 *

Similar Documents

Publication Publication Date Title
Stauffer et al. EzColocalization: An ImageJ plugin for visualizing and measuring colocalization in cells and organisms
Ashhurst et al. Integration, exploration, and analysis of high‐dimensional single‐cell cytometry data using Spectre
US10991453B2 (en) Alignment of nucleic acid sequences containing homopolymers based on signal values measured for nucleotide incorporations
Zangar et al. ELISA microarray technology as a high-throughput system for cancer biomarker validation
He et al. Alevin-fry unlocks rapid, accurate and memory-frugal quantification of single-cell RNA-seq data
Morgulis et al. WindowMasker: window-based masker for sequenced genomes
Du et al. Evaluation of STAR and Kallisto on single cell RNA-Seq data alignment
Ritter et al. The importance of being cis: evolution of orthologous fish and mammalian enhancer activity
Ner-Gaon et al. Jinglebells: a repository of immune-related single-cell rna–sequencing datasets
US10573405B2 (en) Genome analysis and visualization using coverages for bin sizes and ranges of genomic base coordinates calculated and stored before an output request
Rivera-Rivera et al. LS³: A method for improving phylogenomic inferences when evolutionary rates are heterogeneous among taxa
Lee et al. Quantitative methods for genome-scale analysis of in situ hybridization and correlation with microarray data
WO2024055320A1 (en) Gene sequencing method, apparatus and device, and medium
Ma et al. Genome wide approaches to identify protein-DNA interactions
CN101517579A (en) Method of searching for protein and apparatus therefor
Minnier et al. RNA-Seq and expression arrays: Selection guidelines for genome-wide expression profiling
CN115612722A (en) Gene sequencing method, device, equipment and medium
Chong et al. SeqControl: process control for DNA sequencing
Alfonso-Gonzalez et al. Identification of regulatory links between transcription and RNA processing with long-read sequencing
JP5213009B2 (en) Gene expression variation analysis method and system, and program
Boyle et al. Skipper analysis of RNA-protein interactions highlights depletion of genetic variation in translation factor binding sites
JP2005284964A (en) Method for displaying data and process in system for analyzing gene manifestation as well as system for analyzing gene expression
KR20190061771A (en) Method of genome analysis using public next-generation sequencing data in the gene expression omnibus database
Li et al. Automatic DNA replication tract measurement to assess replication and repair dynamics at the single-molecule level
WO2017024682A1 (en) Method for preparing function verification chip of biomarker

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22958510

Country of ref document: EP

Kind code of ref document: A1