AU2018391843A1

AU2018391843A1 - Sequencing data-based ITD mutation ratio detecting apparatus and method

Info

Publication number: AU2018391843A1
Application number: AU2018391843A
Authority: AU
Inventors: Ruilin JING; Dawei Li; Hailiang Wang; Juan Wang; Zhaoling Xuan
Original assignee: ANNOROAD GENE TECHNOLOGY BEIJING CO Ltd
Current assignee: Annoroad Gene Technology (beijing) Co Ltd
Priority date: 2017-12-21
Filing date: 2018-12-20
Publication date: 2020-08-06
Anticipated expiration: 2038-12-20
Also published as: AU2022218581B2; WO2019120254A1; NZ766350A; AU2018391843B2; AU2022218581A1; CN109943635A

Abstract

A sequencing data-based ITD mutation ratio detecting method. Said method comprises: acquiring sequencing data of a sample to be detected; extracting the ITD characteristics of said sample to be detected; and obtaining an ITD mutation ratio on the basis of an ITD characteristic coefficient and the ITD characteristics of said sample to be detected.

Description

SEQUENCING DATA-BASED ITD MUTATION RATIO DETECTING APPARATUS AND METHOD TECHNICAL FIELD

The present invention relates to the field of gene mutation detection, and

particularly relates to a method, an apparatus and an electronic device for the

quantitative detection of ITD mutation based on sequencing data.

BACKGROUND

With the rapid development of next-generation sequencing technology and its

increasing involvement in scientific research and clinical detection of the field of

cancer; we have reached a new level of understanding of their occurrence and

development, clinical manifestations and pathogenesis. Numerous studies have shown

that the occurrence of cancer is closely related to somatic mutations, and these

mutations often appear solely in subclones of certain tumors. The study of subclonal

mutations provides a new direction for disease progression and prognostic

stratification.

High-depth next-generation sequencing technology can detect subclonal

mutations. By sequencing specific genome regions using the target sequence capture

sequencing technology, great depth with low cost can be achieved, thereby a large

number of sample data can be accumulated, and this provides a favorable condition

for accurate estimation of the distribution of false positive rate at specific mutation

sites in the genome.

The most common type of FLT3 gene mutation in acute myeloid leukemia (AML)

is the internal tandem duplication (ITD), followed by the Tyrosine-kinase domain

(TKD) mutation. FLT3-ITD mutation usually involves exon 14 and 15, or exon 11

and/or 12. AML patients are often found to have internal tandem duplications (ITDs)

in the juxtamembrane (JM) region of FLT3, i.e. insertion of several repeating

elements of oligonucleotide in an end-to-end order, these elements can be copies of

uncertain number of nucleotide bases but the number of bases are usually a multiple of 3 so that the reading frame remains intact, thereby extending the JM region, while other domains are unaffected. These mutations play an important role in the pathogenesis of AML. The incidence rate of FLT3-ITD in AML patients is about 24% for adults, 10%-15% for children, and about 15% for secondary AML patients. The NCCN Clinical Practice Guidelines in Oncology: Acute Myeloid Leukemia (2016) pointed out that the prognosis of patients carrying FLT3-ITD mutations with normal karyotype is poor and should be stratified into the high-risk group. Also, in the section about the treatment of patients with relapsed refractory AML, it was mentioned that patients with FLT3-ITD mutation may consider using a demethylating drug

(5-azacytidine or decitabine) together with sorafenib [NCCN-AML]. Therefore, quantitative detection of FLT3-ITD is essential for AML patients.

Currently, the predominant FLT3-ITD detection method is the cDNA-based PCR

amplification. Limitation of this approach is that only quantitative detection can be performed, the position and sequence information of the ITD mutation, however,

cannot be acquired simultaneously. If the acquisition of sequence information is

needed, sanger sequencing must be performed and the result can be acquired only

when the allele frequency of the ITD mutation is above 10%. That is, for patients with ITD mutation ratio below 10%, only the quantitative information can be obtained and

the sequence and position information cannot be gained.

Another FLT3-ITD detection method is to use NGS sequencing and ITD mutation is determined by an bioinformatics algorithm (e.g., PINDEL). This method

is based on the high-depth target sequence capture sequencing, and it is realized by

using the information from captured target regions. However, since the capture process may lose the target region in where the ITD is located, any presently available

NGS-based sequencing algorithm (Pindel et al.) is therefore only suitable for

qualitative analysis, and accurate quantitative results cannot be obtained. In addition, this method also has the following inevitable inherent limitations: 1. due to the

template sequence content of ITD is significantly different from the normal genome, it has a great impact on the capture process. It is possible that relatively high proportion

of ITDs could end up with only limited sequencing reads to support the existence of the mutation after target capture, and so it is hard to reach a correct conclusion; 2. also due to the inevitable sequencing bias in target capture, the resulting ITDs can only be qualitatively measured, and accurate quantitative results cannot be obtained. Moreover, the clinical detection field is currently more inclined to use just one test on AML patients to obtain more detailed information and NGS could fully meet the requirement at this point. It needs only one blood test to complete multiple mutation detections of SNV, CNV, INDEL and ITD. Therefore, there is an urgent need to develop a new algorithm that enables the quantitative detection of FLT3-ITD based on the NGS platform.

SUMMARY OF THE INVENTION

Technical problem to be solved by the present invention

In view of the aforementioned problems in the prior art, the present invention develops a method for the quantitative detection of ITD mutation based on sequencing

data --- using dual platforms (NGS platform and standard detection platform) to

perform tests on a large number of collected ITD positive samples, i.e., using the PCR

amplification-capillary electrophoresis as the gold standard to obtain the quantitative ITD mutation results, and based on the length of these detected ITDs to find their

appropriate matches in the NGS data. Then through supervised machine learning,

these samples are used as the training set to perform training, and finally a sequencing data-based model capable of directly predicting quantitative ITD mutation outcome is

obtained, and the purpose of quantitatively detecting ITDs by next-generation

sequencing is achieved. That is, based on a large number of samples with gold-standard ITD quantitative detection results, the present invention screens the

characteristics related to ITD quantitative detection, and performs model training,

thereby obtaining a model by which the accurate ITD quantitative detection result of the sample to be tested can be determined by using only the sequencing data and the

ITD related characteristics. That is, the present invention includes:

1. An apparatus for the quantitative detection of ITD mutation based on sequencing data, including: a data acquisition module, which is used to acquire the sequencing data of samples to be tested; a data pre-processing module, which is connected to the data acquisition module, and is configured to extract characteristics of ITD of samples to be tested, wherein the ITD characteristics are the ITD characteristics of the whole region of nucleotide sequences or the ITD characteristics of specific regions of nucleotide sequences; a quantification module, which is connected to the data pre-processing module, and is configured to obtain the ITD mutation allele frequency based on the ITD characteristic coefficients and the ITD characteristics of samples to be tested; and a detection result output module, which is connected to the quantification module, and is configured to output the ITD mutation allele frequency as the quantitative detection result of the ITD mutation. In the present invention, the sequencing data may be fastq data obtained by converting the NGS data acquired by sequencing with a high-throughput sequencer via the existing software. The sequencing approach may be to first capture the target sequence in the exon region or other specified regions of the sample, and then sequencing it (i.e., target sequence capture sequencing) by using a high-throughput sequencer. Alternatively, whole-genome sequencing (WGS) of the sample may also be feasible. In the present invention, the ITD mutation may come from samples of species that usually have ITD mutations. Preferably, samples are from mammals, more preferably humans. Specified regions are usually selected for ITD mutation detection, such as internal tandem duplications (ITDs) of the juxtamembrane region of the Fms-like tyrosine kinase 3 (FLT3) gene in patients with acute myeloid leukemia

(AML) (hg19, NCBI version 37; chr3:28608000-28608600) In the present invention, the quantitative detection of a sample is the detection of

variant allele frequency (VAF) of the ITD mutation of the sample. The gold standard ITD detection method is generally a method accepted by those skilled in the art and

capable of accurately obtaining the VAF of ITD mutation, the position of ITD mutation or the length of ITD mutation (including the total length of the ITD and the length of the ITD repeats) etc., for example, the gold standard for current ITD quantitative detection may be specific amplification by polymerase chain reaction (PCR) in combination with capillary electrophoresis (CE) (PCR-CE) to detect VAF of

ITD mutations. 2. The apparatus according to item 1, wherein the quantification module includes

a quantitative model sub-module for obtaining an ITD mutation allele frequency based on the ITD characteristic coefficients and ITD characteristics of samples to be

tested, wherein quantitative model set in the quantitative model sub-module is

represented by the following equation (1),

(w,x) =wO +w1 x 1 +w 2 x 2 +w 3 x3 +--+wnx....(1) in the equation (1), y(w, x) represents the ITD mutation allele frequency, won

represent the ITD characteristic coefficients, and xon represent the ITD characteristics.

3. The apparatus according to item 2, wherein the quantitative model is

configured by a coefficient training sub-module, and is configured to acquire the ITD

characteristic coefficients won, and wherein the coefficient training sub-module includes:

a detection result acquisition unit, which is configured to acquire first detection

results of first test samples and second detection results of first test samples as a training set,

a machine learning unit, which is configured to use the first detection results of

first test samples and second detection results of first test samples as a training set, and obtain the ITD characteristic coefficients wo-n through machine learning of the

training set,

wherein, the first detection results are the high-throughput sequencing data, and the

second detection results are the mutation allele frequency value obtained from the ITD standard detection, such as the gold-standard ITD detection method.

4. The apparatus according to item 2, wherein the quantitative model is configured by a coefficient training sub-module, and is connected to the data pre-processing module for acquiring the ITD characteristic coefficients wo, the coefficient training sub-module includes, a detection result acquisition unit, which is configured to acquire first detection results of first test samples and second detection results of first test samples, and acquire first detection results of second test samples and second detection results of the second test samples, a machine learning unit, which is configured to use the first detection results of first test samples and second detection results of first test samples as a training set, and obtain the ITD characteristic coefficients wo-n by the machine learning of the training set, a machine learning test unit, which is configured to perform tests using the first detection results of second test samples, and then compare the mutation allele frequency value calculated by the equation (1) with the second detection results of second test samples, a test result assessment unit, which is configured to assess whether the result of comparison meets the expectation, a machine learning revise unit, which is configured to determine the values of the

ITD characteristic coefficients wo-n when the comparison result meets the expectation,

and to modify (such as increase, decrease or re-provide) the ITD characteristics xo-n adopted in the equation (1) when the comparison result does not meet the expectation,

and reset the ITD characteristic coefficients won,

second detection results are the mutation allele frequency value obtained from the

ITD standard detection, such as the gold-standard ITD detection method. 5. The apparatus according to item 3 or 4, wherein the assessment is to be made

whether the comparison result meets the expectation according to the following equation (2): nsamples'1

,nsamples1

in the equation (2), yj represents the second detection results of second test

samples, , represents the mutation allele frequency value calculated by equation (1),

ji, represents the mean value of the second detection results of the second test samples.

specifically, the concrete assessment method includes: setting an expected value

(set value) for R 2 , the comparison result is concluded as meeting the requirement if R 2

is above the set value, and it is concluded that the comparison result does not meet the

requirement if R 2 is less than the set value. A preferred set value is 0.9.

6. The apparatus according to any one of items 1-5, wherein the ITD

characteristic is selected from one or two or more of the following characteristics: the

position of the occurring ITD, the length of the ITD, the nucleotide sequence

characteristics of the ITD, the nucleotide sequence characteristics before and after the

position of the occurring ITD, and the nucleotide sequence characteristics of a

particular sequence.

In the apparatus of the present invention, the length of the ITD representing the

ITD characteristic may include, but is not limited to, the total length of the ITD

segment or the length of the repeating segment. Sequence characteristics representing

a particular sequence may include, but is not limited to, sequence complexity.

Nucleotide sequence characteristics representing the ITD characteristic may include,

but is not limited to, sequence complexity or GC content, and the like. Sequence

complexity can be evaluated by using blast software (different parameters).

In the apparatus of the present invention, the number of ITD characteristics is not

particularly limited, and the number of ITD characteristics that may be selected is, for

example, 500 to 2000, and preferably, the number of the ITD characteristics is, for

example, about 1,500.

7. A method for the quantitative ITD mutation detection based on sequencing

data, including:

acquiring the sequencing data of samples to be tested; extracting ITD characteristics of the sequencing data of samples to be tested, wherein the ITD characteristics are the ITD characteristics of the whole region of the nucleic acid sequence or the ITD characteristics of specific regions of nucleic acid sequences; quantitatively detecting the ITD mutation allele frequency of samples to be tested, and obtaining the ITD mutation allele frequency (the quantitative detection result) based on the ITD characteristic coefficients and the ITD characteristics of samples to be tested.

8. The method according to item 7, wherein quantitative detection step is

performed by a quantitative model represented by the following equation (1),

(w,x) =wO +w1 x 1 +w 2 x 2 +w 3 x3 +--+wnx.....(1) in the equation (1), y(w, x) represents the ITD mutation allele frequency, won

9. The method according to item 8, wherein the method for acquiring the ITD

characteristic coefficients won of the quantitative model includes:

acquiring first detection results of first test samples and second detection results of first test samples as a training set,

obtaining the ITD characteristic coefficients wo-n by the machine learning of the

training set, wherein,

the first detection result is the high-throughput sequencing data, and the second

detection result is the mutation allele frequency value of the ITD standard detection, such as the ITD gold-standard detection method.

10. The method according to item 8, wherein the method for acquiring the ITD

characteristic coefficients won of the quantitative model includes: acquiring first detection results of first test samples and second detection results

of first test samples, and acquiring first detection results of second test samples and second detection results of second test samples,

using the first detection results of first test samples and the second detection results of first test sample as a training set, and obtaining the ITD characteristic coefficients wo-n by the machine learning of the training set, the first detection results of second test samples are used for testing, and the mutation allele frequency value calculated by the equation (1) is compared with the second detection results of second test samples to assess whether the comparison result meets the expectation, if the comparison result meets the expectation, the ITD characteristic coefficients wo-, are determined; if the comparison result does not meet the expectation, the ITD characteristics xo, adopted in the equation (1) are modified (such as increased, decreased or re-provided), and the ITD characteristic coefficients won are reset, wherein, the first detection result is the high-throughput sequencing data, and the second detection result is the mutation ratio value of the ITD standard detection, such as the ITD gold-standard detection method.

11. The method according to item 9 or 10, wherein the assessment is to be made

whether the comparison result meets the expectation according to the following

equation (2):

,nsamp1es1y' 2 R nsampesl (2),,)

in the equation (2), yj represents the second detection results of the second test

samples, , represents the mutation allele frequency value calculated by the equation (1), y; represents the mean value of the second detection results of second test

samples. 12. The method according to any one of items 7-11, wherein the ITD

position of the occurring ITD, the length of the ITD, the nucleotide sequence characteristics of the ITD, the nucleotide sequence characteristics before and after the

position of the ITD, and the nucleotide sequence characteristics of a particular sequence.

In the method of the present invention, the length of ITD representing the ITD characteristics may include, but is not limited to, the total length of the ITD segment or the length of the repeating segment. Sequence characteristic representing a particular sequence may include, but is not limited to, sequence complexity. Nucleotide sequence characteristic representing the ITD characteristic may include, but is not limited to, sequence complexity or GC content, and the like. Sequence complexity can be evaluated by using blast software (different parameters).

In the method of the present invention, the number of ITD characteristics is not particularly limited, and the number of ITD characteristics that may be selected is, for

example, about 1,500. 13. An electronic device, including:

a processor; and

a memory, in which the computer program instructions are stored, and when the computer program instructions are executed by the processor, the method for the

quantitative ITD mutation detection based on sequencing data according to any one of

items 7-12 is performed by the processor.

BRIEF DESCRIPTION OF THE DRAWINGS

Various other advantages and benefits of the present application will become

apparent to those skilled in the art by reading the detailed description in the preferred embodiments below. The drawings are only for the purpose of illustrating the

preferred embodiments, and should not to be considered as a limitation on this

application. Fig. 1 is a schematic diagram showing an apparatus for the quantitative ITD

mutation detection based on sequencing data according to an embodiment of the

present application; Fig. 2 is a schematic diagram showing a quantification module in the apparatus

for the quantitative ITD mutation detection based on sequencing data according to an embodiment of the present application;

Fig. 3 is a schematic diagram showing a coefficient training sub-module in the apparatus for the quantitative ITD mutation detection based on sequencing data according to an embodiment of the present application;

Fig. 4 is a schematic diagram showing a coefficient training sub-module in the apparatus for the quantitative ITD mutation detection based on sequencing data

according to an embodiment of the present application; Fig. 5 is a flow chart showing a method for the quantitative ITD mutation

detection based on sequencing data according to an embodiment of the present application;

Fig. 6 is a schematic diagram showing an electronic device according to an

embodiment of the present application; Fig. 7 is a graph showing the detection result according to a preferred

embodiment of the present application.

DETAILED DESCRIPTION OF THE INVENTION

The technical terms mentioned in the specification have the same meanings as

those generally understood by the skilled in the art, and if there is a conflict, the

definition in the present specification shall prevail. In general, the terms used in this specification have the following meanings.

Machine learning: machine learning is a branch of artificial intelligence. The

research of artificial intelligence is a natural and clear thread from focusing on "reasoning" to focusing on "knowledge", and then focusing on "learning". Obviously,

machine learning is a way to realize artificial intelligence, i.e., to solve problems in

artificial intelligence by machine learning. In the past 30 years, machine learning has developed into an inter-disciplinary subject involving many areas, such as probability

theory, statistics, approximation theory, convex analysis, and computational

complexity theory. Machine learning theory is primarily about designing and analyzing algorithms that allow computers to automatically "learn". Machine learning

algorithms are a class of algorithms that automatically analyze data to obtain patterns from it and use them to predict unknown data. Since the learning algorithms involve a

large number of statistical theories, machine learning is closely related to inductive statistics, also known as the statistical learning theory. In terms of algorithm design, machine learning theory focuses on achievable, effective learning algorithms. Many inference problems have the difficulty of no program to follow, so part of the machine learning research is to develop the approximation algorithm that is easy to handle.

Machine learning has been widely used in the areas such as data mining, computer vision, natural language processing, biometrics, search engines, medical diagnostics,

detection of credit card fraud, securities market analysis, DNA sequencing, speech and handwriting recognition, strategy games and robotics.

Target sequence capture sequencing: is to customize genomic regions of interest

into specific probes and hybridize with genomic DNA in a sequence capture chip (or solution), after enriching the DNA segments of target genomic regions, they are

sequenced by using the next generation sequencing technology.

ITD: internal tandem duplication.

Summary of the application

As mentioned above, there is currently a need to quantitatively detect the ITD

mutation based solely on sequencing data, but accurate (being close or substantially consistent with the ITD standard detection method) quantification (ITD mutation

allele frequency) detection results cannot be obtained based on sequencing data only

through the existing software or algorithms. The ITD standard detection method described in the present invention can be, such as gold-standard detection method, for

example, a method of PCR amplification-capillary electrophoresis, or other

commonly recognized detection methods capable of accurately obtaining the ITD mutation allele frequency. Sequencing as described herein generally refers to the next

generation sequencing, i.e., NGS sequencing.

The existing method for directly detecting the ITD mutation ratio by NGS sequencing, such as PINDEL, is based on high-depth target region capture sequencing,

and it is realized by using the information from the captured target regions. Since the capture process may cause missing capture of the segments where the ITD occurs,

accurate quantitative results cannot be obtained.

The inventors of the present application found that, through collecting the

characteristics related to the ITD and the corresponding characteristic coefficients in

the sequencing data, the quantitative detection results of ITD mutations of samples to be detected can be nearly consistent with the detection results of the gold-standard.

The quantitative ITD mutation detection of samples to be tested may employ, for example, the ITD-related characteristics and the corresponding characteristic

coefficients described in the present application, and the acquisition of the ITD-related characteristics and corresponding characteristic coefficients may also

employ, for example, the machine learning method described herein.

Therefore, the basic idea of the present application is to solve the above technical problems, and quantitatively determine the ITD mutations by obtaining the ITD

related characteristics and the corresponding characteristic coefficients.

Specifically, the present application provides an apparatus, a method, and an electronic device for the quantitative ITD mutation detection based on sequencing

data, wherein firstly acquiring sequencing data of samples to be tested, then extracting

the ITD characteristics of the sequencing data of samples to be tested, and obtaining

the ITD mutation allele frequency of samples to be tested based on ITD characteristic coefficients and ITD characteristics of samples to be tested, and wherein ITD

characteristics are the ITD characteristics of the whole region of a nucleic acid

sequence or the ITD characteristics of the specific region of a nucleic acid sequence; Herein, those skilled in the art can understand that the apparatus, method and

electronic device for quantitatively detecting ITD mutation based on sequencing data

provided by the present application can be used for the quantitative ITD mutation detection of various sequencing data, for example, the data of whole genome

sequencing, and target sequence capture sequencing, etc., as long as this sequencing

method is commonly used in the current sequencing methods for detecting ITD mutation. Therefore, even if the sequencing data of target sequence capture is mainly

described below as an example, the embodiments of the present application are not limited thereto.

After introduction of the basic principles of the present application, the exemplary embodiments according to the present application will be described in detail with reference to the accompanying drawings. It is apparent that the described embodiments are only a part of the embodiments of the present application, and are not intended to show all embodiments. It should be understood that the present application is not limited by the exemplary embodiments described herein.

Exemplary Apparatus Fig. 1 is a schematic diagram showing an apparatus for the quantitative ITD

mutation detection based on sequencing data according to an embodiment of the

present application. As shown in Fig. 1, an apparatus 1708 for the quantitative ITD mutation detection based on sequencing data according to an embodiment of the

present application includes:

a data acquisition module 100 for samples to be tested, which is configured to acquire the sequencing data of samples to be tested;

a data pre-processing module 200 for samples, which is connected to the data

acquisition module, and is configured to extract the ITD characteristics of samples to

be tested, wherein said ITD characteristics are the ITD characteristics of the whole region of a nucleic acid sequence or the ITD characteristics of the specific region of a

nucleic acid sequence;

a quantification module 300, which is connected to the data pre-processing module, and is configured to obtain ITD mutation allele frequency based on the ITD

characteristic coefficients and the ITD characteristics of samples to be tested; and

a detection result output module 400, which is connected to the quantification module, and is configured to output the ITD mutation allele frequency as a detection

result of ITD mutation allele frequency of samples to be tested.

In the quantification module 300, particularly in the embodiment of the present

application as shown in Fig. 2, further includes: a quantitative model sub-module 310 for obtaining ITD mutation allele frequency based on the ITD characteristic coefficients and the ITD characteristics of

samples to be tested, wherein quantitative model set in the quantitative model sub-module is represented by the following equation (1),

(w, x) = wO + w 1 x 1 + w 2x 2 + w 3 x3 + '--+wnxn....(1) in the equation (1), 9(w, x) represents ITD mutation allele frequency, won represent ITD characteristic coefficients, and xo, represent ITD characteristics.

The quantitative model sub-module 310 is configured to acquire the ITD characteristic coefficients wo, or to acquire the ITD characteristics xo, and the ITD

coefficients wo- corresponding thereto. Particularly, in the embodiment of the present application, as shown in Fig. 2, the ITD characteristics or the ITD characteristics and

the ITD coefficients corresponding thereto of the quantitative model sub-module 310

are configured by a coefficient training sub-module 320. In the coefficient training sub-module 320, particularly, in a preferred example of

the present invention, as shown in Fig. 3, a data acquisition unit 321 for a sequencing

sample is configured to acquire the first detection results of first test samples and the second detection results of first test samples as a training set; and a machine learning

unit 322 is configured to use the first detection results of first test samples and the

second detection results of first test samples as a training set, and obtain the ITD

characteristic coefficients wonby the machine learning of the training set, wherein the first detection result is high-throughput sequencing data, and the second detection

result is the mutation allele frequency value of ITD gold-standard detection method.

In still another preferred example of the present invention, as shown in Fig. 4, a coefficient training sub-module 320 includes a data acquisition unit 323 for

sequencing samples, which is configured to acquire the first detection results of first

test samples and the second detection results of first test samples, and acquire the first detection results of second test samples and second detection results of second test

samples; a machine learning unit 324 is configured to use the first detection result of

first test samples and the second detection results of first test samples as a training set, and obtain the ITD characteristic coefficients wo-n by the machine learning of the

training set. A machine learning detection unit 325, which is configured to perform a test by using the first detection results of second test samples, and compare the

mutation allele frequency value calculated by the equation (1) with the second detection results of second test samples; a test result assessment unit 326, which is configured to assess whether the comparison result meets the expectation; a machine learning revise unit 327, which is configured to determine values of ITD characteristic coefficients wo, when the comparison result meets the expectation, and revise the

ITD characteristics xo, adopted in the equation (1) when the comparison result does not meet the expectation, re-providing (or re-selecting) and determining the ITD

characteristic coefficients wo-n. Wherein the first detection result is high-throughput sequencing data, and the second detection result is the mutation allele frequency value

of ITD gold-standard detection method.

As described above, the apparatus 1708 for detecting ITD mutation allele frequency based on sequencing data according to examples of the present application

can be used in various terminal devices, such as servers for targeted capture

sequencing, and the like. In one example, the apparatus 1708 according to the present example can be integrated into a terminal device as a software module and/or

hardware module. For example, the apparatus 1708 may be a software module in an

operating system of the terminal device, or may be an application program developed

for the terminal device; of course, the apparatus 1708 may also be one of a number of hardware modules of the terminal device.

Alternatively, in another example, the apparatus 1708 for the quantitative ITD

mutation detection based on sequencing data and the terminal device can also be separate devices, and the apparatus 1708 can be connected to the terminal device via a

wired and/or wireless network, and the interactive information is transmitted

according to the arranged data format.

Exemplary Method

Fig. 5 is a flowchart showing a method for the quantitative ITD mutation detection based on sequencing data according to an embodiment of the present

application. As shown in Fig. 5, a method for the quantitative ITD mutation detection based on sequencing data according to an embodiment of the present application

includes: S100, acquiring the sequencing data of samples to be tested; S200, extracting ITD characteristics of the sequencing data of samples to be tested, wherein the ITD characteristics are the ITD characteristics of the whole region of a nucleic acid sequence or the ITD characteristics of the specific region of a nucleic acid sequence; S300, quantitatively detecting the ITD mutation allele frequency of samples to be tested, and obtaining the quantitative ITD mutation detection result of samples to be tested based on the ITD characteristic coefficients and the ITD characteristics of samples to be tested.

Exemplary Electronic Device

Hereinafter, an electronic device according to an embodiment of the present application will be described with reference to Fig. 6.

Fig. 6 illustrates a block diagram of an electronic device according to an

embodiment of the present application. As shown in Fig. 6, an electronic device 10 includes one or more processors 11

and memory 12.

The processor 11 may be a central processing unit (CPU) or other form of

processing unit with data processing capability and/or instruction executing capability, and may control other components in the electronic device 10 to perform desired

functions.

The memory 12 may include one or more computer program products, which may include various forms of computer readable storage media, such as a volatile

memory and/or a nonvolatile memory. The volatile memory may include, for example,

a random access memory (RAM), and/or a cache, and the like. The nonvolatile memory may include, for example, a read only memory (ROM), a hard disk, a flash

memory, and the like. One or more computer program instructions can be stored in the

computer readable storage medium, and the processor 11 can execute the program instructions to realize the method for the quantitative ITD mutation detection based on

sequencing data in each of the above embodiments according to the present application, and/or other desired functions. Various contents such as the

above-described ITD characteristics, ITD characteristic coefficients, and the like can also be stored in the computer readable storage medium.

In one example, electronic device 10 may also include an input apparatus 13 and an output apparatus 14 that are interconnected by a bus system and/or other form of

connections (not shown).

For example, the input apparatus 13 can include, for example, a keyboard, a mouse, and the like.

The output apparatus 14 can output various kinds of information to the outside, such as the detection result of the quantitative ITD mutation detection and the like.

The output apparatus 14 can include, for example, a display, a speaker, a printer, and a

communication network and the remote output apparatus connected thereto, and the like.

Of course, for simplicity, only some components of the electronic device 10

related to the present application are shown in Fig. 6 and the components such as the bus, the input/output interface, and the like are omitted. In addition to this, the

electronic device 10 may also include any other suitable components depending on

the concrete cases of the applications.

Exemplary Computer Program Product and Computer Readable Storage Medium

In addition to the method and apparatus described above, the embodiment of the

present application can also be a computer program product including computer program instructions, when the computer program instructions are executed by a

processor, they make the processor perform the steps of the method for the

quantitative ITD mutation detection based on sequencing data according to each of the embodiments of the present application described in the above section of "exemplary method" in this specification.

As for the computer program product, any combination of one or more programming languages can be used for writing the program codes for performing the

operations of embodiments of the present application, and the programming languages include object-oriented programming languages, such as Java, C++, etc., and also

include conventional procedural programming languages, such as the "C" language, or similar programming languages. The program codes can be executed entirely on the user's computing device, partially on the user's device, as a stand-alone software package, partially on the user's computing device while partially on the remote computing device, or entirely on the remote computing device or server.

Furthermore, embodiments of the present application can also be computer readable storage medium with computer program instructions stored therein, when the

computer program instructions are executed by a processor, they make the processor perform the steps of the method for the quantitative detection of ITD mutation based

on sequencing data according to each of the embodiments of the present application

described in the above section of "exemplary method" in this specification. The computer readable storage medium can employ any combination of one or

more readable mediums. The readable medium may be a readable signal medium or a

readable storage medium. A readable storage medium can include, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or

semiconductor system, apparatus, or element, or any combination of the above. More

concrete examples (non-exhaustively listed) of readable storage media include: an

electrical connection with one or more wires, a portable disk, a hard disk, a random access memory (RAM), a read only memory (ROM), an erasable programmable

read-only memory (EPROM, or flash memory), an optical fiber, a portable compact

disk read only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the above.

Embodiments Hereinafter, a concrete embodiment of the apparatus for the quantitative ITD

mutation detection based on sequencing data according to the present application will

be described with reference to Figs. 1-5, so as to explain the present invention in more details, but the present invention is not limited by these embodiments.

First, the targeted capture sequencing data and PCR-CE ITD quantitative test results of the FLT3 gene ITD frequently-occurring region (chr3: 28608000-28608600)

from 80 patients with acute myeloid leukemia (AML) are collected and divided equally into ten groups, taking nine of them (the first test samples) as the basis of a training set for establishing the quantitative model, and the coefficient training sub-module is used to obtain the ITD characteristics and the ITD characteristic coefficients corresponding thereto for the present embodiment. The remaining one

(the second test samples) is used to check whether the test result of the above quantitative model meets the expectation. When the test result is concluded not to

meet the expectation, the ITD characteristics adopted by the quantitative model are revised (increased, decreased, or re-provided), and the ITD characteristic coefficients

are reprocessed.

Therefore, it can be understood that the quantification module of the embodiment includes a quantitative model sub-module, wherein quantitative model set in the

quantitative model sub-module is represented by the following equation: y(w, x) wO + wix1 + w 2x 2 + w 3x 3 + -+wx. wherein, y(w, x) represents ITD mutation allele frequency, wo-n represent ITD characteristic coefficients, and xo-n represent ITD characteristics.

In this embodiment, the quantitative model is configured by the coefficient

training sub-module, and is used to acquire the ITD characteristic coefficientswo-n and the ITD characteristics xo,.

Particularly, in this embodiment, the coefficient training sub-module is

configured with a detection result acquisition unit, which is configured to acquire the first detection results of the first test samples and the second detection results of the

first test samples, and to acquire the first detection result of the second test sample and

the second detection result of the second test sample; a machine learning unit, which is configured to use the first detection results of the first test samples and the second

detection results of the first test samples as a training set, and obtain the ITD

characteristic coefficients wonby the machine learning of the training set; a machine learning test unit, which is configured to perform tests by using the first detection

results of the second test samples, and compare the mutation allele frequency value calculated by the quantitative model with the second detection results of the second

test samples; a test result assessment unit, which is configured to assess whether the comparison result meets the expectation; a machine learning revise unit, which is configured to determine the values of the ITD characteristic coefficients wo, when the comparison result meets the expectation, and revise the ITD characteristics xon adopted by the quantitative model when the comparison result does not meet the expectation, re-providing and determining the ITD characteristic coefficients won; wherein the first detection result is high-throughput sequencing data, and the second detection result is a mutation ratio value of ITD standard detection. The assessment of whether the comparison result meets the requirement is to be made by the following equation (2):

P2- samP~es1Y_,) >fsamplesl (2),,)

in the equation (2), y, represents the second detection results of the second test

samples, , represents the mutation allele frequency value calculated by equation (1), ji, represents the mean value of the second detection results of the second test samples.

The quantitative model, the ITD characteristics xon and the characteristic

coefficients wo-nof the present embodiment are obtained by the above apparatus and operation.

Then, the targeted capture sequencing data (in the format of fastq file) and

PCR-CE ITD quantitative test results of the FLT3 gene ITD frequently-occurring area (chrl3: 28608000- 28608600) from 30 patients with acute myeloid leukemia (AML)

are collected.

And then the ITD characteristics of the sequencing data from above samples are extracted by the data pre-processing module, wherein the ITD characteristics are the

ITD characteristics of the target capture area. These ITD characteristics include, but

are not limited to: the length of the insert segment (TD mutant segment), the complexity of the insert segment (TD mutant segment), the supporting sequencing

reads number of the insert segment (lTD mutant segment), and the position of the insert segment (lTD mutant segment), the depth of the insert segment (TD mutant

segment).

Finally, the detection result of the ITD mutation allele frequency is obtained by the quantitative calculation module based on the ITD characteristic coefficients and

the ITD characteristics of samples to be tested. The detection result is output by an

output module, as shown in the following example, Chr13 28608251 21_ TGAGATCATATTCATATTCTC INS 0.0809866666666666

In the example of the displayed output detection result, the first and the second

item are the absolute position of the ITD occurring in the genome, the third item is the

length of the ITD, the fourth item is the sequence of the ITD, and the fifth term is the

type of the ITD, and the final one is the quantitative (ITD mutation allele frequency) detection result (the ITD quantitative result of this example is 8.09%).

By using the apparatus and method for the quantitative ITD mutation detection

based on the sequencing data according to the present embodiment, the quantitative detection results of all 30 samples are shown in Fig. 7. In Fig. 7, the abscissa is the

sample number, and the ordinate is the ITD mutation allele frequency value. The

model prediction curve is the ITD mutation allele frequency value obtained by using the apparatus and method for detecting the ITD mutation allele frequency based on

sequencing data according to the preferred embodiment; and the NGS result curve is

the ITD mutation allele frequency directly calculated without the model training, and the gold standard curve is the quantitative detection result by using the PCR-CE

method. The R 2 values of the model prediction curve and the NGS curve obtained by

the equation (2) of the present embodiment are 0.9951 and 0.875 respectively. It can be seen from this, that the result of the ITD mutation allele frequency obtained by the

apparatus and method for the quantitative ITD mutation detection based on sequencing data according to the preferred embodiment is more relevant to the gold

standard and has a higher degree of conformity.

The fundamentals of the present application have been described above in conjunction with particularly embodiments. However, it should be noted that the

benefits, advantages, effects, and the like mentioned in the present application are merely examples and not limitations, and the benefits, advantages, effects, etc. are not

considered to be required in each of the embodiments of the present application. In addition, the specific details of the above disclosure are only for the purpose of illustration and for ease of understanding, and are not intended to limit the present application. The above details do not limit the application to be implemented by following the above specific details.

INDUSTRIAL APPLICABILITY

According to the present invention, provided are an apparatus and a method capable of detecting an ITD mutation quantitatively while acquiring the ITD

sequencing information.

Claims

1. An apparatus for the quantitative ITD mutation detection based on sequencing data,

including:

a data acquisition module, which is configured to acquire sequencing data of samples to be tested;

a data pre-processing module, which is connected to the data acquisition module, and is

configured to extract ITD characteristics of samples to be tested, wherein the ITD characteristics are ITD characteristics of the whole region of nucleotide sequences or ITD characteristics of specific

regions of nucleotide sequences;

a quantification module, which is connected to the data pre-processing module, and is configured to obtain ITD mutation allele frequency based on the ITD characteristic coefficients and

ITD characteristics of samples to be tested; and

a detection result output module, which is connected to the quantification module, and is configured to output the ITD mutation allele frequency as the quantitative detection result of the ITD mutation of samples to be tested.

2. The apparatus according to claim 1, wherein the quantification module includes a

quantitative model sub-module for obtaining ITD mutation allele frequency based on the ITD characteristic coefficients and ITD characteristics of samples to be tested, wherein quantitative

model set in the quantitative model sub-module is represented by the following equation (1),

y(w,x) = wO +w1 x 1 +w 2x 2 +w 3x 3 +.--+wnx...(1)

in the equation (1), f(w, x) represents ITD mutation allele frequency, won represent ITD characteristic coefficients, and xo, represent ITD characteristics.

3. The apparatus according to claim 2, wherein the quantitative model is configured by a coefficient training sub-module, and is configured to acquire the ITD characteristic coefficients wo,

and wherein the coefficient training sub-module includes:

a detection result acquisition unit, which is configured to acquire first detection results of first test samples and second detection results of first test samples as a training set,

a machine learning unit, which is configured to use the first detection results of first test

samples and second detection results of first test samples as a training set, and obtain the ITD characteristic coefficients wo, through machine learning of the training set, wherein, the first detection results are high-throughput sequencing data, and the second detection results are the mutation allele frequency value of the ITD standard detection.

4. The apparatus according to claim 2, wherein the quantitative model is configured by a coefficient training sub-module, and is connected to the data pre-processing module for acquiring the ITD characteristic coefficients wo, the coefficient training sub-module includes:

a detection result acquisition unit, which is configured to acquire first detection results of first

test samples and second detection results of first test samples, and acquire first detection results of second test samples and second detection results of second test samples,

samples and second detection results of first test samples as a training set, and obtain the ITD characteristic coefficients wo, by the machine learning of the training set,

a machine learning test unit, which is configured to perform tests using the first detection

results of second test samples, and compare the mutation allele frequenct value calculated by the equation (1) with the second detection results of second test samples,

a test result assessment unit, which is configured to assess whether the comparison result meets

the expectation, a machine learning revise unit, which is configured to determine the values of ITD

characteristic coefficients wo, when the comparison result meets the expectation, and to modify the ITD characteristics xo-n adopted in the equation (1) when the comparison result does not meet the expectation, and reset the ITD characteristic coefficients wo,,

wherein, the first detection results are high-throughput sequencing data, and the second detection results

are the mutation allele frequency of the ITD standard detection.

5. The apparatus according to claim 3 or 4, wherein the assessment is to be made whether the comparison result meets the expectation according to the following equation (2): 2 ,nfsamplesl(y5 -)

1- sampes'1y_,) (2).

in the equation (2), yj represents the second detection results of second test samples, , represents the mutation allele frequency calculated by the equation (1), ji, represents the mean value of second detection results of second test samples.

6. The apparatus according to any one of claims 1-5, wherein the ITD characteristic is selected from one or two or more of the following characteristics: the position of the occurring ITD, the length of the ITD, the nucleotide sequence characteristics of the ITD, the nucleotide sequence characteristics before and after the position of the occurring ITD, and the nucleotide sequence characteristics of a particular sequence.

7. A method for the quantitative ITD mutation detection based on sequencing data, including:

acquiring sequencing data of samples to be tested; extracting ITD characteristics of sequencing data of samples to be tested, wherein the ITD

characteristics are ITD characteristics of the whole region of the nucleic acid sequences or ITD

characteristics of specific regions of nucleic acid sequences; quantitatively detecting the ITD mutation allele frequency of samples to be tested, and

obtaining the quantitative detection result of samples to be tested based on ITD characteristic

coefficients and ITD characteristics of samples to be tested.

8. The method according to claim 7, wherein the quantitative detection step is performed by a

quantitative model represented by the following equation (1),

y(w,x) = wO +w1 x 1 +w 2x 2 +w 3x 3 +.--+wnx...(1)

in the equation (1), y(w, x) represents the ITD mutation allele frequency, wo-, represent the ITD characteristic coefficients, and xo-n represent the ITD characteristics.

9. The method according to claim 8, wherein the method for acquiring the ITD characteristic coefficients wo-n of the quantitative model includes:

obtaining the ITD characteristic coefficients wo-n by the machine learning of the training set,

wherein, the first detection result is the high-throughput sequencing data, and the second detection result

is the mutation allele frequency of the ITD standard detection.

10. The method according to claim 8, wherein the method for acquiring the ITD characteristic coefficients wo-n of the quantitative model includes: acquiring first detection results of first test samples and second detection results of first test samples, and acquiring first detection results of second test samples and second detection results of second test samples, using the first detection results of first test samples and second detection results of first test samples as a training set, and obtaining the ITD characteristic coefficients won by the machine learning of the training set, the first detection results of second test samples are used for testing, and the mutation allele frequency value calculated by the equation (1) is compared with the second detection results of second test samples to assess whether the comparison result meets the expectation, if the comparison result meets the expectation, the ITD characteristic coefficients wo, are determined; if the comparison result does not meet the expectation, the ITD characteristics xo, adopted in the equation (1) are modified, and the ITD characteristic coefficients wo, are reseted, wherein, the first detection result is the high-throughput sequencing data, and the second detection result is the mutation allele frequency of the ITD standard detection.

11. The method according to claim 9 or 10, wherein the assessment is to be made whether the comparison result meets the expectation according to the following equation (2): 1 2 ,nfsamples (y- R 1- >mfsamples-' 2 .(2) in the equation (2), y, represents the second detection results of second test samples, y represents the mutation allele frequency calculated by the equation (1), ji, represents the mean value of the second detection results of second test samples.

12. The method according to any one of claims 7-11, wherein the ITD characteristic is selected from one or two or more of the following characteristics: the position of the occurring ITD, the

length of the ITD, the nucleotide sequence characteristics of the ITD, the nucleotide sequence

characteristics before and after the position of the occurring ITD, and the nucleotide sequence characteristics of particular sequences.

13. An electronic device, including:

a processor; and a memory, in which the computer program instructions are stored, and when the computer program instructions are executed by the processor, the method for the quantitative ITD mutation detection based on sequencing data according to any one of claims 7-12 is performed by the processor.