CN115700557A

CN115700557A - Method, device and storage medium for classifying nucleic acid samples

Info

Publication number: CN115700557A
Application number: CN202211276142.1A
Authority: CN
Inventors: 李响; 李旭; 王贺; 张志明; 任毅; 赵礡
Original assignee: Beijing Kayudi Biotechnology Co ltd
Current assignee: Beijing Kayudi Biotechnology Co ltd
Priority date: 2022-10-19
Filing date: 2022-10-19
Publication date: 2023-02-07

Abstract

The present invention provides a method, apparatus and storage medium for classifying a nucleic acid sample. The method comprises the following steps: obtaining Polymerase Chain Reaction (PCR) curve data for a nucleic acid sample, the PCR curve data comprising a series of signal sample values corresponding to a number of PCR cycles; inputting the series of signal sampling values into a convolutional neural network as a one-dimensional vector; extracting curve trend characteristics of the PCR curve data by using a convolutional neural network, and determining the curve type of the PCR curve data according to the curve trend characteristics; and providing a classification result of the nucleic acid sample according to the curve type. According to the technology for classifying the nucleic acid samples, the PCR curve data of the nucleic acid samples can be automatically interpreted through the convolutional neural network, so that the human resources required by the interpretation of the nucleic acid results are greatly reduced on the basis of improving the interpretation accuracy of the nucleic acid results, and the efficiency and the accuracy of the inspection of the nucleic acid samples are effectively improved.

Description

Method, device and storage medium for classifying nucleic acid sample

Technical Field

The present invention relates to nucleic acid sample testing or detection. More particularly, the present invention relates to a method, apparatus and storage medium for classifying nucleic acid samples.

Background

Nucleic acid detection is a technique for determining whether a patient is infected with a virus by detecting the presence of deoxyribonucleic acid (DNA) and ribonucleic acid (RNA) of an invaded virus from the outside in a sample such as respiratory tract, blood or feces of the patient, and is an important standard for diagnosing the infection of a novel coronavirus (COVID-19).

The most common novel coronavirus detection method at present is a quantitative Polymerase Chain Reaction (PCR) nucleic acid detection method using a PCR technique, which facilitates detection of trace amounts of nucleic acid substances by promoting rapid amplification of DNA to a certain extent. For example, the PCR curve can be drawn to quantitatively analyze the nucleic acid sample by introducing a fluorescent dye or probe into the reaction system and monitoring the change of the amount of the amplified product in each cycle of the PCR amplification reaction in real time by means of the change of the fluorescent signal. However, the PCR curve depends heavily on manual interpretation at present, when the detection amount of nucleic acid is large, the efficiency of manual judgment is low, and especially, the timeliness requirement of epidemic situation prevention and control cannot be met during large-scale nucleic acid screening. Moreover, some complicated PCR curves are difficult to be manually interpreted, and the situation of misjudgment of detection results is possibly inevitable only through personal experience and subjective judgment of inspectors, so that high detection accuracy cannot be ensured. In general, no platform for efficient and automatic interpretation of PCR curves exists at present.

For novel coronavirus, nucleic acid detection is an effective means for accurate prevention and control, can discover an infection source as early as possible and control epidemic propagation from the source, and prevents huge challenges caused by incapability of timely diagnosis and timely isolation in the prevention and control of the whole epidemic. Under the control of normalized epidemic situation, the nucleic acid detection capability and efficiency have great strategic significance and value, and the contradiction between the rapid increase of the number of nucleic acid detections and the low-efficiency manual interpretation of inspectors is the pain point of nucleic acid detection, and is a problem which is always desired to be solved in the field of nucleic acid detection. In the case that the number of nucleic acid detections exceeds the interpretation ability of the tester, the nucleic acid detection report may not be obtained late, or the tester is prone to careless mistakes in the overload working condition, which is not favorable for effective prevention and control of the new crown epidemic situation.

Therefore, there is an urgent need for an automated interpretation technique for nucleic acid detection results, which can save the human resources for interpretation of nucleic acid detection results and effectively improve the accuracy and efficiency of interpretation.

Disclosure of Invention

According to one aspect of the present invention, there is provided a method for classifying a nucleic acid sample, comprising: obtaining Polymerase Chain Reaction (PCR) curve data for the nucleic acid sample, the PCR curve data comprising a series of signal sample values corresponding to a number of PCR cycles; inputting the series of signal sample values as a one-dimensional vector to a convolutional neural network; extracting a curve trend characteristic of the PCR curve data by using the convolutional neural network, and determining a curve type of the PCR curve data according to the curve trend characteristic; and providing a classification result of the nucleic acid sample according to the curve type.

In some embodiments, the convolutional neural network comprises a convolutional layer, a pooling layer, and a fully-connected layer, and wherein: the convolutional layer extracting local curve trend features of each part of the PCR curve data through moving scanning of convolutional kernels; the pooling layer is used for downsampling the local curve trend characteristics to filter fluctuations in the PCR curve data; and the full connection layer is used for integrating the local curve trend characteristics after down sampling so as to provide a classification result of the curve type.

In some embodiments, the convolutional neural network comprises a first convolutional layer, a first activation function layer, a second convolutional layer, a second activation function layer, a pooling layer, and a fully-connected layer, and the utilizing the convolutional neural network comprises: passing the one-dimensional vector to an input of the first convolution layer; passing an output of the first convolution layer to an input of the first activation function layer; passing an output of the first activation function layer to an input of the second convolution layer; passing an output of the second convolutional layer to an input of the second activation function layer; passing an output of the second activation function layer to an input of the pooling layer; passing an output of the pooling layer to an input of the fully-connected layer; and taking the output of the full connection layer as the classification result of the curve type.

In some embodiments, the convolutional neural network further comprises a bulk normalization layer, and wherein the input passing the one-dimensional vector to the first convolutional layer comprises: passing the one-dimensional vector to an input of the batch normalization layer; and passing an output of the bulk normalization layer to an input of the first convolution layer.

In some embodiments, the curve types include a positive trend type, a negative trend type, and a candidate trend type, and the classification result of the nucleic acid sample includes a positive result, a negative result, and a candidate result. The to-be-rechecked trend types further comprise weak positive trend types, weak negative trend types and abnormal trend types, and the to-be-rechecked result further comprises a weak positive result, a weak negative result and an abnormal result.

In some embodiments, the convolutional neural network is trained using a random gradient descent algorithm based on a training data sample set including a predetermined number of positive PCR curve data, negative PCR curve data, weak positive PCR curve data, weak negative PCR curve data, and abnormal PCR curve data.

In some embodiments, the method further comprises: determining whether the number of different classes of PCR curve data in the training data sample set is balanced; when the number is unbalanced, constructing new minority class PCR curve data based on the existing minority class PCR curve data and updating the training data sample set; and training the convolutional neural network based on the updated training data sample set.

According to another aspect of the present invention, there is provided an apparatus for classifying a nucleic acid sample, comprising: a memory storing computer instructions and a processor. The instructions, when executed by the processor, cause the processor to perform a method for classifying a nucleic acid sample, comprising: obtaining Polymerase Chain Reaction (PCR) curve data for the nucleic acid sample, the PCR curve data comprising a series of signal sample values corresponding to a PCR cycle number; inputting the series of signal sampling values as a one-dimensional vector to a convolutional neural network; extracting a curve trend characteristic of the PCR curve data by using the convolutional neural network, and determining a curve type of the PCR curve data according to the curve trend characteristic; and providing a classification result of the nucleic acid sample according to the curve type.

According to yet another aspect of the invention, there is provided a non-transitory computer readable storage medium storing instructions for causing a processor to perform a method for classifying a nucleic acid sample, comprising: obtaining Polymerase Chain Reaction (PCR) curve data for the nucleic acid sample, the PCR curve data comprising a series of signal sample values corresponding to a number of PCR cycles; inputting the series of signal sampling values as a one-dimensional vector to a convolutional neural network; extracting a curve trend characteristic of the PCR curve data by using the convolutional neural network, and determining a curve type of the PCR curve data according to the curve trend characteristic; and providing a classification result of the nucleic acid sample according to the curve type.

According to the technology such as the method, the equipment and the storage medium for classifying the nucleic acid sample, provided by the invention, the PCR curve data of the nucleic acid sample can be automatically interpreted through the convolutional neural network, so that the human resources required by the interpretation of the nucleic acid detection result are effectively reduced, and the interpretation efficiency is improved. In addition, the convolutional neural network is used for deeply learning the curve trend of various common types of PCR curves, the technology provided by the invention can be used for judging from the aspect of the essential form of the PCR curves, so that misjudgment caused by subjective factors or insufficient knowledge and experience of inspectors is avoided, the accuracy of nucleic acid result judgment is ensured, and the requirements of mathematical deduction calculation or curve fitting are omitted. In summary, the technology for classifying nucleic acid samples according to the embodiment of the present invention can greatly reduce human resources required for nucleic acid result interpretation on the basis of improving the accuracy of nucleic acid result interpretation, and further shorten the time period of nucleic acid detection, thereby providing reliable nucleic acid results in time especially in large-scale nucleic acid screening and other scenes, and having great strategic significance and practical value for controlling new crown epidemic situation.

Drawings

These and/or other aspects and advantages of the present invention will become more apparent and more readily appreciated from the following detailed description of the embodiments of the invention, taken in conjunction with the accompanying drawings of which:

FIG. 1 shows a schematic of a PCR curve for a positive nucleic acid sample.

FIG. 2 shows a schematic of a PCR curve for a negative nucleic acid sample.

FIG. 3 shows a schematic of a PCR curve for a weakly positive nucleic acid sample.

FIG. 4 shows a schematic of a PCR curve for a weakly negative nucleic acid sample.

FIG. 5 shows a schematic of a PCR curve for an abnormal nucleic acid sample.

FIG. 6 shows a schematic diagram of the manual interpretation of the existing PCR curve.

FIG. 7 shows a flow diagram of a method of classifying a nucleic acid sample according to an embodiment of the invention.

Fig. 8 is a schematic diagram illustrating an example of a structure of a convolutional neural network in a method of classifying a nucleic acid sample according to an embodiment of the present invention.

Fig. 9 is a schematic diagram showing another example of the structure of a convolutional neural network in the method of classifying a nucleic acid sample according to the embodiment of the present invention.

FIG. 10 shows a schematic diagram of a training process of a convolutional neural network according to an embodiment of the present invention.

Fig. 11 is a schematic diagram illustrating an example of updating a training sample set in a method of classifying a nucleic acid sample according to an embodiment of the present invention.

Fig. 12 is a schematic diagram illustrating another example of updating a training sample set in a method of classifying a nucleic acid sample according to an embodiment of the present invention.

FIG. 13 is a diagram illustrating the accuracy of a convolutional neural network model for a set of training data samples in a method of classifying a nucleic acid sample according to an embodiment of the present invention.

FIG. 14 shows a schematic diagram of the accuracy of a convolutional neural network model for a test data sample set in a method of classifying a nucleic acid sample according to an embodiment of the present invention.

FIG. 15 shows a block diagram of an apparatus for classifying nucleic acid samples according to an embodiment of the present invention.

Detailed Description

In order that those skilled in the art will better understand the present invention, the following detailed description of the invention is provided in conjunction with the accompanying drawings and the detailed description of the invention.

First, the basic background and the main idea of the present invention for the automated interpretation of nucleic acid detection results will be briefly summarized. As described above, in the conventional detection method for performing quantitative analysis on a nucleic acid sample based on a PCR curve obtained after amplification, the detection process depends heavily on manual interpretation of the PCR curve, and some PCR curves are difficult to be manually interpreted, which cannot meet the requirement for efficient and accurate detection. The characteristics of PCR curves for several types of nucleic acid samples will be described below with reference to FIGS. 1-5.

FIG. 1 shows a schematic of a PCR curve for a positive nucleic acid sample. As shown in FIG. 1, the abscissa represents the number of cycles of the PCR reaction over time, and the ordinate represents the sampled value of the fluorescence signal intensity after the end of each cycle in the PCR reaction, which reflects the viral nucleic acid content after the end of each cycle. For positive nucleic acid samples, it can be seen that the PCR curve obtained after amplification exhibits a distinct sigmoidal amplification curve.

FIG. 2 shows a schematic of a PCR curve for a negative nucleic acid sample. As shown in FIG. 2, the sampled values of the fluorescence signal intensities exhibited a relatively flat line for the negative nucleic acid samples.

FIG. 3 shows a schematic of a PCR curve for a weakly positive nucleic acid sample. As shown in FIG. 3, for the weakly positive nucleic acid sample, the sampled value of the fluorescence signal intensity shows a preliminary S-shaped amplification curve.

FIG. 4 shows a schematic of a PCR curve for a weakly negative nucleic acid sample. As shown in FIG. 4, the sampled values of fluorescence signal intensities for the weak negative nucleic acid samples showed a relatively flat straight line overall, but had some fluctuation tendency compared to the negative nucleic acid samples.

FIG. 5 shows a schematic of a PCR curve for an aberrant nucleic acid sample. It can be understood that the abnormal nucleic acid sample may be caused by various reasons, such as voltage, reagent kit solution ratio, introduction of pollutants during operation, and the like, so that the obtained PCR curve has abnormal conditions. As shown in fig. 5, in one exemplary case, the curves at the first few cycles of the PCR reaction have an abnormal upward trend.

The characteristics of the PCR curve of several types of nucleic acid samples are described above, and the method for quantitative analysis of the PCR curve in the prior art will be described below in conjunction with the PCR curve of a positive nucleic acid sample.

FIG. 6 shows a schematic diagram of the manual interpretation of the existing PCR curve. Similar to FIG. 1, the abscissa of the PCR curve in FIG. 6 represents the cycle number, and the ordinate represents a series of sampled values of the fluorescence intensity signal. As shown in FIG. 6, the fluorescence signal intensity does not change much in the first few cycles at the beginning of the reaction, and approaches a straight line, and then increases exponentially, and after reaching a certain number of cycles, the fluorescence signal intensity does not increase any more, as shown in the baseline period, the exponential amplification period, and the plateau period in FIG. 6, respectively. For the quantitative analysis of the PCR curve, a fluorescence threshold is set at 10 times of the standard deviation of the fluorescence signal intensity in the baseline period, and the fluorescence threshold line and the PCR curve form an intersection point, the abscissa corresponding to the intersection point represents the Ct value, which represents the number of amplification cycles that the fluorescence signal intensity has undergone when reaching the fluorescence threshold. The cycle number of the fluorescence signal reaching the fluorescence threshold is related to the virus nucleic acid concentration of the sample to be detected, wherein the higher the virus nucleic acid concentration is, the smaller the Ct value is, so that whether the nucleic acid sample is positive or not can be judged according to the Ct value.

At present, no technology for automatically interpreting the PCR curve of a nucleic acid sample exists, and interpretation work needs to be performed manually curve by curve. In particular, the examiner needs to have a deep knowledge of the PCR curve of each type of nucleic acid sample or to be strictly trained, and needs to manually interpret it in further combination with the Ct value according to whether the PCR curve exhibits a sigmoid amplification curve. For example, when the PCR curve has no sigmoid amplification curve and no Ct value (i.e., no intersection), the nucleic acid sample can be judged to be negative; when the Ct value of the PCR is less than or equal to the detection limit and presents a sigmoid amplification curve, the nucleic acid sample can be judged to be positive; when the PCR curve presents a preliminary S-shaped amplification curve, the Ct value is positioned in a gray area or other non-negligible curve fluctuation and other doubtful conditions exist, the type of the nucleic acid sample needing to be rechecked can be judged. Therefore, in the existing method for manually interpreting PCR curves, the examiner needs to have rich experience on several common types of curves and needs to perform mathematical derivation calculation, subjective factors exist and manual detection is inefficient.

Although some methods exist for analyzing data curves, most methods are not suitable for analyzing PCR curves of nucleic acid samples and have low accuracy. For example, the R language is a tool kit for statistical analysis, which can determine whether a PCR curve is positive or negative by curve fitting. However, some PCR curves are too complex to be accurately and properly fitted or cannot be fitted at all, and there is no R language program package designed for the PCR curve, so the accuracy of PCR curve classification for detecting new coronavirus nucleic acid using the R language program package is only about 60%. In addition, the R language is only used to mathematically fit the PCR curve to obtain the mathematical equation expression of each curve, and is not used to learn and predict what kind of trend the PCR curve should exhibit, which makes it difficult to apply to the complex task of detecting new coronavirus nucleic acid that requires high accuracy. Therefore, in the scenario of normalized nucleic acid detection, a technique suitable for efficiently, accurately and automatically interpreting a PCR curve of a nucleic acid sample is needed to make up for the blank in the field of nucleic acid detection.

In summary, the existing methods for performing the interpretation of the nucleic acid result according to the PCR curve have many disadvantages, including but not limited to:

(1) The manual interpretation of PCR is subjective, which results in different examiners giving different interpretation results even for the same nucleic acid sample, so that the reliability and confidence of the nucleic acid result cannot meet the desired standards.

(2) The standards of different detection mechanisms are not uniform, for example, the standards of some detection mechanisms are very strict and the standards of some detection mechanisms are relatively relaxed, so that the classification results given by different mechanisms have certain deviation. Moreover, each inspection structure requires a significant amount of time and money to train personnel to be on duty. With the real-time update of the detection standard, the inspector needs to be trained again, and meanwhile, the inspector may be confused between different standard versions, which is not favorable for the development of work.

(3) The nucleic acid detection result is manually given after the PCR curve is manually interpreted by the inspector, so that the risk of result entry error or even falsification of the detection report exists, unnecessary troubles are brought to the life of the inspector with the result being incorrectly entered, or the control of the epidemic situation is not facilitated.

(4) Under the condition that nucleic acid detection samples are huge and inspectors are in short supply, the manual low-efficiency interpretation work of the PCR curves of the nucleic acid samples is very easy to cause backlog, so that an inspector cannot obtain a nucleic acid inspection report in time, and serious inconvenience is brought to people in aspects of going out, hospitalizing, shopping and the like.

(5) The existing interpretation method not only needs to visually observe the trend of a PCR curve, but also needs to know the Ct value of the PCR curve, which is quite heavy workload and labor load, so that the condition that some nucleic acid samples are directly negative results without being interpreted subjectively, intentionally or accidentally occurs, and if the nucleic acid detection samples of positive patients exist and are not found in time, the spreading of epidemic situations and the out-of-control situation are easily caused.

Therefore, various defects of the existing method relying on the PCR curve manual interpretation exist and need to be solved. In view of the above, the present invention provides a technique for performing automatic interpretation of nucleic acid results based on trend characteristics of PCR curves, which does not need to rely on mathematical derivation calculation and curve fitting of PCR curves. Various embodiments for classifying a nucleic acid sample according to embodiments of the present invention will be described below with reference to the accompanying drawings.

Example 1

FIG. 7 shows a flow diagram of a method of classifying a nucleic acid sample according to an embodiment of the invention. As shown in fig. 7, the method comprises the steps of:

step S101, obtaining PCR curve data of the nucleic acid sample, where the PCR curve data includes a series of signal sampling values corresponding to a PCR cycle number. In the embodiment of the invention, the PCR curve data is obtained by measuring the fluorescence signal intensity along with each cycle number of the PCR reaction, and is used as a series of fluorescence signal intensity sampling values, which can reflect the change condition of the existence and the content of the virus nucleic acid after each amplification cycle is finished, so that the trend characteristic of the PCR curve can be extracted. The characteristics of several common types of PCR curves are shown in FIGS. 1-5, and are not described herein.

And S102, inputting the series of signal sampling values into a convolutional neural network as a one-dimensional vector. As described above, in view of the problems that the conventional manual PCR curve interpretation method is inefficient and the conventional R language method is not applicable, the present invention proposes a method based on a convolutional neural network to automatically interpret a PCR curve, learn the intrinsic morphology of various types of PCR curves through a multi-layer supervised learning neural network, and use the learned intrinsic morphology for classification prediction. Compared with a manual interpretation method relying on Ct value calculation and a mode relying on curve fitting for statistical analysis, the automatic interpretation method of the PCR curve based on the convolutional neural network can more accurately learn and discriminate the essential form of the PCR curve, and is not easily influenced by subjective factors and experiences of inspectors, complex curves, difficulty in fitting approximation under the condition of outliers and the like. In the embodiment of the invention, the fluorescence signal intensity sampling value recorded along with the increase of the PCR cycle number can be input into the one-dimensional convolution neural network as a one-dimensional vector, so that the automatic interpretation can be carried out.

It can be understood that by adopting the convolutional neural network to extract the features, the Ct value calculation and the fitting process of the PCR curve can be omitted, thereby avoiding the susceptibility of the mathematical calculation process to the influence of data noise and the influence of possible differences of individual curves. In addition, in the embodiment of the invention, the sampling signal value obtained in the PCR reaction is input into the convolutional neural network as a one-dimensional vector, the drawing process of the PCR curve is omitted, and meanwhile, the data object processed by the convolutional neural network is the original data value of the PCR curve instead of the drawn PCR curve image, so that the data operation amount of the convolutional neural network is also reduced.

And S103, extracting curve trend characteristics of the PCR curve data by using the convolutional neural network, and determining the curve type of the PCR curve data according to the curve trend characteristics. As mentioned above, the convolutional neural network can learn what kind of trend appearance each type of curve should have by training various PCR curves, so for a PCR curve to be tested, curve trend characteristics can be extracted through each neural network layer, and accordingly, the curve type can be determined. In an embodiment of the present invention, the curve types may include a positive trend type, a negative trend type, and a trend to review type, similar to the manual interpretation method described above in connection with fig. 6.

And step S104, providing a classification result of the nucleic acid sample according to the curve type. In the step, instead of performing nucleic acid detection based on the magnitude of the Ct value or a fitting formula, automatic classification is performed based on the curve type output by the convolutional neural network. In the embodiment of the present invention, the classification result of the nucleic acid sample may include a positive result, a negative result, and a result to be rechecked, corresponding to step S103. For example, the convolutional neural network of the embodiment of the present invention may provide respective confidence values that the PCR curve is respectively distinguished as a positive trend type, a negative trend type, and a to-be-re-checked trend type, so that classification can be performed according to the relative high or low of the confidence values. For example, the curve type with the highest confidence may be provided as the corresponding discrimination result, or the discrimination results and their confidence values may be provided in order of the confidence from high to low for reference. It is understood that the present invention may provide nucleic acid sample classification results in a variety of ways. The classification result may be notified to the user in various forms, for example, video, audio, or text. For example, visual interaction can be realized based on a convolutional neural network model, and when a user inputs PCR curve data to be detected, nucleic acid detection results can be presented in various manners as described above.

It should be noted that the three types of positive, negative and to-be-retested results described above are the most common criteria for the current practice of nucleic acid detection, among which: the positive result needs to be treated by emergent isolation treatment, the negative result is released, and the result to be rechecked comprises various uncertain and suspicious factors and can be confirmed only by sampling detection again. Therefore, in order to satisfy the partition criteria in the current practice, the classification results of nucleic acid samples may be set to the above three classes. However, the classification situation provided by the embodiment of the present invention is not limited thereto, and a finer-grained classification result may be provided. For example, the types of trends to be reviewed for the PCR curve may include a weak positive trend type, a weak negative trend type, and an abnormal trend type. Accordingly, the result to be rechecked of the nucleic acid sample may include a weak positive result, a weak negative result, and an abnormal result.

Specifically, considering that the standards in the new coronavirus sample collection and detection technology guide are continuously developed and updated, the existing three types of classification standards may not meet the subsequent finer-grained detection requirements. On the other hand, in the current detection process, the result to be rechecked is sampled again only by judging, and the curve trend characteristics are not further analyzed, so that the waste of the valuable nucleic acid sample big data is caused, the theoretical research on the curves is relatively deficient, and the progress of the nucleic acid detection practice is hindered. In view of this, in the embodiment of the present invention, five classification results of positive, negative, weak positive, weak negative and abnormal may be provided, and a curve trend morphology further refined may be learned and predicted through a convolutional neural network, which is helpful for medical workers to perform deep theoretical analysis on various PCR curves, and further has an important guiding significance on a nucleic acid detection practice level. Meanwhile, by performing finer-grained division on the nucleic acid sample, different treatment priorities can be respectively assigned to different types of detection results, such as: the positive results are still treated as the first treatment priority in an emergency, the weak positive results can be treated as the second treatment priority in a redundant situation, the weak negative results can be treated as the third treatment priority, the abnormal results can be treated as the fourth treatment priority, the negative results can be treated as the last priority or are not treated, and the last three situations can determine corresponding treatment schemes according to situations. Therefore, the task rationalization distribution scheduling of epidemic situation disposal can be realized, and especially under the condition of shortage of epidemic prevention manpower and material resources, the limited resources are distributed to the most urgent situation.

It should be noted that experimental research shows that the classification results of three classes can meet the classification standard of the current practical operation and provide relatively higher classification accuracy, while the classification results of five classes can provide more detailed classification results for theoretical analysis and practical guidance. Therefore, embodiments of the present invention may switch between two partition criteria to meet the desired detection requirements, for example, positive, negative and to-be-rechecked three types of partition results may be selected in case of a need for higher detection accuracy.

The above describes a method of extracting curve trend characteristics of PCR curve data using a convolutional neural network and performing nucleic acid sample classification based thereon. The structure of the convolutional neural network will be described below with reference to fig. 8 and 9. It should be noted that the structures of the convolutional neural networks shown in fig. 8 and fig. 9 are only a few schematic examples of the convolutional neural network used in the embodiment of the present invention, and the present invention is not limited thereto.

Fig. 8 is a schematic diagram showing one example of a structure of a convolutional neural network in the method of classifying a nucleic acid sample according to the embodiment of the present invention. As shown in fig. 8, the convolutional neural network of the embodiment of the present invention is a one-dimensional convolutional neural network, and includes at least a convolutional layer, a pooling layer, and a fully-connected layer.

Specifically, PCR curve data may be input as one-dimensional vectors to the convolutional layer, which in turn extracts local curve trend features for each portion thereof by a moving scan of the convolutional kernel. Just as human cognition on the outside world is from local to global, the convolutional layer of the embodiment of the invention also realizes comprehensive cognition on the whole PCR curve at higher fully-connected layers step by step through the recognition that the local part of the PCR curve is perceived first, so as to obtain the whole curve trend characteristic of the PCR curve. In embodiments of the invention, the convolutional layer may be provided to the pooling layer by applying its generated feature map as a local curve trend feature.

The pooling layer may then down-sample the received local curve trend characteristics to filter fluctuations in the PCR curve data. It will be appreciated that since the optical detection system may inevitably introduce some fluctuations in the PCR curve during the sampling of the fluorescence signal, or for other reasons introduce some noise in the PCR curve, which do not reflect the trend characteristics of the PCR curve, the pooling layer may be utilized in embodiments of the present invention to remove this part of the fluctuating noise without eliminating the fluctuating trend in the PCR curve that is closely associated with the type of PCR curve to cause loss of important information. In embodiments of the present invention, an average pooling layer or a maximum pooling layer may be employed, with the maximum pooling layer being preferred for better convolutional neural network model performance. In addition, by arranging the pooling layer, the data volume to be processed by the convolutional neural network can be reduced, and the generalization capability of the convolutional neural network can be improved.

Finally, the fully connected layer may integrate the downsampled local curve trend features to provide a classification result of the curve types. In the embodiment of the invention, the full-connection layer plays a role of a classifier in the whole convolutional neural network, namely, the result is identified and classified through the full-connection layer after passing through the deep network such as convolution, pooling and the like. As described above, the fully connected layer comprehensively recognizes the whole PCR curve at a higher layer, so as to obtain the whole classification result of the PCR curve. As previously mentioned, the classification results provided by the convolutional neural network may employ three or five classification criteria.

Fig. 9 is a schematic diagram showing another example of the structure of a convolutional neural network in the method of classifying a nucleic acid sample according to the embodiment of the present invention. As shown in fig. 9, the convolutional neural network according to the embodiment of the present invention includes at least a first convolutional layer, a first activation function layer, a second convolutional layer, a second activation function layer, a pooling layer, and a full-link layer. Although the convolutional neural network has a certain application in the fields of visual image processing, natural language processing and the like, these existing neural network structures cannot be applied to PCR curve data for nucleic acid detection, and cannot guarantee high accuracy requirements for nucleic acid detection. Therefore, the convolutional neural network structure provided by the invention is designed by specially considering the characteristics of the PCR curve data, and comprises the number and the arrangement relation of each network layer in the convolutional neural network and the parameters of each network layer. Specific parameters of the convolutional neural network will be described below.

For example, theoretical analysis of PCR curves shows that the curve type can be determined from the curve trend in several tens of cycles. Therefore, the PCR curve trend characteristics can be combined to design the number of convolution layers. For example, in an embodiment of the present invention, each PCR curve may include only 30 fluorescence signal sampling points to be input as a one-dimensional vector to the convolutional neural network, so as to extract therefrom the overall trend of the PCR curve. Because there are relatively few data points of the one-dimensional vector, in the embodiment of the present invention, too many convolutional layers are not needed to simplify the structure of the convolutional neural network as much as possible, and preferably, two convolutional layers may be provided to achieve the desired model detection performance. For other parameters in the convolutional neural network structure, the optimal parameter determination can be realized through the model parameter adjusting process, so that the optimal model classification prediction performance is realized. For example, for convolutional layers, it is necessary to determine parameters such as the number and size of convolutional kernels and the step size. The model parameter adjustment process will be described in detail below.

By combining the characteristics of the PCR curve and model performance analysis, the number of convolution layers in the convolutional neural network and convolution parameters thereof are found to have great influence on the model performance. Accordingly, the present invention proposes a combination of various model design parameters, and evaluates the performance of the model with respect to PCR curve data. By way of illustrative example, the corresponding results are as follows:

(1) The number of convolution layers is 1, the number of convolution kernels is 5, the size is 3 and the step length is 3.

(2) The number of convolutional layers is 1, the number of convolutional cores is 10, the size is 3, and the step length is 3.

(3) The number of the convolution layer is 2, the number of convolution kernels of the two convolution layers is 10 and 4 respectively, the sizes of the convolution kernels are 3 and 3 respectively, and the step lengths are 1 and 1 respectively.

(4) The number of the convolution layers is 2, the number of convolution kernels of the two convolution layers is 10 and 10 respectively, the sizes of the convolution kernels are 3 and 3 respectively, and the step lengths are 1 and 1 respectively.

(5) The number of the convolution layers is 3, the number of convolution kernels of the three convolution layers is 10, 4 and 4 respectively, the sizes of the convolution kernels are 3, 3 and 3 respectively, and the step lengths are 1, 1 and 1 respectively.

Through test evaluation, when satisfactory performance can be obtained under the model parameters in the step (3), the result of the primary judgment accuracy can reach 94%.

In addition, by combining PCR curve characteristics and model performance analysis, the type and parameters of the pooling layer in the convolutional neural network are also found to have a large influence on the model performance. Accordingly, embodiments of the present invention perform experiments under different pooling layer parameters to evaluate the performance of the model. By way of illustrative example, the corresponding results are as follows:

(1) The pooling layer type is average pooling, the kernel size is set to 3, and the step size is set to 2.

(2) The pooling layer type is average pooling, the kernel size is set to 5, and the step size is set to 2.

(3) The pooling layer type is maximum pooling, the kernel size is set to 3, and the step size is set to 2.

(4) The pooling layer type is maximum pooling, the kernel size is set to 5, and the step size is set to 2.

Correspondingly, the filtering effect of the pooling layer on data fluctuation noise, the loss degree of the interested curve fluctuation trend, the generalization capability performance of the convolutional neural network and other angles are comprehensively considered, and the satisfactory performance can be obtained under the model parameters in the step (3).

Accordingly, a process utilizing a convolutional neural network may include the following steps.

First, the PCR curve data is passed as a one-dimensional vector to the input of the first convolution layer. As described above, as a preferred example, the number and size of convolution kernels of the first convolution layer may be set to 10 and 3, and the step size may be set to 1, but the present invention is not limited thereto. For example, the dimension of the one-dimensional vector of the original PCR curve data may be relatively high and the information amount is often very large, so that the key trend features in all information need to be extracted through the convolutional layer, and then the simplified features are transmitted to the subsequent network for discrimination. The convolution kernel can also be viewed as a filter that convolves the one-dimensional vector with a different window of data from the original one-dimensional vector to obtain a new feature map. In a convolutional neural network, a convolution kernel may perform convolution calculations on local input data. And after each local data in one data window is calculated, the data window is continuously translated and slid until all data are calculated, wherein the number of convolution kernels determines the number of characteristic graphs and the output depth, and the step length determines how many steps can reach the data edge by sliding.

The output of the first convolution layer is then passed to the input of the first activation function layer. As an illustrative example, the first activation function layer may be selected as a modified linear element (ReLu) activation function. It should be noted that, in order to increase the nonlinearity of the convolutional neural network, an activation layer needs to be added to the convolutional neural network to enhance the fitting. The activation function is used for adding nonlinear factors and carrying out nonlinear mapping on the output result of the convolutional layer, so that the convolutional neural network can be applied to a plurality of nonlinear models. For example, the ReLU function is a piecewise linear function, which changes all negative values to 0 and positive values to zero, which has the advantages of simple calculation, fast convergence speed, and reduced overfitting.

Thereafter, the output of the first activation function layer is passed to the input of the second convolution layer. As described above, as a preferred example, the number and size of convolution kernels of the second convolution layer may be set to 4 and 3, and the step size may be set to 1, but the present invention is not limited thereto. For the sliding convolution calculation of the second convolution layer, reference may be made to the description of the first convolution layer, which is not repeated herein.

However, the output of the second convolution layer is passed to the input of the second activation function layer. As an illustrative example, the second activation-function layer may also be selected as a ReLu activation function. For details of the second activation function layer, reference may be made to the description of the first activation function layer, which is not repeated herein.

Thereafter, the output of the second activation function layer is passed to the input of the pooling layer. As described above, as a preferred example, the kernel size of the maximum pooling layer is set to 3, and the step size is set to 2, but the present invention is not limited thereto. In order to effectively reduce the calculation amount, a pooling layer can be adopted to simplify the result of the previous step, and only important information is reserved while unnecessary interference noise is removed, so that the calculation amount of a computer is reduced, and the calculation speed of the model is improved. In the pooling operation, a certain pooling region may be set according to the kernel size of the pooling layer, and then the pooling region may be converted into a corresponding value according to a certain rule, for example, a maximum value, an average value, etc. within the pooling region as a result of pooling. Preferably, in the embodiment of the present invention, the maximum pooling retains the maximum value in each pooled region, which is equivalent to retaining the optimal matching result of this region, so that important information is retained while simplifying and denoising.

Next, the output of the pooling layer is passed to the input of the fully connected layer. As an illustrative example, 3 fully-connected layers can be arranged behind the pooling layer for classification, wherein the numbers of neurons in the fully-connected layers of the first two layers can be 100 and 20 respectively, and the number of neurons in the fully-connected layer of the last layer is 3 or 5, which corresponds to the PCR curve trend type and the three-or five-class division standard for nucleic acid sample classification. As mentioned above, the convolutional layer extracts the local curve trend characteristics of the PCR curve, and the fully-connected layer is used for reassembling the previous local curve trend characteristics into global characteristics through the weight matrix for classification and judgment. For example, when three types of results including positive, negative and to-be-repeated need to be given, 3 neurons can be arranged in the last layer; and when five classification results of positive, negative, weak positive, weak negative and abnormal are required to be given, 5 neurons can be arranged in the last layer.

And finally, taking the output of the full connection layer as a curve type classification result. As previously described, the classification results of the nucleic acid samples may be provided for review based on the respective confidence values at which the PCR curves are respectively determined to be trending for the various types of curves. For example, the curve type with the highest confidence may be used as the corresponding discrimination result, or the discrimination results and their confidence values may be provided in order of high confidence to low confidence for reference. In one example, the curve types may include a positive trend type, a negative trend type, and a candidate trend type, and the classification result of the corresponding nucleic acid sample may include a positive result, a negative result, and a candidate result. For another example, the curve types may include a positive trend type, a negative trend type, a weak positive trend type, a weak negative trend type, and an abnormal trend type, and the classification results of the respective nucleic acid samples may include a positive result, a negative result, a weak positive result, a weak negative result, and an abnormal result.

The inventors have noted that, considering the large difference in the size of the curve value range of each PCR curve, directly inputting the PCR curve data into the convolutional neural network results in very low model accuracy. In view of this, in the embodiment of the present invention, a Batch Normalization layer (Batch Normalization) is added at the beginning of the convolutional neural network to perform Normalization processing, so as to improve the accuracy of model identification. Still referring to fig. 9, the convolutional neural network according to an embodiment of the present invention further includes a batch normalization layer. Accordingly, the input to pass the one-dimensional vector to the first convolution layer may include: passing the one-dimensional vector to an input of a batch normalization layer; and passing the output of the bulk normalization layer to the input of the first convolution layer.

As an illustrative example, the mean and standard deviation of the input one-dimensional vector may be calculated, and then each element of the one-dimensional vector minus the mean is divided by the standard deviation as a batch-normalized one-dimensional vector to be input to the subsequent convolutional layer. In this way, the convolutional neural network can achieve the expected classification prediction performance no matter the difference of the curve value range size of each PCR curve caused by various reasons is too large.

According to the method for classifying the nucleic acid samples, the PCR curve data of the nucleic acid samples can be automatically interpreted through the convolutional neural network, so that the human resources required by the interpretation of the nucleic acid detection results are effectively reduced, and the interpretation efficiency is improved. In addition, the method for classifying the nucleic acid samples provided by the invention can be used for judging the PCR curves from the aspect of essential forms of the PCR curves by deeply learning the trends of various common types of PCR curves by the convolutional neural network, so that the misjudgment caused by subjective factors or insufficient knowledge and experience of inspectors is avoided, the accuracy of nucleic acid result judgment is ensured, and the requirements of mathematical derivation calculation or curve fitting are eliminated. In summary, the method for classifying a nucleic acid sample according to the embodiment of the present invention can greatly reduce the human resources required for nucleic acid result interpretation on the basis of improving the accuracy of nucleic acid result interpretation, and further shorten the time period for nucleic acid detection, thereby providing reliable nucleic acid results in time, especially in large-scale nucleic acid screening and other scenarios.

For example, the technology for automatically interpreting PCR curve data of a nucleic acid sample provided by the present invention has significant beneficial technical effects, including but not limited to:

(1) The automatic interpretation is carried out through the convolutional neural network, so that the detection result is objective and fair, the given detection result can reflect the trend essence of a PCR curve better, and other interference noises are eliminated.

(2) The convolutional neural network model can provide a uniform discrimination standard, so that the result deviation caused by different standard grasping scales is avoided, and the requirements on theoretical knowledge and practical experience of inspection personnel are also saved. Meanwhile, when the inspection standard is updated, the judgment standard of the convolutional neural network is only needed to be updated to be put into use, so that an overlong transition period between different standard versions is avoided.

(3) The PCR curve data of each input nucleic acid sample is interpreted through the convolutional neural network, so that the detection of each nucleic acid sample can be traced, the risk of wrong input of a detection result or falsification of the detection result can be avoided through the tracing process, and the samples can be taken to perform iterative training of the convolutional neural network, so that the model parameters of the convolutional neural network are continuously optimized and perfected.

(4) The automatic interpretation is carried out through the convolutional neural network, the detection efficiency can be greatly improved, the overstock of work in the interpretation flow can not be caused, the problem that a detection report can not be obtained late can not be caused, and the real-time statistical analysis of epidemic situation situations can be facilitated.

(5) Carry out automatic interpretation through convolutional neural network, can manage the nucleic acid detection flow in an intelligent way, the condition that a certain nucleic acid sample is missed or the result is directly given without interpretation can not appear, and epidemic situation diffusion caused by the missed-detection or missed-interpretation positive patient is avoided.

(6) The automatic interpretation is carried out through the convolutional neural network, the national popularization in the industry of nucleic acid detection standards can be facilitated, the mutual recognition of nucleic acid results among all provinces is facilitated, and the global management and analysis of nucleic acid sample data and epidemic situation on the whole national level are facilitated.

Example 2

The above describes a process of nucleic acid classification using a convolutional neural network. The training process of the convolutional neural network according to the embodiment of the present invention will be described with reference to fig. 10 to 14.

FIG. 10 shows a schematic diagram of a training process of a convolutional neural network according to an embodiment of the present invention. In one embodiment of the invention, the convolutional neural network is obtained by training based on a training data sample set by using a random gradient descent algorithm. As shown in fig. 10, the PCR curve data and its known label (e.g., positive, negative, to-be-rechecked, etc.) may be input to the convolutional neural network, the PCR curve is subjected to feature extraction and classification prediction by the convolutional neural network, and then loss calculation is performed based on the known label and the prediction result, and the parameters of the convolutional neural network are updated accordingly. It should be noted that the present invention does not limit the specific calculation manner of the loss of the convolutional neural network. For example, in the training process of the convolutional neural network, the neural network parameters can be updated and the finally used neural network model can be determined by targeting the loss minimization of the convolutional neural network. Alternatively, the training of the convolutional neural network may be completed after a predetermined number of iterations is completed to determine the neural network model to be finally used.

Specifically, the set of training data samples includes a predetermined number of positive PCR curve data, negative PCR curve data, weak positive PCR curve data, weak negative PCR curve data, and abnormal PCR curve data. For example, original PCR curve sample data can be imported through an Excel file, where: 625 positive, 139 weak positive, 414 negative, 4 weak negative, 183 abnormal, total 1365.

In one example, in order to provide a convolutional neural network capable of outputting three types of detection results, namely a positive result, a negative result and a result to be rechecked, weak positive, weak negative and abnormal conditions can be firstly classified into a large type to be rechecked, namely positive and negative types, and three types of training data sample labels are provided. Then, for each PCR curve, selecting the fluorescence sampling point corresponding to 30 cycles and the curve type thereof as PCR curve data and sample label, and then performing the following steps: 3 dividing the original PCR curve sample data into a training data sample set and a testing data sample set. Next, the data of the training data sample set may be utilized, and the loss function is minimized by using a gradient descent method, so that the weight parameters in the convolutional neural network are reversely adjusted layer by layer, and the accuracy of the network is improved by frequent iterative training.

However, the inventors have noted that, since the number of different types of samples in the training data sample set may be unbalanced with respect to each other, for example, the data corresponding to positive results and negative results in the real world are relatively large, and the three types of weak negative, weak positive and abnormal results are not common. In this case, if these types of PCR curve data are directly used as a training data sample set to train the convolutional neural network, the model obtained by training is also biased, that is, the classification result is biased to the classification of the majority of types of samples, so that erroneous judgment occurs. In view of this, the method according to the embodiment of the present invention further includes a process of updating the training sample set. Fig. 11 is a schematic diagram illustrating an example of updating a training sample set in a method of classifying a nucleic acid sample according to an embodiment of the present invention. As shown in fig. 11, weak yang, weak yin, and abnormality can be classified into a large class to be rechecked, but the number of the class is still significantly less than that of the positive samples or the negative samples, in order to obtain a good model training effect, the PCR curve samples of a small number of classes to be rechecked can be analyzed, and new PCR curve samples to be rechecked constructed by manual simulation are added to the data set, so that the classes in the original data are not seriously unbalanced any more. Specifically, the update process of the training sample set may include the following steps.

First, it is determined whether the number of different classes of PCR curve data in the training data sample set is balanced. As shown on the left side of fig. 11, the number of class sample types to be reviewed is significantly smaller.

Second, when the number of different classes of PCR curve data is unbalanced, new minority class PCR curve data is constructed based on the existing minority class PCR curve data and the training data sample set is updated. In the embodiment of the present invention, a Smote (Synthetic Minrity Over-Sampling Technique) method may be adopted to perform a balancing process, for example, a new Minority class sample is constructed by a k-nearest neighbor algorithm (KNN). The specific process of constructing a new few samples is not described herein. In addition, new PCR curve data construction can also be performed on the remaining next few classes, so that the number of PCR curve data of all types is maintained at a similar level. As shown on the right side of fig. 11, the number of samples of each type may be substantially the same or approach a number level after the balancing process.

Finally, the convolutional neural network is trained based on the updated training data sample set. In an embodiment of the present invention, the training of the convolutional neural network employs a stochastic gradient descent algorithm (SGD). Preferably, the learning rate in the training process is set to be 0.04, and the Momentum (Momentum) is 0.9, and the loss function can be a cross-entropy loss function.

In another example, in order to provide a convolutional neural network capable of outputting five types of detection results of positive results, negative results, weak positive results, weak negative results, and abnormal results, positive, negative, weak positive, weak negative, and abnormal results may be respectively classified into one type, for a total of five training data sample labels. Then, similarly, for each PCR curve, the fluorescence sampling points and their curve types corresponding to 30 cycles were selected as the PCR curve data and sample labels, and then the PCR curve data and sample labels were calculated according to 7:3 dividing the original PCR curve sample data into a training data sample set and a testing data sample set. Next, similar training methods can be used, and are not described herein.

Also, in the example of providing the five types of detection results, the problem of the unbalanced number of different types of PCR curve data is relatively more serious, and therefore, in this example, the process of updating the training sample set may also be performed. Fig. 12 is a schematic diagram illustrating another example of updating a training sample set in a method of classifying a nucleic acid sample according to an embodiment of the present invention. As shown in fig. 12, in order to obtain a good model training effect, a few classes of PCR curve samples may be analyzed, and a new PCR curve sample to be reviewed, which is constructed through manual simulation, is added to a data set, so that the classes in the original data are not unbalanced any more. The specific procedure may refer to the processing described in fig. 11, which is not described in detail herein.

Next, a schematic diagram of the accuracy of the convolutional neural network is described with reference to fig. 13 and 14.

FIG. 13 is a diagram illustrating the accuracy of a convolutional neural network model for a set of training data samples in a method of classifying a nucleic acid sample according to an embodiment of the present invention. In the embodiment of the present invention, the training of the convolutional neural network may be ended when the convolutional neural network trains for about 500 rounds, or when the predicted performance of the convolutional neural network or the loss function value reaches a predetermined threshold value, as the trained model parameter. As shown in fig. 13, in the embodiment of the present invention, when the training of the convolutional neural network is finished, the accuracy of the model for the training data sample set is about 98%.

FIG. 14 shows a schematic diagram of the accuracy of a convolutional neural network model for a set of test data samples in a method of classifying a nucleic acid sample, according to an embodiment of the invention. As described above, the ratio of 7:3, dividing the original PCR curve sample data into a training data sample set and a testing data sample set, and testing and verifying the model performance based on the testing data sample set after the training is finished. As shown in fig. 14, the classification accuracy of the convolutional neural network for the test data sample set is around 94%, and only 24 curves are misjudged, where: 13 negative judgments are to be rechecked, 6 negative judgments are to be rechecked, 4 positive judgments are to be rechecked and 1 positive judgments are to be rechecked, and the conditions of negative judgments, positive judgments and negative judgments do not exist. From the test result, the convolutional neural network is adopted to classify and predict the PCR curve of the nucleic acid sample, the prediction effect is good, the high-accuracy prediction of negative, positive and required classes to be subjected to the rechecking is basically met, and the classes to be subjected to the rechecking can be subdivided according to the requirements. In the embodiment of the invention, the PCR curve data of the nucleic acid sample is automatically read through the convolutional neural network, the implementation scheme is simple, the classification prediction precision is high, the application requirements of a nucleic acid detection task, particularly large-scale nucleic acid screening, can be met, and the method has wide application and popularization values. Therefore, under a normalized nucleic acid detection scene, the method provided by the invention can be used for efficiently, accurately and automatically interpreting the PCR curve of the nucleic acid sample, fills the blank in the field of nucleic acid detection, has great strategic significance and practical value for controlling new crown epidemic situation, and has remarkable technical effect.

Example 3

According to another aspect of the present invention, there is provided an apparatus for classifying a nucleic acid sample. FIG. 15 shows a block diagram of an apparatus for classifying nucleic acid samples according to an embodiment of the present invention. As shown in fig. 15, the device 1000 includes a processor U1001 and a memory U1002.

The processor U1001 may be any processing capable device capable of carrying out the functions of embodiments of the present invention and may be, for example, a general purpose processor, a Digital Signal Processor (DSP), an ASIC, a Field Programmable Gate Array (FPGA) or other Programmable Logic Device (PLD), discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein.

The memory U1002 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM) and/or cache memory, as well as other removable/non-removable, volatile/nonvolatile computer system memory, such as a hard disk drive, floppy disk, CD-ROM, DVD-ROM, or other optical storage media.

In this embodiment, computer program instructions are stored in the memory U1002, and the processor U1001 may execute the instructions stored in the memory U1002. When the computer program instructions are executed by the processor, the processor is caused to perform the method for classifying a nucleic acid sample of an embodiment of the present invention. The method for classifying the nucleic acid sample is substantially the same as described above with respect to fig. 7-9, and thus will not be described in detail in order to avoid redundancy. As examples of devices, a computer, a server, a workstation, etc. may be included.

Based on the equipment for classifying the nucleic acid samples, an automatic COVID-19 virus nucleic acid interpretation system can be built. Specifically, the automatic interpretation system is mainly based on a convolutional neural network model, and can be equipped with a corresponding interaction hardware device, so as to realize visual interaction. When a user inputs PCR curve data to be detected into the system, curve trend feature extraction and classification prediction are carried out on the PCR curve through the convolutional neural network, and the system can automatically output interpretation results (such as positive, negative, weak positive, weak negative, abnormity and the like) of the PCR curve, which is not described herein again.

Example 4

The technique of classifying a nucleic acid sample according to the present invention may also be realized by providing a computer program product comprising program code implementing the method or apparatus, or by any storage medium having such a computer program product stored thereon.

The basic principles of the present invention have been described above with reference to specific embodiments, but it should be noted that the advantages, effects, etc. mentioned in the present invention are only examples and are not limiting, and the advantages, effects, etc. must not be considered to be possessed by various embodiments of the present invention. Furthermore, the foregoing disclosure of specific details is for the purpose of illustration and description and is not intended to be limiting, since the invention is not limited to the specific details described above. In addition, features from one embodiment may be combined with features of another embodiment or embodiments to yield yet further embodiments.

The block diagrams of devices, apparatus, apparatuses, systems involved in the present invention are by way of illustrative examples only and are not intended to require or imply that the devices, apparatus, apparatuses, systems must be connected, arranged, or configured in the manner shown in the block diagrams. These devices, apparatuses, devices, systems may be connected, arranged, configured in any manner, as will be appreciated by those skilled in the art. Words such as "including," "comprising," "having," and the like are open-ended words that mean "including, but not limited to," and are used interchangeably therewith. The words "or" and "as used herein mean, and are used interchangeably with, the word" and/or, "unless the context clearly dictates otherwise. The word "such as" is used herein to mean, and is used interchangeably with, the phrase "such as but not limited to".

Also, as used herein, "or" as used in a list of items beginning with "at least one" indicates a separate list, such that, for example, a list of "at least one of a, B, or C" means a or B or C, or AB or AC or BC, or ABC (i.e., a and B and C). Furthermore, the phrase "exemplary" does not mean that the described example is preferred or better than other examples.

It should also be noted that in the apparatus and method of the present invention, the components or steps may be disassembled and/or reassembled. These decompositions and/or recombinations are to be considered as equivalents of the present invention.

It will be understood by those of ordinary skill in the art that all or any portion of the methods and apparatus of the present invention may be implemented in hardware, firmware, software, or any combination thereof, in any computing device (including processors, storage media, etc.) or network of computing devices. The hardware may be implemented with a general purpose processor, a Digital Signal Processor (DSP), an ASIC, a field programmable gate array signal (FPGA) or other Programmable Logic Device (PLD), discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. A general-purpose processor may be a microprocessor, but in the alternative, the processor may be any commercially available processor, controller, microcontroller or state machine. A processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration. The software may reside in any form of computer readable tangible storage medium. By way of example, and not limitation, such computer-readable tangible storage media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other tangible medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. Disk, as used herein, includes Compact Disk (CD), laser disk, optical disk, digital Versatile Disk (DVD), floppy disk, and Blu-ray disk.

Various changes, substitutions, and alterations to the techniques described herein may be made without departing from the techniques of the teachings as defined by the appended claims. Moreover, the scope of the present claims is not intended to be limited to the particular aspects of the process, machine, manufacture, composition of matter, means, methods and acts described above. Processes, machines, manufacture, compositions of matter, means, methods, or acts, presently existing or later to be developed that perform substantially the same function or achieve substantially the same result as the corresponding aspects described herein may be utilized. Accordingly, the appended claims are intended to include within their scope such processes, machines, manufacture, compositions of matter, means, methods, or acts.

The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the invention. Thus, the present invention is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

The foregoing description has been presented for purposes of illustration and description. This description is not intended to limit embodiments of the invention to the form disclosed herein. While a number of example aspects and embodiments have been discussed above, those of skill in the art will recognize certain variations, modifications, alterations, additions and sub-combinations thereof.

Claims

1. A method for classifying a nucleic acid sample, comprising:

obtaining Polymerase Chain Reaction (PCR) curve data for the nucleic acid sample, the PCR curve data comprising a series of signal sample values corresponding to a number of PCR cycles;

inputting the series of signal sample values as a one-dimensional vector to a convolutional neural network;

extracting curve trend characteristics of the PCR curve data by using the convolutional neural network, and determining the curve type of the PCR curve data according to the curve trend characteristics; and

providing a classification result of the nucleic acid sample according to the curve type.

2. The method of claim 1, wherein the convolutional neural network comprises a convolutional layer, a pooling layer, and a fully-connected layer, and wherein:

the convolution layer extracts a local curve trend characteristic of each part of the PCR curve data through the moving scanning of a convolution kernel;

the pooling layer is used for downsampling the local curve trend characteristics to filter fluctuations in the PCR curve data; and

the full connection layer is used for integrating the local curve trend characteristics after down sampling so as to provide a classification result of the curve type.

3. The method of claim 1, wherein the convolutional neural network comprises a first convolutional layer, a first activation function layer, a second convolutional layer, a second activation function layer, a pooling layer, and a fully-connected layer, and the utilizing the convolutional neural network comprises:

passing the one-dimensional vector to an input of the first convolution layer;

passing an output of the first convolution layer to an input of the first activation function layer;

passing an output of the first activation function layer to an input of the second convolution layer;

passing an output of the second convolutional layer to an input of the second activation function layer;

passing an output of the second activation function layer to an input of the pooling layer;

passing an output of the pooling layer to an input of the fully-connected layer; and

and taking the output of the full connection layer as a classification result of the curve type.

4. The method of claim 3, wherein the convolutional neural network further comprises a bulk normalization layer, and

wherein the input to pass the one-dimensional vector to the first convolution layer comprises:

passing the one-dimensional vector to an input of the batch normalization layer; and

passing an output of the bulk normalization layer to an input of the first convolution layer.

5. The method of claim 4, wherein,

the number and the size of convolution kernels of the first convolution layer are 10 and 3, and the step length is 1;

the first activation function layer is a modified linear unit ReLu activation function;

the number and the size of convolution kernels of the second convolution layer are 4 and 3, and the step length is 1;

the second activation function layer is a modified linear unit ReLu activation function;

the pooling layer is the largest pooling layer, the size of the core is 3, and the step length is 2; and

the full-connection layer comprises 3 full-connection layers, the number of the neurons of the full-connection layer of the first two layers is 100 and 20 respectively, and the number of the neurons of the full-connection layer of the last layer is 3 or 5.

6. The method of any one of claims 3-5, wherein:

the curve types comprise a positive trend type, a negative trend type and a to-be-rechecked trend type,

the classification result of the nucleic acid sample comprises a positive result, a negative result and a result to be rechecked.

7. The method of claim 6, wherein:

the trend types to be rechecked further comprise a weak positive trend type, a weak negative trend type and an abnormal trend type,

the result to be rechecked further comprises a weak positive result, a weak negative result and an abnormal result.

8. The method of claim 7, wherein the convolutional neural network is trained using a stochastic gradient descent algorithm based on a training data sample set comprising a predetermined number of positive PCR curve data, negative PCR curve data, weak positive PCR curve data, weak negative PCR curve data, and abnormal PCR curve data.

9. The method of claim 8, wherein the method further comprises:

determining whether the number of different classes of PCR curve data in the training data sample set is balanced;

when the number is unbalanced, constructing new minority class PCR curve data based on the existing minority class PCR curve data and updating the training data sample set; and

training the convolutional neural network based on the updated training data sample set.

10. An apparatus for classifying a nucleic acid sample, comprising:

a memory having computer instructions stored thereon; and

a processor for processing the received data, wherein the processor is used for processing the received data,

wherein the instructions, when executed by the processor, cause the processor to perform the method of any of claims 1-9.

11. A non-transitory computer-readable storage medium storing instructions that cause a processor to perform the method of any one of claims 1-9.