WO2023123291A1

WO2023123291A1 - Time sequence signal identification method and apparatus, and computer readable storage medium

Info

Publication number: WO2023123291A1
Application number: PCT/CN2021/143406
Authority: WO
Inventors: 颜旭; 黎宇翔; 章文蔚; 徐讯; 曾涛
Original assignee: 深圳华大生命科学研究院
Priority date: 2021-12-30
Filing date: 2021-12-30
Publication date: 2023-07-06

Abstract

The present application provides a time sequence signal identification method and apparatus, and a computer readable storage medium. The method comprises: obtaining a time sequence signal to be identified; converting said time sequence signal into a two-dimensional image; and determining an identification result according to the two-dimensional image, the identification result comprising at least one of the following: whether said time sequence signal comprises a target signal, the type of the target signal, and the position of the target signal in said time sequence signal. Since the image identification technology is adopted, different types of target signals can be identified, it is not limited to target signals whose features are clearly easy to identify, and compared with an artificial recognition mode, the identification efficiency is relatively high.

Description

Time series signal identification method, device and computer-readable storage medium

technical field

The present application relates to the field of sequencing, in particular, to a time series signal identification method, device, computer-readable storage medium, processor and system.

Background technique

For the identification of time-series signals, that is, sequence signals (such as nanopore electrical signals), existing solutions generally adopt traditional time-series data analysis ideas, and use statistical analysis methods, manual extraction of special sequence features, etc. to perform similarity calculations and threshold filtering for identification. Therefore, the target signal in the timing signal is detected.

The existing technology has great limitations, and can only identify some target signals with obvious features that are easy to identify; the robustness is poor, and different types of target signals need to be designed separately, which is not efficient.

Contents of the invention

The main purpose of this application is to provide a time-series signal identification method, device, computer-readable storage medium, processor and system to solve the problem that the time-series signal identification method in the prior art can only identify target signals with obvious characteristics The problem.

In order to achieve the above object, according to one aspect of the present application, a time series signal identification method is provided, including: acquiring the time series signal to be identified; converting the time series signal to be identified into a two-dimensional image; according to the The two-dimensional image determines the recognition result, and the recognition result includes at least one of the following: whether the time-series signal to be recognized includes a target signal, the type of the target signal, the target signal in the time-series signal to be recognized position in .

Optionally, determining the recognition result according to the two-dimensional image includes: constructing an artificial intelligence model, the artificial intelligence model is obtained through training using multiple sets of training data, each of the multiple sets of training data The training data all include: historical two-dimensional images corresponding to historical time series signals and historical recognition results corresponding to the historical two-dimensional images; input the two-dimensional images into the artificial intelligence model for calculation, and obtain the recognition result.

Optionally, the artificial intelligence model includes a DBL module and/or a residual module, the DBL module includes a convolution layer, a batch normalization layer, and an activation layer, and the residual module includes the DBL module.

Optionally, determining the position of the target signal in the time series signal to be identified includes: determining the position of a sub-image corresponding to the target signal in the two-dimensional image; The position of the sub-image in the two-dimensional image determines the position of the target signal in the time-series signal to be identified.

Optionally, according to the position of the sub-image corresponding to the target signal in the two-dimensional image, determining the position of the target signal in the time series signal to be identified includes: acquiring the position of the two-dimensional image Width; obtain the pixel coordinates of the sub-image corresponding to the target signal in the two-dimensional image; obtain the total length of the time series signal; according to the width of the two-dimensional image, the pixel coordinates and the time series The total length of the signal determines the position of the target signal in the time series signal to be identified.

Optionally, before converting the time series signal to be identified into a two-dimensional image, the method further includes: performing filtering processing on the time series signal to be identified; a time axis of the time series signal to be identified Perform scaling.

Optionally, the time series signal is a sequencing time series.

According to one aspect of the present application, a time-series signal identification device is provided, including: an acquisition unit, configured to acquire a time-series signal to be identified; a conversion unit, configured to convert the time-series signal to be identified into a two-dimensional image; a first determining unit, configured to determine a recognition result based on the two-dimensional image, the recognition result including at least one of the following: whether the time series signal to be recognized includes a target signal, the type of the target signal, the The position of the target signal in the time series signal to be identified.

According to one aspect of the present application, a computer-readable storage medium is provided. The computer-readable storage medium includes a stored program, wherein when the program is running, the device where the computer-readable storage medium is located is controlled to execute any the method described.

According to one aspect of the present application, a system is provided, including a single-channel nanopore sequencing device, one or more processors, memory and one or more programs, wherein the one or more programs are stored in the In memory, and configured to be executed by the one or more processors, the one or more programs are included for performing any one of the methods described above.

Applying the technical solution of the present application, by obtaining the time series signal to be identified, then converting the time series signal to be identified into a two-dimensional image, and finally identifying whether the time series signal includes the target signal and the type of the target signal according to the two-dimensional image , the position of the above-mentioned target signal in the above-mentioned time series signal to be identified. Due to the use of image recognition technology, different types of target signals can be identified, and it is no longer limited to target signals with obvious features that are easy to identify. Compared with manual identification, the identification efficiency is higher.

Description of drawings

The accompanying drawings constituting a part of the present application are used to provide further understanding of the present application, and the schematic embodiments and descriptions of the present application are used to explain the present application, and do not constitute an improper limitation of the present application. In the attached picture:

Fig. 1 shows a flow chart of a time series signal identification method according to an embodiment of the present application;

FIG. 2 shows a schematic diagram of a time series signal according to an embodiment of the present application;

FIG. 3 shows a preprocessed time series signal according to an embodiment of the present application;

Fig. 4 shows a schematic diagram of a two-dimensional image according to an embodiment of the present application;

FIG. 5 shows a schematic diagram of a target signal according to an embodiment of the present application;

Fig. 6 shows a target detection framework yolov3 according to an embodiment of the present application;

FIG. 7 shows a schematic diagram of a DBL module according to an embodiment of the present application;

FIG. 8 shows a schematic diagram of a residual module according to an embodiment of the present application;

Fig. 9 shows a schematic diagram of a time-series signal identification device according to an embodiment of the present application.

Detailed ways

It should be noted that, in the case of no conflict, the embodiments in the present application and the features in the embodiments can be combined with each other. The present application will be described in detail below with reference to the accompanying drawings and embodiments.

In order to enable those skilled in the art to better understand the solution of the present application, the technical solution in the embodiment of the application will be clearly and completely described below in conjunction with the accompanying drawings in the embodiment of the application. Obviously, the described embodiment is only It is an embodiment of a part of the application, but not all of the embodiments. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without creative efforts shall fall within the scope of protection of this application.

It should be noted that the terms "first" and "second" in the description and claims of the present application and the above drawings are used to distinguish similar objects, but not necessarily used to describe a specific sequence or sequence. It should be understood that the data so used may be interchanged under appropriate circumstances for the embodiments of the application described herein. Furthermore, the terms "comprising" and "having", as well as any variations thereof, are intended to cover a non-exclusive inclusion, for example, a process, method, system, product or device comprising a sequence of steps or elements is not necessarily limited to the expressly listed instead, may include other steps or elements not explicitly listed or inherent to the process, method, product or apparatus.

It will be understood that when an element such as a layer, film, region, or substrate is referred to as being "on" another element, it can be directly on the other element or intervening elements may also be present. Also, in the specification and claims, when it is described that an element is "connected" to another element, the element may be "directly connected" to the other element, or "connected" to the another element through a third element.

As introduced in the background technology, the identification method of time series signals in the prior art can only identify target signals with obvious characteristics. Problem, the embodiments of the present application provide a time series signal identification method, device, computer-readable storage medium, processor and system.

According to an embodiment of the present application, a time series signal identification method is provided.

Fig. 1 is a flowchart of a time series signal identification method according to an embodiment of the present application. As shown in Figure 1, the method includes the following steps:

Step S101, acquiring time series signals to be identified;

Step S102, converting the above-mentioned time series signal to be identified into a two-dimensional image;

Step S103, determine the recognition result according to the above-mentioned two-dimensional image, the above-mentioned recognition result includes at least one of the following: whether the above-mentioned time-series signal to be recognized includes the target signal, the type of the above-mentioned target signal, the above-mentioned target signal in the above-mentioned time-series signal to be recognized position in .

Specifically, the above-mentioned time-series signal may be a time-series electrical signal of nanopore sequencing. When a nucleic acid sequence passes through a nanopore in a nanopore sequencer, the nanopore sequencer sequentially generates a corresponding time-series electrical signal. The above-mentioned nucleic acid sequence may include One or more nucleic acid subsequences, which may include one or more nucleotides, each nucleotide including a nitrogenous base. Understandably, the position of the target signal in the time-series signal to be identified is the position of the nucleic acid subsequence corresponding to the target signal in the nucleic acid sequence corresponding to the time-series electrical signal.

Optionally, the above-mentioned time-series signal is one-dimensional time-series data, and the above-mentioned target signal is one-dimensional time-series data.

Specifically, the time series signal to be identified is shown in FIG. 2 , which includes 5 repeated target signal segments. It should be noted that the time series signal in FIG. 2 can be colored.

Specifically, after acquiring the time series signal to be identified, it is saved to a hard disk file.

In the above scheme, by acquiring the time series signal to be identified, then converting the time series signal to be identified into a two-dimensional image, and finally identifying whether the time series signal includes the target signal, the type of the target signal, and the target signal based on the two-dimensional image The position of the signal in the above time series signal to be identified. Due to the use of image recognition technology, different types of target signals can be identified, and it is no longer limited to target signals with obvious features that are easy to identify. Compared with manual identification, the identification efficiency is higher.

It should be noted that the steps shown in the flowcharts of the accompanying drawings may be performed in a computer system, such as a set of computer-executable instructions, and that although a logical order is shown in the flowcharts, in some cases, The steps shown or described may be performed in an order different than here.

In one embodiment of the present application, determining the recognition result based on the above-mentioned two-dimensional image includes: constructing an artificial intelligence model, the above-mentioned artificial intelligence model is obtained through training using multiple sets of training data, and each of the above-mentioned multiple sets of training data Each set of training data includes: historical two-dimensional images corresponding to historical time series signals and historical recognition results corresponding to the above-mentioned historical two-dimensional images; the above-mentioned two-dimensional images are input into the above-mentioned artificial intelligence model for calculation, and the above-mentioned recognition results are obtained. That is, by building an artificial intelligence model, the recognition result can be determined more accurately based on the two-dimensional image. That is, it is determined according to the two-dimensional image whether the time-series signal to be identified includes the target signal, the type of the target signal, and the position of the target signal in the time-series signal to be identified.

In a specific embodiment, the above-mentioned artificial intelligence model includes a DBL module and/or a residual module, the above-mentioned DBL module includes a convolution layer, a batch normalization layer and an activation layer, and the above-mentioned residual module includes the above-mentioned DBL module. More specifically, the residual module is obtained by adding the input (input) after two DBL modules.

In an embodiment of the present application, determining the position of the above-mentioned target signal in the above-mentioned time series signal to be identified includes: determining the position of the sub-image corresponding to the above-mentioned target signal in the above-mentioned two-dimensional image; The position of the sub-image in the two-dimensional image determines the position of the target signal in the time-series signal to be identified. That is, the position of the target signal in the time series signal to be identified can be determined according to the position of the sub-image corresponding to the target signal in the two-dimensional image. Further determine the position of the base corresponding to the target signal in the original sequencing sequence.

In an embodiment of the present application, according to the position of the sub-image corresponding to the target signal in the above-mentioned two-dimensional image, determining the position of the above-mentioned target signal in the above-mentioned time series signal to be recognized includes: acquiring the above-mentioned two-dimensional image Width; obtain the pixel coordinates of the sub-image corresponding to the above-mentioned target signal in the above-mentioned two-dimensional image; obtain the total length of the above-mentioned time-series signal; determine according to the width of the above-mentioned two-dimensional image, the above-mentioned pixel coordinates and the total length of the above-mentioned time-series signal The position of the above-mentioned target signal in the above-mentioned time series signal to be identified.

Specifically, the position of the target signal in the two-dimensional image can be identified according to the two-dimensional image, and then the position of the target signal in the time-series signal can be determined according to the correspondence between the two-dimensional image and the time-series signal to be identified. For example, the size of the two-dimensional image is 400x100, that is, the width of the two-dimensional image is 400. After the target detection, the abscissa of the sub-image corresponding to the target signal is 150 pixels. At the same time, the sequence length of the time series signal before being converted into a picture is 10000, so the position of the target signal in the time series signal is 10000*150/400.

In order to realize the accurate determination of the artificial intelligence model, in one embodiment of the present application, the above-mentioned artificial intelligence model is a deep learning model, and constructing the above-mentioned artificial intelligence model includes: obtaining relevant parameters for model training, and the above-mentioned relevant parameters include optimizers, The learning rate and the number of training iterations; using the above-mentioned relevant parameters as standards, using multiple sets of the above-mentioned training data to train the above-mentioned artificial intelligence model. That is, by setting relevant parameters such as the optimizer, learning rate, and number of training iterations, the accuracy and generalization of the trained artificial intelligence model are higher.

In a specific embodiment of the present application, the deep learning network is mainly divided into three modules: a preprocessing layer, a feature mapping and fusion layer, and a prediction output layer, as shown in Table 1. In the preprocessing layer, a three-channel image data is used as the input data of the model. After data slicing and merging, the data is sent to a 3*3 convolutional layer, and then after multiple serial convolutions and bottleneck layers. The modular units feed data into the pooling layer and finally output to the feature map and fusion layer. Each bottleneck layer module unit here is spliced by three convolutional layers and N residual network units, and each convolutional layer is followed by data normalization and Leak relu activation function to process the data . In addition, in addition to the output of the pooling layer, plus the output of the bottleneck layer module unit, there are three feature data of different sizes as the input of the feature fusion layer to enter the next step of data processing. In the feature mapping and fusion layer, the feature data of three different scales output in the preprocessing layer are spliced with each other after a series of pooling, convolution and upsampling, and then three different scales are output after convolution processing. The feature map of dimension is fed as output to the resulting prediction layer. The purpose of convolution and data splicing in the feature mapping layer and fusion layer is not only to enable the model to capture more subtle features in the training of targets of different sizes so as to ensure the classification and prediction effect of the model for different targets, but also The spatial information capability of the features is guaranteed, which helps to locate the target accurately. In the output layer, the three features output by the feature extraction and fusion layers are respectively processed by convolution, data normalization, activation function, and reconvolution as output feature vectors for classification prediction and coordinate point calculation. In the model, three types of losses are designed as loss functions to calculate whether there is a target, the classification of the target, and the coordinate points of the target. Among them, the classification of whether to include the target and the target is calculated by cross entropy loss; the loss of the coordinate point of the target object is calculated by GIoU to calculate the distance loss between the predicted coordinate frame and the real frame.

Table 1 Deep learning network model table

Specifically, build a convolutional neural network-based deep learning model based on pytorch or tensorflow, and use the generated training set to complete the deep learning model training, and save the model file.

In yet another embodiment of the present application, the above-mentioned historical identification results are obtained by marking the above-mentioned historical two-dimensional images with a picture annotation tool, and building an artificial intelligence model includes: combining multiple groups of the above-mentioned historical two-dimensional images and the above-mentioned historical two-dimensional images The historical recognition results corresponding to the dimensional images are divided into a training set and a test set; the above-mentioned artificial intelligence model is trained by using the above-mentioned training set; and the above-mentioned artificial intelligence model is tested by using the above-mentioned test set. That is, use a rich training set to train the model to obtain the artificial intelligence model, and then use the test set to test the accuracy of the artificial intelligence model. If the accuracy does not meet the requirements, adjust the training set and conduct training again to guide the training. The accuracy of the artificial intelligence model is high.

Specifically, an image annotation tool is used to annotate the above-mentioned historical two-dimensional images, the target sequence signal can be selected with a rectangle, and then a file format that the model can read, such as xml, is generated to obtain the training data required by the model.

In another embodiment of the present application, multiple groups of the above-mentioned historical two-dimensional images and the historical recognition results corresponding to the above-mentioned historical two-dimensional images are divided into training sets and test sets, including: determining the division ratio; based on the above-mentioned division ratio, Dividing multiple groups of the above-mentioned historical two-dimensional images and the historical recognition results corresponding to the above-mentioned historical two-dimensional images into the above-mentioned training set and the above-mentioned test set. For example, take 60% of the data as the training set and 40% of the data as the test set. Of course, in practical applications, those skilled in the art can select an appropriate division ratio according to actual needs.

In yet another embodiment of the present application, before converting the above-mentioned time-series signal to be identified into a two-dimensional image, the above-mentioned method further includes: performing filtering processing on the above-mentioned time-series signal to be identified; The time axis is scaled. That is, in order to realize the accurate determination of the artificial intelligence model, the above-mentioned time series signals to be recognized are preprocessed first, and the sequencing sequence after preprocessing is shown in Figure 3, and then converted into a two-dimensional image, as shown in Figure 4 , the recognition results are shown in Figure 5. Specifically, the filtering process includes smoothing or denoising the time series signal. The target signal in the two-dimensional image is easier to identify by performing scaling processing on the time axis of the above-mentioned time series signal to be identified, specifically, the easier it is to be identified by the naked eye, the better.

Specifically, a combination of a downsampling algorithm and a filtering algorithm may also be used to perform smoothing and denoising processing on the above-mentioned time series signal to be identified.

In a specific embodiment of the present application, the time series signal is a sequencing time series. The sequencing time series are electrical signal time series and optical signal time series.

The embodiment of the present application also provides a time-series signal identification device. It should be noted that the time-series signal identification device in the embodiment of the present application can be used to implement the time-series signal identification method provided in the embodiment of the present application. The time series signal identification device provided by the embodiment of the present application is introduced below.

Fig. 9 is a schematic diagram of a time-series signal identification device according to an embodiment of the present application. As shown in Figure 9, the device includes:

An acquisition unit 10, configured to acquire a time series signal to be identified;

A conversion unit 20, configured to convert the above-mentioned time series signal to be identified into a two-dimensional image;

The first determination unit 30 is configured to determine a recognition result based on the above-mentioned two-dimensional image, and the above-mentioned recognition result includes at least one of the following: whether the above-mentioned time-series signal to be recognized includes a target signal, the type of the above-mentioned target signal, and whether the above-mentioned target signal is to be recognized. The position in the above time series signal of .

In the above solution, the acquisition unit acquires the time series signal to be identified, the conversion unit converts the time series signal to be identified into a two-dimensional image, and the first determination unit identifies whether the time series signal includes the target signal, the above target signal or not according to the two-dimensional image. The type of the target signal and the position of the above-mentioned target signal in the above-mentioned time series signal to be identified. Due to the use of image recognition technology, different types of target signals can be identified, and it is no longer limited to target signals with obvious features that are easy to identify. Compared with manual identification, the identification efficiency is higher.

In one embodiment of the present application, the first determining unit includes a building block and a computing module, and the building block is used to build an artificial intelligence model. The above-mentioned artificial intelligence model is obtained through training using multiple sets of training data. Among the above-mentioned multiple sets of training data Each set of training data includes: historical two-dimensional images corresponding to historical time series signals and historical recognition results corresponding to the above-mentioned historical two-dimensional images; the calculation module is used to input the above-mentioned two-dimensional images into the above-mentioned artificial intelligence model for calculation , to obtain the above recognition results. That is, by building an artificial intelligence model, the recognition result can be determined more accurately based on the two-dimensional image. That is, it is determined according to the two-dimensional image whether the time-series signal to be identified includes the target signal, the type of the target signal, and the position of the target signal in the time-series signal to be identified.

In an embodiment of the present application, the device further includes a second determination unit, the second determination unit is used to determine the position of the target signal in the time series signal to be identified, and the second determination unit includes a first determination module and a second determination module, the first determination module is used to determine the position of the sub-image corresponding to the above-mentioned target signal in the above-mentioned two-dimensional image; the second determination module is used to determine the position of the sub-image corresponding to the above-mentioned target signal in the above-mentioned two-dimensional image position, to determine the position of the target signal in the time series signal to be identified. That is, the position of the target signal in the time series signal to be identified can be determined according to the position of the sub-image corresponding to the target signal in the two-dimensional image. Further determine the position of the base corresponding to the target signal in the original sequencing sequence.

In an embodiment of the present application, the second determination module includes a second acquisition submodule, a third acquisition submodule, a fourth acquisition submodule, and a second determination submodule, and the second acquisition submodule is used to acquire the above-mentioned two-dimensional image width; the third acquisition sub-module is used to obtain the pixel coordinates of the sub-image corresponding to the above-mentioned target signal in the above-mentioned two-dimensional image; the fourth acquisition sub-module is used to obtain the total length of the above-mentioned time series signal; the second determination sub-module uses The position of the target signal in the time series signal to be identified is determined according to the width of the two-dimensional image, the pixel coordinates and the total length of the time series signal. Specifically, the position of the target signal in the two-dimensional image can be identified according to the two-dimensional image, and then the position of the target signal in the time-series signal can be determined according to the correspondence between the two-dimensional image and the time-series signal to be identified. For example, the size of the two-dimensional image is 400x100, that is, the width of the two-dimensional image is 400. After the target detection, the abscissa of the sub-image corresponding to the target signal is 150 pixels. At the same time, the sequence length of the time series signal before being converted into a picture is 10000, so the position of the target signal in the time series signal is 10000*150/400.

In order to realize the accurate determination of the artificial intelligence model, in one embodiment of the present application, the above-mentioned artificial intelligence model is a deep learning model, and the construction module includes a first acquisition sub-module and a first training sub-module, and the first acquisition sub-module is used for Obtain relevant parameters for model training, the above-mentioned relevant parameters include an optimizer, a learning rate and the number of training iterations; the first training sub-module is used to use the above-mentioned relevant parameters as a standard to train the above-mentioned artificial intelligence model by using multiple sets of the above-mentioned training data. That is, by setting relevant parameters such as the optimizer, learning rate, and number of training iterations, the accuracy and generalization of the trained artificial intelligence model are higher.

In yet another embodiment of the present application, the above-mentioned historical recognition results are obtained by using a picture annotation tool to mark the above-mentioned historical two-dimensional images. It is used to divide multiple groups of the above-mentioned historical two-dimensional images and the historical recognition results corresponding to the above-mentioned historical two-dimensional images into a training set and a test set; the second training submodule is used to use the above-mentioned training set to train the above-mentioned artificial intelligence model; The test sub-module is used to test the above-mentioned artificial intelligence model by using the above-mentioned test set. That is, use a rich training set to train the model to obtain the artificial intelligence model, and then use the test set to test the accuracy of the artificial intelligence model. If the accuracy does not meet the requirements, adjust the training set and conduct training again to guide the training. The accuracy of the artificial intelligence model is high.

In yet another embodiment of the present application, the division submodule includes a first determination submodule and a processing submodule, the first determination submodule is used to determine the division ratio; the processing submodule is used to combine multiple groups of the above history The two-dimensional images and the historical recognition results corresponding to the above-mentioned historical two-dimensional images are divided into the above-mentioned training set and the above-mentioned test set. For example, take 60% of the data as the training set and 40% of the data as the test set. Of course, in practical applications, those skilled in the art can select an appropriate division ratio according to actual needs.

In yet another embodiment of the present application, the above-mentioned device further includes a filtering unit and a scaling unit, and the filtering unit is used to filter the above-mentioned time-series signal to be recognized before converting the above-mentioned time-series signal to be recognized into a two-dimensional image ; The scaling unit is used to scale the time axis of the time series signal to be identified. That is, in order to realize the accurate determination of the artificial intelligence model, the above-mentioned time series signals to be recognized are preprocessed first, and the sequencing sequence after preprocessing is shown in Figure 3, and then converted into a two-dimensional image, as shown in Figure 4 , the recognition results are shown in Figure 5. Specifically, the filtering process includes smoothing or denoising the time series signal. The target signal in the two-dimensional image is easier to identify by performing scaling processing on the time axis of the above-mentioned time series signal to be identified, specifically, the easier it is to be identified by the naked eye, the better.

The time-series signal recognition device includes a processor and a memory, and the above-mentioned acquisition unit, conversion unit and first determination unit are all stored in the memory as program units, and the processor executes the above-mentioned program units stored in the memory to realize corresponding Function.

The processor includes a kernel, and the kernel fetches corresponding program units from the memory. One or more kernels can be set, and accurate identification of time series signals can be achieved by adjusting kernel parameters.

Memory may include non-permanent memory in computer-readable media, in the form of random access memory (RAM) and/or non-volatile memory, such as read-only memory (ROM) or flash memory (flash RAM), memory including at least one memory chip.

An embodiment of the present invention provides a computer-readable storage medium, the computer-readable storage medium includes a stored program, wherein when the program is running, the device where the computer-readable storage medium is located is controlled to execute the time series signal recognition methods.

An embodiment of the present invention provides a processor, the processor is used to run a program, wherein the time series signal identification method is executed when the program is running.

An embodiment of the present invention provides a system, including a single-channel nanopore sequencing device, one or more processors, memory, and one or more programs, wherein the above-mentioned one or more programs are stored in the above-mentioned memory, and are It is configured to be executed by the above-mentioned one or more processors, and the above-mentioned one or more programs include a method for performing any one of the above-mentioned methods.

An embodiment of the present invention provides a device. The device includes a processor, a memory, and a program stored on the memory and operable on the processor. When the processor executes the program, at least the following steps are implemented:

Step S101, acquiring time series signals to be identified;

The devices in this article can be servers, PCs, PADs, mobile phones, etc.

The present application also provides a computer program product, which, when executed on a data processing device, is adapted to execute a program initialized with at least the following method steps:

Step S101, acquiring time series signals to be identified;

Those skilled in the art should understand that the embodiments of the present application may be provided as methods, systems, or computer program products. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

The present application is described with reference to flowcharts and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the present application. It should be understood that each procedure and/or block in the flowchart and/or block diagram, and a combination of procedures and/or blocks in the flowchart and/or block diagram can be realized by computer program instructions. These computer program instructions may be provided to a general purpose computer, special purpose computer, embedded processor, or processor of other programmable data processing equipment to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing equipment produce a An apparatus for realizing the functions specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.

These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to operate in a specific manner, such that the instructions stored in the computer-readable memory produce an article of manufacture comprising instruction means, the instructions The device realizes the function specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.

These computer program instructions can also be loaded onto a computer or other programmable data processing device, causing a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process, thereby The instructions provide steps for implementing the functions specified in the flow chart or blocks of the flowchart and/or the block or blocks of the block diagrams.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

Memory may include non-permanent storage in computer readable media, in the form of random access memory (RAM) and/or nonvolatile memory such as read only memory (ROM) or flash RAM. The memory is an example of a computer readable medium.

Computer-readable media, including both permanent and non-permanent, removable and non-removable media, can be implemented by any method or technology for storage of information. Information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory (ROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Flash memory or other memory technology, Compact Disc Read-Only Memory (CD-ROM), Digital Versatile Disc (DVD) or other optical storage, Magnetic tape cartridge, tape magnetic disk storage or other magnetic storage device or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer-readable media excludes transitory computer-readable media, such as modulated data signals and carrier waves.

It should also be noted that the term "comprises", "comprises" or any other variation thereof is intended to cover a non-exclusive inclusion such that a process, method, article, or apparatus comprising a set of elements includes not only those elements, but also includes Other elements not expressly listed, or elements inherent in the process, method, commodity, or apparatus are also included. Without further limitations, an element defined by the phrase "comprising a ..." does not preclude the presence of additional identical elements in the process, method, article, or apparatus that includes the element.

Example

This embodiment relates to a specific time series signal recognition system, including a hardware environment and a software environment, as shown in Table 2.

Table 2 Hardware environment and software environment table

Since the algorithm module involves the reading, writing and storage of a large amount of data, as well as the training and testing of the model, a large number of data calculations and operations are required. Running in a high-performance CPU or hardware and software environment with GPU configuration can significantly improve efficiency and stability. sex.

The system also includes a single-channel nanopore sequencing device and a PC. Use the nanopore sequencing device to collect target library data, and save the collected signal data and other information as a file structure of h5, and store them in the hard disk of the PC in real time.

This embodiment uses the target detection framework yolov3, and the network architecture is shown in Figure 6, wherein the DBL module is composed of a convolutional layer (conv), a batch normalization layer (BN) and an activation function (Leaky ReLU). As shown in Figure 7, the residual module is obtained by adding the input after two DBL modules, as shown in Figure 8.

As shown in Figure 6, at the input end, a 416*416*3 image is input, preliminary feature extraction is completed through the DBL module and 3 residual modules, and then further feature learning at multiple scales is performed, and finally various Possible prediction frame coordinate information (that is, the coordinate information of the target to be detected in the picture). The use of three outputs (y1/y2/y3) here is based on FPN (feature pyramid networks), and multi-scale is used to detect targets of different sizes. The finer the grid, the finer the object can be detected. Finally, by setting the threshold and filtering according to the probability value of each prediction frame, the remaining most likely coordinate position information can be obtained. In terms of specific network construction, YOLOv3 can be implemented based on pytorch or tensorflow, and you can refer to open source code.

Data labeling and preprocessing: Before using the model training, it is necessary to convert the point signal data of the sample analysis into an image format, manually label it and pass it to the deep learning model for training and testing. First, the signal data is generated as a picture of the same size and saved. Considering the size of the signal data, in the process of picture generation, the data can be processed by downsampling and filtering to maintain the data form. Then, use tools such as roLabelImg (an open source image labeling tool) to label the type and relative coordinates of the classification signals contained in each picture, and save all result files.

Model training and testing: After data labeling is completed, the data images used for training and the corresponding labeling results need to be divided into a training set and a test set. In the training set production stage, you can use random sampling and set the ratio yourself to create a data set for deep learning training. Then, set the parameters of model training according to the application scenario, such as optimizer, learning rate, and number of training iterations, etc., to train and test the model. If the test results are not ideal, you can try to increase the number of data sets, modify the model training parameters, etc. to adjust the training, and repeat the iterative process until the model training results meet the requirements of the indicators.

Use of the model: After the model training is completed, it can be deployed and used for different scenarios. A two-dimensional image converted based on the sequencing electrical signal of the present invention is shown in FIG. 4 . Because during the sequencing process, some custom special base sequence fragments and other fragments are mixed, and the special base fragments will present special electrical signals during the sequencing process, as shown in Figure 4. It may take a lot of time and effort to manually screen out special target signals from all the sequencing electrical signals, so this model can be used to screen out one or more target signals that need to be analyzed in a large amount of chaotic signal data, as shown in the figure 5. In addition, the output of the model detection results includes the classification information and relative coordinates of the target signal, which also plays a great auxiliary role in the further analysis of the signal.

From the above description, it can be seen that the above-mentioned embodiments of the present application have achieved the following technical effects:

1), the time-series signal identification method of the present application obtains the time-series signal to be identified, then converts the time-series signal to be identified into a two-dimensional image, and finally identifies whether the time-series signal includes the target signal according to the two-dimensional image, The type of the target signal and the position of the target signal in the time series signal to be identified. Due to the use of image recognition technology, different types of target signals can be identified, and it is no longer limited to target signals with obvious features that are easy to identify. Compared with manual identification, the identification efficiency is higher.

2) In the time-series signal identification device of the present application, the acquisition unit acquires the time-series signal to be identified, the conversion unit converts the time-series signal to be identified into a two-dimensional image, and the first determination unit identifies the time-series signal according to the two-dimensional image Whether to include the target signal, the type of the target signal, and the position of the target signal in the time series signal to be identified. Due to the use of image recognition technology, different types of target signals can be identified, and it is no longer limited to target signals with obvious features that are easy to identify. Compared with manual identification, the identification efficiency is higher.

The above descriptions are only preferred embodiments of the present application, and are not intended to limit the present application. For those skilled in the art, there may be various modifications and changes in the present application. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of this application shall be included within the protection scope of this application.

Claims

A method for identifying time series signals, comprising:

Obtain the time series signal to be identified;

converting the time series signal to be identified into a two-dimensional image;

The recognition result is determined according to the two-dimensional image, and the recognition result includes at least one of the following: whether the time series signal to be recognized includes a target signal, the type of the target signal, and whether the target signal is in the target signal to be recognized. position in the time series signal.
The method according to claim 1, wherein determining the recognition result according to the two-dimensional image comprises:

Building an artificial intelligence model, the artificial intelligence model is obtained through training using multiple sets of training data, each set of training data in the multiple sets of training data includes: historical two-dimensional images corresponding to historical time series signals and the Describe the historical recognition results corresponding to the historical two-dimensional images;

The two-dimensional image is input into the artificial intelligence model for calculation to obtain the recognition result.
The method according to claim 2, wherein the artificial intelligence model comprises a DBL module and/or a residual module, the DBL module comprises a convolutional layer, a batch normalization layer and an activation layer, and the residual modules include the DBL module.
The method according to claim 1, wherein determining the position of the target signal in the time series signal to be identified comprises:

determining the position of the sub-image corresponding to the target signal in the two-dimensional image;

The position of the target signal in the time series signal to be identified is determined according to the position of the sub-image corresponding to the target signal in the two-dimensional image.
The method according to claim 4, wherein the position of the target signal in the time series signal to be identified is determined according to the position of the sub-image corresponding to the target signal in the two-dimensional image, include:

Acquiring the width of the two-dimensional image;

Acquiring pixel coordinates of the sub-image corresponding to the target signal in the two-dimensional image;

Obtain the total length of the time series signal;

According to the width of the two-dimensional image, the pixel coordinates and the total length of the time series signal, determine the position of the target signal in the time series signal to be identified.
The method according to any one of claims 1 to 5, wherein before converting the time series signal to be identified into a two-dimensional image, the method further comprises:

performing filtering processing on the time series signal to be identified;

Perform scaling processing on the time axis of the time series signal to be identified.
The method according to any one of claims 1 to 5, wherein the time series signal is a sequencing time series.
An identification device for a time series signal, characterized in that it comprises:

an acquisition unit, configured to acquire a time series signal to be identified;

a conversion unit, configured to convert the time series signal to be identified into a two-dimensional image;

The first determination unit is configured to determine a recognition result according to the two-dimensional image, the recognition result includes at least one of the following: whether the time series signal to be recognized includes a target signal, the type of the target signal, the target The position of a signal in said time series signal to be identified.
A computer-readable storage medium, characterized in that the computer-readable storage medium includes a stored program, wherein, when the program is running, the device where the computer-readable storage medium is located is controlled to execute any of claims 1 to 7. one of the methods described.
A system, characterized in that it includes a single-channel nanopore sequencing device, one or more processors, memory and one or more programs, wherein the one or more programs are stored in the memory and are Configured to be executed by the one or more processors, the one or more programs are included for performing the method of any one of claims 1-7.