CN111243666B

CN111243666B - Nextflow-based automatic analysis method and system for circular ribonucleic acid

Info

Publication number: CN111243666B
Application number: CN202010024079.7A
Authority: CN
Inventors: 蔡宏民; 魏焯辉
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2020-01-08
Filing date: 2020-01-08
Publication date: 2023-04-07
Anticipated expiration: 2040-01-08
Also published as: CN111243666A

Abstract

The embodiment of the invention provides a Nextflow-based automatic analysis method and system for cyclic ribonucleic acid. According to the embodiment of the invention, analysis software of a plurality of cyclic ribonucleic acids is integrated through a Nextflow framework, results analyzed by the plurality of software are compared, deduplicated and screened, and analysis results of different software are integrated to obtain a final result, so that a more comprehensive and accurate prediction and analysis report of the cicRNA can be obtained.

Description

Nextflow-based automatic analysis method and system for circular ribonucleic acid

Technical Field

The invention relates to the technical field of biological analysis and big data mining, in particular to a Nextflow-based automatic analysis method and system for cyclic ribonucleic acid.

Background

circRNA (circular ribonucleic acid) is a special circular small non-coding RNA, and is also a recent research hotspot in the RNA field. The circRNA is different from the traditional linear RNA, and the molecular structure of the circRNA has the characteristics of sealing and circularity, so that the circRNA is not influenced by RNA exonuclease, is not easy to degrade and is more stable in gene expression.

Research in recent years shows that the circRNA molecules are rich in binding sites of microRNA (miRNA), so that the circRNA has the function of absorbing the miRNA (miRNA sponge), the inhibition of the miRNA on corresponding target genes in cells is relieved, and the expression level of the target genes is increased. This mechanism of action is known as a competitive endogenous RNA mechanism. Through the interaction with miRNA related to diseases, the naturally generated circular RNA molecule influences gene expression, and plays an important role in regulation and control in the aspects of occurrence and development of diseases, growth and development of organisms, resistance to external environment and the like.

In order to better and more fully search for circRNA, a circular RNA prediction tool based on the number of RNA sequencing sequences has been developed in recent years, including: STAR-based CIRCCexplor 2, BWA-based CIRI, mapsplice, segemehl, bowtie 2-based Find _ circ.

However, the above listed software has disadvantages in finding cicRNA.

Both Mapsplice and STAR-based CIRCCexpolor 2 have low false positive rates and can output credible circRNA lists, but since files are annotated by means of known genes, the circRNA of de novo cannot be found

Although Segemehl can find the most circrnas, the operation time is long, the memory consumption is large, the hardware configuration is required to be certain, the false positive rate is high, and the obtained circRNA list needs to be judged to a certain extent.

The running time of Find _ circ and CIRI is shorter, the prediction result can be obtained faster than that of other software, however, the obtained quantity of circRNAs is less, the method is limited by the same alignment algorithm and reference genome, and the two kinds of software have the problem of missing some circRNAs for prediction.

Therefore, how to obtain a more comprehensive and accurate prediction analysis report of the cicRNA is a technical problem which needs to be solved urgently.

Disclosure of Invention

The invention aims to provide a Nextflow-based automatic analysis method and system for cyclic ribonucleic acid, which can integrate analysis software of a plurality of cyclic ribonucleic acids through a Nextflow framework, compare the results analyzed by the plurality of software, and synthesize the analysis results of different software to obtain a final result, thereby obtaining a more comprehensive and accurate prediction analysis report of the cicRNA.

In a first aspect, the embodiments of the present invention provide a method for automatically analyzing a circular ribonucleic acid based on Nextflow, comprising the following steps:

s1, performing quality control on input original gene data and a reference genome sequence, removing abnormal fragments with the mass fraction lower than a first set value and the GC content in the genome sequence higher than a second set value, and generating a quality control report; wherein, fastp and Multiqc software are used to implement step S1;

s2, comparing the sequence fragments of the input sample to a reference genome sequence to confirm the specific position of each sequence on the genome; wherein, STAR, BWA, bowtie2 and Bowtie software are used to implement step S2 independently;

s3, after the sequence fragments of the input sample are compared with the reference genome sequence, confirming the sequence type and the number of the cyclic ribonucleic acid, and annotating the name of the cyclic ribonucleic acid and the position of the chromosome where the cyclic ribonucleic acid is located through an annotation file; wherein CIRCCexplor 2 based on STAR, CIRI based on BWA, mapsplice based on Bowtie2, segemehl and Find _ circ software based on Bowtie2 are respectively used for independently realizing the step S3;

s4, merging and de-duplicating sequence types and numbers of the circular ribonucleic acids respectively obtained by CIRCeXplorer2 based on STAR, CIRI based on BWA, mapsplice based on Bowtie2, segemehl and Find _ circ software based on Bowtie 2;

s5, analyzing and interpreting the sequence types and the number of the combined and de-duplicated circular ribonucleic acids to generate a chart report obtained by aiming at the original data;

wherein steps S1-S5 are all in Nextflow.

Further, before the step S1, the Nextflow-based automated analysis method for circular ribonucleic acid further includes:

s0, establishing a comparative index file for the input genome sequence; wherein Bowtie, bowtie2 and STAR software are each used to implement step S0 independently.

Further, the merging and de-duplication of sequence types and numbers of circular ribonucleic acids respectively obtained by STAR-based circexplor 2, BWA-based CIRI, bowtie 2-based mapply, segemehl, and Bowtie 2-based Find _ circ software are specifically:

merging the data of the same type of cyclic ribonucleic acid, and deleting the data of the same type of cyclic ribonucleic acid before merging; wherein the final amount of the same type of the cyclic ribonucleic acids is the combined amount of the cyclic ribonucleic acids of the type, and the final amount is the average of the amounts of all the cyclic ribonucleic acids of the same type.

Further, if the detected cyclic ribonucleic acids are on the same chromosome, and the difference between the starting position of the base of the alignment result of the cyclic ribonucleic acid in the N-1 th row and the cyclic ribonucleic acid in the N-2 nd row in the order of order and the starting position of the base of the cyclic ribonucleic acid in the N-th row is less than or equal to 5, and the difference between the distance between the ending position of the base of the alignment result of the cyclic ribonucleic acid in the N-1 st row and the cyclic ribonucleic acid in the N-2 nd row in the order of order and the ending position of the base of the cyclic ribonucleic acid in the N-2 nd row is less than or equal to 5, the alignment result of the cyclic ribonucleic acid in the N-1 st row and the cyclic ribonucleic acid in the N-2 nd row is of the same type as the cyclic ribonucleic acid in the N-2 th row.

Further, the rank order columns are ranked by the position and number of one type of cyclic ribonucleic acid, and the rank order columns are: chromosome-base start position-base end position-number of cyclic ribonucleic acids of this type.

Further, the chart report includes information on position analysis of the cyclic ribonucleic acid, information on length analysis of the cyclic ribonucleic acid, information on number analysis of the cyclic ribonucleic acid, and information on type analysis of the cyclic ribonucleic acid.

Further, the Nextflow-based automated analysis method for the circular ribonucleic acid further comprises the following steps:

running an instruction to automatically execute the configuration operation of the software environment according to the preset configuration steps;

the method comprises the steps of automatically capturing the hardware configuration of a current server, and automatically modifying the parameters of software according to the hardware configuration of the server.

In a second aspect, the embodiments of the present invention further provide a Nextflow-based circular ribonucleic acid automated analysis system, including:

the quality control module is used for performing quality control on the input original gene data and the reference genome sequence, removing abnormal fragments with the mass fraction lower than a first set value and the GC content in the genome sequence higher than a second set value, and generating a quality control report; wherein, fastp and Multiqc software are used to realize the function of the quality control module;

the alignment module is used for aligning the sequence fragments of the input sample to the reference genome sequence so as to confirm the specific position of each sequence on the genome; STAR, BWA, bowtie2 and Bowtie software are used for independently realizing the function of the comparison module respectively;

the quantitative module is used for confirming the sequence type and the number of the cyclic ribonucleic acid after comparing the sequence fragments of the input sample to the reference genome sequence, and annotating the name of the cyclic ribonucleic acid and the position of the chromosome where the cyclic ribonucleic acid is located through an annotation file; wherein, CIRCCexplor 2 based on STAR, CIRI based on BWA, mapsplice based on Bowtie2, segemehl and Find _ circ software based on Bowtie2 are respectively used for independently realizing the function of the quantitative module;

a merging and de-duplication module for merging and de-duplicating the sequence types and the numbers of the circular ribonucleic acids respectively obtained by STAR-based CIRCCexplor 2, BWA-based CIRI, bowtie 2-based MapsPLice, segemehl and Bowtie 2-based Find _ circ software;

the report generation module is used for analyzing and interpreting the sequence types and the number of the combined and de-duplicated circular ribonucleic acids to generate a chart report aiming at the original data;

wherein the quality control module, the comparison module, the quantification module, the combined deduplication module and the report generation module are all in Nextflow.

Further, if the detected cyclic ribonucleic acids are on the same chromosome, and the difference between the starting position of the base of the comparison result between the cyclic ribonucleic acid in the N-1 th row and the cyclic ribonucleic acid in the N-2 nd row in the order and the starting position of the base of the cyclic ribonucleic acid in the N-th row is less than or equal to 5, and the difference between the distance between the ending position of the base of the comparison result between the cyclic ribonucleic acid in the N-1 st row and the cyclic ribonucleic acid in the N-2 nd row in the order and the ending position of the base of the cyclic ribonucleic acid in the N-2 nd row is less than or equal to 5, the comparison result between the cyclic ribonucleic acid in the N-1 st row and the cyclic ribonucleic acid in the N-2 nd row is of the same type as the cyclic ribonucleic acid in the N-2 nd row; wherein the order of the columns is ordered by the position and number of the one type of the cyclic ribonucleic acid, and the order of the columns is: chromosome-base start position-base end position-number of cyclic ribonucleic acids of this type.

According to the embodiment of the invention, a plurality of pieces of analysis software of the ring-shaped ribonucleic acid are integrated through a Nextflow framework, the results analyzed by the plurality of pieces of software are compared, deduplicated and screened, and the analysis results of different pieces of software are synthesized to obtain the final result, so that a more comprehensive and accurate prediction analysis report of the cicRNA can be obtained.

Drawings

FIG. 1 is a schematic diagram showing the use of the tools involved in the Nextflow-based automated analysis method for circular ribonucleic acid provided in example 1;

FIG. 2 is a schematic structural diagram of an automated Nextflow-based analysis system for circular RNA provided in example 2.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The terms "comprises" and "comprising" indicate the presence of the described features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

The term "and/or" refers to and includes any and all possible combinations of one or more of the associated listed items.

Nextflow is a reactive workflow framework and programming domain specific language that can simplify the writing of data intensive flows. The design concept is that the Linux platform is a universal language for data science. Linux provides many simple but powerful command line and script tools that, when linked together, can simplify complex data manipulation. Nextflow extends this approach, adding the ability to define complex program interactions and advanced parallel computing environments based on dataflow programming models.

circRNA (circular ribonucleic acid) is a special circular small non-coding RNA, and is also the latest research hotspot in the RNA field. The circRNA is different from the traditional linear RNA, and the molecular structure of the circRNA has the characteristics of sealing and circularity, so that the circRNA is not influenced by RNA exonuclease, is not easy to degrade and is more stable in gene expression.

Example 1:

referring to fig. 1, fig. 1 is a schematic view of the tool usage involved in steps S1-S5.

The embodiment of the invention provides a Nextflow-based automatic analysis method for circular ribonucleic acid, which comprises the steps S1-S5, wherein the steps S1-S5 are all in Nextflow.

S1, performing quality control on input original gene data and a reference genome sequence, removing abnormal fragments with the mass fraction lower than a first set value and the GC content in the genome sequence higher than a second set value, and generating a quality control report; wherein Fastp and Multiqc software are used to implement step S1.

Wherein the ratio of guanine and cytosine is referred to as GC content, the first set value is 0.4, and the second set value is 0.6. The Multiqc depends on the analysis result of the fastp, and the Multiqc carries out comprehensive statistics on the quality control result of the fastp.

S2, comparing the sequence fragments of the input sample to a reference genome sequence to confirm the specific position of each sequence on the genome; wherein the STAR, BWA, bowtie2 and Bowtie software are each used to implement step S2 independently.

S3, after comparing the sequence fragments of the input sample to the reference genome sequence, confirming the sequence type and the number of the circular ribonucleic acid, and annotating the name of the circular ribonucleic acid and the position of the chromosome where the circular ribonucleic acid is located through an annotation file; wherein, circexplor 2 based on STAR, CIRI based on BWA, mapplice based on Bowtie2, segemehl and Find _ circ software based on Bowtie2 are used to implement step S3 independently.

Among them, CIRCCexplor 2 based on STAR is based on the idea of using a fusion gene to detect circRNA. The main process comprises the following steps: first, short sequences that STAR cannot align are filtered out and aligned to the genome using Tophat-Fusion. Sequences that are aligned to non-linear candidate positions on the genome with Tophat-Fusion are potential head-to-tail junction sequences. These sequences will then, with the help of genetic annotation, determine a more precise donor and acceptor position. Finally, circ RNA was annotated.

BWA-based CIRI is mainly based on aligning sequences onto large genomes. The specific process is that firstly, an index is established for a large reference genome through a BWT compression algorithm, and then the sequence is compared to the genome. CIRI is characterized by rapidity, accuracy and memory saving.

Segemehl is a software that maps short sequence reads to a reference genome. Segemehl implements a matching policy based on an Enhanced Suffix Array (ESA). For each suffix of a sequence fragment, the goal of Segemehl is to find the best scoring seed. The seed may contain insertions, deletions and mismatches (differences). The number of allowed differences for a seed is user controlled [ parameter-D, -difference ], which is critical when the program is running.

Mapsplice, a highly specific and sensitive transcriptome sequencing alignment algorithm published by Kai Wang et al in 2010 on Nucleic Acids Research. MapSPLice does not depend on the nature of the cleavage site or the length of the intron, and it can better detect new classical and non-classical cleavage sites. Mapspice makes a good trade-off between the quality of alignment and the diversity of the sequences. The algorithm is divided into two steps: label alignment and stitching reasoning.

Find _ circ is based on Bowtie2 alignment. The key step in circular RNA prediction based on high throughput sequencing data is to find binding sequences that cannot be aligned continuously to the genome or transcriptome. To accomplish this, the first step is to align the RNA sequences to the genome and then search for unaligned sequences. Find _ circ aligns these unaligned sequences to the genome again, taking 20 bases on each side (ensuring unique alignment to the genome). Next, the GU/AG cleavage sites were determined by short sequence alignment to infer potential circular RNA sequences.

And S4, merging and de-duplicating the sequence types and the number of the circular ribonucleic acids respectively obtained by CIRCeXplorer2 based on STAR, CIRI based on BWA, mapsplice based on Bowtie2, segemehl and Find _ circ software based on Bowtie 2.

Step S4 can be realized through a shell script and a python script.

In the embodiment of the present invention, the sequence types and numbers of the cyclic ribonucleic acids obtained by circexplor 2 based on STAR, CIRI based on BWA, mapple based on Bowtie2, segemehl and Find _ circ software based on Bowtie2 are combined and de-duplicated, specifically:

Wherein, if the detected cyclic RNAs are in the same chromosome, and the difference between the starting position of the base of the comparison result between the cyclic RNAs in the N-1 st row and the cyclic RNAs in the N-2 nd row in the order and the starting position of the base of the cyclic RNAs in the N-th row is less than or equal to 5, and the difference between the distance between the ending position of the base of the comparison result between the cyclic RNAs in the N-1 st row and the cyclic RNAs in the N-2 nd row in the order and the ending position of the base of the cyclic RNAs in the N-2 nd row is less than or equal to 5, the comparison result between the cyclic RNAs in the N-1 st row and the cyclic RNAs in the N-2 nd row is of the same type as the cyclic RNAs in the N-th row. The order of the sequence of the order of the positions and the number of the types of the cyclic ribonucleic acids is: chromosome-base start position-base end position-number of cyclic ribonucleic acids of this type. The chromosomes, the base starting positions, the base ending positions and the number of the annular ribonucleic acids are arranged from small to large; wherein, the chromosomes are preferentially arranged from small to large, if the chromosomes are the same, the initial positions of the bases are preferentially arranged from small to large, and so on.

The data for the same type of circular ribonucleic acid were combined according to the following method:

if the cyclic ribonucleic acids in the N-1 row and the N row belong to the same type, taking the position of the row with the largest number as the position of the current merging result, wherein the position comprises the chromosome, the starting position of the base of the chromosome and the ending position of the base of the chromosome;

if the number of the circRNAs is the same, taking the position of the row with the longest circRNA length as the position of the current merging result;

if the lengths of the circRNAs are the same, the position of the row with the smallest starting position of the base is taken as the position of the current combination result.

It should be understood that in the present embodiment, the above-mentioned 5 software (STAR-based circexplor 2, BWA-based CIRI, bowtie 2-based mapply, segemehl, and Bowtie 2-based Find _ circ software) will obtain 5 different results for the same sample, and the final number of the cyclic rnas of this type is determined by taking the average of the numbers of all the cyclic rnas of the same type in these 5 results.

For the sake of understanding, the following description will be briefly made by taking three software examples.

For example: the software CIRCeXplorer2 based on STAR, abbreviated as A tool, detects a certain type of circular RNA at the following positions: the number of bases from 10 th to 100 th of chromosome 1 is 1. The method is simplified as follows: a:1-10-100-1

BWA-based CIRI software, referred to as the B tool, detects a certain type of circular rna at the following positions: chromosome 1 from 8 th to 98 th bases, which is 2 in number. The method is simplified as follows: b:1-8-98-2

Based on the Mapsplice software of Bowtie2, called C tool for short, a certain type of cyclic RNA is detected, whose position is: chromosome 1 from base 4 to base 94, which is 1 in number. The method is simplified as follows: c:1-4-94-1

Then, the ordering principle is according to the above principle: A. b, C has the sequence:

C：1-4-94-1

B：1-8-98-2

A：1-10-100-1

comparing in sequence, since |4-8| <5 and |94-98| <5, C, B belongs to the same type, so C and B need to be combined, since the amount of the ring-shaped ribonucleic acid of the type detected by C is 1,B and 2 of the amount of the ring-shaped ribonucleic acid is detected, the position of the row B is taken as the position of the current combination, namely the position after combination is B:1-8-98, in an amount that is averaged, i.e., (1+2)/2 =1.5; that is, the result after alignment (also referred to as the result after current combination) is B:1-8-98-1.5;

and continuously comparing the results after the current combination according to the sequence, namely comparing the results with the results of A: 1-10-100-1. Since |8-10| <5, |98-100| <5, i.e. the aligned result and the detected cyclic ribonucleic acid are the same type as A, the aligned result and A are combined to obtain the combined result at the position B:1-8-98-1.25.

And S5, analyzing and interpreting the sequence types and the number of the combined and de-duplicated circular ribonucleic acids to generate a chart report obtained by aiming at the original data.

The embodiment of the invention can be realized by analyzing the software R and the corresponding analysis code.

In an embodiment of the invention, the chart report comprises information on position analysis of circRNA, information on length analysis of circRNA, information on quantity analysis of circRNA, and information on type analysis of circRNA.

In addition, through a Nextflow framework, the embodiment of the invention can automatically connect gene analysis software of different steps and automatically process the software analysis result obtained in each step, thereby improving the analysis efficiency of a machine, reducing artificial participation and improving the analysis efficiency in accuracy.

In a preferred embodiment, the Nextflow-based automated analysis method for circular ribonucleic acids further comprises:

The configuration of the software environment requires manual operation and requires complex steps for configuration, but in the embodiment of the invention, the steps are written into the instruction in advance, so that the user only needs to operate the instruction, and the system (software) automatically executes the operations of downloading, installing and configuring. Wherein the user can cause the instruction to run by clicking a button.

Because some parameters of software operation are different due to different hardware configurations of the server, the parameters of the software need to be set.

Preferably, different software of the same step can run in parallel, so that the resource is utilized to the maximum extent, and the analysis time is saved as much as possible.

Example 2:

the embodiment of the invention also provides a Nextflow-based automatic analysis system for the circular ribonucleic acid, which comprises the following steps:

the quality control module 11 is used for performing quality control on the input original gene data and the reference genome sequence, removing abnormal fragments with the mass fraction lower than a first set value and the GC content in the genome sequence higher than a second set value, and generating a quality control report; wherein, fastp and Multiqc software are used to realize the function of the quality control module;

an alignment module 12 for aligning the sequence fragments of the input sample to the reference genome sequence to confirm the specific position of each sequence on the genome; STAR, BWA, bowtie2 and Bowtie software are used for independently realizing the function of the comparison module respectively;

a quantitative module 13, configured to, after comparing the sequence fragments of the input sample to the reference genome sequence, determine the sequence type and number of the cyclic ribonucleic acid, and annotate the name of the cyclic ribonucleic acid and the position of the chromosome where the cyclic ribonucleic acid is located through an annotation file; wherein, CIRCCexplor 2 based on STAR, CIRI based on BWA, mapsplice based on Bowtie2, segemehl and Find _ circ software based on Bowtie2 are respectively used for independently realizing the function of the quantitative module;

a combining and de-duplication module 14, configured to combine and de-duplicate sequence types and numbers of cyclic ribonucleic acids respectively obtained by STAR-based circxplor 2, BWA-based CIRI, bowtie 2-based mapply, segemehl, and Bowtie 2-based Find _ circ software;

a report generation module 15, configured to analyze and interpret the sequence type and number of the merged and deduplicated cyclic ribonucleic acids, and generate a chart report obtained for the raw data;

wherein, the quality control module 11, the comparison module 12, the quantification module 13, the combined deduplication module 14 and the report generation module 15 are all in Nextflow.

In one preferred embodiment, the sequence types and numbers of the cyclic ribonucleic acids obtained by STAR-based CIRCAXPLORer 2, BWA-based CIRI, bowtie 2-based MapsPLice, segemehl and Bowtie 2-based Find _ circ software are combined and de-duplicated, specifically,

In a preferred embodiment, if the detected cyclic RNAs are on the same chromosome, and the difference between the start position of the base in the alignment result of the cyclic RNA in the N-1 st row and the cyclic RNA in the N-2 nd row in the order of the order and the start position of the base in the cyclic RNA in the N-th row is less than or equal to 5, and the difference between the distance between the end position of the base in the alignment result of the cyclic RNA in the N-1 st row and the cyclic RNA in the N-2 nd row in the order and the end position of the base in the cyclic RNA in the N-2 nd row is less than or equal to 5, the alignment result of the cyclic RNA in the N-1 st row and the cyclic RNA in the N-2 nd row is of the same type as the cyclic RNA in the N-th row; wherein the rank ordered columns are ordered by the position and number of one type of circular ribonucleic acid, and the rank ordered columns are: chromosome-base start position-base end position-number of cyclic ribonucleic acids of this type.

It should be noted that the embodiments of the present invention provide a system corresponding to the Nextflow-based automated rna analysis method of example 1, and therefore, the embodiments of the present invention will not be described in detail herein.

While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention.

Claims

1. An automatic analysis method of circular ribonucleic acid based on Nextflow, which is characterized by comprising the following steps:

wherein, the steps S1-S5 are all in Nextflow;

if the detected cyclic RNAs are on the same chromosome, and the difference between the start position of the base of the comparison result between the cyclic ribonucleic acid in the N-1 st row and the cyclic ribonucleic acid in the N-2 nd row in the order of ordering and the start position of the base of the cyclic ribonucleic acid in the N-2 th row is less than or equal to 5, and the difference between the end position of the base of the comparison result between the cyclic ribonucleic acid in the N-1 st row and the cyclic ribonucleic acid in the N-2 nd row in the order of ordering and the end position of the base of the cyclic ribonucleic acid in the N-2 nd row is less than or equal to 5, the comparison result between the cyclic ribonucleic acid in the N-1 st row and the cyclic ribonucleic acid in the N-2 nd row is of the same type as the cyclic ribonucleic acid in the N-2 nd row.

2. The Nextflow-based automated analysis method for circular ribonucleic acids according to claim 1, further comprising, before step S1:

3. The Nextflow-based automated analysis method for circular ribonucleic acids according to claim 1 or 2, characterized in that the sequence types and the number of circular ribonucleic acids obtained by STAR-based CIRCCexplor 2, BWA-based CIRI, bowtie 2-based Mapsplice, segemehl and Bowtie 2-based Find _ circ software are combined and de-duplicated, specifically:

4. Nextflow-based automated analytical method of circular ribonucleic acids according to claim 3, characterised in that the rank ordered columns are ordered by the position and number of one type of circular ribonucleic acid, and the rank ordered columns are: chromosome-base start position-base end position-number of circular ribonucleic acid of the type.

5. The Nextflow-based automated analysis method for cyclic ribonucleic acids according to claim 4, wherein the chart report includes information on the location analysis of cyclic ribonucleic acids, information on the length analysis of cyclic ribonucleic acids, information on the number analysis of cyclic ribonucleic acids, and information on the type analysis of cyclic ribonucleic acids.

6. The Nextflow-based automated analysis method for circular ribonucleic acids according to claim 1, further comprising:

running an instruction to automatically execute the operation of configuring the software environment according to the preset configuration steps;

automatically capturing the hardware configuration of the current server, and modifying the parameters of the software according to the hardware configuration of the server.

7. An automated Nextflow-based circular ribonucleic acid analysis system, comprising:

the quantitative module is used for confirming the sequence type and the number of the cyclic ribonucleic acid after comparing the sequence fragments of the input sample to the reference genome sequence, and annotating the name of the cyclic ribonucleic acid and the position of the chromosome where the cyclic ribonucleic acid is located through an annotation file; wherein CIRCeXplorer2 based on STAR, CIRI based on BWA, mapsplice based on Bowtie2, segemehl and Find _ circ software based on Bowtie2 are respectively used for independently realizing the functions of the quantitative module;

wherein the quality control module, the comparison module, the quantification module, the combined deduplication module and the report generation module are all in a Nextflow;

8. The Nextflow-based automated analysis system for RNA of claim 7, wherein the sequences of types and amounts of RNA obtained from STAR-based CIRCCexplor 2, BWA-based CIRI, bowtie 2-based Mapsplice, segemehl, and Bowtie 2-based Find _ circ software are combined and de-duplicated, specifically,

9. The Nextflow-based automated analysis system for cyclic ribonucleic acids according to claim 8, characterized in that the rank ordered columns are ordered by the position and number of one type of cyclic ribonucleic acid: chromosome-base start position-base end position-number of cyclic ribonucleic acids of this type.