CN113870950B - Identification system and identification method for key sRNA of rice infected by Pyricularia oryzae - Google Patents
Identification system and identification method for key sRNA of rice infected by Pyricularia oryzae Download PDFInfo
- Publication number
- CN113870950B CN113870950B CN202111047153.8A CN202111047153A CN113870950B CN 113870950 B CN113870950 B CN 113870950B CN 202111047153 A CN202111047153 A CN 202111047153A CN 113870950 B CN113870950 B CN 113870950B
- Authority
- CN
- China
- Prior art keywords
- rice
- srna
- data
- network
- rice blast
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 235000007164 Oryza sativa Nutrition 0.000 title claims abstract description 140
- 235000009566 rice Nutrition 0.000 title claims abstract description 140
- 108020004566 Transfer RNA Proteins 0.000 title claims abstract description 121
- 241001330975 Magnaporthe oryzae Species 0.000 title claims abstract description 92
- 238000000034 method Methods 0.000 title claims abstract description 51
- 240000007594 Oryza sativa Species 0.000 title 1
- 241000209094 Oryza Species 0.000 claims abstract description 139
- 230000001717 pathogenic effect Effects 0.000 claims abstract description 69
- 230000003993 interaction Effects 0.000 claims abstract description 59
- 239000000126 substance Substances 0.000 claims abstract description 37
- 238000003860 storage Methods 0.000 claims abstract description 31
- 238000010276 construction Methods 0.000 claims abstract description 29
- 238000007781 pre-processing Methods 0.000 claims abstract description 19
- 238000012545 processing Methods 0.000 claims abstract description 16
- 238000005065 mining Methods 0.000 claims abstract description 15
- 230000033228 biological regulation Effects 0.000 claims description 44
- 108090000623 proteins and genes Proteins 0.000 claims description 26
- 208000015181 infectious disease Diseases 0.000 claims description 19
- 230000008569 process Effects 0.000 claims description 16
- 230000014509 gene expression Effects 0.000 claims description 14
- 102000004169 proteins and genes Human genes 0.000 claims description 14
- 108010026552 Proteome Proteins 0.000 claims description 9
- 238000004458 analytical method Methods 0.000 claims description 9
- 108020004999 messenger RNA Proteins 0.000 claims description 9
- 241000894007 species Species 0.000 claims description 7
- 241000894006 Bacteria Species 0.000 claims description 5
- 230000009467 reduction Effects 0.000 claims description 5
- 238000012216 screening Methods 0.000 claims description 5
- 238000012706 support-vector machine Methods 0.000 claims description 4
- 238000012800 visualization Methods 0.000 claims description 4
- 238000007405 data analysis Methods 0.000 claims description 2
- 238000009966 trimming Methods 0.000 claims description 2
- 238000011144 upstream manufacturing Methods 0.000 claims description 2
- 230000001105 regulatory effect Effects 0.000 abstract description 3
- 239000011159 matrix material Substances 0.000 description 11
- 208000031888 Mycoses Diseases 0.000 description 6
- 230000007918 pathogenicity Effects 0.000 description 6
- 238000013135 deep learning Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 5
- 230000010354 integration Effects 0.000 description 5
- 238000011160 research Methods 0.000 description 4
- 239000008186 active pharmaceutical agent Substances 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 238000010998 test method Methods 0.000 description 3
- 241000233866 Fungi Species 0.000 description 2
- 101100112673 Rattus norvegicus Ccnd2 gene Proteins 0.000 description 2
- 230000003321 amplification Effects 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 239000013256 coordination polymer Substances 0.000 description 2
- 201000010099 disease Diseases 0.000 description 2
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000003199 nucleic acid amplification method Methods 0.000 description 2
- 238000013441 quality evaluation Methods 0.000 description 2
- 230000007067 DNA methylation Effects 0.000 description 1
- 241000196324 Embryophyta Species 0.000 description 1
- 244000153665 Ficus glomerata Species 0.000 description 1
- 235000012571 Ficus glomerata Nutrition 0.000 description 1
- 241001344131 Magnaporthe grisea Species 0.000 description 1
- 108091030146 MiRBase Proteins 0.000 description 1
- 206010028980 Neoplasm Diseases 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000031018 biological processes and functions Effects 0.000 description 1
- 239000000090 biomarker Substances 0.000 description 1
- 229930189065 blasticidin Natural products 0.000 description 1
- 230000007248 cellular mechanism Effects 0.000 description 1
- 235000013339 cereals Nutrition 0.000 description 1
- 238000007621 cluster analysis Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000000205 computational method Methods 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 239000000417 fungicide Substances 0.000 description 1
- 238000003064 k means clustering Methods 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000037353 metabolic pathway Effects 0.000 description 1
- 238000002705 metabolomic analysis Methods 0.000 description 1
- 230000001431 metabolomic effect Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000003068 pathway analysis Methods 0.000 description 1
- 230000035790 physiological processes and functions Effects 0.000 description 1
- 238000004393 prognosis Methods 0.000 description 1
- 238000013138 pruning Methods 0.000 description 1
- 238000010223 real-time analysis Methods 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B25/00—ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
Landscapes
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Engineering & Computer Science (AREA)
- Medical Informatics (AREA)
- Biophysics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Theoretical Computer Science (AREA)
- Spectroscopy & Molecular Physics (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Biology (AREA)
- Biotechnology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Epidemiology (AREA)
- Software Systems (AREA)
- Public Health (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Bioethics (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Genetics & Genomics (AREA)
- Molecular Biology (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The invention discloses a rice blast fungus infected rice key sRNA identification system and an identification method, wherein the identification system comprises the following components: an input unit for inputting data of Pyricularia oryzae and multiple groups of rice; a microprocessor connected to the input unit; a storage unit connected to the microprocessor; the processing unit is connected with the microprocessor and is used for processing data and obtaining a recognition result; wherein the processing unit comprises: a preprocessing unit that acquires the plurality of sets of chemical data from the storage unit and performs preprocessing; the network construction unit is used for acquiring the preprocessed multiple groups of chemical data and processing to obtain a multiple groups of chemical layered heterogeneous interaction network of the rice blast fungus and the rice; a pathogenic factor mining unit that inputs the multiple sets of the chemical layered heterogeneous interaction network, outputting a pathogenic sRNA regulatory network; and a key sRNA recognition unit which receives the pathogenic sRNA regulatory network and performs the identification of the key sRNA of the Pyricularia oryzae to obtain a recognition result.
Description
Technical Field
The invention belongs to the technical field of bioinformatics, and particularly relates to a rice blast fungus infection rice key sRNA identification system and a identification method.
Background
Rice blast caused by Pyricularia oryzae constitutes a great hazard for the production of rice and other cereal crops worldwide. Although Pyricularia oryzae is a model fungus for the study of plant fungal diseases, current studies have shown that it is unstable in the performance of long-term control of Pyricularia oryzae in fields using rice fungicides or selecting disease resistant rice varieties. In order to find a durable and effective control method for rice blast, the key point is to reveal a cellular mechanism of rice blast bacteria infecting rice.
With the advent of large amounts of transcriptomics, genomics, proteomics, and metabolomics data related to fungal-plant interactions, researchers began to develop key factors in biological processes based on the genomic data, using biological computational methods to aid and guide biological experiments to reveal biomolecular interactions. The application of the current multi-group data integration method has certain effects on the aspects of prognosis analysis, classification and the like of complex diseases such as cancers, but is still in a starting stage in the research of a fungus-plant interaction mechanism.
Deep learning today has very effective results on predictive problems in a wide variety of areas. The deep learning method can extract effective and implicit features in large-scale data by constructing a multi-group hierarchical heterogeneous network, and an effective prediction model is constructed by utilizing the features. At present, the deep learning method has a great breakthrough in bioinformatics. Therefore, the application of deep learning methods to identify key pathogenic sRNA based on multiple sets of chemical data is a new area of research.
At present, the method for identifying the key sRNA of the rice blast fungus has the following problems:
(1) Although the rice blast bacteria and rice histology data are increasingly abundant, how to integrate and analyze the data platform resources of each group is still difficult;
(2) How the pathogenic sRNA is involved in the fungal-plant interaction process, the regulatory relationship with each set of pathogenic markers is not known.
Disclosure of Invention
The invention aims to provide a rice key sRNA identification system for rice blast infection, which can accurately identify rice blast and rice multiunit data and identify key sRNA related to rice blast pathogenicity.
The invention also provides a method for identifying the key sRNA of rice infected by the rice blast fungus, which is used for excavating the key sRNA of cross species from a statistical angle by combining a pathogenic factor association network, and can be used for identifying the key pathogenic sRNA of the rice blast fungus more accurately.
The technical scheme provided by the invention is as follows:
A rice blast fungus infection rice key sRNA identification system and identification method comprises the following steps:
an input unit for inputting data of Pyricularia oryzae and multiple groups of rice;
a microprocessor connected to the input unit;
a storage unit connected to the microprocessor;
The processing unit is connected with the microprocessor and is used for processing data and obtaining a recognition result;
wherein the processing unit comprises:
a preprocessing unit that acquires the plurality of sets of chemical data from the storage unit and performs preprocessing;
The network construction unit is used for acquiring the preprocessed multiple groups of chemical data and processing to obtain a multiple groups of chemical layered heterogeneous interaction network of the rice blast fungus and the rice;
A pathogenic factor mining unit which inputs the multiple groups of chemical layered heterogeneous interaction networks and outputs a pathogenic sRNA regulation module; and
And the key sRNA recognition unit is used for receiving the pathogenic sRNA regulation module and recognizing the key sRNA of the rice blast fungus to obtain a recognition result.
Preferably, the rice blast fungus infects the key sRNA recognition system of rice, further comprises:
The first interface unit is connected with the input unit and comprises a first USB interface, a first JTAG debugging interface, a first Ethernet interface and a first RS-232 interface; and
The second interface unit is connected with the preprocessor and comprises a second USB interface, a second JTAG debugging interface, a second Ethernet interface and a second RS-232 interface;
The second JTAG debugging interface is connected with the first JTAG debugging interface, the second Ethernet interface is connected with the first Ethernet interface, and the second RS-232 interface is connected with the first RS-232 interface.
A rice blast fungus infection rice key sRNA identification method comprises the following steps:
Step one, collecting multiple groups of data of rice blast bacteria and rice;
Wherein the plurality of sets of mathematical data comprises: genomic data, transcriptomic data, proteomic data, and metabonomic data;
Step two, constructing a multi-group chemical layered heterogeneous interaction network of the rice blast fungus and the rice by utilizing the multi-group chemical data;
Step three, performing differential expression data analysis on the rice blast sRNA and the rice mRNA on the rice blast and rice multi-group academic layered heterogeneous interaction network to obtain a cross-species pathogenic sRNA regulation module in the rice infection process of the rice blast;
and step four, identifying key sRNA related to the pathogenic of the rice blast fungus through the pathogenic sRNA regulation module.
Preferably, in the second step, a multi-group chemical layered heterogeneous interaction network of rice blast bacteria and rice is constructed, comprising the following steps:
Step 1, establishing a rice blast fungus gene and rice gene interaction regulation network in genomics; establishing a rice blast sRNA and rice mRNA interaction network in transcriptome; in proteomics, establishing a data interaction network of rice blast fungus protein and rice proteome; in metabonomics, establishing a metabonomics relationship network of rice and rice blast fungus;
Step 2, adopting a clustering dimension reduction algorithm based on multiple sets of chemical data to longitudinally integrate the rice blast fungus gene and rice gene interaction regulation network, the rice blast fungus sRNA and rice mRNA interaction network, the rice blast fungus protein and rice proteome data interaction network and the rice blast fungus and rice blast fungus metabonomics relation network to establish a primary rice blast fungus-rice layered heterogeneous interaction network;
step 3, longitudinally integrating the rice blast fungus gene and rice gene interaction regulation network, the rice blast fungus sRNA and rice mRNA interaction network, the rice blast fungus protein and rice proteome data interaction network and the rice blast fungus metabonomics relation network, establishing a primary rice blast fungus-rice layered heterogeneous interaction network, and respectively expanding through an upstream and downstream histology database to obtain a corresponding regulation network as a demonstration network; and optimizing the primary rice blast fungus-rice layered heterogeneous interaction network, perfecting a rice blast fungus-rice interaction knowledge database, and obtaining the rice blast fungus-rice multi-group academic layered heterogeneous interaction network.
Preferably, in the step 1, establishing a data interaction network of the rice blast protein and the rice proteome comprises:
Obtaining a protein with differential expression according to the proteomics data, inputting the protein with differential expression into a STRING database, and establishing a rice PPI network construction and a rice blast fungus PPI network construction;
Completing node selection and network visualization by using a support vector machine method based on a graph and giving label attributes to the nodes;
And screening positive tag nodes in the network and visualizing the proteomics data network to obtain the rice blast fungus-rice proteomics data interaction network.
Preferably, in the third step, a cross-species pathogenic sRNA regulation module in the rice infection process of the rice blast is obtained by establishing a cross-species pathogenic sRNA regulation network in the rice infection process of the rice blast;
wherein, establish the cross species pathogenic sRNA regulation network in the rice infection process of rice blast fungus, comprising the following steps:
step a, constructing a primary pathogenic control network by adopting an OP-Cluster algorithm;
And b, acquiring integrated data in a multi-group chemical layering heterogeneous network, screening pathogenic related multi-group chemical data, inputting the data into iNMF multi-group chemical combination analysis, and trimming the primary pathogenic regulation network to obtain a cross-species pathogenic sRNA regulation network in the rice infection process of the rice blast.
The beneficial effects of the invention are as follows:
(1) The identification system for the key sRNA of rice infected by the rice blast fungus can accurately identify the rice blast fungus and the multiple groups of data of the rice blast fungus and identify the key sRNA related to the pathogenicity of the rice blast fungus.
(2) According to the identification method for the key sRNA of rice infected by the rice blast fungus, provided by the invention, the key sRNA of cross species is excavated from a statistical angle by combining a pathogenic factor association network, so that the key pathogenic sRNA of the rice blast fungus can be identified more accurately.
Drawings
FIG. 1 is a schematic structural diagram of a rice key sRNA recognition system infected by Pyricularia oryzae of the present invention.
FIG. 2 is a schematic diagram of a rice blast infection key sRNA identifier according to the present invention.
FIG. 3 is a schematic circuit diagram of a rice key sRNA identifier infected by Pyricularia oryzae according to the invention.
FIG. 4 is a flow chart of a method for identifying key sRNA of rice infected by Pyricularia oryzae according to the invention.
Fig. 5 is a flow chart of the transverse construction of a multi-component heterogeneous network according to the present invention.
FIG. 6 is a flowchart of a local bi-directional clustering algorithm OP-Cluster and global multi-set chemical combination analysis algorithm iNMF according to the present invention.
Detailed Description
The present invention is described in further detail below with reference to the drawings to enable those skilled in the art to practice the invention by referring to the description.
As shown in fig. 1, the invention provides a key sRNA identification system for rice infection caused by rice blast fungus based on multi-group data integration, wherein an upper computer 01 is composed of an input unit 0111, a USB interface 0121 of a first interface unit 012, a JTAG debugging interface 0122, an ethernet interface 0123, an RS-232 serial port 0124, and a display unit 0131 to complete the coordination work with an ARM9 microprocessor 023. The input unit 0111 is connected with the first interface unit 012 and is responsible for finishing the input of the rice blast fungus and the rice multiunit data; the first interface unit 012 is responsible for connection communication with the ARM9 microprocessor 023; the display unit 0131 is connected with the first interface unit 012 and is responsible for finishing the output display of the identification result of the key sRNA of the rice blast fungus.
The rice blast fungus infection rice key sRNA identifier consists of an ARM9 microprocessor 023, a second interface unit 021, a storage unit 022 and a processing unit 024; the second interface unit 021 comprises a USB interface 0211, a JTAG debug interface 0212, an Ethernet interface 0213 and an RS-232 serial port 0214; the USB interface 0211 can be connected with a USB flash disk to realize the transfer of result data obtained by identifying the key sRNA of the rice blast fungus, thereby realizing the amplification of the storage unit; the JTAG debug interface 0212 is connected with the upper computer JTAG interface 0122 through JTAG emulation (programmer) conversion equipment and is used for realizing online debugging of the program; the Ethernet interface 0213 is connected with the Ethernet interface 0123 of the upper computer 01, so that the intercommunication between the ARM9 microprocessor 023 and the upper computer 01 is realized; the RS-232 serial port 0214 is connected with the RS-232 serial port 0124 of the upper computer 01, so that the intercommunication between the ARM9 microprocessor 023 and the upper computer 01 is realized.
The storage unit 022 includes a storage unit 0221, a cache unit 0222, and an external storage unit 0223; wherein, the storage unit 0221 is connected with the cache unit 0222 and is responsible for finishing the storage of the rice blast fungus and the rice multiunit data; the buffer unit 0222 is connected with the storage unit 0221, the preprocessing unit 0241 and the network construction unit 0242 simultaneously, and is responsible for completing the storage of intermediate data for identifying the key sRNA of the Pyricularia oryzae; the external storage unit 0223 is simultaneously connected with the key sRNA identification unit 0244 and the RS-232 serial port 0214, and is responsible for finishing the storage of the result data of the identification of the key sRNA of the Magnaporthe grisea and transmitting the result data back to the display unit of the upper computer through the RS-232 serial port for output display.
The processing unit 024 comprises a preprocessing unit 0241, a network construction unit 0242, a pathogenic factor mining unit 0243 and a key sRNA identification unit 0244; the network construction unit encapsulates a clustering dimension reduction algorithm (Multi-omics Cluster-DR), the pathogenic factor mining unit encapsulates a Multi-group data comprehensive clustering (MODSC) algorithm, the key sRNA identification unit can identify key pathogenic sRNA of the rice blast in a pathogenic factor network, and the identified key pathogenic sRNA is clustered by using a Multi-rank sum test Method (MRST) of two species samples, so that support is provided for the subsequent research of rice blast-rice interaction.
In this embodiment, a general PC computer is used as the upper computer 01, and the upper computer can connect with the generating device of the rice blast fungus infection rice key sRNA identifier based on multi-group data integration based on the microprocessor of 32-bit ARM920T core produced by samsung company through the RS-232 serial port, and together act to complete the task of rice blast fungus key sRNA identification.
The input unit 0111 and the display unit 0131 of the upper computer 01 both adopt the input and output equipment of the PC computer to realize the functions.
The intercommunication between the upper computer 01 and the ARM9 microprocessor 023 is realized through the Ethernet interface 0123 of the upper computer 01 and the Ethernet interface 0213 of the ARM9 microprocessor 023, and the Ethernet interface adopts a single fast Ethernet controller chip which is completely integrated by the DM9000 and has lower cost.
Meanwhile, a JTAG debugging interface 0122 of the upper computer 01 and a JTAG debugging interface 0212 of the ARM9 microprocessor 023 are added, and the interfaces are connected through JTAG simulation (programmer) conversion equipment, so that the upper computer 01 can realize real-time analysis and execution monitoring of programs on the ARM9 microprocessor 023.
The USB interface is a USB3.0 interface, so that identification result data of the key sRNA of the Pyricularia oryzae can be transferred to the USB flash disk through a USB interface 0121 of the upper computer 01 or a USB interface 0211 of the ARM9 microprocessor 023 for realizing the amplification of the storage unit.
The ARM9 microprocessor 023 system program storage unit 022 selects HY57V561620CT SDRAM of 32M Hynix company as the storage unit 0221, K9F1208UOM NAND FLASH of 64M SAMSUNG company as the cache unit 0222, and a hard disk of 100G as the expansion external storage unit 0223.
Each unit included in the processing unit 024 of the ARM9 microprocessor 023 is a deep learning algorithm of the key sRNA identification of the blasticidin packaged on the ARM9 microprocessor, and a 32-bit arithmetic unit is used in the arithmetic.
As shown in FIG. 2, a schematic diagram of a rice blast infection key sRNA identifier model is shown, and the connection relationship is as follows: the data input ports Vin of the USB interface 0211, JTAG debug interface 0212, ethernet interface 0213 and RS-232 serial interface 0214 are respectively connected with the data output pins Vout1[0..7] of the ARM9 microprocessor 023, and GND thereof is respectively connected with the GND of the ARM9 microprocessor 023.
The data input port Vin of the memory unit 0221 is connected to the data output pin Vout1[0..7] of the ARM9 microprocessor 023, the data output port Vout thereof is connected to the data input port Vin of the buffer unit 0222, and GND thereof is connected to GND of the ARM9 microprocessor 023. The data input port Vin of the buffer unit 0222 is connected with the data output port Vout of the memory unit 0221, the data output port Vout thereof is connected with the data input port Vin of the preprocessing unit, the data input port Vin of the network construction unit and the data input pin Vin1[0..7] of the ARM9 microprocessor 023, and the GND thereof is connected with the GND of the ARM9 microprocessor 023. The data input port Vin of the external memory unit 0223 is connected with the data output port Vout of the key sRNA recognition unit 0244, the data output port Vout is connected with the data input pin Vin1[0..7] of the ARM9 microprocessor 023, and GND is connected with GND of the ARM9 microprocessor 023.
The data input port Vin of the preprocessing unit 0241 is connected to the data output port Vout of the buffer unit 0223, the data output port Vout is connected to the data input port Vin of the ARM9 microprocessor 0237ini1 [0..7] and the network construction unit 0242, respectively, and GND is connected to GND of the ARM9 microprocessor 023. The data input port Vin of the network construction unit 0242 is connected with the data output port Vout of the preprocessing unit 0241 and the data output port Vout of the buffer unit, the data output port Vout is connected with the data input port Vin of the pathogenic factor mining unit 0243, and GND is connected with GND of the ARM9 microprocessor 023. The data input port Vin of the pathogenic factor mining unit 0243 is connected with the data output port Vout of the network construction unit 0242, the data output port Vout is connected with the data input port Vin of the key sRNA identification unit, and GND is connected with the GND of the ARM9 microprocessor 023. The data input port Vin of the key sRNA recognition unit 0244 is connected with the output port Vout of the pathogenic factor mining unit 0243, the data output port Vout is connected with the data input port Vin of the external storage unit 0223, and GND is connected with the GND of the ARM9 microprocessor 023.
As shown in FIG. 3, the circuit diagram connection relationship of the rice blast infection rice key sRNA identifier is as follows:
1) The F line is an address bus and is respectively connected with pins A0 to A17 of the ARM9 microprocessor, the external storage unit, the cache unit and the memory unit and is responsible for transmitting address information;
2) The line A is a data bus and is respectively connected with D0-D15 pins of an ARM9 microprocessor, a preprocessing unit, a network construction unit, a pathogenic factor mining unit, a key sRNA identification unit, an internal storage unit, a cache unit and an external storage unit, and is responsible for transmitting various data information;
3) The B line is a control bus and is respectively connected with OE, WR, CS3, CS1, HBE and LBE pins of the ARM9 microprocessor, the external storage unit, the cache unit, the internal storage unit, the preprocessing unit, the network construction unit, the pathogenic factor mining unit and the key sRNA identification unit, and is responsible for transmitting control signals and time sequence signals;
4) The C line is a power supply positive line and is respectively connected with the ARM9 microprocessor, the external storage unit, the cache unit, the internal storage unit, the preprocessing unit, the network construction unit, the pathogenic factor mining unit, the key sRNA identification unit, the USB interface, the JTAG interface, the TRS232 interface and the DS Ethernet interface;
5) The D line is a power supply negative electrode line and is respectively connected with the ARM9 microprocessor, the external storage unit, the cache unit, the internal storage unit, the preprocessing unit, the network construction unit, the pathogenic factor mining unit, the key sRNA identification unit, the USB interface, the JTAG interface, the TRS232 interface and the DS Ethernet interface;
6) The E line is a general I/O line and is respectively connected with the ARM9 microprocessor, the USB interface, the JTAG interface and the TRS232 interface to be responsible for data transmission of input and output equipment;
7) The G line is a network interface line, is respectively connected with the ARM9 microprocessor and the DS Ethernet interface and is responsible for network transmission.
As shown in FIG. 4, the invention also provides a method for identifying key sRNA of rice infected by Pyricularia oryzae based on multi-group data integration, which comprises the following steps:
(1) The method comprises the steps of inputting the multiple groups of data of the rice blast fungus and the rice by an input unit 0111 of an upper computer, transmitting the data to a memory unit 0221 of a key sRNA identifier of the rice infected by the rice blast fungus based on the integration of the multiple groups of data by an RS-232 serial port, and further reading the data into a cache unit 0222;
(2) The preprocessing unit 0241 reads the data of the rice blast fungus and the rice multiunit from the caching unit 0222, performs preprocessing, and outputs the result to the network construction unit 0242;
(3) The network construction unit 0242 reads out the preprocessed multiple groups of chemical data from the preprocessing unit 0241, and the preprocessed multiple groups of chemical layered heterogeneous interaction networks are obtained after processing;
repeating the operations (2) - (3) until all the multiple sets of data are processed.
The process of establishing a multi-component hierarchical heterogeneous interaction network includes: establishing a multi-component hierarchical heterogeneous network structure and longitudinally constructing the multi-component hierarchical heterogeneous network.
As shown in fig. 5, the multi-component heterogeneous transverse network construction process is as follows:
first, data of each group of rice before and after infection of rice with Pyricularia oryzae are collected from a plurality of databases such as NCBI, miRBase, genebank, eRice and the like and stored in a local database.
Then establishing a transverse relation network in each group respectively:
① In genome, gene expression, copy number variation and DNA methylation data before and after infection obtained from genomic data in a local database are respectively represented by adjacent matrixes X, D and C; the genome network visualization is completed by using a Gene network construction method (Gene-relational Data Network) based on relational data to establish a Gene-Gene interaction regulation network.
② In the transcriptome, differentially expressed sRNA is obtained from transcriptomic data in a local database; and predicting target genes of sRNA in starBase databases and selecting core targets to finally obtain the sRNA-mRNA interaction network.
③ In the proteome, protein with differential expression is obtained from proteomics data in a local database, the differential expression protein is input into a STRING database, and a network construction tool is utilized to carry out PPI network construction of rice and PPI network construction of rice blast fungus; and finally, completing node selection and network visualization by using a graph-based Support Vector Machine (SVM) method.
Screening positive tag nodes in the network, visualizing a proteomics data network, and establishing a rice blast fungus-rice proteomics data interaction network.
④ In the metabolome, the metabolome data in the local database is used for obtaining the metabolome data of differential expression, the dimension is reduced by PCA, and then the metabolome data of rice and rice blast fungus are respectively subjected to KEGG metabolic pathway analysis, so as to establish a metabolome relation network.
Then, a Multi-group chemical layered heterogeneous network is longitudinally constructed, and a Multi-group chemical rice blast fungus-rice layered heterogeneous interaction network is longitudinally established for each group of layered network by adopting a clustering dimension reduction algorithm (Multi-omics Cluster-DR) based on Multi-group chemical data. Adopting PCA dimension reduction algorithm to reduce dimension of a gene-gene interaction regulation network, a sRNA-mRNA interaction network, a rice blast fungus-rice proteome data interaction network and a rice blast fungus-rice proteome data interaction network respectively, and marking the result as a DR method set; then, decomposing the data matrix by using a tensor model CP and expressing the data matrix as a factor matrix, then, deleting outliers in the sample matrix by using a judgment outlier algorithm such as absolute median deviation (Median Absolute Deviation, MAD) and selecting a DR model of the first 5%, and applying a clustering algorithm to the DR algorithm; and then, carrying out quality evaluation through a local continuous meta-rule LCMC, and selecting the result with the highest evaluation value as output. And finally, visualizing a plurality of histology relation networks to establish a rice blast fungus-rice multi-set chemical layering heterogeneous network. The specific process is as follows:
S1, in DR method, decomposing tensor sample×sample×DR method into three factor matrices with sizes of n×p, n×p and m×p by using a CP model, wherein the first two matrices represent decomposition of sample modes, and the third matrix represents DR method model;
S2, performing outlier judgment and deletion by using a rejection standard of an absolute median deviation (MAD) method, wherein the rejection standard is as follows:
S3, selecting a DR method model with the front 5% of outliers, and applying a k clustering method to the DR method set to obtain a DR method group based on k clustering;
And S4, performing quality evaluation through a local continuous element criterion LCMC (QL), and selecting the highest value as output. And visualizing a plurality of histology relation structures to obtain the rice blast fungus-rice multi-set chemical layering heterogeneous interaction network.
(4) In the pathogenicity factor mining unit, a plurality of groups of academic layered heterogeneous interaction networks are used as the input of a model, the nodes of the Pyricularia oryzae sRNA in the network are subjected to hierarchical clustering to obtain a hierarchical clustering tree, and if some sRNAs always have similar regulation and control effects in a physiological process or different tissues, the sRNAs are reasonably considered to be functionally related, and can be defined as a module. Different branches of the cluster tree represent different sRNA modules. The sRNAs are classified according to the regulation mode, and sRNAs with similar modes are classified into one module. The output is a pathogenic sRNA regulation module, and then the pathogenic sRNA regulation module is transmitted to a key sRNA recognition unit;
(5) The key sRNA recognition unit receives the pathogenic sRNA regulation module, performs rice blast fungus key sRNA recognition, and finally obtains an accurate recognition result;
The pathogenic factor mining unit is used for sequentially reading a plurality of groups of academic layered heterogeneous interaction network data, taking the data as the input of a model, and transmitting all the output pathogenic sRNA regulation modules to the key sRNA identification unit; all the data are processed to obtain identification results, the identification results are stored in an external storage unit, and the identification results reach a display unit through an RS-232 serial port to be output and displayed.
In the embodiment, a multi-group data comprehensive clustering (MODSC) algorithm is adopted to establish a cross-species pathogenic sRNA regulation network in the rice infection process of the rice blast fungus, the correlation among pathogenic related group factors is revealed, and a cross-species pathogenic sRNA regulation module in the rice infection process of the rice blast fungus is excavated; and recognizing key sRNA related to the pathogenic of the rice blast fungus through the pathogenic sRNA regulation module. As shown in fig. 6, MODSC algorithm is an algorithm combining a local bi-directional clustering algorithm OP-Cluster with a global multi-set chemical combination analysis algorithm iNMF.
Firstly, constructing a primary pathogenic regulation network, acquiring a rice blast fungus sRNA-rice mRNA data matrix from the constructed multi-layered heterogeneous network, repeatedly performing OP-Cluster double clustering analysis and excavating pathogenic factors, and completing the construction of the primary pathogenic regulation network. The OP-Cluster double-clustering analysis algorithm is a double-clustering algorithm based on sequence comparison, and aims to find a double-clustering module with consistent relative change trend. To pre-process the data, all entry values for each row are first sorted in a non-decreasing order. Secondly, organizing each ordered row into a group sequence according to the similarity, wherein the group sequence is in the shape of ba (cd), and comprises three groups of b, a and cd, and the characteristic elements in the same group are not in sequence, wherein the column characteristics in (cd) meet the following conditions: and c-d is less than or equal to 0.1 Xmin (c, d).
The OP-Cluster algorithm uses a compact Tree structure OPC-Tree to store key information for mining OP-clusters, while discovering frequent subsequences and binding rows associated with frequent subsequences, where sequences sharing the same prefix are collected and recorded in the same location. Therefore, for all rows of the shared prefix, further operations along the shared prefix will be performed only once, and the pruning technique can be easily applied to the OPC Tree structure, and finally, all the double clustering results can be obtained only by performing DFS traversal on the OPC-Tree once.
And then a complete pathogenic regulation network is constructed, the internal relation of biomarkers among a plurality of groups is obtained in a multi-group hierarchical heterogeneous network, the structural relation among nodes is completely reserved, the multi-group pathogenic data related to the pathogenicity with the difference significance (p-value) less than 0.05 is selected through rice mRNA differential expression and rice blast fungus sRNA change and is input into iNMF multi-group chemical combination analysis, and the primary pathogenic regulation network constructed by the OP-Cluster is trimmed, so that the construction of the complete pathogenic regulation network is completed. The iNMF multiple sets of chemical joint analysis methods were as follows, iNMF joint fitting K gaussian joint latent variable models, capturing shared and data specific structures in K data sets X k (N is the number of samples) of dimension P k X N. Firstly, the potential variables are estimated by replacing orthogonal constraint with non-negative constraint, secondly, the sparse matrix H k is shared between the V k base matrix specific to data and the commonly shared W base matrix, and finally, k-means clustering is carried out according to posterior expectation value of the potential factor E (W|X) to determine cluster allocation. Wherein the coefficient matrix and the base matrix are corresponding matrices of the loading and latent variable matrices. And through iNMF cluster analysis, the construction of a pathogenic factor regulation network is further perfected. Wherein iNMF uses the European loss function optimization problem, and the model formula is as follows:
Wherein W is more than or equal to 0, H k is more than or equal to 0, k=1, … and K.
The process for identifying the key sRNA based on the pathogenic factor network and MODSC method comprises the following steps:
Selecting the first 5% sRNA regulation module most relevant to rice blast expression in a pathogenicity factor network, and selecting sRNA nodes with differential expression p-value less than 0.05 from the selected sRNA regulation modules as key sRNAs relevant to rice blast pathogenicity.
If further research with the identified key sRNAs is required, the selected key pathogenic sRNAs can be clustered by using a multivariate rank sum test Method (MRST) of two species samples.
The multivariate rank sum test method of the two species samples is used for analyzing the high-throughput data difference of the Pyricularia oryzae sRNA. First, only sRNA common to both sets of expression data is selected to be dimensionally uniform, and then normalization of the data within the sets is performed to be quantitatively uniform, whereby normalized X 1,X2,…,XN can be obtained, where X i represents the observed value of the i-th sample. For X i, the calculated spatial rank formula is:
Ri=mj[S(Ax(Xi-Xj))],i,j=1,2,...,N
Wherein mz represents the average of all samples z, m j z represents the j-th dimension of the average above, S (z) represents the sign of z (+or-), such that A x satisfies Where λ is the adjustment coefficient.
Selecting two sample space rank statistics as follows: Wherein/> S=1, 2, n s is the weight, which is the average vector of the spatial center rank.
Calculating key sRNAs of the rice blast fungi selected from the pathogenic factor network, and clustering the key sRNAs with similar functions.
Although embodiments of the present invention have been disclosed above, it is not limited to the details and embodiments shown and described, it is well suited to various fields of use for which the invention would be readily apparent to those skilled in the art, and accordingly, the invention is not limited to the specific details and illustrations shown and described herein, without departing from the general concepts defined in the claims and their equivalents.
Claims (5)
1. A rice blast infection rice key sRNA recognition system, comprising:
an input unit for inputting data of Pyricularia oryzae and multiple groups of rice;
Wherein the plurality of sets of mathematical data comprises: genomic data, transcriptomic data, proteomic data, and metabonomic data;
a microprocessor connected to the input unit;
a storage unit connected to the microprocessor;
The processing unit is connected with the microprocessor and is used for processing data and obtaining a recognition result;
wherein the processing unit comprises:
a preprocessing unit that acquires the plurality of sets of chemical data from the storage unit and performs preprocessing;
The network construction unit is used for acquiring the preprocessed multiple groups of chemical data and processing to obtain a multiple groups of chemical layered heterogeneous interaction network of the rice blast fungus and the rice;
the method for constructing the multi-component heterogeneous interaction network of the rice blast fungus and the rice comprises the following steps:
Step 1, establishing a rice blast fungus gene and rice gene interaction regulation network in genomics; establishing a rice blast sRNA and rice mRNA interaction network in transcriptome; in proteomics, establishing a data interaction network of rice blast fungus protein and rice proteome; in metabonomics, establishing a metabonomics relationship network of rice and rice blast fungus;
Step 2, adopting a clustering dimension reduction algorithm based on multiple sets of chemical data to longitudinally integrate the rice blast fungus gene and rice gene interaction regulation network, the rice blast fungus sRNA and rice mRNA interaction network, the rice blast fungus protein and rice proteome data interaction network and the rice blast fungus and rice blast fungus metabonomics relation network to establish a primary rice blast fungus-rice layered heterogeneous interaction network;
Step 3, expanding the established primary rice blast fungus-rice layered heterogeneous interaction network through an upstream and downstream group study database respectively to obtain a corresponding regulation network as a demonstration network; optimizing the primary rice blast fungus-rice layered heterogeneous interaction network, perfecting a rice blast fungus-rice interaction knowledge database, and obtaining a rice blast fungus-rice multi-group academic layered heterogeneous interaction network;
A pathogenic factor mining unit which inputs the multiple groups of chemical layered heterogeneous interaction networks and outputs a pathogenic sRNA regulation module; and
And the key sRNA recognition unit is used for receiving the pathogenic sRNA regulation module and recognizing the key sRNA of the rice blast fungus to obtain a recognition result.
2. The rice blast infection key sRNA identification system of claim 1, further comprising:
The first interface unit is connected with the input unit and comprises a first USB interface, a first JTAG debugging interface, a first Ethernet interface and a first RS-232 interface; and
The second interface unit is connected with the preprocessing unit and comprises a second USB interface, a second JTAG debugging interface, a second Ethernet interface and a second RS-232 interface;
The second JTAG debugging interface is connected with the first JTAG debugging interface, the second Ethernet interface is connected with the first Ethernet interface, and the second RS-232 interface is connected with the first RS-232 interface.
3. A method for identifying key sRNA of rice infected with rice blast fungus, characterized in that the key sRNA identification system of rice infected with rice blast fungus according to claim 1 or 2 is used, comprising the following steps:
Step one, collecting multiple groups of data of rice blast bacteria and rice;
Wherein the plurality of sets of mathematical data comprises: genomic data, transcriptomic data, proteomic data, and metabonomic data;
Step two, constructing a multi-group chemical layered heterogeneous interaction network of the rice blast fungus and the rice by utilizing the multi-group chemical data;
Step three, performing differential expression data analysis on the rice blast sRNA and the rice mRNA on the rice blast and rice multi-group academic layered heterogeneous interaction network to obtain a cross-species pathogenic sRNA regulation module in the rice infection process of the rice blast;
and step four, identifying key sRNA related to the pathogenic of the rice blast fungus through the pathogenic sRNA regulation module.
4. A method for identifying key sRNA of rice infected with rice blast fungus according to claim 3, wherein in said step 1, establishing a data interaction network of rice blast fungus protein and rice protein group comprises:
Obtaining a protein with differential expression according to the proteomics data, inputting the protein with differential expression into a STRING database, and establishing a rice PPI network construction and a rice blast fungus PPI network construction;
Completing node selection and network visualization by using a support vector machine method based on a graph and giving label attributes to the nodes;
And screening positive tag nodes in the network and visualizing the proteomics data network to obtain the rice blast fungus-rice proteomics data interaction network.
5. The method for identifying key sRNA of rice infected with rice blast fungus according to claim 4, wherein in the third step, a cross-species pathogenic sRNA regulation module in rice infected with rice blast fungus is obtained by establishing a cross-species pathogenic sRNA regulation network in rice infected with rice blast fungus;
wherein, establish the cross species pathogenic sRNA regulation network in the rice infection process of rice blast fungus, comprising the following steps:
step a, constructing a primary pathogenic control network by adopting an OP-Cluster algorithm;
And b, acquiring integrated data in a multi-group chemical layering heterogeneous network, screening pathogenic related multi-group chemical data, inputting the data into iNMF multi-group chemical combination analysis, and trimming the primary pathogenic regulation network to obtain a cross-species pathogenic sRNA regulation network in the rice infection process of the rice blast.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111047153.8A CN113870950B (en) | 2021-09-07 | 2021-09-07 | Identification system and identification method for key sRNA of rice infected by Pyricularia oryzae |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111047153.8A CN113870950B (en) | 2021-09-07 | 2021-09-07 | Identification system and identification method for key sRNA of rice infected by Pyricularia oryzae |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113870950A CN113870950A (en) | 2021-12-31 |
CN113870950B true CN113870950B (en) | 2024-05-17 |
Family
ID=78994785
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111047153.8A Active CN113870950B (en) | 2021-09-07 | 2021-09-07 | Identification system and identification method for key sRNA of rice infected by Pyricularia oryzae |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113870950B (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108292326A (en) * | 2015-08-27 | 2018-07-17 | 皇家飞利浦有限公司 | Carry out the integration method and system that the patient-specific body cell of identification function distorts for using multigroup cancer to compose |
CN109033748A (en) * | 2018-08-14 | 2018-12-18 | 齐齐哈尔大学 | A kind of miRNA identification of function method based on multiple groups |
CN110428866A (en) * | 2019-07-23 | 2019-11-08 | 哈尔滨工业大学 | Cancer related pathways recognition methods based on network integration multiple groups data |
-
2021
- 2021-09-07 CN CN202111047153.8A patent/CN113870950B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108292326A (en) * | 2015-08-27 | 2018-07-17 | 皇家飞利浦有限公司 | Carry out the integration method and system that the patient-specific body cell of identification function distorts for using multigroup cancer to compose |
CN109033748A (en) * | 2018-08-14 | 2018-12-18 | 齐齐哈尔大学 | A kind of miRNA identification of function method based on multiple groups |
CN110428866A (en) * | 2019-07-23 | 2019-11-08 | 哈尔滨工业大学 | Cancer related pathways recognition methods based on network integration multiple groups data |
Non-Patent Citations (1)
Title |
---|
蛋白质相互作用网络研究的引文分析;张浩;侯跃芳;张婕;;中华医学图书情报杂志;20120415(第04期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN113870950A (en) | 2021-12-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109461025B (en) | Electric energy substitution potential customer prediction method based on machine learning | |
US10275711B2 (en) | System and method for scientific information knowledge management | |
CN108198621B (en) | Database data comprehensive diagnosis and treatment decision method based on neural network | |
CN110957002B (en) | Drug target interaction relation prediction method based on synergistic matrix decomposition | |
US20160232224A1 (en) | Categorization and filtering of scientific data | |
CN107862173A (en) | A kind of lead compound virtual screening method and device | |
CN106815492A (en) | A kind of bacterial community composition and the automatic mode of diversity analysis for 16SrRNA genes | |
US20090299646A1 (en) | System and method for biological pathway perturbation analysis | |
CN110442523B (en) | Cross-project software defect prediction method | |
CN108206056B (en) | Nasopharyngeal darcinoma artificial intelligence assists diagnosis and treatment decision-making terminal | |
CN108335756B (en) | Nasopharyngeal carcinoma database and comprehensive diagnosis and treatment decision method based on database | |
CN110021340A (en) | A kind of RNA secondary structure generator and its prediction technique based on convolutional neural networks and planning dynamic algorithm | |
JP7411977B2 (en) | Machine learning support method and machine learning support device | |
CN114897451B (en) | Double-layer clustering correction method and device considering key features of demand response user | |
CN111243658B (en) | Biomolecular network construction and optimization method based on deep learning | |
CN113724195B (en) | Quantitative analysis model and establishment method of protein based on immunofluorescence image | |
CN108320797B (en) | Nasopharyngeal carcinoma database and comprehensive diagnosis and treatment decision method based on database | |
CN117423391A (en) | Method, system and equipment for establishing gene regulation network database | |
CN113870950B (en) | Identification system and identification method for key sRNA of rice infected by Pyricularia oryzae | |
CN117409962A (en) | Screening method of microbial markers based on gene regulation network | |
TWI399661B (en) | A system for analyzing and screening disease related genes using microarray database | |
CN111898807A (en) | Tobacco yield prediction method based on whole genome selection and application | |
CN112466389A (en) | Method and system for obtaining tumor marker based on machine learning algorithm | |
CN111798920A (en) | Tobacco economic trait phenotypic value prediction method based on whole genome selection and application | |
AbdelAziz et al. | A hybrid multi-objective whale optimization algorithm for analyzing microarray data based on Apache Spark |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |