CN113870950B

CN113870950B - Identification system and identification method for key sRNA of rice infected by Pyricularia oryzae

Info

Publication number: CN113870950B
Application number: CN202111047153.8A
Authority: CN
Inventors: 张�浩; 赵恒毅; 刘元宁; 赵天横; 赵恩爽; 张天悦; 袁帅
Original assignee: Jilin University
Current assignee: Jilin University
Priority date: 2021-09-07
Filing date: 2021-09-07
Publication date: 2024-05-17
Anticipated expiration: 2041-09-07
Also published as: CN113870950A

Abstract

The invention discloses a rice blast fungus infected rice key sRNA identification system and an identification method, wherein the identification system comprises the following components: an input unit for inputting data of Pyricularia oryzae and multiple groups of rice; a microprocessor connected to the input unit; a storage unit connected to the microprocessor; the processing unit is connected with the microprocessor and is used for processing data and obtaining a recognition result; wherein the processing unit comprises: a preprocessing unit that acquires the plurality of sets of chemical data from the storage unit and performs preprocessing; the network construction unit is used for acquiring the preprocessed multiple groups of chemical data and processing to obtain a multiple groups of chemical layered heterogeneous interaction network of the rice blast fungus and the rice; a pathogenic factor mining unit that inputs the multiple sets of the chemical layered heterogeneous interaction network, outputting a pathogenic sRNA regulatory network; and a key sRNA recognition unit which receives the pathogenic sRNA regulatory network and performs the identification of the key sRNA of the Pyricularia oryzae to obtain a recognition result.

Description

Identification system and identification method for key sRNA of rice infected by Pyricularia oryzae

Technical Field

The invention belongs to the technical field of bioinformatics, and particularly relates to a rice blast fungus infection rice key sRNA identification system and a identification method.

Background

Rice blast caused by Pyricularia oryzae constitutes a great hazard for the production of rice and other cereal crops worldwide. Although Pyricularia oryzae is a model fungus for the study of plant fungal diseases, current studies have shown that it is unstable in the performance of long-term control of Pyricularia oryzae in fields using rice fungicides or selecting disease resistant rice varieties. In order to find a durable and effective control method for rice blast, the key point is to reveal a cellular mechanism of rice blast bacteria infecting rice.

With the advent of large amounts of transcriptomics, genomics, proteomics, and metabolomics data related to fungal-plant interactions, researchers began to develop key factors in biological processes based on the genomic data, using biological computational methods to aid and guide biological experiments to reveal biomolecular interactions. The application of the current multi-group data integration method has certain effects on the aspects of prognosis analysis, classification and the like of complex diseases such as cancers, but is still in a starting stage in the research of a fungus-plant interaction mechanism.

Deep learning today has very effective results on predictive problems in a wide variety of areas. The deep learning method can extract effective and implicit features in large-scale data by constructing a multi-group hierarchical heterogeneous network, and an effective prediction model is constructed by utilizing the features. At present, the deep learning method has a great breakthrough in bioinformatics. Therefore, the application of deep learning methods to identify key pathogenic sRNA based on multiple sets of chemical data is a new area of research.

At present, the method for identifying the key sRNA of the rice blast fungus has the following problems:

(1) Although the rice blast bacteria and rice histology data are increasingly abundant, how to integrate and analyze the data platform resources of each group is still difficult;

(2) How the pathogenic sRNA is involved in the fungal-plant interaction process, the regulatory relationship with each set of pathogenic markers is not known.

Disclosure of Invention

The invention aims to provide a rice key sRNA identification system for rice blast infection, which can accurately identify rice blast and rice multiunit data and identify key sRNA related to rice blast pathogenicity.

The invention also provides a method for identifying the key sRNA of rice infected by the rice blast fungus, which is used for excavating the key sRNA of cross species from a statistical angle by combining a pathogenic factor association network, and can be used for identifying the key pathogenic sRNA of the rice blast fungus more accurately.

The technical scheme provided by the invention is as follows:

A rice blast fungus infection rice key sRNA identification system and identification method comprises the following steps:

an input unit for inputting data of Pyricularia oryzae and multiple groups of rice;

a microprocessor connected to the input unit;

a storage unit connected to the microprocessor;

The processing unit is connected with the microprocessor and is used for processing data and obtaining a recognition result;

wherein the processing unit comprises:

a preprocessing unit that acquires the plurality of sets of chemical data from the storage unit and performs preprocessing;

The network construction unit is used for acquiring the preprocessed multiple groups of chemical data and processing to obtain a multiple groups of chemical layered heterogeneous interaction network of the rice blast fungus and the rice;

A pathogenic factor mining unit which inputs the multiple groups of chemical layered heterogeneous interaction networks and outputs a pathogenic sRNA regulation module; and

And the key sRNA recognition unit is used for receiving the pathogenic sRNA regulation module and recognizing the key sRNA of the rice blast fungus to obtain a recognition result.

Preferably, the rice blast fungus infects the key sRNA recognition system of rice, further comprises:

The first interface unit is connected with the input unit and comprises a first USB interface, a first JTAG debugging interface, a first Ethernet interface and a first RS-232 interface; and

The second interface unit is connected with the preprocessor and comprises a second USB interface, a second JTAG debugging interface, a second Ethernet interface and a second RS-232 interface;

The second JTAG debugging interface is connected with the first JTAG debugging interface, the second Ethernet interface is connected with the first Ethernet interface, and the second RS-232 interface is connected with the first RS-232 interface.

A rice blast fungus infection rice key sRNA identification method comprises the following steps:

Step one, collecting multiple groups of data of rice blast bacteria and rice;

Wherein the plurality of sets of mathematical data comprises: genomic data, transcriptomic data, proteomic data, and metabonomic data;

Step two, constructing a multi-group chemical layered heterogeneous interaction network of the rice blast fungus and the rice by utilizing the multi-group chemical data;

Step three, performing differential expression data analysis on the rice blast sRNA and the rice mRNA on the rice blast and rice multi-group academic layered heterogeneous interaction network to obtain a cross-species pathogenic sRNA regulation module in the rice infection process of the rice blast;

and step four, identifying key sRNA related to the pathogenic of the rice blast fungus through the pathogenic sRNA regulation module.

Preferably, in the second step, a multi-group chemical layered heterogeneous interaction network of rice blast bacteria and rice is constructed, comprising the following steps:

Step 1, establishing a rice blast fungus gene and rice gene interaction regulation network in genomics; establishing a rice blast sRNA and rice mRNA interaction network in transcriptome; in proteomics, establishing a data interaction network of rice blast fungus protein and rice proteome; in metabonomics, establishing a metabonomics relationship network of rice and rice blast fungus;

Step 2, adopting a clustering dimension reduction algorithm based on multiple sets of chemical data to longitudinally integrate the rice blast fungus gene and rice gene interaction regulation network, the rice blast fungus sRNA and rice mRNA interaction network, the rice blast fungus protein and rice proteome data interaction network and the rice blast fungus and rice blast fungus metabonomics relation network to establish a primary rice blast fungus-rice layered heterogeneous interaction network;

step 3, longitudinally integrating the rice blast fungus gene and rice gene interaction regulation network, the rice blast fungus sRNA and rice mRNA interaction network, the rice blast fungus protein and rice proteome data interaction network and the rice blast fungus metabonomics relation network, establishing a primary rice blast fungus-rice layered heterogeneous interaction network, and respectively expanding through an upstream and downstream histology database to obtain a corresponding regulation network as a demonstration network; and optimizing the primary rice blast fungus-rice layered heterogeneous interaction network, perfecting a rice blast fungus-rice interaction knowledge database, and obtaining the rice blast fungus-rice multi-group academic layered heterogeneous interaction network.

Preferably, in the step 1, establishing a data interaction network of the rice blast protein and the rice proteome comprises:

Obtaining a protein with differential expression according to the proteomics data, inputting the protein with differential expression into a STRING database, and establishing a rice PPI network construction and a rice blast fungus PPI network construction;

Completing node selection and network visualization by using a support vector machine method based on a graph and giving label attributes to the nodes;

And screening positive tag nodes in the network and visualizing the proteomics data network to obtain the rice blast fungus-rice proteomics data interaction network.

Preferably, in the third step, a cross-species pathogenic sRNA regulation module in the rice infection process of the rice blast is obtained by establishing a cross-species pathogenic sRNA regulation network in the rice infection process of the rice blast;

wherein, establish the cross species pathogenic sRNA regulation network in the rice infection process of rice blast fungus, comprising the following steps:

step a, constructing a primary pathogenic control network by adopting an OP-Cluster algorithm;

And b, acquiring integrated data in a multi-group chemical layering heterogeneous network, screening pathogenic related multi-group chemical data, inputting the data into iNMF multi-group chemical combination analysis, and trimming the primary pathogenic regulation network to obtain a cross-species pathogenic sRNA regulation network in the rice infection process of the rice blast.

The beneficial effects of the invention are as follows:

(1) The identification system for the key sRNA of rice infected by the rice blast fungus can accurately identify the rice blast fungus and the multiple groups of data of the rice blast fungus and identify the key sRNA related to the pathogenicity of the rice blast fungus.

(2) According to the identification method for the key sRNA of rice infected by the rice blast fungus, provided by the invention, the key sRNA of cross species is excavated from a statistical angle by combining a pathogenic factor association network, so that the key pathogenic sRNA of the rice blast fungus can be identified more accurately.

Drawings

FIG. 1 is a schematic structural diagram of a rice key sRNA recognition system infected by Pyricularia oryzae of the present invention.

FIG. 2 is a schematic diagram of a rice blast infection key sRNA identifier according to the present invention.

FIG. 3 is a schematic circuit diagram of a rice key sRNA identifier infected by Pyricularia oryzae according to the invention.

FIG. 4 is a flow chart of a method for identifying key sRNA of rice infected by Pyricularia oryzae according to the invention.

Fig. 5 is a flow chart of the transverse construction of a multi-component heterogeneous network according to the present invention.

FIG. 6 is a flowchart of a local bi-directional clustering algorithm OP-Cluster and global multi-set chemical combination analysis algorithm iNMF according to the present invention.

Detailed Description

The present invention is described in further detail below with reference to the drawings to enable those skilled in the art to practice the invention by referring to the description.

As shown in fig. 1, the invention provides a key sRNA identification system for rice infection caused by rice blast fungus based on multi-group data integration, wherein an upper computer 01 is composed of an input unit 0111, a USB interface 0121 of a first interface unit 012, a JTAG debugging interface 0122, an ethernet interface 0123, an RS-232 serial port 0124, and a display unit 0131 to complete the coordination work with an ARM9 microprocessor 023. The input unit 0111 is connected with the first interface unit 012 and is responsible for finishing the input of the rice blast fungus and the rice multiunit data; the first interface unit 012 is responsible for connection communication with the ARM9 microprocessor 023; the display unit 0131 is connected with the first interface unit 012 and is responsible for finishing the output display of the identification result of the key sRNA of the rice blast fungus.

The rice blast fungus infection rice key sRNA identifier consists of an ARM9 microprocessor 023, a second interface unit 021, a storage unit 022 and a processing unit 024; the second interface unit 021 comprises a USB interface 0211, a JTAG debug interface 0212, an Ethernet interface 0213 and an RS-232 serial port 0214; the USB interface 0211 can be connected with a USB flash disk to realize the transfer of result data obtained by identifying the key sRNA of the rice blast fungus, thereby realizing the amplification of the storage unit; the JTAG debug interface 0212 is connected with the upper computer JTAG interface 0122 through JTAG emulation (programmer) conversion equipment and is used for realizing online debugging of the program; the Ethernet interface 0213 is connected with the Ethernet interface 0123 of the upper computer 01, so that the intercommunication between the ARM9 microprocessor 023 and the upper computer 01 is realized; the RS-232 serial port 0214 is connected with the RS-232 serial port 0124 of the upper computer 01, so that the intercommunication between the ARM9 microprocessor 023 and the upper computer 01 is realized.

The storage unit 022 includes a storage unit 0221, a cache unit 0222, and an external storage unit 0223; wherein, the storage unit 0221 is connected with the cache unit 0222 and is responsible for finishing the storage of the rice blast fungus and the rice multiunit data; the buffer unit 0222 is connected with the storage unit 0221, the preprocessing unit 0241 and the network construction unit 0242 simultaneously, and is responsible for completing the storage of intermediate data for identifying the key sRNA of the Pyricularia oryzae; the external storage unit 0223 is simultaneously connected with the key sRNA identification unit 0244 and the RS-232 serial port 0214, and is responsible for finishing the storage of the result data of the identification of the key sRNA of the Magnaporthe grisea and transmitting the result data back to the display unit of the upper computer through the RS-232 serial port for output display.

The processing unit 024 comprises a preprocessing unit 0241, a network construction unit 0242, a pathogenic factor mining unit 0243 and a key sRNA identification unit 0244; the network construction unit encapsulates a clustering dimension reduction algorithm (Multi-omics Cluster-DR), the pathogenic factor mining unit encapsulates a Multi-group data comprehensive clustering (MODSC) algorithm, the key sRNA identification unit can identify key pathogenic sRNA of the rice blast in a pathogenic factor network, and the identified key pathogenic sRNA is clustered by using a Multi-rank sum test Method (MRST) of two species samples, so that support is provided for the subsequent research of rice blast-rice interaction.

In this embodiment, a general PC computer is used as the upper computer 01, and the upper computer can connect with the generating device of the rice blast fungus infection rice key sRNA identifier based on multi-group data integration based on the microprocessor of 32-bit ARM920T core produced by samsung company through the RS-232 serial port, and together act to complete the task of rice blast fungus key sRNA identification.

The input unit 0111 and the display unit 0131 of the upper computer 01 both adopt the input and output equipment of the PC computer to realize the functions.

The intercommunication between the upper computer 01 and the ARM9 microprocessor 023 is realized through the Ethernet interface 0123 of the upper computer 01 and the Ethernet interface 0213 of the ARM9 microprocessor 023, and the Ethernet interface adopts a single fast Ethernet controller chip which is completely integrated by the DM9000 and has lower cost.

Meanwhile, a JTAG debugging interface 0122 of the upper computer 01 and a JTAG debugging interface 0212 of the ARM9 microprocessor 023 are added, and the interfaces are connected through JTAG simulation (programmer) conversion equipment, so that the upper computer 01 can realize real-time analysis and execution monitoring of programs on the ARM9 microprocessor 023.

The USB interface is a USB3.0 interface, so that identification result data of the key sRNA of the Pyricularia oryzae can be transferred to the USB flash disk through a USB interface 0121 of the upper computer 01 or a USB interface 0211 of the ARM9 microprocessor 023 for realizing the amplification of the storage unit.

The ARM9 microprocessor 023 system program storage unit 022 selects HY57V561620CT SDRAM of 32M Hynix company as the storage unit 0221, K9F1208UOM NAND FLASH of 64M SAMSUNG company as the cache unit 0222, and a hard disk of 100G as the expansion external storage unit 0223.

Each unit included in the processing unit 024 of the ARM9 microprocessor 023 is a deep learning algorithm of the key sRNA identification of the blasticidin packaged on the ARM9 microprocessor, and a 32-bit arithmetic unit is used in the arithmetic.

As shown in FIG. 2, a schematic diagram of a rice blast infection key sRNA identifier model is shown, and the connection relationship is as follows: the data input ports Vin of the USB interface 0211, JTAG debug interface 0212, ethernet interface 0213 and RS-232 serial interface 0214 are respectively connected with the data output pins Vout1[0..7] of the ARM9 microprocessor 023, and GND thereof is respectively connected with the GND of the ARM9 microprocessor 023.

The data input port Vin of the memory unit 0221 is connected to the data output pin Vout1[0..7] of the ARM9 microprocessor 023, the data output port Vout thereof is connected to the data input port Vin of the buffer unit 0222, and GND thereof is connected to GND of the ARM9 microprocessor 023. The data input port Vin of the buffer unit 0222 is connected with the data output port Vout of the memory unit 0221, the data output port Vout thereof is connected with the data input port Vin of the preprocessing unit, the data input port Vin of the network construction unit and the data input pin Vin1[0..7] of the ARM9 microprocessor 023, and the GND thereof is connected with the GND of the ARM9 microprocessor 023. The data input port Vin of the external memory unit 0223 is connected with the data output port Vout of the key sRNA recognition unit 0244, the data output port Vout is connected with the data input pin Vin1[0..7] of the ARM9 microprocessor 023, and GND is connected with GND of the ARM9 microprocessor 023.

The data input port Vin of the preprocessing unit 0241 is connected to the data output port Vout of the buffer unit 0223, the data output port Vout is connected to the data input port Vin of the ARM9 microprocessor 0237ini1 [0..7] and the network construction unit 0242, respectively, and GND is connected to GND of the ARM9 microprocessor 023. The data input port Vin of the network construction unit 0242 is connected with the data output port Vout of the preprocessing unit 0241 and the data output port Vout of the buffer unit, the data output port Vout is connected with the data input port Vin of the pathogenic factor mining unit 0243, and GND is connected with GND of the ARM9 microprocessor 023. The data input port Vin of the pathogenic factor mining unit 0243 is connected with the data output port Vout of the network construction unit 0242, the data output port Vout is connected with the data input port Vin of the key sRNA identification unit, and GND is connected with the GND of the ARM9 microprocessor 023. The data input port Vin of the key sRNA recognition unit 0244 is connected with the output port Vout of the pathogenic factor mining unit 0243, the data output port Vout is connected with the data input port Vin of the external storage unit 0223, and GND is connected with the GND of the ARM9 microprocessor 023.

As shown in FIG. 3, the circuit diagram connection relationship of the rice blast infection rice key sRNA identifier is as follows:

1) The F line is an address bus and is respectively connected with pins A0 to A17 of the ARM9 microprocessor, the external storage unit, the cache unit and the memory unit and is responsible for transmitting address information;

2) The line A is a data bus and is respectively connected with D0-D15 pins of an ARM9 microprocessor, a preprocessing unit, a network construction unit, a pathogenic factor mining unit, a key sRNA identification unit, an internal storage unit, a cache unit and an external storage unit, and is responsible for transmitting various data information;

3) The B line is a control bus and is respectively connected with OE, WR, CS3, CS1, HBE and LBE pins of the ARM9 microprocessor, the external storage unit, the cache unit, the internal storage unit, the preprocessing unit, the network construction unit, the pathogenic factor mining unit and the key sRNA identification unit, and is responsible for transmitting control signals and time sequence signals;

4) The C line is a power supply positive line and is respectively connected with the ARM9 microprocessor, the external storage unit, the cache unit, the internal storage unit, the preprocessing unit, the network construction unit, the pathogenic factor mining unit, the key sRNA identification unit, the USB interface, the JTAG interface, the TRS232 interface and the DS Ethernet interface;

5) The D line is a power supply negative electrode line and is respectively connected with the ARM9 microprocessor, the external storage unit, the cache unit, the internal storage unit, the preprocessing unit, the network construction unit, the pathogenic factor mining unit, the key sRNA identification unit, the USB interface, the JTAG interface, the TRS232 interface and the DS Ethernet interface;

6) The E line is a general I/O line and is respectively connected with the ARM9 microprocessor, the USB interface, the JTAG interface and the TRS232 interface to be responsible for data transmission of input and output equipment;

7) The G line is a network interface line, is respectively connected with the ARM9 microprocessor and the DS Ethernet interface and is responsible for network transmission.

As shown in FIG. 4, the invention also provides a method for identifying key sRNA of rice infected by Pyricularia oryzae based on multi-group data integration, which comprises the following steps:

(1) The method comprises the steps of inputting the multiple groups of data of the rice blast fungus and the rice by an input unit 0111 of an upper computer, transmitting the data to a memory unit 0221 of a key sRNA identifier of the rice infected by the rice blast fungus based on the integration of the multiple groups of data by an RS-232 serial port, and further reading the data into a cache unit 0222;

(2) The preprocessing unit 0241 reads the data of the rice blast fungus and the rice multiunit from the caching unit 0222, performs preprocessing, and outputs the result to the network construction unit 0242;

(3) The network construction unit 0242 reads out the preprocessed multiple groups of chemical data from the preprocessing unit 0241, and the preprocessed multiple groups of chemical layered heterogeneous interaction networks are obtained after processing;

repeating the operations (2) - (3) until all the multiple sets of data are processed.

The process of establishing a multi-component hierarchical heterogeneous interaction network includes: establishing a multi-component hierarchical heterogeneous network structure and longitudinally constructing the multi-component hierarchical heterogeneous network.

As shown in fig. 5, the multi-component heterogeneous transverse network construction process is as follows:

first, data of each group of rice before and after infection of rice with Pyricularia oryzae are collected from a plurality of databases such as NCBI, miRBase, genebank, eRice and the like and stored in a local database.

Then establishing a transverse relation network in each group respectively:

① In genome, gene expression, copy number variation and DNA methylation data before and after infection obtained from genomic data in a local database are respectively represented by adjacent matrixes X, D and C; the genome network visualization is completed by using a Gene network construction method (Gene-relational Data Network) based on relational data to establish a Gene-Gene interaction regulation network.

② In the transcriptome, differentially expressed sRNA is obtained from transcriptomic data in a local database; and predicting target genes of sRNA in starBase databases and selecting core targets to finally obtain the sRNA-mRNA interaction network.

③ In the proteome, protein with differential expression is obtained from proteomics data in a local database, the differential expression protein is input into a STRING database, and a network construction tool is utilized to carry out PPI network construction of rice and PPI network construction of rice blast fungus; and finally, completing node selection and network visualization by using a graph-based Support Vector Machine (SVM) method.

Screening positive tag nodes in the network, visualizing a proteomics data network, and establishing a rice blast fungus-rice proteomics data interaction network.

④ In the metabolome, the metabolome data in the local database is used for obtaining the metabolome data of differential expression, the dimension is reduced by PCA, and then the metabolome data of rice and rice blast fungus are respectively subjected to KEGG metabolic pathway analysis, so as to establish a metabolome relation network.

Then, a Multi-group chemical layered heterogeneous network is longitudinally constructed, and a Multi-group chemical rice blast fungus-rice layered heterogeneous interaction network is longitudinally established for each group of layered network by adopting a clustering dimension reduction algorithm (Multi-omics Cluster-DR) based on Multi-group chemical data. Adopting PCA dimension reduction algorithm to reduce dimension of a gene-gene interaction regulation network, a sRNA-mRNA interaction network, a rice blast fungus-rice proteome data interaction network and a rice blast fungus-rice proteome data interaction network respectively, and marking the result as a DR method set; then, decomposing the data matrix by using a tensor model CP and expressing the data matrix as a factor matrix, then, deleting outliers in the sample matrix by using a judgment outlier algorithm such as absolute median deviation (Median Absolute Deviation, MAD) and selecting a DR model of the first 5%, and applying a clustering algorithm to the DR algorithm; and then, carrying out quality evaluation through a local continuous meta-rule LCMC, and selecting the result with the highest evaluation value as output. And finally, visualizing a plurality of histology relation networks to establish a rice blast fungus-rice multi-set chemical layering heterogeneous network. The specific process is as follows:

S1, in DR method, decomposing tensor sample×sample×DR method into three factor matrices with sizes of n×p, n×p and m×p by using a CP model, wherein the first two matrices represent decomposition of sample modes, and the third matrix represents DR method model;

S2, performing outlier judgment and deletion by using a rejection standard of an absolute median deviation (MAD) method, wherein the rejection standard is as follows:

S3, selecting a DR method model with the front 5% of outliers, and applying a k clustering method to the DR method set to obtain a DR method group based on k clustering;

And S4, performing quality evaluation through a local continuous element criterion LCMC (QL), and selecting the highest value as output. And visualizing a plurality of histology relation structures to obtain the rice blast fungus-rice multi-set chemical layering heterogeneous interaction network.

(4) In the pathogenicity factor mining unit, a plurality of groups of academic layered heterogeneous interaction networks are used as the input of a model, the nodes of the Pyricularia oryzae sRNA in the network are subjected to hierarchical clustering to obtain a hierarchical clustering tree, and if some sRNAs always have similar regulation and control effects in a physiological process or different tissues, the sRNAs are reasonably considered to be functionally related, and can be defined as a module. Different branches of the cluster tree represent different sRNA modules. The sRNAs are classified according to the regulation mode, and sRNAs with similar modes are classified into one module. The output is a pathogenic sRNA regulation module, and then the pathogenic sRNA regulation module is transmitted to a key sRNA recognition unit;

(5) The key sRNA recognition unit receives the pathogenic sRNA regulation module, performs rice blast fungus key sRNA recognition, and finally obtains an accurate recognition result;

The pathogenic factor mining unit is used for sequentially reading a plurality of groups of academic layered heterogeneous interaction network data, taking the data as the input of a model, and transmitting all the output pathogenic sRNA regulation modules to the key sRNA identification unit; all the data are processed to obtain identification results, the identification results are stored in an external storage unit, and the identification results reach a display unit through an RS-232 serial port to be output and displayed.

In the embodiment, a multi-group data comprehensive clustering (MODSC) algorithm is adopted to establish a cross-species pathogenic sRNA regulation network in the rice infection process of the rice blast fungus, the correlation among pathogenic related group factors is revealed, and a cross-species pathogenic sRNA regulation module in the rice infection process of the rice blast fungus is excavated; and recognizing key sRNA related to the pathogenic of the rice blast fungus through the pathogenic sRNA regulation module. As shown in fig. 6, MODSC algorithm is an algorithm combining a local bi-directional clustering algorithm OP-Cluster with a global multi-set chemical combination analysis algorithm iNMF.

Firstly, constructing a primary pathogenic regulation network, acquiring a rice blast fungus sRNA-rice mRNA data matrix from the constructed multi-layered heterogeneous network, repeatedly performing OP-Cluster double clustering analysis and excavating pathogenic factors, and completing the construction of the primary pathogenic regulation network. The OP-Cluster double-clustering analysis algorithm is a double-clustering algorithm based on sequence comparison, and aims to find a double-clustering module with consistent relative change trend. To pre-process the data, all entry values for each row are first sorted in a non-decreasing order. Secondly, organizing each ordered row into a group sequence according to the similarity, wherein the group sequence is in the shape of ba (cd), and comprises three groups of b, a and cd, and the characteristic elements in the same group are not in sequence, wherein the column characteristics in (cd) meet the following conditions: and c-d is less than or equal to 0.1 Xmin (c, d).

The OP-Cluster algorithm uses a compact Tree structure OPC-Tree to store key information for mining OP-clusters, while discovering frequent subsequences and binding rows associated with frequent subsequences, where sequences sharing the same prefix are collected and recorded in the same location. Therefore, for all rows of the shared prefix, further operations along the shared prefix will be performed only once, and the pruning technique can be easily applied to the OPC Tree structure, and finally, all the double clustering results can be obtained only by performing DFS traversal on the OPC-Tree once.

And then a complete pathogenic regulation network is constructed, the internal relation of biomarkers among a plurality of groups is obtained in a multi-group hierarchical heterogeneous network, the structural relation among nodes is completely reserved, the multi-group pathogenic data related to the pathogenicity with the difference significance (p-value) less than 0.05 is selected through rice mRNA differential expression and rice blast fungus sRNA change and is input into iNMF multi-group chemical combination analysis, and the primary pathogenic regulation network constructed by the OP-Cluster is trimmed, so that the construction of the complete pathogenic regulation network is completed. The iNMF multiple sets of chemical joint analysis methods were as follows, iNMF joint fitting K gaussian joint latent variable models, capturing shared and data specific structures in K data sets X _k (N is the number of samples) of dimension P _k X N. Firstly, the potential variables are estimated by replacing orthogonal constraint with non-negative constraint, secondly, the sparse matrix H _k is shared between the V _k base matrix specific to data and the commonly shared W base matrix, and finally, k-means clustering is carried out according to posterior expectation value of the potential factor E (W|X) to determine cluster allocation. Wherein the coefficient matrix and the base matrix are corresponding matrices of the loading and latent variable matrices. And through iNMF cluster analysis, the construction of a pathogenic factor regulation network is further perfected. Wherein iNMF uses the European loss function optimization problem, and the model formula is as follows:

Wherein W is more than or equal to 0, H _k is more than or equal to 0, k=1, … and K.

The process for identifying the key sRNA based on the pathogenic factor network and MODSC method comprises the following steps:

Selecting the first 5% sRNA regulation module most relevant to rice blast expression in a pathogenicity factor network, and selecting sRNA nodes with differential expression p-value less than 0.05 from the selected sRNA regulation modules as key sRNAs relevant to rice blast pathogenicity.

If further research with the identified key sRNAs is required, the selected key pathogenic sRNAs can be clustered by using a multivariate rank sum test Method (MRST) of two species samples.

The multivariate rank sum test method of the two species samples is used for analyzing the high-throughput data difference of the Pyricularia oryzae sRNA. First, only sRNA common to both sets of expression data is selected to be dimensionally uniform, and then normalization of the data within the sets is performed to be quantitatively uniform, whereby normalized X ₁,X₂,…,X_N can be obtained, where X _i represents the observed value of the i-th sample. For X _i, the calculated spatial rank formula is:

R_i＝m_j[S(A_x(X_i-X_j))]，i，j＝1，2，...，N

Wherein mz represents the average of all samples z, m _j z represents the j-th dimension of the average above, S (z) represents the sign of z (+or-), such that A _x satisfies Where λ is the adjustment coefficient.

Selecting two sample space rank statistics as follows: Wherein/> S=1, 2, n _s is the weight, which is the average vector of the spatial center rank.

Calculating key sRNAs of the rice blast fungi selected from the pathogenic factor network, and clustering the key sRNAs with similar functions.

Although embodiments of the present invention have been disclosed above, it is not limited to the details and embodiments shown and described, it is well suited to various fields of use for which the invention would be readily apparent to those skilled in the art, and accordingly, the invention is not limited to the specific details and illustrations shown and described herein, without departing from the general concepts defined in the claims and their equivalents.

Claims

1. A rice blast infection rice key sRNA recognition system, comprising:

a microprocessor connected to the input unit;

a storage unit connected to the microprocessor;

wherein the processing unit comprises:

the method for constructing the multi-component heterogeneous interaction network of the rice blast fungus and the rice comprises the following steps:

Step 3, expanding the established primary rice blast fungus-rice layered heterogeneous interaction network through an upstream and downstream group study database respectively to obtain a corresponding regulation network as a demonstration network; optimizing the primary rice blast fungus-rice layered heterogeneous interaction network, perfecting a rice blast fungus-rice interaction knowledge database, and obtaining a rice blast fungus-rice multi-group academic layered heterogeneous interaction network;

2. The rice blast infection key sRNA identification system of claim 1, further comprising:

The second interface unit is connected with the preprocessing unit and comprises a second USB interface, a second JTAG debugging interface, a second Ethernet interface and a second RS-232 interface;

3. A method for identifying key sRNA of rice infected with rice blast fungus, characterized in that the key sRNA identification system of rice infected with rice blast fungus according to claim 1 or 2 is used, comprising the following steps:

Step one, collecting multiple groups of data of rice blast bacteria and rice;

4. A method for identifying key sRNA of rice infected with rice blast fungus according to claim 3, wherein in said step 1, establishing a data interaction network of rice blast fungus protein and rice protein group comprises:

5. The method for identifying key sRNA of rice infected with rice blast fungus according to claim 4, wherein in the third step, a cross-species pathogenic sRNA regulation module in rice infected with rice blast fungus is obtained by establishing a cross-species pathogenic sRNA regulation network in rice infected with rice blast fungus;