CN111370060A

CN111370060A - Protein interaction network co-location co-expression complex recognition system and method

Info

Publication number: CN111370060A
Application number: CN202010204246.6A
Authority: CN
Inventors: 张锦雄; 钟诚
Original assignee: Guangxi University
Current assignee: Guangxi University
Priority date: 2020-03-21
Filing date: 2020-03-21
Publication date: 2020-07-03

Abstract

The invention belongs to the technical field of protein complex identification, and discloses a protein interaction network co-localization co-expression complex identification system and method, which comprises the following steps: the system comprises a data extraction module, a matrix data generation module, an identification and evaluation module, a core mining module, an attachment adding module and a compound screening module. The protein complex recognition method comprises the following steps: organizing protein positioning data, gene expression data, protein interaction data and protein GO similarity data in a matrix manner; a seed expansion strategy is used to identify a co-localized co-expressed protein complex based on the core-accessory structure. The invention discovers the protein complex from the protein interaction network, is favorable for understanding the topological structure of the protein network and the biological significance contained in the complex, and has important functions for predicting the functions of unknown proteins and designing disease-targeted drugs.

Description

Protein interaction network co-location co-expression complex recognition system and method

Technical Field

The invention belongs to the technical field of protein complex identification, and particularly relates to a protein interaction network co-localization co-expression complex identification system and method.

Background

With the advent of the post-genomic era, proteome became yet another important research content for researchers. In cells, proteins rarely work alone, and they must bind to other proteins to interact with each other to perform their biological functions. Protein Interaction (PPI) is essential in all vital activities and is the basis for all metabolic activities performed by cells. Therefore, network maps for revealing and establishing interaction relationships among proteins have become hot spots in proteomics research and are also a difficult problem in the later gene era. Among various biological networks, protein interaction networks (PPINs) are the basis of cellular functions, which control a large number of life processes, and abnormal regulation caused by abnormal perturbation of protein-protein interactions is the main cause of many diseases, and thus, they are becoming major tools for revealing disease mechanisms from a molecular level.

Proteins are products of gene expression, which are executives of physiological functions of organisms and also direct manifestations of life phenomena. Proteomics is a discipline for systematic study of the properties contained in proteins and can provide detailed descriptions of the structure, function and regulation of biological systems in healthy and diseased states. Almost all biological processes are accomplished through a series of protein interactions. From the perspective of system biology, the research and analysis of biological functions by using a protein interaction network has important prospects and practical values. Protein complexes are collections of proteins that are organized in a multi-molecular mechanism by interactions at the same time and space, which is the primary form of a protein to perform its function. The recognition of protein complexes not only facilitates understanding of complex life activities, but also provides valuable theoretical references for discovering complex disease generation mechanisms and designing targeted drugs.

Currently, methods for mining protein complexes can be roughly classified into 3 types: the method is an identification method based on the traditional graph theory, for example: the method can save a certain time cost based on an RNSC algorithm clustered by a division mode, an MCODE algorithm clustered by a density mode and a GN algorithm clustered by a hierarchical mode, but the method can influence the overall efficiency of the algorithm to a certain extent due to sensitivity to a clustering center, data, parameters and the like; secondly, an identification method based on multigroup chemical data fusion is adopted, biological information data are generally integrated into the existing protein network, and the accuracy and reliability of the network are enhanced, so that the problems of false positive and false negative and the like existing in interaction data are solved, but inevitable limitations are difficult to meet the performance requirements of the algorithm; and thirdly, the identification method based on intelligent optimization shows good performance by simulating various group behaviors of the organisms in the nature and searching for an approximate optimal solution of the solved problem by utilizing interactive cooperation among individuals, such as an ant colony optimization algorithm, a particle swarm optimization algorithm and the like.

Meanwhile, a protein interaction network (PPIN) is constructed through the existing protein interaction data (PPIData), and meaningful substructures such as a protein Complex (Complex), a functional module (functional module) and a Motif (Motif) are found from the PPIN, so that the method becomes a hot spot of domestic and foreign research. In order to more easily find these substructures from the protein interaction network, it is a common practice to represent the protein interaction network in the form of a graph, regarding proteins as vertices and interactions between proteins as edges, and then to mine a biologically meaningful substructure, the protein Complex (complete), using various algorithms.

In summary, the problems of the prior art are as follows:

(1) the existing recognition method based on the traditional graph theory is sensitive to the comparison of a clustering center, data, parameters and the like, so that the overall efficiency of the algorithm is influenced to a certain extent, and the accuracy is low.

(2) The existing identification method based on multigroup data fusion has the inevitable limitation that the performance requirement of the algorithm is difficult to meet, and the accuracy rate is low.

(3) The existing identification method based on intelligent optimization is time-consuming and labor-consuming, low in convergence rate, low in search efficiency and easy to fall into local optimization.

The difficulty of solving the technical problems is as follows:

(1) the existing identification method based on the traditional graph theory is difficult to accurately identify protein compounds basically, and an algorithm needs to be redesigned according to the co-localization and co-expression attributes of the protein compounds;

(2) most of the existing identification methods based on multigroup biological data fusion only adopt 2-class biological data, and the utilization of more biological data means that multigroup biological data fusion modes are various and the optimal fusion mode needs to be selected;

(3) the NP problem cannot be solved through an exhaustion method, the problem that the NP is difficult to fall into local optimum is that the existing identification method based on intelligent optimization cannot avoid, and the search efficiency can be effectively improved by combining seed expansion and a greedy strategy.

The significance of solving the technical problems is as follows:

(1) co-localized co-expression is a fundamental attribute of protein complex assembly, and redesigning an algorithm based on the fundamental attribute is a prerequisite for accurate identification of protein complexes.

(2) Fusing more biological data into the algorithm ensures that the biological significance of identifying protein complexes is more significant.

(3) Seed expansion coupled with a greedy strategy makes efficient and accurate identification of protein complexes feasible.

Disclosure of Invention

Aiming at the problems in the prior art, the invention provides a protein interaction network co-localization co-expression complex recognition system and a protein interaction network co-localization co-expression complex recognition method.

The invention is realized in such a way that a protein interaction network co-localization co-expression complex recognition method comprises the following steps:

step one, a matrix data preparation stage: extracting protein positioning data, gene expression data, protein interaction data and protein GO labeling data;

analyzing and calculating to sequentially generate an interaction matrix with reliability scores among the proteins, a protein positioning matrix, a gene expression matrix, a CC-based protein similarity matrix, an MF-based protein similarity matrix and a BP-based protein similarity matrix;

identifying a protein complex under parameter tuning setting through a core algorithm ICJointLE;

(1) protein complex core mining phase: excavating a densely and reliably connected combined co-location co-expression protein core by applying a seed expansion strategy according to the core-attachment structure;

(2) protein complex attachment addition stage: adding a strongly reliable linked joint co-localization joint co-expression protein accessory;

(3) overlapping protein complex screening phase: overlapping complexes with low reliable ligation densities were deleted.

And step four, evaluating the quality of the identified compound by taking CYC2008 as a reference.

Further, the identification method of the protein interaction network co-localization co-expression complex adopts a saccharomyces cerevisiae yeast data set.

Further, the CYC2008 as a set of known complexes comprises 408 artificially organized heteromeric protein complexes; gene expression data GSE3431 contains not only gene expression data for 3 consecutive metabolic cycles, but also 3-class GO terminology labels that contain expressed genes.

Further, the method for identifying the complex containing the protein-free localization data protein in CYC2008 by ICJointLE is as follows: some of the proteins in the CYC2008 and PPI datasets have no protein localization data, and when calculating the joint co-localization count for the proteome containing the protein lacking protein localization data proteins, the localization vectors for the protein lacking protein localization data proteins are set to all 1's.

Another object of the present invention is to provide a protein-interacting network co-localized co-expression complex recognition system for implementing the protein-interacting network co-localized co-expression complex recognition method, the protein-interacting network co-localized co-expression complex recognition system comprising:

the data extraction module is used for extracting protein positioning data, gene expression data, protein interaction data and protein GO labeling data;

the matrix data generation module is used for sequentially generating an interaction matrix with reliability scores among proteins, a protein positioning matrix, a gene expression matrix, a CC-based protein similarity matrix, an MF-based protein similarity matrix and a BP-based protein similarity matrix;

the identification and evaluation module is used for identifying the protein compound under the parameter tuning setting through a core algorithm ICJointLE, and then performing quality evaluation on the identified compound by taking CYC2008 as a reference;

the core mining module is used for mining the densely and reliably connected joint co-localization joint co-expression protein core;

an attachment addition module for adding a strongly reliably linked joint co-localized joint co-expressed protein attachment;

and the complex screening module is used for deleting the overlapped complexes with low reliable connection density.

The invention also aims to provide an information data processing terminal for realizing the protein interaction network co-localization co-expression complex identification method.

It is another object of the present invention to provide a computer-readable storage medium comprising instructions which, when executed on a computer, cause the computer to perform the protein interaction network co-localized co-expression complex identification method.

In summary, the advantages and positive effects of the invention are: the invention realizes the co-location co-Expression protein complex identification based on a protein interaction network based on a kit ICJointLE (identification protein Complexes with the features of joint co-Localization and joint co-Expression) V1.0. The invention discovers the protein Complex (Complex) from the protein interaction network (PPIN), is beneficial to understanding not only the topological structure of the protein network, but also the biological significance contained in the Complex, and has important function for predicting the function of unknown protein and human pathogenic genes.

Drawings

FIG. 1 is a schematic diagram of a protein interaction network co-localization co-expression complex recognition system provided in an embodiment of the present invention;

in the figure: 1. a data extraction module; 2. a matrix data generation module; 3. a complex recognition module; 4. a core mining module; 5. an accessory adding module; 6. and 7, a compound screening module and a compound evaluation module.

FIGS. 2 and 3 are flow charts of methods for identifying co-localized co-expression complexes of protein interaction networks according to embodiments of the present invention.

Fig. 4 is a schematic diagram of a folder where icjoinle V1.0 is initially installed according to an embodiment of the present invention and a configuration thereof.

Fig. 5 is a schematic diagram of a preparing data set training folder according to an embodiment of the present invention.

Fig. 6 is a schematic diagram of creating a STRING folder according to an embodiment of the present invention.

Fig. 7 is a diagram of a PPI file for preparing a STRING data set according to an embodiment of the present invention.

Fig. 8 is a schematic diagram of a matrix data file generation process of the STRING data set according to the embodiment of the present invention.

Fig. 9 is a schematic diagram of a matrix data file of a STRING data set according to an embodiment of the present invention.

FIG. 10 is a schematic diagram of a process for identifying and evaluating complexes in a STRING interaction network, according to an embodiment of the present invention.

FIG. 11 is a schematic diagram of the complexes identified in the STRING interaction network and their evaluation provided by embodiments of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Aiming at the problems in the prior art, the invention provides a protein interaction network co-localization co-expression complex recognition system and a protein interaction network co-localization co-expression complex recognition method, and the invention is described in detail below with reference to the attached drawings.

As shown in fig. 1, the system for identifying co-localized co-expression complexes of protein interaction networks provided in the embodiments of the present invention includes: the system comprises a data extraction module 1, a matrix data generation module 2, an identification evaluation module 3, a core mining module 4, an attachment adding module 5 and a compound screening module 6.

The data extraction module 1 is used for extracting protein positioning data, gene expression data, protein interaction data and protein GO labeling data;

the matrix data generation module 2 is used for sequentially generating an interaction matrix with reliability scores among proteins, a protein positioning matrix, a gene expression matrix, a CC-based protein similarity matrix, an MF-based protein similarity matrix and a BP-based protein similarity matrix;

the identification and evaluation module 3 is used for identifying the protein compound under the parameter optimization setting through a core algorithm ICJointLE, and then performing quality evaluation on the identified compound by taking CYC2008 as a reference;

the core mining module 4 is used for mining a densely and reliably connected combined co-localization combined co-expression protein core;

an attachment adding module 5 for adding a strongly reliable linked joint co-localized joint co-expressed protein attachment;

and a complex screening module 6 for deleting overlapping complexes with low reliable connection density.

The data adopted by the system provided by the embodiment of the invention is a saccharomyces cerevisiae (yeast) related data set.

CYC2008 provided by the examples herein is a collection of known complexes that includes 408 artificially organized heteromeric protein complexes. Gene expression data GSE3431 contains not only gene expression data for 3 consecutive metabolic cycles, but also 3-class GO terminology labels that contain expressed genes.

The method for identifying the compound containing the protein-free positioning data protein in CYC2008 by ICJointLE provided by the embodiment of the invention comprises the following steps:

some proteins in the CYC2008 and PPI datasets have no protein localization data. To accurately identify as many protein complexes as possible in CYC2008, the localization vectors for protein-deficient localization data proteins are set to all "1" s when calculating the joint co-localization counts for the proteomes containing the protein-deficient localization data proteins.

As shown in fig. 2, the method for identifying a co-localized co-expression complex of a protein interaction network provided in the embodiment of the present invention includes the following steps:

s101: matrix data preparation stage: extracting protein positioning data, gene expression data, protein interaction data and protein GO labeling data.

S102: and analyzing and calculating to sequentially generate an interaction matrix, a protein positioning matrix, a gene expression matrix, a CC-based protein similarity matrix, an MF-based protein similarity matrix and a BP-based protein similarity matrix with the reliability scores among the proteins.

S103: the core algorithm ICJointLE identified protein complexes under the parameter tuning settings.

S103-1: protein complex core mining phase: and (3) excavating a densely and reliably connected combined co-localization combined co-expression protein core by applying a seed expansion strategy according to the core-attachment structure.

S103-2: protein complex attachment addition stage: the addition of a strongly reliably linked co-localized co-expressed protein accessory.

S103-3: overlapping protein complex screening phase: overlapping complexes with low reliable ligation densities were deleted.

S104: evaluation of protein complexes: the quality of the identified complexes was assessed with reference to CYC 2008.

The technical solution of the present invention is further described below.

The invention points out that: a group of proteins must interact with each other at the same time and place to form a complex. In other words, the proteins in the complex are mass co-localized co-expressed and present dense junctions in the static PPI network (SPPIN). The software suite ICJointLE V1.0 excavates co-localized, co-expressed, densely and reliably connected and biologically functional similar protein clusters from a static PPI network (SPPIN) according to a core-attachment structure to generate a protein complex. Therefore, the software suite ICJointLE V1.0 realizes a group of protein co-localization criterion criteria according to the joint localization vector; then calculating a group of protein co-expression levels according to the combined gene expression pattern; in addition, similarity of characteristics of various protein Gene Ontologies (GO) is combined to establish a criterion for judging protein function similarity so as to ensure that protein complexes with consistent biological functions are identified.

1. Overview of software

1.1 principle

The software suite ICJointLE V1.0 organizes protein positioning data, gene expression data, protein interaction data and protein GO similar data in a matrix mode, and then identifies a combined co-localized and co-expressed protein complex by adopting a seed expansion strategy according to 3 steps (protein core mining, accessory protein adding and candidate protein complex screening) based on a core-accessory structure.

The software suite ICJointLE V1.0 is divided into 2 phases. The first stage is the matrix data preparation stage. This stage in turn generates an interaction matrix with reliability scores between proteins, a protein localization matrix, a gene expression matrix, a CC-based protein similarity matrix, an MF-based protein similarity matrix, and a BP-based protein similarity matrix. The second phase is the protein complex recognition phase. At this stage, according to the core attachment structure, a seed expansion strategy is applied, firstly, a densely and reliably connected combined co-localization combined co-expression protein core is excavated, then, a strongly and reliably connected combined co-localization combined co-expression protein attachment is added, and finally, an overlapping compound with low reliable connection density is deleted.

1.2 scheme

The operation flow of the present system is shown in fig. 3.

2. Operating environment

The experimental environment is shown in table 1.

TABLE 1 Experimental Environment

3. Instructions for use

3.1 software suite deployment

The software suite ICJointLE V1.0 is composed of a set of program modules running under a console and a plurality of related public data sets, and a user can deploy the software suite ICJointLE V1.0 into a designated folder.

3.1.1 software suite Structure and specific files

Under the folder designated by the user, the directory structure of the package of files is as follows.

3.1.2 software suite usage

The software suite ICJointLE V1.0 was carried out in the following two steps.

(1) Data preparation phase

preparation _ data creates a default directory "yourdata" under the current directory "

Or

preparation _ data datadir creates a directory "datadir" under the current directory "

Or

prepare _ data datadiryour _ ppis.txt generates all matrix data files within the "datadir" containing your _ ppis.txt.

The PPIs file format:

after creating the directory "yourdata" or "datadir", please copy the user's PPIs file (e.g., your _ PPIs. txt) into the directory "yourdata" or "datadir". Note that the PPIs file must conform to the following format.

your _ ppis. txt a pair of system names separated by tab per line

YKL171W YML096W

YFL017W-AYFR031C-A

...

Thus, all matrix data files can be generated using the following format.

preparing_datayourdatayour_PPIs.txt

Or

preparing_data datadiryour_PPIs.txt

After the data preparation phase is completed, the directory "yourdata" or "datadir" (assumed to be "yourdata") contains the files listed in table 2.

TABLE 2 associated data files

(2) Identification and evaluation phase

At this stage, the protein complexes were first identified by the core algorithm ICJointLE in the parameter tuning settings listed in table 3, and then the quality of the identified complexes was evaluated with reference to CYC 2008.

Optional parameters

TABLE 3 optional parameter description file

Examples of the invention

Setting all optional parameters

identify_and_analyze yourdata your_PPIs.txt -L 1 -r 999 -d 0.3 -c 0.7-f 0.75 -p 0.3 -m 0.08 -u 0.01 -e 0.9

Partial optional parameter Default

identify_and_analyze yourdata your_PPIs.txt -r 990 -c 0.6 -f 0.8 -p0.1 -m 0.4 -e 0.7

Default all optional parameters (Default all parameters according to Table 3)

identify_and_analyze STRING STRING_PPIs.txt

identify_and_analyze BioGrid BioGrid_PPIs.txt

identify_and_analyze DIP DIP_PPIs.txt

3.2 correlation data

The data adopted by the software suite ICJointLE V1.0 at present is a saccharomyces cerevisiae (yeast) related data set. Saccharomyces cerevisiae has been extensively studied as a model organism and has generated a large amount of biological data on Saccharomyces cerevisiae, which is the main reason why this study has been conducted using the Saccharomyces cerevisiae data set. In the experiments, 6 yeast PPI datasets were selected for the present invention. The first data set was from a STRING database version 10 containing 6418 protein and 939998 pairs of interactions, each with reliability score data. The second dataset consisted of 5811 proteins and 256516 interactions, which were derived from yeast PPI data version 3.4.128 of the BioGrid database. The third yeast PPI dataset was derived from the DIP database, published at 2015/07/01, comprising 5022 proteins and 22381 interactions. There were 3 additional sets of yeast binary interaction data generated by the yeast two-hybrid experiment: uetz, Ito, and Yu. The Uetz dataset contains 910 proteins and 823 interactions, the Ito dataset consists of 765 proteins and 733 interactions, and the Yu dataset consists of 1203 proteins and 1610 interactions.

CYC2008 as a set of known complexes contains 408 artificially organized heteromeric protein complexes. Gene expression data GSE3431 contains not only gene expression data for 3 consecutive metabolic cycles, but also 3-class GO terminology labels that contain expressed genes. Yeast protein localization data was derived from http:// yeastgfp. The present invention notes that some proteins in the CYC2008 and PPI datasets have no protein localization data. To accurately identify as many protein complexes as possible in CYC2008, we set the localization vector for protein-deficient localization data proteins to all "1" s when calculating the joint co-localization count for the proteome containing the protein-deficient localization data proteins. Thus, the method of the invention, ICJointLE, still recognized complexes of CYC2008 containing protein-free localization data proteins.

3.3 output results

The software suite ICJointLE V1.0 produced the results of identifying the compound and its quality assessment, and the output was stored as a file in the "complexes" subdirectory, as listed in Table 4.

Table 4 identification of complexes and quality assessment thereof

Example 2: example of user operation

As shown in FIG. 4, assume that the software suite ICJointLE V1.0 is installed in folder d \ ICJointLE V1.0.

1. Data preparation phase

The related data generation process is described by taking the STRING data set as an example.

Creating a data set folder

The software suite icjoinle V1.0 program module set folder is entered in the command line state and then batch command preparation data bat is executed in the following format, the operation process is shown in fig. 5.

As shown in fig. 6, a folder named STRING is created.

Preparing PPI dataset files

The PPI file (STRING _ PPIs. txt) that meets the format requirements is copied into \ STRING, as shown in fig. 7.

Generating matrix data files

The software suite icjoinle V1.0 program module set folder is entered in the command line state and then batch command preparation data bat is executed in the following format, see fig. 8 for the operational process.

preparing_data STRING STRING_PPIs.txt

After the data preparation phase is complete, a series of matrix data files are generated in the folder STRING (see FIG. 9)

2. Identification and evaluation of protein complexes

The software suite ICJointLE V1.0 program module set folder is entered in the command line state, and then the following format batch commands are executed, the operation process is shown in FIG. 10.

identify_and_analyze STRING STRING_PPIs.txt -L 1 -r 999 -d 0.3 -c 0.7-f 0.75 -p 0.3 -m 0.08 -u 0.01 -e 0.9

After the identification and evaluation phase is complete, the files listed in FIG. 11 are generated in the subfolders complexes of the folder STRING.

The technical effects of the present invention will be described in detail with reference to experiments.

To reflect the quality of the protein complexes identified by the software suite, tables 5-7 identified the evaluation index of the complexes on the STRING PPI data set from 3 points of exact match, approximate match and biological relevance versus the 9 algorithms including ICJointLE.

Table 5 compares the number of compounds that match exactly. It is easy to see that the total number of the compounds accurately identified by the software suite ICJointLE is obviously more than that of other algorithms, especially the compounds with the scale of 2-3.

TABLE 5 comparison of quantity distributions of accurately identified protein complexes on different scales

Table 6 compares the evaluation indexes of approximate matching. It is also easy to see that the software suite ICJointLE has better indexes than other algorithms except the Sn index.

TABLE 6 comparison of evaluation indexes for identifying protein complexes

Table 7 compares the significance of functional enrichment in BP terms. It can be seen that the percentage of the identified complexes, ICJointLE, in terms of BP function enrichment significance, whether overall or in different specification groups, was greater than that of other algorithms.

Table 7 identification of protein Complex BP enrichment significance comparison

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims

1. A protein interaction network co-localization co-expression complex identification method is characterized by comprising the following steps:

step three, a core algorithm ICJointLE identifies a protein complex under the parameter tuning setting, and the process is divided into the following 3 sequential steps:

(1) protein complex core recognition phase: excavating a densely and reliably connected combined co-location co-expression protein core by applying a seed expansion strategy according to the core-attachment structure;

(3) overlapping protein complex screening phase: deleting overlapping complexes of low reliable ligation density;

2. The method for identifying protein-interacting network co-localized co-expression complexes of claim 1, wherein the method for identifying protein-interacting network co-localized co-expression complexes employs a saccharomyces cerevisiae yeast dataset.

3. The method for identifying co-localized and co-expressed complexes of the protein interaction network of claim 1, wherein the CYC2008 comprises 408 artificially organized heteromeric protein complexes as a set of known complexes; gene expression data GSE3431 contains not only gene expression data for 3 consecutive metabolic cycles, but also 3-class GO terminology labels that contain expressed genes.

4. The method for identifying protein-interacting network co-localized co-expression complexes of claim 1, wherein the method for ICJointLE to identify complexes containing protein-free localization data protein in CYC2008 comprises: some of the proteins in the CYC2008 and PPI datasets have no protein localization data, and when calculating the joint co-localization count for the proteome containing the protein lacking protein localization data proteins, the localization vectors for the protein lacking protein localization data proteins are set to all 1's.

5. A protein-interacting network co-localized co-expression complex recognition system for implementing the method of any one of claims 1 to 4, wherein the protein-interacting network co-localized co-expression complex recognition system comprises:

the protein complex core mining module is used for mining the densely and reliably connected joint co-localization joint co-expression protein core;

a protein complex attachment addition module for adding a strongly reliably linked co-localized co-expressed protein attachment;

a protein complex screening module for deleting overlapping complexes of low reliable ligation density.

6. An information data processing terminal for implementing the method for identifying a co-localized and co-expressed complex in a protein interaction network according to any one of claims 1 to 4.

7. A computer-readable storage medium comprising instructions which, when executed on a computer, cause the computer to perform the method for protein-interacting network co-localized co-expression complex identification of any one of claims 1-4.