CN111028887B

CN111028887B - Method and device for identifying ncRNA (non-coding ribonucleic acid) cooperative competition network

Info

Publication number: CN111028887B
Application number: CN201911229601.9A
Authority: CN
Inventors: 张俊鹏; 饶妮妮; 王光斌
Original assignee: University of Electronic Science and Technology of China; Guangdong Electronic Information Engineering Research Institute of UESTC
Current assignee: University of Electronic Science and Technology of China; Guangdong Electronic Information Engineering Research Institute of UESTC
Priority date: 2019-12-04
Filing date: 2019-12-04
Publication date: 2021-04-06
Anticipated expiration: 2039-12-04
Also published as: CN111028887A

Abstract

The invention provides a method and a device for identifying ncRNA (non-coding ribonucleic acid) cooperative competition networks, and relates to the technical field of gene identification. According to the invention, by acquiring ncRNA and mRNA expression profile data of a target disease type matching sample, and according to the ncRNA and mRNA expression profile data and preset ncRNA-mRNA competition relationship data, determining ncRNA-ncRNA pairs consisting of two ncRNAs meeting preset conditions in the ncRNA and mRNA expression profile data as a cooperative competition relationship pair, recognition of the ncRNA-ncRNA cooperative competition relationship pair can be realized, a ncRNA cooperative competition network formed by multiple ncRNA cooperations and mRNA competition is recognized, and reference can be provided for clinical diagnosis and targeted treatment of target genes of human complex diseases such as cancer.

Description

Method and device for identifying ncRNA (non-coding ribonucleic acid) cooperative competition network

Technical Field

The invention relates to the technical field of gene identification, in particular to a method and a device for identifying ncRNA (non-coding ribonucleic acid) cooperative competition networks.

Background

Micro ribonucleic acid (microRNA, miRNA) is a non-coding small RNA regulating molecule with the endogenous length of about 22 nucleotides, and can regulate the expression level of messenger RNA (mRNA) of protein coding genes. The existing research shows that: miRNA plays an important role in regulation and control in biological processes such as cell differentiation, cell proliferation, cell growth, cell migration, cell apoptosis and cancer. According to the endogenous competing RNA (ceRNA) hypothesis, gene expression is regulated by the mutual competition between different gene transcripts of the response elements of Mirnas (MREs). These transcripts with competitive relationship are collectively called cerRNA, and include mRNA encoding protein, long non-coding RNA (lncRNA), pseudogene transcript (pseudogene), and circular RNA (circRNA), etc., and the formed RNA regulatory network is called the cerRNA interaction network.

The CERNA interaction network is closely related to a plurality of human complex diseases (such as cancer), can be used as a novel biomarker for diagnosing and targeting treatment of the human complex diseases, and provides reference for clinical diagnosis and targeting treatment of the human complex diseases such as cancer.

In general, in a non-coding RNA (ncRNA) -linked ceRNA interaction network, the competition relationship between ncRNA and target gene mRNA is a many-to-many relationship. This competition relationship indicates that: multiple ncRNAs can compete with the target gene mRNA cooperatively, and a ncRNA cooperative competition network is formed. While the research on the cooperative competition relationship in the ncRNA cooperative competition network can help to understand the cooperative competition mechanism of ncRNA in human complex diseases, no feasible scheme for identifying the ncRNA cooperative competition network exists in the prior art.

Disclosure of Invention

The invention aims to provide a method and a device for identifying a ncRNA (non-coding ribonucleic acid) cooperative competition network, which can screen the ncRNA cooperative competition network associated with complex diseases and provide reference for clinical diagnosis and targeted treatment of human complex diseases such as cancer.

In a first aspect, an embodiment of the present invention provides a method for identifying an ncRNA cooperative competition network, including: acquiring ncRNA and mRNA expression profile data of a target disease type matching sample; and determining ncRNA-ncRNA pairs consisting of two ncRNAs meeting preset conditions in the ncRNA and mRNA expression profile data to form a cooperative competition relationship pair according to the ncRNA and mRNA expression profile data and preset ncRNA-mRNA competition relationship data.

In an alternative embodiment, two nc meeting a preset condition in the ncRNA and mRNA expression profile data are determined according to the ncRNA and mRNA expression profile data and preset ncRNA-mRNA competition relationship dataThe ncRNA-ncRNA pairing formed by RNA is a cooperative competition relationship pair, and comprises: obtaining ncRNA and mRNA expression profile data of ncRNA₁And ncRNA₂Composed ncRNA₁-ncRNA₂Pairing; calculating the ncRNA according to preset ncRNA-mRNA competition relation data₁-ncRNA₂Matching corresponding cooperative competition mRNA statistical significance probability value, positive correlation significance probability value and sensitivity partial correlation coefficient value; if ncRNA₁-ncRNA₂When the pairing simultaneously satisfies that the statistical significance probability value of the cooperative competition mRNA is smaller than a first threshold value, the positive correlation significance probability value is smaller than a second threshold value, and the sensitivity partial correlation coefficient value is larger than a third threshold value, determining the ncRNA₁-ncRNA₂The pairing is a cooperative competition relationship pair.

In alternative embodiments, ncRNA is calculated₁-ncRNA₂Pairing corresponding cooperative competing mRNA statistical significance probability values, comprising: measuring the ncRNA by adopting a hyper-geometric distribution test algorithm according to preset ncRNA-mRNA competitive relation data₁-ncRNA₂Alignment of ncRNA₁And ncRNA₂The probability value of statistical significance of the cooperative competition mRNA.

In alternative embodiments, ncRNA is calculated₁-ncRNA₂Pairing corresponding positive correlation significance probability values comprising: calculation of ncRNA₁-ncRNA₂Alignment of ncRNA₁And ncRNA₂Pearson correlation coefficient therebetween; and calculating to obtain a positive correlation significance probability value according to the Pearson correlation coefficient.

In alternative embodiments, ncRNA is calculated₁-ncRNA₂Pairing corresponding sensitivity partial correlation coefficient values, comprising: according to ncRNA₁-ncRNA₂Alignment of ncRNA₁And ncRNA₂Correlation coefficient values between and corresponding ncRNA under mRNA conditions₁-ncRNA₂Alignment of ncRNA₁And ncRNA₂And calculating to obtain the sensitivity partial correlation coefficient value.

In alternative embodiments, the ncRNA₁-ncRNA₂Pairing while satisfying a synergistic competitive mRNA statistical significance profileIf the value is less than the first threshold value, the positive correlation significance probability value is less than the second threshold value and the sensitivity partial correlation coefficient value is greater than the third threshold value, determining the ncRNA₁-ncRNA₂Pairing is a cooperative competition relationship pair, including: if the statistical significance probability value of the cooperative competition mRNA is less than 0.05, the positive correlation significance probability value is less than 0.05 and the sensitivity partial correlation coefficient value is more than 0.1 are simultaneously satisfied, determining the ncRNA₁-ncRNA₂The pairing is a cooperative competition relationship pair.

In an optional embodiment, before determining that ncRNA-ncRNA pairs composed of two ncrnas satisfying a preset condition in the ncRNA and mRNA expression profile data are paired into a cooperative competition relationship pair according to the ncRNA and mRNA expression profile data and preset ncRNA-mRNA competition relationship data, the method further includes: and acquiring prior ncRNA-mRNA competitive network data associated with the ncRNA and mRNA expression profile data of the target disease type matching sample by fusing a plurality of different databases to obtain ncRNA-mRNA competitive relationship data.

In an optional embodiment, before determining that ncRNA-ncRNA pairs composed of two ncrnas satisfying a preset condition in the ncRNA and mRNA expression profile data are paired into a cooperative competition relationship pair according to the ncRNA and mRNA expression profile data and preset ncRNA-mRNA competition relationship data, the method further includes: and preprocessing the ncRNA and mRNA expression profile data, and removing repeated items in the ncRNA and mRNA expression profile data and ncRNA and mRNA without gene names.

In an alternative embodiment, the method further comprises: pairing ncRNAs identified as cooperative competition pairs in the following manner₁-ncRNA₂And (3) evaluating the ncRNA-ncRNA cooperative competition network formed by pairing: 1) fitting whether the connectivity of the ncRNA-ncRNA cooperative competition network obeys power law distribution or not to determine whether the ncRNA-ncRNA cooperative competition network belongs to a scale-free network or not; 2) determining nodes with the connectivity higher than the first 10% in the ncRNA-ncRNA cooperative competition network as pivot nodes; 3) determining ncRNAs for which both ncRNAs are associated with a target disease type₁-ncRNA₂Pairing is a ncRNA-ncRNA cooperative competition pair corresponding to the target disease type; 4) based on ncRNA-ncRNA cooperative competition network, using Markov clustering algorithmIdentifying a ncRNA-ncRNA cooperative competition module; 5) determining a ncRNA-ncRNA cooperative competition module with a significance probability value related to the functionality of the target disease type less than 0.05 as a ncRNA-ncRNA cooperative competition module corresponding to the target disease type according to the ncRNAs related to the prior target disease type and a hyper-geometric distribution inspection algorithm; 6) for each ncRNA-ncRNA cooperative competition module, calculating a risk value of each target disease type matching sample by using a multivariate Cox model; dividing the target disease type matching sample into a high risk sample set and a low risk sample set according to the risk value of the target disease type matching sample; calculating a risk value according to the high risk sample set and the low risk sample set; calculating significance probability values of the difference of the survival times of the high-risk sample set and the low-risk sample set according to a logarithmic rank test algorithm to obtain a logarithmic rank test significance value; and determining the ncRNA-ncRNA cooperative competition module with the risk value of more than 1 and the significance probability value of the logarithmic rank test of less than 0.05 as the biomarker of the target disease type.

In alternative embodiments, the target disease type includes any of the following: glioblastoma multiforme, squamous cell carcinoma of the lung, ovarian cancer, and prostate cancer.

In alternative embodiments, the ncRNA comprises any one of: long non-coding RNA, circular RNA, and pseudogenes.

In a second aspect, an embodiment of the present invention provides an apparatus for identifying ncRNA cooperative competition network, including: the acquisition module is used for acquiring ncRNA and mRNA expression profile data of the target disease type matching sample; and the recognition module is used for determining that the ncRNA-ncRNA pair formed by two ncRNAs meeting the preset conditions in the ncRNA and mRNA expression profile data is a cooperative competition relationship pair according to the ncRNA and mRNA expression profile data and the preset ncRNA-mRNA competition relationship data.

In an alternative embodiment, the identification module comprises: an acquisition submodule for acquiring ncRNA and ncRNA in mRNA expression profile data₁And ncRNA₂Composed ncRNA₁-ncRNA₂Pairing; a calculation submodule for calculating nc according to the preset ncRNA-mRNA competition relation dataRNA₁-ncRNA₂Matching corresponding cooperative competition mRNA statistical significance probability value, positive correlation significance probability value and sensitivity partial correlation coefficient value; recognition submodule for rogowski RNA₁-ncRNA₂When the pairing simultaneously satisfies that the statistical significance probability value of the cooperative competition mRNA is smaller than a first threshold value, the positive correlation significance probability value is smaller than a second threshold value, and the sensitivity partial correlation coefficient value is larger than a third threshold value, determining the ncRNA₁-ncRNA₂The pairing is a cooperative competition relationship pair.

In an alternative embodiment, the calculation submodule is specifically configured to measure the ncRNA by using a hyper-geometric distribution test algorithm according to preset ncRNA-mRNA competition relationship data₁-ncRNA₂Alignment of ncRNA₁And ncRNA₂The probability value of statistical significance of the cooperative competition mRNA.

In an alternative embodiment, the calculation submodule, in particular for calculating ncRNA₁-ncRNA₂Alignment of ncRNA₁And ncRNA₂Pearson correlation coefficient therebetween; and calculating to obtain a positive correlation significance probability value according to the Pearson correlation coefficient.

In an alternative embodiment, the computational submodule, in particular for computing the function of ncRNA₁-ncRNA₂Alignment of ncRNA₁And ncRNA₂Correlation coefficient values between and corresponding ncRNA under mRNA conditions₁-ncRNA₂Alignment of ncRNA₁And ncRNA₂And calculating to obtain the sensitivity partial correlation coefficient value.

In an alternative embodiment, the identification submodule is specifically configured to determine the ncRNA if the cooperative competition mRNA statistical significance probability value is less than 0.05, the positive correlation significance probability value is less than 0.05, and the sensitivity partial correlation coefficient value is greater than 0.1 are simultaneously satisfied₁-ncRNA₂The pairing is a cooperative competition relationship pair.

In an alternative embodiment, the apparatus further comprises: and the competition data module is used for obtaining prior ncRNA-mRNA competition network data associated with the ncRNA and mRNA expression profile data of the target disease type matching sample by fusing a plurality of different databases before the recognition module determines that the ncRNA-ncRNA pair consisting of two ncRNAs meeting the preset conditions in the ncRNA and mRNA expression profile data is a cooperative competition relationship pair according to the ncRNA and mRNA expression profile data and the preset ncRNA-mRNA competition relationship data.

In an alternative embodiment, the apparatus comprises: and the preprocessing module is used for preprocessing the ncRNA and mRNA expression profile data and removing repeated items in the ncRNA and mRNA expression profile data and the ncRNA and mRNA without gene names before the recognition module determines that the ncRNA-ncRNA pair consisting of two ncRNAs meeting preset conditions in the ncRNA and mRNA expression profile data is a cooperative competition relationship pair according to the ncRNA and mRNA expression profile data and preset ncRNA-mRNA competition relationship data.

In an alternative embodiment, the apparatus further comprises: an evaluation module for pairing ncRNAs determined as cooperative competition pairs in the following manner₁-ncRNA₂And (3) evaluating the ncRNA-ncRNA cooperative competition network formed by pairing: 1) fitting whether the connectivity of the ncRNA-ncRNA cooperative competition network obeys power law distribution or not to determine whether the ncRNA-ncRNA cooperative competition network belongs to a scale-free network or not; 2) determining nodes with the connectivity higher than the first 10% in the ncRNA-ncRNA cooperative competition network as pivot nodes; 3) determining ncRNAs for which both ncRNAs are associated with a target disease type₁-ncRNA₂Pairing is a ncRNA-ncRNA cooperative competition pair corresponding to the target disease type; 4) identifying the ncRNA-ncRNA cooperative competition module by utilizing a Markov clustering algorithm based on the ncRNA-ncRNA cooperative competition network; 5) determining a ncRNA-ncRNA cooperative competition module with a significance probability value related to the functionality of the target disease type less than 0.05 as a ncRNA-ncRNA cooperative competition module corresponding to the target disease type according to the ncRNAs related to the prior target disease type and a hyper-geometric distribution inspection algorithm; 6) for each ncRNA-ncRNA cooperative competition module, calculating a risk value of each target disease type matching sample by using a multivariate Cox model; dividing the target disease type matching sample into a high risk sample set and a low risk sample set according to the risk value of the target disease type matching sample; from a high risk sample set and a low risk sample setCalculating a risk value; calculating significance probability values of the difference of the survival times of the high-risk sample set and the low-risk sample set according to a logarithmic rank test algorithm to obtain logarithmic rank test significance probability values; and determining the ncRNA-ncRNA cooperative competition module with the risk value of more than 1 and the significance probability value of the logarithmic rank test of less than 0.05 as the biomarker of the target disease type.

In a third aspect, an embodiment of the present invention provides an ncRNA cooperative competition network identification device, including: the network identification device comprises a processor, a storage medium and a bus, wherein the storage medium stores machine-readable instructions executable by the processor, when the ncRNA cooperates with the competition network identification device to operate, the processor and the storage medium communicate through the bus, and the processor executes the machine-readable instructions to execute the method of the first aspect.

In a fourth aspect, the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to perform the method according to the first aspect.

The invention has the beneficial effects that:

according to the invention, by acquiring ncRNA and target gene mRNA expression profile data of a target disease type matching sample, and according to the ncRNA and mRNA expression profile data and preset ncRNA-mRNA competition relationship data, determining ncRNA-ncRNA pairs consisting of two ncRNAs meeting preset conditions in the ncRNA and mRNA expression profile data as a cooperative competition relationship pair, recognition of the ncRNA-ncRNA cooperative competition relationship pair can be realized, and a ncRNA cooperative competition network formed by multiple ncRNA cooperations and mRNA target gene competition can be recognized, so that reference can be provided for clinical diagnosis and targeted treatment of human complex diseases such as cancer.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

Fig. 1 is a schematic flow chart of a ncRNA cooperative competition network identification method according to an embodiment of the present invention;

FIG. 2 is another schematic flow chart of the ncRNA cooperative competition network identification method provided in the embodiment of the present invention;

FIG. 3 is a schematic flow chart of a ncRNA cooperative competition network identification method according to an embodiment of the present invention;

FIG. 4 is a schematic flow chart of a ncRNA cooperative competition network identification method according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of an ncRNA cooperative competition network identification apparatus according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of an identification module according to an embodiment of the present invention;

fig. 7 is another schematic structural diagram of a ncRNA cooperative competition network identification apparatus according to an embodiment of the present invention;

fig. 8 is a schematic structural diagram of a ncRNA cooperative competition network recognition apparatus according to an embodiment of the present invention;

fig. 9 is a schematic structural diagram of a ncRNA cooperative competition network recognition apparatus according to an embodiment of the present invention;

fig. 10 is a schematic structural diagram of a ncRNA cooperative competition network identification device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.

Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that: like reference numbers and letters denote like items in the following figures and formulas, and thus, once an item is defined in one figure or formula, it need not be further defined and explained in subsequent figures or formulas. It should also be noted that the descriptions of first, second, third, etc. are merely used for distinguishing and are not intended to indicate relative importance.

Some embodiments of the invention are described in detail below with reference to the accompanying drawings. The embodiments described below and the features of the embodiments can be combined with each other without conflict.

The embodiment of the invention provides a ncRNA cooperative competition network identification method, wherein an execution main body of the ncRNA cooperative competition network identification method can be a terminal device with computing capacity, for example: a desktop computer, a notebook computer, a server, a cloud, a customized terminal or an intelligent terminal, etc., which are not limited herein.

Fig. 1 is a schematic flow diagram of a ncRNA cooperative competition network identification method according to an embodiment of the present invention, and as shown in fig. 1, the ncRNA cooperative competition network identification method may include:

and S110, acquiring ncRNA and mRNA expression profile data of the target disease type matching sample.

Wherein the target disease type may include any one of: glioblastoma multiforme, squamous cell carcinoma of the lung, ovarian cancer, and prostate cancer, the present invention is not particularly limited as to the type of disease targeted.

Taking the target disease type as Glioblastoma multiforme (GBM) and ncRNA as lncRNA as an example, the ncRNA and mRNA expression profile data of the target disease type matching sample can be obtained as follows: lncRNA and mRNA expression profiling data of glioblastoma multiforme matched samples were collected from The international famous Cancer gene expression profiling database Cancer genome map (The Cancer genoatlas, TCGA). Wherein, the address of TCGA is https:// cancerrgeneme.

And S120, determining that the ncRNA-ncRNA pair formed by two ncRNAs meeting preset conditions in the ncRNA and mRNA expression profile data is a cooperative competition relationship pair according to the ncRNA and mRNA expression profile data and preset ncRNA-mRNA competition relationship data.

In an optional embodiment, before determining that ncRNA-ncRNA pairs composed of two ncrnas satisfying a preset condition in the ncRNA and mRNA expression profile data are paired into a cooperative competition relationship pair according to the ncRNA and mRNA expression profile data and preset ncRNA-mRNA competition relationship data, the method may further include: and acquiring prior ncRNA-mRNA competitive network data associated with the ncRNA and mRNA expression profile data of the target disease type matching sample by fusing a plurality of different databases, thereby obtaining the ncRNA-mRNA competitive relationship data of the ncRNA-mRNA competitive relationship.

For example, a plurality of different databases can be fused in advance to obtain prior ncRNA-mRNA competition network data associated with ncRNA and mRNA expression profile data of a target disease type matching sample, so as to obtain prior ncRNA-mRNA competition relationship.

The ncRNA-mRNA competition relationship refers to the competition relationship between ncRNA and mRNA of shared MREs, and the prior ncRNA-mRNA competition network data can be computer prediction type or experimental verification type data. The competing network data used may originate from a single database, or merge multiple different databases.

Also taking the aforementioned glioblastoma multiforme as an example, the lncRNA-mRNA competition relationship pairs associated with the glioblastoma multiforme expression profile data can be obtained by integrating the four databases of miRSponge, LncCeRBase, LncACTdb v2.0 and ENCORI. Lncmnas associated with glioblastoma multiforme can also be collected from three databases lncrnodisease v2.0, Lnc2Cancer v2.0 and MNDR v 2.0.

Fig. 2 is another schematic flow chart of the ncRNA cooperative competition network identification method provided in the embodiment of the present invention, and optionally, as shown in fig. 2, the step S120 may specifically include:

s121, obtaining ncRNA and ncRNA in mRNA expression profile data₁And ncRNA₂Composed ncRNA₁-ncRNA₂And (6) pairing.

Wherein the ncRNA₁-ncRNA₂Alignment of ncRNA₁And ncRNA₂Are used to represent two ncRNAs of the same type. Alternatively, the type of ncRNA can be any of long non-coding RNA (incrna), circular RNA (circrna), or pseudogene transcript (also known as pseudogene).

Taking ncRNA cooperative competition involving lncRNA, circRNA, and pseudogene as an example, the cooperative competition mode may specifically include the following six types: pseudo-ogen-pseudo-ogen, pseudo-ogen-circRNA, pseudo-ogen-lncRNA, circRNA-circRNA, circRNA-lncRNA, and lncRNA-lncRNA.

S122, calculating the ncRNA according to preset ncRNA-mRNA competition relation data₁-ncRNA₂The corresponding pair of cooperative competition mRNA statistical significance probability value, positive correlation significance probability value and sensitivity partial correlation coefficient value.

S123, rhonRNA₁-ncRNA₂When the pairing simultaneously satisfies that the statistical significance probability value of the cooperative competition mRNA is smaller than a first threshold value, the positive correlation significance probability value is smaller than a second threshold value, and the sensitivity partial correlation coefficient value is larger than a third threshold value, determining the ncRNA₁-ncRNA₂The pairing is a cooperative competition relationship pair.

In an alternative embodiment, the first threshold may be 0.05, the second threshold may be 0.05, the third threshold may be 0.1, and if the statistical significance probability value of the cooperative competition mRNA is less than 0.05, the positive correlation significance probability value is less than 0.05, and the sensitivity partial correlation coefficient value is greater than 0.1, it may be determined that the ncRNA is present₁-ncRNA₂Pairing as a cooperative competitive relationshipAnd (4) carrying out pairing.

Alternatively, the second threshold may be the same as the first threshold, or may be different from the first threshold. It should be noted that the first threshold, the second threshold and the third threshold are only exemplary in the embodiment of the present invention, and in order to improve the accuracy of ncRNA cooperative competition network identification, a person skilled in the art may set specific values of the first threshold, the second threshold and the third threshold to other values according to actual needs, for example: the second threshold may also be 0.01, 0.001, etc., and the invention is not limited in this regard.

As described above, according to the embodiment of the present invention, by obtaining ncRNA and target gene mRNA expression profile data of a target disease type matching sample, and determining ncRNA-ncRNA pairs composed of two ncrnas satisfying a preset condition in the ncRNA and mRNA expression profile data as a cooperative competition relationship pair according to the ncRNA and mRNA expression profile data and preset ncRNA-mRNA competition relationship data, it is possible to identify a ncRNA cooperative competition network formed by a plurality of ncrnas in cooperation with the target gene mRNA, and further provide reference for clinical diagnosis and targeted therapy of human complex diseases such as cancer.

In an alternative embodiment, the ncRNA is calculated in step S122 above₁-ncRNA₂The step of pairing corresponding statistical significance values may comprise: measuring the ncRNA by adopting a hyper-geometric distribution test algorithm according to preset ncRNA-mRNA competitive relation data₁-ncRNA₂Alignment of ncRNA₁And ncRNA₂The probability value of statistical significance of the cooperative competition mRNA.

The statistical significance probability value calculation formula may be as follows:

wherein p-value represents ncRNA₁And ncRNA₂Statistical significance probability values of the interoperable competitive mRNAs; n is a radical of₁Represents the number of all mRNAs in the data set, M₁And K₁Are respectively provided withRepresents ncRNA₁And ncRNA₂Number of competing mRNAs, L₁(the value is usually not less than 3) represents ncRNA₁And ncRNA₂Number of synergistically competing mRNAs.

Fig. 3 is a schematic flow chart of a ncRNA cooperative competition network identification method according to an embodiment of the present invention.

As shown in FIG. 3, in an alternative embodiment, the ncRNA is calculated in step S122 above₁-ncRNA₂The step of matching corresponding positive correlation significance probability values may comprise:

s1221, calculating ncRNA₁-ncRNA₂Alignment of ncRNA₁And ncRNA₂Pearson correlation coefficient therebetween.

Each ncRNA₁-ncRNA₂Alignment of ncRNA₁And ncRNA₂Pearson Correlation (PC) values between them were calculated as follows:

wherein,

representation of ncRNA₁And ncRNA₂Pearson correlation coefficient therebetween; variable x ═ x₁,x₂,...,x_s) And y ═ y₁,y₂,...,y_s) Respectively represent ncRNA₁And ncRNA₂The amount of gene expression of (a),

and

respectively representing the mean expression quantity of variables x and y, and s is the number of matched samples.

And S1222, calculating and obtaining a positive correlation significance probability value according to the Pearson correlation coefficient.

When in use

When greater than 0, the positive correlation significance probability value is calculated as follows (not calculated when not greater than 0):

p-value＝2pt(t-value)；

wherein,

representation of ncRNA₁And ncRNA₂Pearson correlation coefficient therebetween; the pt () function is used to calculate the probability p value corresponding to the t-value, i.e., in the formula, p-value represents the positive correlation significance probability value, and s is the number of matched samples.

In an alternative embodiment, the ncRNA is calculated in step S122 above₁-ncRNA₂The step of matching the corresponding sensitivity partial correlation coefficient values may comprise: according to ncRNA₁-ncRNA₂Alignment of ncRNA₁And ncRNA₂Correlation coefficient values between and corresponding ncRNA under mRNA conditions₁-ncRNA₂Alignment of ncRNA₁And ncRNA₂And calculating to obtain the sensitivity partial correlation coefficient value.

Specifically, in order to calculate a Sensitivity Partial Pearson Correlation (SPPC) value, a condition variable mRNA is considered. Considering ncRNA₁And ncRNA₂Under the precondition of a synergistically competing mRNA, ncRNA₁And ncRNA₂Values of cross-sensitivity partial correlation coefficients

The definition is as follows:

wherein,

is ncRNA₁And ncRNA₂The partial correlation coefficient value between the ncRNAs under the precondition of considering the cooperative competition of the mRNAs₁And ncRNA₂The partial correlation coefficient value between. Hypothetical ncRNA₁-ncRNA₂Synergistically competing for m (the value of which is usually not less than 3) mrnas, and is represented by Z ═ Z (Z)₁,Z₂,...,Z_m) Then the partial correlation coefficient value

The calculation is as follows:

wherein x is (x)₁,x₂,...,x_s)，y＝(y₁,y₂,...,y_s)，Z_i＝(z_i,1,z_i,2,...,z_i,s)(i∈[1,2,...,m])，cor(x,y|(Z₁,Z₂,...,Z_m) Is represented by (Z)₁,Z₂,...,Z_m) The value of the partial correlation coefficient between x and y under the condition, cor (x, y | (Z)₁,Z₂,...,Z_m-1) Is represented by (Z)₁,Z₂,...,Z_m-1) Value of partial correlation coefficient between x and y under the condition cor (x, Z)_m|(Z₁,Z₂,...,Z_m-1) Is represented by (Z)₁,Z₂,...,Z_m) Under the condition of x and Z_mValue of partial correlation coefficient between, cor (y, Z)_m|(Z₁,Z₂,...,Z_m-1) Is represented by (Z)₁,Z₂,...,Z_m) Under the condition of y and Z_mThe partial correlation coefficient value between.

The ncRNA calculated based on the above step S122₁-ncRNA₂The matched corresponding cooperative competition mRNA statistical significance probability value, positive correlation significance probability value and sensitivity partial correlation coefficient value can be determined through the step S123₁-ncRNA₂And whether the pairing is a cooperative competition relationship pair or not is judged, and the ncRNA cooperative competition network identification is further realized.

In an optional embodiment, before determining that ncRNA-ncRNA pairs composed of two ncrnas satisfying a preset condition in the ncRNA and mRNA expression profile data are a cooperative competition relationship pair according to the ncRNA and mRNA expression profile data and the preset ncRNA-mRNA competition relationship data, the method may further include: and preprocessing the ncRNA and mRNA expression profile data, and removing repeated items in the ncRNA and mRNA expression profile data and ncRNA and mRNA without gene names.

Accordingly, the ncRNA and the ncRNA in the mRNA expression profile data obtained in step S121 are obtained₁And ncRNA₂Composed ncRNA₁-ncRNA₂When in matching, the ncRNA and mRNA expression profile data of the sample can be matched according to the preprocessed target disease type to obtain the ncRNA₁-ncRNA₂And (6) pairing.

Optionally, the ncRNA cooperative competition network identification method may further include:

pairing ncRNAs identified as cooperative competition pairs in the following manner₁-ncRNA₂And (3) evaluating the ncRNA-ncRNA cooperative competition network formed by pairing:

1) fitting whether the connectivity of the ncRNA-ncRNA cooperative competition network complies with power law distribution to determine whether the ncRNA-ncRNA cooperative competition network belongs to a scale-free network (network topology analysis):

previous studies have shown that: true biomolecular networks tend to be scale-free networks. In a scaleless biomolecular network, most molecules are connected together through a few hub molecules, which means that the positions of the molecules in the biomolecular network are not equal, and the hub molecules play a key role in connecting the integrity of the biomolecular network.

The scale-free network refers to that the distribution of connectivity in the network is subjected to power law distribution, and the distribution form is expressed as y-bx^a(x is the connectivity, y is the frequency of occurrence of the connectivity, and a and b are parameters). To assess whether the identified ncRNA-ncRNA cooperative competition network belongs to a scale-free network, a ncRNA-ncRNA cooperative competition can be fitWhether the connectivity of the contention network follows a power law distribution. The goodness-of-fit test statistic R is provided for the goodness-of-fit²Measured by the amount of R²A closer to 1 indicates a closer to the power law distribution.

2) Determining the nodes with the connectivity higher than the first 10% in the ncRNA-ncRNA cooperative competition network as pivot nodes (identifying pivot ncRNAs):

the pivotal ncRNAs play a key role in connecting ncRNA-ncRNA to compete for network integrity cooperatively, and the pivotal ncRNAs can be used as biomarkers (biomarkers) to provide references for clinical diagnosis and targeted therapy of human complex diseases such as cancer. Typically, the nodes with high connectivity (the first 10%) are considered to be hub nodes. In this example, the top 10% high connectivity ncRNAs were considered pivotal ncRNAs.

3) Determining ncRNAs for which both ncRNAs are associated with a target disease type₁-ncRNA₂Pairing to obtain ncRNA-ncRNA cooperative competition pairs corresponding to the target disease types (ncRNA-ncRNA cooperative competition pairs for identifying the target disease types):

and extracting ncRNA-ncRNA cooperative competition pairs related to the target disease type based on the ncRNAs of the target disease type. For each ncRNA-ncRNA cooperative competition pair, the ncRNA-ncRNA cooperative competition pair associated with the target disease type is considered to be a ncRNA-ncRNA cooperative competition pair if and only if both ncRNAs in the cooperative competition pair are associated with the target disease type.

4) Based on the ncRNA-ncRNA cooperative competition network, utilizing a Markov clustering algorithm to identify a ncRNA-ncRNA cooperative competition module (identify the lncRNA-lncRNA cooperative competition module):

and identifying the ncRNA-ncRNA cooperative competition module by utilizing Markov Clustering Algorithm (MCL) based on the ncRNA-ncRNA cooperative competition network. In each ncRNA-ncRNA cooperative competition module, the number of ncRNAs is at least 3.

5) Determining a ncRNA-ncRNA cooperative competition module with a significance probability value related to the functionality of the target disease type less than 0.05 as a ncRNA-ncRNA cooperative competition module corresponding to the target disease type (ncRNA-ncRNA cooperative competition module enrichment analysis) according to the ncRNAs related to the prior target disease type and a hyper-geometric distribution test algorithm:

based on the ncRNAs associated with the prior target disease types, each ncRNA-ncRNA co-competition module was tested for functional association with the target disease type using the hyper-geometric distribution test (hyper-geometric distribution test), as follows:

wherein p-value represents a significance probability value related to the target disease type functionality; n is a radical of₂Representing the number of ncRNAs in the data set, M₂Number of ncRNAs representing association of target disease type in data set, K₂The number of ncRNAs in the ncRNA-ncRNA cooperative competition module, L₂And (3) representing the number of ncRNAs related to the target disease type in the ncRNA-ncRNA cooperative competition module. p-value is less than 0.05, the ncRNA-ncRNA cooperative competition module is identified as the target disease type association module.

6) For each ncRNA-ncRNA cooperative competition module, calculating a risk value of each target disease type matching sample by using a multivariate Cox model; dividing the target disease type matching sample into a high risk sample set and a low risk sample set according to the risk value of the target disease type matching sample; calculating a risk value according to the high risk sample set and the low risk sample set; calculating significance probability values of the difference of the survival times of the high-risk sample set and the low-risk sample set according to a logarithmic rank test algorithm to obtain logarithmic rank test significance probability values; and determining the ncRNA-ncRNA cooperative competition module with the risk value of more than 1 and the significance probability value of the logarithmic rank test of less than 0.05 as the biomarker of the target disease type (ncRNA-ncRNA cooperative competition module survival analysis):

for each ncRNA-ncRNA cooperative competition module, a multivariate Cox model is applied to calculate the risk value of each target disease type sample, and the calculation is as follows:

h(t,R)＝h₀(t)exp(β'R)＝h₀(t)exp(β₁R₁+β₂R₂+...+β_kR_k)

where h (t, R) is the risk function value at time t for glioblastoma multiforme samples with a covariate R, t is the survival time, and R ═ R (R)₁,R₂,...,R_k) ' is ncRNAs which may influence survival time, h₀(t) is the risk function value β ═ β (β) when all covariates are taken to be 0₁,β₂,...,β_k) ' is the regression coefficient of the Cox model.

And according to the risk function value h (t, R) of each sample, equally dividing the target disease type sample into two groups of high-risk samples and low-risk samples. The risk value (HR) for the high risk and low risk groups of samples for the target disease type was calculated as follows:

HR＝h(t,R_h)/h(t,R_l)＝exp[β(R_h-R_l)]

wherein, h (t, R)_h) Risk function values for high risk groups for the target disease type, h (t, R)_l) The risk function value for the low risk group for the disease type of interest,

are high risk ncRNAs that may affect survival time,

are low risk ncRNAs that may affect survival time, the threshold for HR may be set to 1.

Further, a Log-rank test (Log-rank) can be used to compare whether the survival time of the two groups of samples with high risk and low risk of the target disease type is the same, and the test statistic is chi²The calculation is as follows:

wherein A is the number of death cases of the observed target disease type, and T is the number of death cases of the theoretical target disease type. Calculated χ²The larger the value, the smaller the difference significance p value, which indicates the survival of the two groups of samples with high risk and low risk of the target disease typeThe less likely the time is to be the same. HR value greater than 1 and log rank significance p value less than 0.05, ncRNA-ncRNA co-competition module was identified as target disease type module biomarker.

Based on the foregoing embodiments, the embodiment of the present invention further provides a method for identifying a ncRNA cooperative competition network, and fig. 4 is a schematic flow chart of the method for identifying a ncRNA cooperative competition network provided in the embodiment of the present invention.

As shown in fig. 4, the ncRNA cooperative competition network identification method may include:

s401, acquiring ncRNA and mRNA expression profile data of the target disease type matching sample.

S402, preprocessing the ncRNA and mRNA expression profile data, and removing repeated items in the ncRNA and mRNA expression profile data and ncRNA and mRNA without gene names.

S403, obtaining ncRNA and ncRNA in mRNA expression profile data₁And ncRNA₂Composed ncRNA₁-ncRNA₂And (6) pairing.

S404, measuring the ncRNA by adopting a hyper-geometric distribution test algorithm according to preset ncRNA-mRNA competition relation data₁-ncRNA₂Alignment of ncRNA₁And ncRNA₂The probability value of statistical significance of the cooperative competition mRNA.

S405, calculating ncRNA₁-ncRNA₂Alignment of ncRNA₁And ncRNA₂Pearson correlation coefficient therebetween.

And S406, calculating and obtaining a positive correlation significance probability value according to the Pearson correlation coefficient.

S407, based on ncRNA₁-ncRNA₂Alignment of ncRNA₁And ncRNA₂Correlation coefficient values between and corresponding ncRNA under mRNA conditions₁And ncRNA₂And calculating to obtain the sensitivity partial correlation coefficient value.

S408, judging whether the statistical significance probability value of the cooperative competition mRNA is less than 0.05, the positive correlation significance probability value is less than 0.05 and the sensitivity partial correlation coefficient value is more than 0.1.

If so, then executeA step S409; if not, finishing or continuously obtaining new ncRNA₁-ncRNA₂The pairing performs the above-described process (not shown in the figure).

S409, determining ncRNA₁-ncRNA₂The pairing is a cooperative competition relationship pair.

In order to make the technical solution recorded in the ncRNA cooperative competition network identification method provided in the embodiment of the present invention clearer, the present invention now describes the steps of the ncRNA cooperative competition network identification method through the following specific embodiments:

example 1

Taking the lncRNA-lncRNA cooperative competition in glioblastoma multiforme as an example, the method for identifying lncRNA-lncRNA cooperative competition network in this embodiment is implemented by the following steps:

step 1: data source acquisition

IncRNA and mRNA expression profile data of glioblastoma multiforme (GBM) matched samples were collected from the international famous cancer gene expression profile database TCGA (the cancer gene atlas, https:// cancer gene. nih. gov /). 9704 lncRNAs and 18282 mRNAs expression profile data of 451 breast cancer matched samples and sample clinical information are finally obtained through pretreatment (removing repeated items and lncRNAs and mRNAs without gene names). In this example, the ncRNA is lncRNA,

the prior lncRNA-mRNA competitive network data is obtained by fusing a plurality of different databases. Specifically, by integrating the four databases of miRSponge, LncCerbase, LncACTdb v2.0 and ENCORI. Finally, 10099 lncRNA-mRNA competition pairs associated with the glioblastoma multiforme expression profile data were obtained. 166 lncRNAscope associated glioblastoma multiforme lncRNAscope v2.0, Lnc2Cancer v2.0 and MNDR v2.0 databases were also collected.

Step 2: identification of lncRNA-lncRNA cooperative competition network

In the lncRNA-lncRNA cooperative competition network, each lncRNA-lncRNA cooperative competition pair must satisfy the condition: the significance probability p value of the cooperative competition mRNAs is less than 0.05, the positive correlation significance probability p value is less than 0.05, and the sensitivity partial correlation coefficient value

And step 3: evaluation of lncRNA-lncRNA cooperative competition network

The identified lncRNA-lncRNA cooperative competition network can then be evaluated from the following six aspects:

1) network topology analysis

The scale-free network refers to that the distribution of connectivity in the network is subjected to power law distribution, and the distribution form is expressed as y-bx^a(x is the connectivity, y is the frequency of occurrence of the connectivity, and a and b are parameters). To evaluate whether the identified lncRNA-lncRNA cooperative competition network belongs to a scale-free network, whether the connectivity of the lncRNA-lncRNA cooperative competition network obeys power law distribution can be fitted. The goodness-of-fit test statistic R is provided for the goodness-of-fit²Measured by the amount of R²A closer to 1 indicates a closer to the power law distribution.

2) Identifying hinge lncRNAs

The junction lncRNAs play a key role in connecting lncRNA-lncRNA to compete for network integrity cooperatively, and the junction lncRNAs can be used as biomarkers (biomarkers) to provide references for clinical diagnosis and targeted therapy of human complex diseases such as cancer. Typically, the nodes with high connectivity (the first 10%) are considered to be hub nodes. In this example, lncRNAs with high first 10% connectivity are considered pivotal lncRNAs.

3) lncRNA-lncRNA cooperative competition pair for recognizing glioblastoma multiforme association

Based on glioblastoma multiforme lncRNAs, extracting glioblastoma multiforme associated lncRNA-lncRNA cooperative competition pairs. For each lncRNA-lncRNA cooperative competition pair, the lncRNA-lncRNA cooperative competition pair is considered to be a glioblastoma multiforme-associated lncRNA-lncRNA cooperative competition pair if and only if both lncrnas in the cooperative competition pair are associated with glioblastoma multiforme.

4) Module for identifying lncRNA-lncRNA cooperative competition

And identifying the lncRNA-lncRNA cooperative competition module by utilizing a Markov Clustering Algorithm (MCL) based on the lncRNA-lncRNA cooperative competition network. In each lncRNA-lncRNA cooperative competition module, the number of lncRNA is at least 3.

5) lncRNA-lncRNA cooperative competition module enrichment analysis

Based on the prior glioblastoma multiforme-associated lncRNAs, a hyper-geometric distribution test (hyper-geometric distribution test) was used to test whether each lncRNA-lncRNA co-competition module is functionally associated with glioblastoma multiforme, calculated as follows:

wherein N is₂Representing the number of lncRNAs in the data set, M₂Representing the number of glioblastoma multiforme-associated lncRNAs in the data set, K₂The number of lncRNAs in lncRNA-lncRNA cooperative competition module, L₂Indicating the number of lncRNAs associated with glioblastoma multiforme in lncRNA-lncRNA cooperative competition module.

In this example, the significance probability p is less than 0.05, and the lncRNA-lncRNA cooperative competition module is determined as the glioblastoma multiforme association module.

6) lncRNA-lncRNA cooperative competition module survival analysis

For each lncRNA-lncRNA co-competition module, a multivariate Cox model was applied to calculate the risk value for each glioblastoma multiforme sample, as follows:

h(t,R)＝h₀(t)exp(β'R)＝h₀(t)exp(β₁R₁+β₂R₂+...+β_kR_k)

where h (t, R) is the risk function value at time t for glioblastoma multiforme samples with a covariate R, t is the survival time, and R ═ R (R)₁,R₂,...,R_k) ' is lncRNAs which may influence the survival time, h₀(t) is the risk function value β ═ β (β) when all covariates are taken to be 0₁,β₂,...,β_k) ' is the regression coefficient of the Cox model.

Based on the risk function value h (t, R) of each sample, the 451 glioblastoma multiforme samples were equally divided into two groups of high-risk and low-risk samples. The risk value (HR) of the high-risk and low-risk groups of glioblastoma multiforme samples was calculated as follows:

HR＝h(t,R_h)/h(t,R_l)＝exp[β(R_h-R_l)]

wherein, h (t, R)_h) The risk function value for the high risk group of glioblastoma multiforme, h (t, R)_l) The risk function value for the low risk group of glioblastoma multiforme,

are high risk lncRNAs that may affect survival time,

are low risk lncRNAs that may affect survival time, the threshold for HR in this example is set to 1.

Further, Log-rank test (Log-rank) can be used to compare the survival time of the high-risk and low-risk glioblastoma multiforme samples to be the same, with chi-square test statistic²The calculation is as follows:

wherein, A is the number of cases of glioblastoma multiforme death observed, and T is the number of cases of theoretical glioblastoma multiforme death. Calculated χ²The larger the value, the smaller the differential significance p-value, indicating that the survival time of the high-risk and low-risk groups of glioblastoma multiforme samples is unlikely to be the same. In this example, if the HR value is greater than 1 and the p-value of the log-rank test significance probability is less than 0.05, the lncRNA-lncRNA competition module is identified as the glioblastoma multiforme module biomarker.

Example 2

Taking the lncRNA-lncRNA cooperative competition corresponding to the lung squamous cell carcinoma as an example, the method for identifying the lncRNA-lncRNA cooperative competition network in the embodiment is implemented by the following steps:

in step 1 of this example, lncRNA and mRNA expression profiling data of Lung Squamous Cell Carcinoma (Lung Squamous Cell Carcinoma, LSCC) matched samples were collected from the International famous cancer gene expression profiling database TCGA (the cancer atlas, https:// cancer nih. gov /). Through pretreatment (removing repeated items and lncRNA and mRNA without gene names), 9704 lncRNAs and 18282 mRNAs expression profile data of 113 breast cancer matching samples and sample clinical information are finally obtained. In this example, the ncRNA is lncRNA,

the prior lncRNA-mRNA competition network data is the same as that in the embodiment 1, and 10099 lncRNA-mRNA competition relationship pairs related to the lung cancer expression profile data are finally obtained. 429 lncRNAs associated with lung Cancer can also be collected from the three databases LncRNADisease v2.0, Lnc2Cancer v2.0 and MNDR v 2.0.

Other steps are the same as embodiment 1 and are not described herein again.

Example 3

Taking lncRNA-lncRNA cooperative competition in ovarian cancer as an example, the method for identifying lncRNA-lncRNA cooperative competition network in this embodiment is implemented by the following steps:

in step 1 of this example, lncRNA and mRNA expression profile data of Ovarian Cancer (Ovarian Cancer, OvCa) matched samples were collected from the International famous Cancer Gene expression Profile database TCGA (the Cancer gene atlas, https:// Cancer gene nih. gov /). 9704 lncRNAs and 18282 mRNAs expression profile data of 585 ovarian cancer matched samples and sample clinical information were finally obtained by pretreatment (removing duplicate and lncRNAs and mRNAs without gene names). In this example, the ncRNA is lncRNA,

the prior lncRNA-mRNA competition network data is the same as that in example 1, and 10099 lncRNA-mRNA competition relationship pairs related to the ovarian cancer expression profile data are finally obtained. 140 lncRNAs associated with ovarian Cancer can also be collected from the three databases LncRNADisease v2.0, Lnc2Cancer v2.0 and MNDR v 2.0.

Other steps are the same as embodiment 1 and are not described herein again.

Example 4

Taking lncRNA-lncRNA cooperative competition corresponding to prostate cancer as an example, the method for identifying lncRNA-lncRNA cooperative competition network in this embodiment is implemented by the following steps:

in step 1 of this example, lncRNA and mRNA expression profiling data of Prostate Cancer (PrCa) matched samples were collected from the Memorial Sloan-Kettering Cancer Center MSKCC (Memorial Sloan-Kettering Cancer Center, https:// www.mskcc.org /). Through pretreatment (removing repeated items and lncRNA and mRNA without gene names), 9704 lncRNAs and 18282 mRNAs expression profile data of 150 ovarian cancer matched samples and sample clinical information are finally obtained. In this example, the ncRNA is lncRNA,

the prior lncRNA-mRNA competition network data is the same as that in example 1, and 10099 lncRNA-mRNA competition relationship pairs related to the ovarian cancer expression profile data are finally obtained. 141 lncRNAs associated with ovarian Cancer can also be collected from the three databases lncrnodisease v2.0, Lnc2Cancer v2.0 and MNDR v 2.0.

Other steps are the same as embodiment 1, and are not repeated herein.

Based on the foregoing examples 1-4, the evaluation of the incrna-incrna cooperative competition network recognition results can be shown in the following tables 1-6. Wherein, table 1 is the lncRNA-lncRNA cooperative competition network topology analysis mined in examples 1-4; table 2 shows hinge lncRNAs excavated in examples 1-4; table 3 shows the disease-associated lncRNA-lncRNA cooperative competition relationships mined in examples 1 to 4; table 4 shows the lncRNA-lncRNA cooperative competition module mined in examples 1 to 4; table 5 shows the disease enrichment-associated lncRNA-lncRNA competition module in examples 1-4; table 6 shows the lncRNA-lncRNA competition module used as the biomarker in examples 1 to 4.

TABLE 1 lncRNA-lncRNA Competition network topology analysis mined in examples 1-4

Table 2 hinge lncRNAs excavated in examples 1-4

TABLE 3 disease-associated lncRNA-lncRNA concerted competition relationships explored in examples 1-4

TABLE 4 lncRNA-lncRNA Competition Module mined in examples 1-4

TABLE 5 lncRNA-lncRNA Competition Module associated with disease enrichment in examples 1-4

TABLE 6 lncRNA-lncRNA Competition Module serving as biomarker in examples 1-4

As shown in Table 1, examples 1-4 the lncRNA-lncRNA cooperative competition network mined in the four data sets (GBM, LSCC, OvCa and PrCa) substantially conforms to the scale-free network characteristics (goodness of fit test statistic R) of the real biomolecule network²Both greater than 0.69). A portion of the pivotal lncRNAs and lncRNA-lncRNA synergistic competition relationship are associated with diseases (GBM, LSCC, OvCa and PrCa) (as shown in tables 2 and 3). Of the mined lncRNA-lncRNA cooperative competition modules (see table 4), most of the lncRNA-lncRNA cooperative competition modules were associated with significant enrichment of diseases (see table 5) and served as biomarkers (see table 6). The method provided by the invention has basically consistent results in the four data sets, and can robustly identify the lncRNA-lncRNA cooperative competition network.

In conclusion, the ncRNA cooperative competition network identification method provided by the invention can effectively excavate the cooperative competition relationship between the ncRNAs, and the identified ncRNA cooperative competition network basically conforms to the scale-free network characteristics of the real biomolecular network. Based on the identified ncRNA cooperative competition network, the disease associated pivot ncRNAs, the disease associated ncRNA cooperative competition network and module, and the disease biomarker can be further identified. Especially, in the application of the complex disease gene expression profile data set, the method provides technical support and understanding means for clinical diagnosis and targeted therapy of human complex diseases such as cancer and the like, and has important biological significance.

Based on the ncRNA cooperative competition network identification method provided by the method embodiment, the embodiment of the invention also correspondingly provides an ncRNA cooperative competition network identification device. Fig. 5 is a schematic structural diagram of an ncRNA cooperative competition network identification apparatus according to an embodiment of the present invention, and as shown in fig. 5, the ncRNA cooperative competition network identification apparatus may include: the acquisition module 10 is used for acquiring ncRNA of a target disease type matching sample and target gene mRNA expression profile data; and the identification module 20 is configured to determine that ncRNA-ncRNA pairs composed of two ncrnas which meet a preset condition in the ncRNA and mRNA expression profile data are paired into a cooperative competition relationship pair according to the ncRNA and mRNA expression profile data and preset ncRNA-mRNA competition relationship data.

Fig. 6 is a schematic structural diagram of an identification module according to an embodiment of the present invention.

As shown in fig. 6, in an alternative embodiment, the identification module 20 includes: an acquisition submodule 21 for acquiring ncRNA and ncRNA in mRNA expression profile data₁And ncRNA₂Composed ncRNA₁-ncRNA₂Pairing; a calculation submodule 22 for calculating the ncRNA according to the preset ncRNA-mRNA competition relation data₁-ncRNA₂Matching corresponding cooperative competition mRNA statistical significance probability value, positive correlation significance probability value and sensitivity partial correlation coefficient value; recognition submodule 23 for rogowski RNA₁-ncRNA₂The pairing simultaneously satisfies that the statistical significance probability value of the cooperative competition mRNA is less than a first threshold value, the positive correlation significance probability value is less than a second threshold value, and the sensitivity is partially correlatedIf the coefficient value is greater than the third threshold value, determining the ncRNA₁-ncRNA₂The pairing is a cooperative competition relationship pair.

In an alternative embodiment, the calculation submodule 22 is specifically configured to measure the ncRNA by using the hyper-geometric distribution test algorithm according to the preset ncRNA-mRNA competition relationship data₁-ncRNA₂Alignment of ncRNA₁And ncRNA₂The probability value of statistical significance of the cooperative competition mRNA.

In an alternative embodiment, the calculation submodule 22 is specifically adapted to calculate ncRNA₁-ncRNA₂Alignment of ncRNA₁And ncRNA₂Pearson correlation coefficient therebetween; and calculating to obtain a positive correlation significance probability value according to the Pearson correlation coefficient.

In an alternative embodiment, the calculation submodule 22 is specifically adapted to calculate the ncRNA₁-ncRNA₂Alignment of ncRNA₁And ncRNA₂Correlation coefficient values between and corresponding ncRNA under mRNA conditions₁-ncRNA₂Alignment of ncRNA₁And ncRNA₂And calculating to obtain the sensitivity partial correlation coefficient value.

In an alternative embodiment, the identifier module 23 is specifically configured to determine the ncRNA if the co-competition mRNA statistical significance probability value of less than 0.05, the positive correlation significance probability value of less than 0.05, and the sensitivity partial correlation coefficient value of greater than 0.1 are simultaneously satisfied₁-ncRNA₂The pairing is a cooperative competition relationship pair.

Fig. 7 is another schematic structural diagram of a ncRNA cooperative competition network identification apparatus according to an embodiment of the present invention.

As shown in fig. 7, in an alternative embodiment, the apparatus further comprises: and the competition data module 30 is configured to obtain prior ncRNA-mRNA competition network data associated with the ncRNA and mRNA expression profile data of the target disease type matching sample by fusing multiple different databases before the recognition module 20 determines that the ncRNA-ncRNA pair composed of two ncrnas meeting a preset condition in the ncRNA and mRNA expression profile data is a cooperative competition relationship pair according to the ncRNA and mRNA expression profile data and preset ncRNA-mRNA competition relationship data.

Fig. 8 is a schematic structural diagram of a ncRNA cooperative competition network recognition apparatus according to an embodiment of the present invention.

As shown in fig. 8, in an alternative embodiment, the apparatus further comprises: and the preprocessing module 40 is configured to preprocess the ncRNA and mRNA expression profile data before the recognition module 20 determines that the ncRNA-ncRNA pair composed of two ncrnas which meet a preset condition in the ncRNA and mRNA expression profile data is a cooperative competition relationship pair according to the ncRNA and mRNA expression profile data and the preset ncRNA-mRNA competition relationship data, so as to remove repeated items in the ncRNA and mRNA expression profile data and the ncRNA and mRNA without gene names.

Fig. 9 is a schematic structural diagram of a ncRNA cooperative competition network recognition apparatus according to an embodiment of the present invention.

As shown in fig. 9, in an alternative embodiment, the apparatus further comprises: an evaluation module 50 for pairing ncRNAs determined as cooperative competition pairs in the following manner₁-ncRNA₂And (3) evaluating the ncRNA-ncRNA cooperative competition network formed by pairing: 1) fitting whether the connectivity of the ncRNA-ncRNA cooperative competition network obeys power law distribution or not to determine whether the ncRNA-ncRNA cooperative competition network belongs to a scale-free network or not; 2) determining nodes with the connectivity higher than the first 10% in the ncRNA-ncRNA cooperative competition network as pivot nodes; 3) determining ncRNAs for which both ncRNAs are associated with a target disease type₁-ncRNA₂Pairing is a ncRNA-ncRNA cooperative competition pair corresponding to the target disease type; 4) identifying the ncRNA-ncRNA cooperative competition module by utilizing a Markov clustering algorithm based on the ncRNA-ncRNA cooperative competition network; 5) determining a ncRNA-ncRNA cooperative competition module with a significance probability value related to the functionality of the target disease type less than 0.05 as a ncRNA-ncRNA cooperative competition module corresponding to the target disease type according to the ncRNAs related to the prior target disease type and a hyper-geometric distribution inspection algorithm; 6) for each ncRNA-ncRNA cooperative competition module, calculating a risk value of each target disease type matching sample by using a multivariate Cox model; dividing the target disease type matching sample into a plurality of target disease type matching samples according to the risk value of the target disease type matching sampleA high risk sample set and a low risk sample set; calculating a risk value according to the high risk sample set and the low risk sample set; calculating significance probability values of the difference of the survival times of the high-risk sample set and the low-risk sample set according to a logarithmic rank test algorithm to obtain logarithmic rank test significance probability values; and determining the ncRNA-ncRNA cooperative competition module with the risk value of more than 1 and the significance probability value of the logarithmic rank test of less than 0.05 as the biomarker of the target disease type.

These above modules may be one or more integrated circuits configured to implement the above methods, such as: one or more Application Specific Integrated Circuits (ASICs), or one or more microprocessors (DSPs), or one or more Field Programmable Gate Arrays (FPGAs), among others. For another example, when one of the above modules is implemented in the form of a processing element scheduler code, the processing element may be a general-purpose processor, such as a Central Processing Unit (CPU) or other processor capable of calling program code. For another example, these modules may be integrated together and implemented in the form of a System-On-a-Chip (SOC).

The embodiment of the invention also provides ncRNA cooperative competition network identification equipment, which can be the desktop computer, the notebook computer, the server, the cloud, the customized terminal or the intelligent terminal and the like.

As shown in fig. 10, the ncRNA cooperative competition network identification device may include: the ncRNA cooperative competition network identification device comprises a processor 100, a storage medium 200 and a bus 300, wherein the storage medium 200 stores machine readable instructions executable by the processor 100, when the ncRNA cooperative competition network identification device runs, the processor 100 communicates with the storage medium 200 through the bus 300, and the processor 100 executes the machine readable instructions to execute the ncRNA cooperative competition network identification method in the foregoing method embodiment.

It is noted that a processor may include one or more processing cores (e.g., a single-core processor or a multi-core processor). Merely by way of example, a Processor may include a Central Processing Unit (CPU), an Application Specific Integrated Circuit (ASIC), an Application Specific Instruction Set Processor (ASIP), a Graphics Processing Unit (GPU), a Physical Processing Unit (PPU), a Digital Signal Processor (DSP), a Field Programmable Gate Array (FPGA), a Programmable Logic Device (PLD), a controller, a microcontroller Unit, a Reduced Instruction Set computer (Reduced Instruction Set computer), a microprocessor, or the like, or any combination thereof.

The storage medium may include: including mass storage, removable storage, volatile Read-and-write Memory, or Read-Only Memory (ROM), among others, or any combination thereof. By way of example, mass storage may include magnetic disks, optical disks, solid state drives, and the like; removable memory may include flash drives, floppy disks, optical disks, memory cards, zip disks, tapes, and the like; volatile read-write Memory may include Random Access Memory (RAM); the RAM may include Dynamic RAM (DRAM), Double data Rate Synchronous Dynamic RAM (DDR SDRAM); static RAM (SRAM), Thyristor-based Random Access Memory (T-RAM), Zero-capacitor RAM (Zero-RAM), and the like. By way of example, ROMs may include Mask ROMs (MROMs), Programmable ROMs (PROMs), Erasable Programmable ROMs (PERROMs), Electrically Erasable Programmable ROMs (EEPROMs), compact disk ROMs (CD-ROMs), digital versatile disks ROMs (ROMs), and the like.

For ease of illustration, only one processor is depicted in the ncRNA cooperative competition network identification apparatus. However, it should be noted that the ncRNA cooperative competition network identification apparatus in the present invention may also include multiple processors, and thus the steps performed by one processor described in the present invention may also be performed by multiple processors in combination or individually. For example, if the ncRNA co-competes with the processor of the network identification device to perform steps a and B, it should be understood that steps a and B can also be performed by two different processors together or separately in one processor. For example, a first processor performs step a and a second processor performs step B, or the first processor and the second processor perform steps a and B together.

Optionally, the present invention further provides a computer readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to perform the steps of the ncRNA cooperative competition network identification method described in the foregoing method embodiments.

In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.

The integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) or a Processor (Processor) to execute some steps of the methods according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. A method for identifying a ncRNA cooperative competition network is characterized by comprising the following steps:

acquiring ncRNA and mRNA expression profile data of a target disease type matching sample;

determining that ncRNA-ncRNA pairing consisting of two ncRNAs meeting preset conditions in the ncRNA and mRNA expression profile data is a cooperative competition relationship pair according to the ncRNA and mRNA expression profile data and preset ncRNA-mRNA competition relationship data, wherein the cooperative competition relationship pair comprises:

obtaining the ncRNA in the ncRNA and mRNA expression profile data₁And ncRNA₂Composed ncRNA₁-ncRNA₂Pairing;

calculating the ncRNA according to preset ncRNA-mRNA competition relation data₁-ncRNA₂Matching corresponding cooperative competition mRNA statistical significance probability value, positive correlation significance probability value and sensitivity partial correlation coefficient value;

if the ncRNA₁-ncRNA₂And when the pairing simultaneously meets the conditions that the statistical significance probability value of the cooperative competition mRNA is smaller than a first threshold value, the positive correlation significance probability value is smaller than a second threshold value, and the sensitivity partial correlation coefficient value is larger than a third threshold value, determining that the ncRNA₁-ncRNA₂The pairing is a cooperative competition relationship pair.

2. The method of claim 1, wherein said ncRNA is calculated₁-ncRNA₂Pairing corresponding cooperative competing mRNA statistical significance probability values, comprising:

measuring the ncRNA by adopting a hyper-geometric distribution test algorithm according to preset ncRNA-mRNA competitive relation data₁-ncRNA₂Alignment of ncRNA₁And ncRNA₂The probability value of statistical significance of the cooperative competition mRNA.

3. The method of claim 2, wherein said ncRNA is calculated₁-ncRNA₂Pairing corresponding positive correlation significance probability values comprising:

calculating the ncRNA₁-ncRNA₂Alignment of ncRNA₁And ncRNA₂Pearson correlation coefficient therebetween;

and calculating and obtaining the positive correlation significance probability value according to the Pearson correlation coefficient.

4. The method of claim 3, wherein said ncRNA is calculated₁-ncRNA₂Pairing corresponding sensitivity partial correlation coefficient values, comprising:

according to the ncRNA₁-ncRNA₂Alignment of ncRNA₁And ncRNA₂The correlation coefficient value between and the corresponding ncRNA under mRNA conditions₁-ncRNA₂Alignment of ncRNA₁And ncRNA₂And calculating to obtain the sensitivity partial correlation coefficient value.

5. The method of any of claims 2-4, wherein said ncRNA is present as a recombinant vector₁-ncRNA₂And when the pairing simultaneously meets the conditions that the statistical significance probability value of the cooperative competition mRNA is smaller than a first threshold value, the positive correlation significance probability value is smaller than a second threshold value, and the sensitivity partial correlation coefficient value is larger than a third threshold value, determining that the ncRNA₁-ncRNA₂Pairing is a cooperative competition relationship pair, including:

if the statistical significance probability value of the cooperative competition mRNA is less than 0.05, the positive correlation significance probability value is less than 0.05 and the sensitivity partial correlation coefficient value is more than 0.1 are simultaneously satisfied, determining that the ncRNA₁-ncRNA₂The pairing is a cooperative competition relationship pair.

6. The method of any one of claims 1 to 4, wherein before determining that ncRNA-ncRNA pairs consisting of two ncRNAs satisfying a predetermined condition in the ncRNA and mRNA expression profile data are in a cooperative competition relationship pair according to the ncRNA and mRNA expression profile data and the predetermined ncRNA-mRNA competition relationship data, the method further comprises:

and acquiring prior ncRNA-mRNA competitive network data associated with the ncRNA and mRNA expression profile data of the target disease type matching sample by fusing a plurality of different databases to obtain the ncRNA-mRNA competitive relationship data.

7. The method according to any one of claims 1-4, further comprising:

pairing said ncRNAs determined as cooperative competition pairs in the following manner₁-ncRNA₂And (3) evaluating the ncRNA-ncRNA cooperative competition network formed by pairing:

1) fitting whether the connectivity of the ncRNA-ncRNA cooperative competition network obeys power law distribution or not to determine whether the ncRNA-ncRNA cooperative competition network belongs to a scale-free network or not;

2) determining nodes with the connectivity higher than the first 10% in the ncRNA-ncRNA cooperative competition network as pivot nodes;

3) determining ncRNAs for which both ncRNAs are associated with the target disease type₁-ncRNA₂Pairing to obtain a ncRNA-ncRNA cooperative competition pair corresponding to the target disease type;

4) identifying the ncRNA-ncRNA cooperative competition module by utilizing a Markov clustering algorithm based on the ncRNA-ncRNA cooperative competition network;

5) determining a ncRNA-ncRNA cooperative competition module with a significance probability value related to the target disease type functionality smaller than 0.05 as a ncRNA-ncRNA cooperative competition module corresponding to the target disease type according to the ncRNAs related to the target disease type in a priori manner and a hyper-geometric distribution test algorithm;

6) for each ncRNA-ncRNA cooperative competition module, calculating a risk value of each target disease type matching sample by applying a multivariate Cox model; dividing the target disease type matching sample into a high risk sample set and a low risk sample set according to the risk value of the target disease type matching sample; calculating a risk value according to the high-risk sample set and the low-risk sample set; calculating the significance probability value of the difference of the survival times of the high-risk sample set and the low-risk sample set according to a logarithmic rank test algorithm to obtain a logarithmic rank test significance probability value; determining the ncRNA-ncRNA co-competition module with the risk value greater than 1 and the log-rank test significance probability value less than 0.05 as the biomarker for the target disease type.

8. The method of any one of claims 1 to 4, wherein said ncRNA comprises any one of: long non-coding RNA, circular RNA, and pseudogenes.

9. An ncRNA cooperative competition network recognition device, comprising:

the acquisition module is used for acquiring ncRNA and mRNA expression profile data of the target disease type matching sample;

and the recognition module is used for determining that the ncRNA-ncRNA pair formed by two ncRNAs meeting preset conditions in the ncRNA and mRNA expression profile data is a cooperative competition relationship pair according to the ncRNA and mRNA expression profile data and preset ncRNA-mRNA competition relationship data.