CN111198905B

CN111198905B - Visual analysis framework for understanding missing links in a two-way network

Info

Publication number: CN111198905B
Application number: CN201911126664.1A
Authority: CN
Inventors: 赵健; 弗朗辛·陈; P·邱
Original assignee: Fujifilm Business Innovation Corp
Current assignee: Fujifilm Business Innovation Corp
Priority date: 2018-11-19
Filing date: 2019-11-18
Publication date: 2024-02-13
Anticipated expiration: 2039-11-18
Also published as: CN111198905A; US11176460B2; JP7423998B2; US20200160188A1; JP2020098585A

Abstract

Visual analysis framework for understanding missing links in a bisection network. Example implementations described herein relate to an interface for computing and displaying missing links for data represented as a binary network, and novel methods for improving link prediction algorithms in the prior art. By example implementations described herein, the accuracy of the link prediction algorithm may be improved, providing a user with a more accurate understanding of data in a bipartite network.

Description

Visual analysis framework for understanding missing links in a two-way network

Technical Field

The present disclosure relates generally to data analysis, and more particularly, to determining and visualizing missing links in a bipartite network.

Background

Many real world complex systems can be modeled as a two-part network (dual-mode network) in which there are two types of nodes in the network and links exist only between different node types. Analysis of bipartite relationships has been used for data analysis in various application fields, such as researching political tendencies using a voting network based on roll-call voting records, and researching gene expression networks in bioinformatics.

One analytical problem for such networks is link prediction (e.g., detecting missing links) that infers that new relationships exist between nodes based on the currently observed links. Such link generation is valuable because real world data may be noisy or incomplete. But in general the output of the link prediction algorithm is simply a list containing the scores or probabilities of all predicted missing links, which is difficult to interpret and these results may be inaccurate.

Disclosure of Invention

In practice, analysts need to apply their domain knowledge to check the algorithm output. To solve the problems of the prior art, a generic visual analysis framework for detecting and checking missing links in a binary network is presented in the present disclosure. First, the framework provides a novel link prediction method for a binary network, which is an integrated (ensable) method that utilizes information of a binary group (biclicque) in the network. Second, by two most common network analysis methods based on metrics (e.g., compute node intermediaries) and motif (e.g., detect cliques), interactive visualizations are utilized to present detected missing links and enable a better understanding of the meaning and impact of missing links.

Furthermore, the prior art system does not address the problem of detecting and visualizing missing links. More specifically, in the example implementation, a matrix-based design is employed because the links are the focus in our framework and need to be visually emphasized.

Furthermore, generic link prediction algorithms for networks are broadly divided into two broad categories: learning-based and similarity-based. The learning-based approach treats link prediction as a binary classification problem and trains a machine learning model to predict class labels for each unconnected node pair (i.e., potential links are positive). One prior art method is classification based on features, which extract features based on node attributes, topology, social theory, or a combination thereof. Another approach is based on probabilistic graph models including relational models, entity-relational models, and the like. These techniques, while effective, are less versatile, they typically require some additional information (e.g., semantic node attributes) in addition to the observed network structure. However, the trained machine learning model may perform well only on networks with specific features (depending on the training set).

On the other hand, the similarity-based approach attempts to calculate a similarity score based on each unconnected node pair and rank all of these potential links. Methods of computing similarity metrics include random walk-based simulations, and neighbor-based metrics (measures), such as common neighbors, jaccard coefficients, adamic-adar coefficients, and preferential connections. Researchers extend some of the similarity metrics to the two-way network case. Example implementations advance one step further by integrating important types of structural information into a binary network, a binary group, to improve the performance of predictions, to propose a series of integration methods.

Aspects of the disclosure include a method that may include the following steps performed for data represented as a bipartite network and for a set of missing links in the bipartite network: calculating weights for each of the missing links in the set based on the tuples in the bipartite network; executing a link prediction algorithm configured to incorporate the weights of each of the missing links; and providing a missing link of the set of missing links selected by the link prediction algorithm as a predicted missing link of the bipartite network.

Aspects of the disclosure may also include a non-transitory computer-readable medium storing instructions for performing a process, the instructions comprising: for data represented as a bipartite network, for a set of missing links in the bipartite network: calculating weights for each of the missing links in the set based on the tuples in the bipartite network; executing a link prediction algorithm configured to incorporate the weights of each of the missing links; and providing a missing link of the set of missing links selected by the link prediction algorithm as a predicted missing link of the bipartite network.

Aspects of the disclosure include a system that may include means for calculating weights for each of a set of missing links in a binary network based on a binary group in the binary network for data represented as the binary network and for the set of missing links in the binary network; means for performing a link prediction algorithm configured to combine the weights of each of the missing links; and means for providing a missing link of the set of missing links selected by the link prediction algorithm as a predicted missing link of the bipartite network.

Aspects of the present disclosure include an apparatus that may include a processor configured to: for data represented as a bipartite network and for a set of missing links in the bipartite network, calculating weights for each of the missing links in the set based on the tuples in the bipartite network; executing a link prediction algorithm configured to incorporate the weights for each of the missing links; and providing a missing link of the set of missing links selected by the link prediction algorithm as a predicted missing link of the bipartite network.

Drawings

FIG. 1 illustrates an example system diagram according to an example implementation.

FIG. 2 illustrates a tuple according to an example implementation.

FIG. 3 illustrates an example interface for facilitating visual exploration of missing links according to an example implementation.

Fig. 4 (a) and 4 (b) illustrate example flowcharts in accordance with one example implementation.

Figure 5 illustrates numerically the average performance of the experimental results under each condition.

FIG. 6 illustrates an example computing environment having example computer apparatus suitable for use in example implementations.

Detailed Description

The following detailed description provides further details of example implementations and figures of the present application. For clarity, reference numerals and descriptions of redundant components between the drawings are omitted. The terminology used throughout the description is provided by way of example only and is not intended to be limiting. For example, the use of the term "automated" may involve a fully automated implementation or a semi-automated implementation involving the control of certain aspects of the implementation by a user or administrator, depending on the desired implementation by one of ordinary skill in the art in specific practice of the implementations of the present application. The selection may be made by the user through a user interface or other input device, or may be implemented by a desired algorithm. The example implementations described herein may be utilized alone or in combination, and the functions of the example implementations may be implemented by any means depending on the desired implementation.

As set forth herein, the term "unconnected node pair" is defined as an unconnected node in the original network. The term "set of missing links" is defined as potential links that exist between unconnected nodes. The term "predicted missing link" is defined as a missing link having a probability of being generated by the algorithm of the example implementations described herein.

FIG. 1 illustrates an example system diagram according to an example implementation. In the example implementations described herein, the data 100 is processed through a framework that includes an analysis module and a visualization module. The analysis module supports missing link prediction 101 in a binary network and two of the most common ways to observe the network: including node metrics 103 and subnet schemas 102. The link prediction method described herein uses the structure information of the tuples in the network, which can be integrated with any prior art similarity-based link prediction algorithm. The visualization module displays all of the output of the analysis module and enables the analyst to explore the data through rich user interactions. An analyst may visually investigate the identified missing links 104, network motifs 105, and node metrics 106 and further examine the impact of a particular link by comparing the analysis results for the original network with the analysis results for the network to which these links were added.

Formally, a bipartite network can be defined as g= < X, Y, E >, where X and Y are two non-overlapping sets of nodes, and E is a set of links that exist only between X and Y, i.e., e= < X, Y > ∈e, where x∈x and y∈y. For a bipartite network, the number of all possible links is |X| |Y|, we indicate these links as U. Thus, the link prediction problem is to identify which links in the set U-E may be missing.

The similarity of each unconnected node pair is first calculated using a link prediction algorithm, in particular a similarity-based method. Based on the similarity values, the method may generate an ordered list of recommended missing links with decreasing scores. One way to calculate the similarity between node pairs is via random walk. Another method of measuring similarity is based on a comparison of the neighbors of two nodes (including normal neighbors, jaccard coefficients, adamic-adar coefficients, and preferential connections).

Based on the above algorithm, the example implementations described herein provide a novel approach that integrates an important type of structure, a binary set (i.e., a full bipartite graph) in a bipartite network. Formally, a tuple is defined as a subnetwork, G' =<X′，Y′，E′>Wherein, the method comprises the steps of, wherein,and->And there is a link e=between each node pair<x，y>E', X E X and Y E Y. Many algorithms have been proposed to effectively detect all tuples in a network, and in the example implementations described herein, the maximum tuple enumeration algorithm (MBEA) algorithm has been tested.

FIG. 2 illustrates a tuple according to an example implementation. Treating the two tuples as two communities having some nodes in common; each missing link between non-overlapping nodes from both communities helps to form a larger community that benefits all nodes. If two communities have many nodes in common, each of several missing links that can be added carries more value, because larger tuples can be formed quite easily. On the other hand, if there are fewer common nodes for both communities, more links may need to be added to merge the two tuples into a larger tuple, and each of the missing links carries fewer values.

Following this intuition, example implementations relate to an algorithm that reorders the missing link list generated by the above similarity-based method. In an example implementation, the proposed algorithm calculates all missing links based on the information of the tuples in the network (M of fig. 2 ₄ In) weight w _e . The weight of a link is the sum of all values calculated in processing each pair of tuples, wherein the values are determined from the magnitude of the difference of the two tuples and their overlap. Intuitively, as shown in FIG. 2, the value calculated in each iteration corresponds to the area M of intersection ₁ Divided by the area M of the missing part ₄ . The weight and similarity scores are then normalized with their maximum values, and a new ordered list with new scores is generated using s' (x, y) =w (x, y) ·x (x, y). The above method can be used with any existing generic similarity-based link prediction to generate a series of algorithms.

However, the algorithm is not perfect; missing link predictions may be erroneous. That is because the real world is much more complex and it is difficult to consider every nuances of all fields in the algorithm design. A priori knowledge of the analyst is required to further examine the output of the algorithm which combines the flexibility of the personnel and the scalability of the machine.

Example implementations include a visual interface to help an analyst better understand missing links identified by the foregoing method in a bisection network. The visualization module involves five interactively coordinated views (as shown in fig. 3): a network view and a link list view for supporting exploration of missing links, a motif overview and detail view for providing motif analysis, and a metric view for displaying node-based metrics. These views present the output of the analysis module in visual form to enable the analyst to effectively answer the question content, reasons and how the missing link was formed.

FIG. 3 illustrates an example interface for facilitating visual exploration of missing links according to an example implementation. In the example interface of fig. 3, there are several views. First, the network view 300 as shown in interface pane (a) displays a dual adjacency matrix of a bipartite network, where rows and columns represent two different types of nodes, respectively. Links may be represented as squares at the intersections of rows and columns. Existing links in the network may be shown in a first shade (e.g., yellow-green), where the shade reflects the weight of the link. If it is a non-weighted network, all links are displayed in the darkest tone (e.g., green). The predicted missing link is displayed in a second color scale (e.g., white purple), where the darker color reflects the higher probability or score determined by the link prediction algorithm.

Further, the link list view 301 as shown in interface pane (b) is configured to present missing links linearly with probabilities or scores, where each link is visualized in a manner similar to that in the network view 300. Additional information such as ordering of links and connecting nodes is provided. The link list view 301 is used with the network view 300 to enable an analyst to better understand missing link predictions from different angles.

In an example implementation, various interface functions are provided. In the network view 300, an analyst may reorder the rows and columns of the matrix using certain criteria, such as node labels, average prediction scores, and the total number of missing links detected. The analyst may also filter the matrix based on the prediction scores, e.g., to reveal the most likely missing links suggested by the algorithm. In addition, different link prediction algorithms can be applied and viewed in the visualization, so that the results are easily compared.

Moreover, an analyst may explore the link predictions and add certain missing links to examine the impact of these added missing links using visual analysis of the motifs and metrics described below. The added links are marked on the matrix (e.g., black crosses) and are also displayed at the top of the list. They may be added at one time by selecting a single link or a group of links from the matrix.

Motif analysis (Motif analysis) is one of the main ways to understand the topology of a network. In a binary network, the doublet is one of the most important architectural modes. In interface pane (c), there is a detail view 302, and in interface pane (d) an overview 303 is provided for viewing the die body in different scales. These two views provide a visual exploration of all the tuples detected in the network and if some missing links are added, a survey is made of the change in the results. In the motif detail view 302, the tuples are shown as smaller multiples of the matrix in a similar visual encoding as the network view 300. Essentially, the tuple is part of a double adjacency matrix of the entire network. In addition, the motif overview 303 displays all the tuples as points in two-dimensional space based on a multi-dimensional scaling (MDS) projection. The distance between two tuples is measured using the sum of the Jaccard distances between each type of node set of the two tuples.

To support an analyst comparing two sets of tuples detected in a network with links added and a network without links added, the motif detail view 302 organizes the tuples into three columns: the removed tuple, the newly added tuple, and the unchanged tuple, to compare with the set of tuples of the original network; they are within boundaries represented in different colors (e.g., red, green, and gray). In each column, the default order of the tuples is sorting by size, which may be altered to other sorting criteria. Similarly, the motif profile 303 encodes these tuples in three different colors.

Furthermore, the Jaccard distance may be used to calculate the similarity between the added and removed tuples to facilitate a better understanding of the effects of structural changes and missing links. In the motif detail view 302, when an analyst hovers a mouse (river) over a tuple, this information is shown as links connecting the relevant tuple, and the thickness of the links is mapped to the similarity values of the paired links.

Computing node metrics is a method for taking pictures of network features in the social sciences and other fields. The metrics view 304 in interface pane (e) supports this analysis by presenting many metrics in a conventional form view that includes the degree, affinity, and centrality before and after adding certain missing links. The change in the metric value is highlighted (e.g., in red). The table is also interactively linked with other views. For example, hovering a mouse over a row may emphasize a corresponding node in the network view 301. Since there may be a large number of nodes (rows), a search function may also be provided and hovering a mouse over a node in other views automatically navigates to a corresponding row in the table.

To confirm the accuracy of the proposed missing link prediction method, quantitative experiments were performed with three bipartite networks including a weighted personal network extracted from the atlantic storm corpus, a weighted user dialogue bipartite network detected from the Slack communication message, and an unweighted bipartite network between authors and papers from the IEEE VIS publishing corpus.

Since the missing link is not realistic, the test will randomly remove a certain number of links from the original network, apply a link prediction algorithm on the new network, and measure performance by comparing the detected missing link with the removed (actually missing) link (i.e., the real situation). To validate the integration method, five existing link prediction algorithms are integrated into the method, including the normal neighbor, jaccard coefficients, adamic-adar coefficients, preferential connection, and random walk methods. For each algorithm, the test randomly removes 1%, 2%, 5%, 10% and 15% of the links from the incoming network in order to test the performance of the algorithm under different conditions. For each of these conditions, experiments were performed with the random link removed five times in order to reduce sampling bias.

Fig. 4 (a) illustrates an example flow for the proposed algorithm according to an example implementation.

At 400, the process detects a bipartite network g=<X，Y，E>Inner tuples and combining theseTo list l= { C _i ＝<X _i ，Y _i ，E _i >In, and X and Y are both sets of nodes within the respective networks in the bipartite network, and E represents a link present in the bipartite network. The doublet may be detected by any method, depending on the desired implementation.

At 401, the process will have all missing linksWhere U is the set of all possible links that may exist in a bipartite network. In an example implementation, the flow sets w according to a desired implementation _e ζ0 or other base value.

At 402, for each pair of tuples in list L having a score o meeting a threshold based on the number of overlapping nodes and the size of each pair of tuples (C _i ，C _j ) As depicted at 403, the flow performs the calculation of the weight of the missing link. In an example implementation, the score may be based on the number of overlapping nodes that meet a threshold and the size of each pair of tuples, and the threshold may be set accordingly to the desired implementation. An example formula for calculating the score o may be as follows:

If o fails to meet the threshold, the tuple pair is discarded and the next tuple pair is considered. Otherwise, flow proceeds to 403 to calculate the weight of the missing link between the pair of tuples. Example calculations may be performed incrementally based on the number of overlapping nodes between the pair of tuples and the impact that links based on the corresponding value of the node (e.g., the size of the tuple) have. In an example implementation, the formula for performing such calculations may include:

at 404, flow continues to loop back to flow at 403 until all doublet pairs have been processed.

Fig. 4 (b) illustrates an example overall flow according to an example implementation. Given the data represented as a bipartite network and a set of missing links in the bipartite network, at 410, the algorithm of FIG. 4 (a) is performed for calculating the weight of each missing link in the set based on the tuples in the bipartite network.

At 411, the flow executes a link prediction algorithm configured to incorporate the weights of each of the missing links. Any link prediction algorithm known in the art may be used for this purpose, such as the algorithms described herein.

At 412, the flow provides the missing links in the set of missing links selected by the link prediction algorithm as predicted missing links for the bipartite network, as shown in FIG. 3. This may include presenting the bipartite network as a dual adjacency matrix comprising rows representing nodes of a first type in the bipartite network and columns representing rows of a second type of network, each of the entries in the matrix representing links between nodes of the first type and nodes of the second type, as shown in fig. 3. The link prediction algorithm may select predicted missing links based on scores obtained by the link prediction algorithm for particular links meeting a threshold, or may display all missing links in a double adjacency matrix according to a desired implementation. As shown in fig. 3, the step of providing the missing links in the set of missing links selected by the link prediction algorithm as predicted missing links of the binary network may include representing the entry as a hue according to the score provided by the link prediction algorithm. Further, as described in fig. 3, the step of presenting a bipartite network may include providing an interface configured to rank the rows and columns of the dual adjacency matrix according to a selected criteria (type of node, average score, etc.). Further, as shown in fig. 3, the step of providing the missing links in the set of missing links selected by the link prediction algorithm as predicted missing links of the bipartite network may include linearly presenting the predicted missing links according to probability.

Figure 5 illustrates numerically the average performance of the experimental results under each condition. For each condition (i.e., in a table cell), three numbers represent (1) an average metric for the baseline, (2) an average metric for the proposed method, and (3) an improvement of the proposed method in five runs (different numbers of links removed from the original dataset). The highest performance and improvement of each metric in each dataset is highlighted in bold. The performance metric (R accuracy or area under curve-accurate recall (aucpr)) is calculated in each run using the input network built by removing a proportion of the links.

From these results, the proposed two-tuple-oriented method increases their baseline under all conditions, and improves to varying degrees for both R-accuracy and aucpr. Some of the performance gains were considerable, with the preferential connection algorithm exhibiting the greatest improvement (R accuracy of 0.564, aucpr of 0.557) for the unweighted atlantic storm data set. Thus, by implementation of the algorithms as described in fig. 4 (a) and 4 (b), improvements to prior art algorithms may be achieved and prior art link prediction algorithms may be enhanced to more accurately detect missing links.

Such an example implementation is particularly useful for large data analysis where there is a large amount of data and the data includes real world data that may be noise. For example, for data used in determining gene expression, genes are related to different conditions, and the bipartite network involves nodes of a first type (genes) and nodes of a second type (conditions/diseases that may occur). In practice, it is impractical to experiment with each type of gene combination because of the condition and the too many genes. By way of example implementation, such a bipartite network may be analyzed to identify which combinations of genes are likely to lead to which conditions by missing link detection, and then the user may focus on those specific gene/condition experiments.

In another example implementation involving drug discovery, the bipartite network may involve different types of molecules and different types of conditions (e.g., side effects, disease treatment efficacy). Drug discovery can involve a large number of experiments because there may be too many different types of molecules and conditions that a user may be interested in. By applying the algorithm as described herein, the causal relationship between drug molecule combinations and conditions can be determined more accurately than prior art link prediction algorithms, and thus, the user can concentrate on drug experiments to test such conditions accordingly.

FIG. 6 illustrates an example computing environment having example computer apparatus suitable for use in example implementations. The computer device 605 in the computing environment 600 may include one or more processing units, cores or processors 610, memory 615 (e.g., RAM, ROM, etc.), internal memory 620 (e.g., magnetic, optical, solid state, and/or organic memory), and/or I/O interfaces 625, any of which may be coupled to a communication mechanism or bus 630 for communicating information, or embedded in the computer device 605.

The computer device 605 may be communicatively coupled to an input/user interface 635 and an output device/interface 640. Either or both of the input/user interface 635 and the output device/interface 640 may be wired or wireless interfaces and separable. The input/user interface 635 may include any device, component, sensor, or interface (e.g., buttons, touch screen interface, keyboard, pointing/cursor control, microphone, camera, braille, motion sensor, optical reader, etc.) that may be used to provide input. The output device/interface 640 may include a display, television, monitor, printer, speaker, braille, etc. In some example implementations, the input/user interface 635 and the output device/interface 640 may be embedded in the computer device 605 or physically coupled with the computer device 605. In other example implementations, other computer devices may serve as or provide the functionality of the input/user interface 635 and the output device/interface 640 of the computer device 605. In example implementations involving a touch screen display, a television display, or any other form of display, the display is configured to provide a user interface such as that illustrated at fig. 3.

Examples of computer device 605 may include, but are not limited to, highly mobile devices (e.g., smart phones, devices in vehicles and other machines, devices carried by humans and animals, etc.), mobile devices (e.g., tablet computers, notebook computers, laptops, personal computers, portable televisions, radios, etc.), and devices of non-mobile design (e.g., desktop computers, other computers, kiosks, televisions with one or more processors embedded and/or coupled thereto, radios, etc.).

The computer device 605 may be communicatively coupled (e.g., via an I/O interface 625) to external memory 645 and network 650 to communicate with any number of networking components, devices, and systems, including one or more computer devices of the same or different configuration. The computer device 605 or any connected computer device may act as a server, client, thin server, general purpose machine, special purpose machine, or another tag, providing the services of a server, client, thin server, general purpose machine, special purpose machine, or another tag, or be referred to as a server, client, thin server, general purpose machine, special purpose machine, or another tag.

The I/O interface 625 may include, but is not limited to, a wired and/or wireless interface using any communication or I/O protocol or standard (e.g., ethernet, 802.11x, universal system bus, wiMax, modem, cellular network protocol, etc.) for communicating information to and/or from at least all of the connected components, devices, and networks in the computing environment 600. The network 650 may be any network or combination of networks (e.g., the internet, a local area network, a wide area network, a telephone network, a cellular network, a satellite network, etc.).

The computer device 605 may communicate using and/or with computer-usable or computer-readable media including transitory and non-transitory media. Transitory media include transmission media (e.g., metal cables, optical fibers), signals, carriers, and the like. Non-transitory media include magnetic media (e.g., magnetic disks and tapes), optical media (e.g., CD ROM, digital video disk, blu-ray disk), solid state media (e.g., RAM, ROM, flash memory, solid state storage), and other non-volatile storage or memory.

In some example computing environments, the computer device 605 may be used to implement techniques, methods, applications, processes, or computer-executable instructions. Computer-executable instructions may be retrieved from a transitory medium and stored on and retrieved from a non-transitory medium. The executable instructions may originate from one or more of any programming, scripting, and machine language (e.g., C, C ++, c#, java, visual Basic, python, perl, javaScript, among others).

The memory 615 may be configured to store or manage algorithms to be executed by the processor 610 and data to be processed, for example, in the flows described at fig. 4 (a) and 4 (b). The example implementations described herein may be performed alone or in any combination with one another depending on the desired implementation and are not limited to a particular example implementation.

The processor 610 may execute under any Operating System (OS) (not shown) in a natural or virtual environment. One or more applications may be deployed including a logic unit 660, an Application Programming Interface (API) unit 665, an input unit 670, an output unit 675, and an inter-unit communication mechanism 695, the inter-unit communication mechanism 695 being for the different units to communicate with each other, with the OS, and with other applications (not shown). The units and elements described may vary in design, function, configuration, or implementation and are not limited to the description provided. Processor 610 may be in the form of a physical processor or Central Processing Unit (CPU) configured to execute instructions loaded from memory 615.

In some example implementations, when the API unit 665 receives information or execution instructions, the information or execution instructions may be transferred to one or more other units (e.g., the logic unit 660, the input unit 670, the output unit 675). In some cases, logic unit 660 may be configured to control information flow between units and, in some example implementations described above, direct services provided by API unit 665, input unit 670, output unit 675. For example, the flow of one or more processes or implementations may be controlled by logic unit 660 alone or in combination with API unit 665. The input unit 670 may be configured to obtain input for the calculations described in the example implementation, and the output unit 675 may be configured to provide output based on the calculations described in the example implementation.

Some portions of the detailed description are presented in terms of algorithms and symbolic representations of operations within a computer. These algorithmic descriptions and symbolic representations are the means used by those skilled in the data processing arts to convey the substance of their innovation to others skilled in the art. An algorithm is a defined sequence of steps leading to a desired end state or result. In an example implementation, the steps performed require physical manipulation of a tangible number to achieve a tangible result.

Unless specifically stated otherwise, as apparent from the discussion, it is appreciated that throughout the description, discussions utilizing terms such as "processing," "computing," "calculating," "determining," "displaying," or the like, may include the actions and processes of a computer system, or other information processing device, that manipulates and transforms data represented as physical (electronic) quantities within the computer system's registers and memories into other data similarly represented as physical quantities within the computer system memories or registers or other information storage, transmission or display devices.

Example implementations may also relate to devices for performing the operations herein. The apparatus may be specially constructed for the required purposes, or it may comprise one or more general-purpose computers selectively activated or reconfigured by one or more computer programs. Such a computer program may be stored in a computer readable medium such as a computer readable storage medium or a computer readable signal medium. Computer readable storage media may be related to tangible media such as, but not limited to, optical disks, magnetic disks, read-only memories, random access memories, solid state devices and drives or any other type of tangible or non-transitory media suitable for storing electronic information. Computer readable signal media may include media such as carrier waves. The algorithms and displays presented herein are not inherently related to any particular computer or other apparatus. The computer program may be directed to a purely software implementation comprising instructions for carrying out the operations of the desired implementation.

Various general-purpose systems may be used with programs and modules in accordance with the examples herein, or it may prove convenient to construct more specialized apparatus to perform the desired method steps. In addition, the example implementations are not described with reference to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the teachings of example implementations as described herein. The instructions of the programming language may be implemented by one or more processing devices, such as a Central Processing Unit (CPU), processor, or controller.

The operations described above may be performed by hardware, software, or some combination of software and hardware, as is known in the art. Various aspects of the example implementations may be implemented using circuitry and logic (hardware), while other aspects may be implemented using instructions stored on a machine-readable medium (software), which if executed by a processor, would cause the processor to perform a method for implementing an implementation of the present application. Moreover, some example implementations of the present application may be performed using only hardware, while other example implementations may be performed using only software. Furthermore, the various functions described may be performed in a single unit or may be distributed among multiple components in any number of ways. When executed by software, the method may be performed by a processor, such as a general purpose computer, based on instructions stored on a computer readable medium. The instructions may be stored on the medium in compressed and/or encrypted format, if desired.

Further, other implementations of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the teachings herein. The various aspects and/or components of the described example implementations may be used alone or in any combination. The specification and example implementations are to be considered merely as examples and the true scope and spirit of the application is indicated by the following claims.

Claims

1. A method for determining a set of missing links in a binary network, the method comprising the steps of:

for data represented as the bipartite network, for the set of missing links in the bipartite network:

calculating weights for each of the missing links in the set based on the tuples in the bipartite network;

executing a link prediction algorithm configured to incorporate the weights of each of the missing links;

providing a missing link of the set of missing links selected by the link prediction algorithm as a predicted missing link of the bipartite network; and

presenting the bipartite network as a dual adjacency matrix comprising rows representing nodes of a first type and columns representing rows of a second type of network in the bipartite network, each of the entries in the matrix representing links between nodes of the first type and nodes of the second type,

Wherein the step of presenting the bisection network comprises: an interface is provided, the interface being configured to represent each of the entries in the matrix as a hue according to a score for each of the predicted missing links, the score indicating a probability that a respective predicted missing link exists between a respective node of the first type and a respective node of the second type.

2. The method of claim 1, wherein calculating the weight of each of the missing links in the set based on the tuples in the bipartite network comprises:

for each pair of tuples having a score based on the number of overlapping nodes meeting a threshold and the size of each pair of tuples, the weight of a missing link in the set of missing links between the each pair of tuples is calculated.

3. The method of claim 2, wherein calculating the weight for a missing link in the set of missing links between the each pair of tuples is based on the number of overlapping nodes and a size of the each pair of tuples.

4. The method of claim 1, wherein the score is provided by the link prediction algorithm.

5. The method of claim 4, wherein presenting the bisection network comprises: the interface is provided that is configured to rank the rows and columns of the dual adjacency matrix according to a selected criterion.

6. The method of claim 1, wherein providing a missing link of the set of missing links selected by the link prediction algorithm as the predicted missing link of the bisection network comprises: the predicted missing links are presented linearly according to probability.

7. A non-transitory computer-readable medium storing instructions for performing a process, the instructions comprising:

for data represented as a bipartite network, for a set of missing links in the bipartite network:

wherein presenting the bipartite network comprises: an interface is provided, the interface being configured to represent each of the entries in the matrix as a hue according to a score for each of the predicted missing links, the score indicating a probability that a respective predicted missing link exists between a respective node of the first type and a respective node of the second type.

8. The non-transitory computer-readable medium of claim 7, wherein calculating the weight of each of the missing links in the set based on the tuples in the bipartite network comprises:

9. The non-transitory computer-readable medium of claim 8, wherein calculating the weight for a missing link in the set of missing links between the each pair of tuples is based on a number of the overlapping nodes and a size of the each pair of tuples.

10. The non-transitory computer-readable medium of claim 7, wherein the score is provided by the link prediction algorithm.

11. The non-transitory computer-readable medium of claim 10, wherein the operation of presenting the bisection network comprises: the interface is provided that is configured to rank the rows and columns of the dual adjacency matrix according to a selected criterion.

12. The non-transitory computer-readable medium of claim 7, wherein providing a missing link of the set of missing links selected by the link prediction algorithm as the predicted missing link of the bisection network comprises: the predicted missing links are presented linearly according to probability.

13. An apparatus for determining a set of missing links in a binary network, the apparatus comprising:

a processor configured to:

providing a missing link of the set of missing links selected by the link prediction algorithm as a predicted missing link of the bipartite network; and is also provided with

wherein the processor is configured to: an interface is provided, the interface being configured to represent each of the entries in the matrix as a hue according to a score for each of the predicted missing links, the score indicating a probability that a respective predicted missing link exists between a respective node of the first type and a respective node of the second type.

14. The device of claim 13, wherein the processor is configured to calculate the weight of each of the missing links in the set based on a tuple in the bipartite network by:

15. The device of claim 14, wherein the processor is configured to calculate the weight of a missing link in the set of missing links between the each pair of tuples based on the number of overlapping nodes and a size of the each pair of tuples.

16. The apparatus of claim 13, wherein the score is provided by the link prediction algorithm.

17. The device of claim 16, wherein the processor is configured to present the bipartite network, the operation of presenting the bipartite network comprising providing the interface configured to rank the rows and columns of the dual adjacency matrix according to a selected criterion.

18. The device of claim 13, wherein the processor is configured to provide the missing link in the set of missing links selected by the link prediction algorithm as the predicted missing link of the bipartite network by linearly presenting the predicted missing links according to probability.

19. The device of claim 13, wherein the processor is configured to:

in response to selection of an interface for one of the predicted missing links:

performing at least one of a motif analysis or a metric analysis on selected ones of the predicted missing links by adding the selected ones of the predicted missing links in the bipartite network; and is also provided with

Providing a result of said at least one of said motif analysis or said metric analysis of selected ones of said predicted missing links.