METHOD FOR SIMULATION OF INTRACELLULAR PROCESSES, AND RELATED SIMULATOR APPARATUS
The present invention refers to a method for simulation of intracellu- lar processes, that allows, in a simple, efficient, and reliable way, observing the evolution of a "virtual" intracellular process for aiding biological research, particularly aiming at production of new drugs and new therapies.
The present invention further refers to the related simulator appara- tus and to the related instruments necessary for performing the method.
Recent developments in genomics have brought to complete genome sequence of numerous organisms, among which the human one. In parallel with the efforts accomplished for sequencing genomes, specific softwares have been developed for carrying out comparisons among se- quences and for foreseeing their possible functions. These software packages, based on comparison of new sequences with already characterised and noted sequences, are very powerful and have allowed carrying out many functional forecasts which have been revealed as correct. The discovery that most of the human genome does not encode proteins has opened new research fields for understanding "grammar" and "syntax" of what was considered as "trash DNA" until few years ago and that, instead, should have an important role in regulating the portion that is encoded by it.
With regard to a different subject, the one of proteomics, it is to be completed the map of the protein set expressed within various cellular types and biochemical paths have been discovered which allow extracellular signals to reach the nucleus and to activate the proper transcriptional response. Also in the field of proteomics some specific softwares have been developed allowing foreseeing which is the likely function of the sin- gle domains of which the proteins are composed. Software packages have been also developed capable to follow the dynamics of the structure of proteins by simulating forces (forces substantially of electrostatic character) operating among various amino acids, as well as the dynamics of short tracts of nucleotides (i.e., DNA and RNA) in order to understand their various bends and their ways of interacting with other molecules.
Since the beginning of the last century, mathematical models have been developed which allow describing in a precise way the action of enzymes onto their own substrates and which have become the basis for
carrying out detailed studies concerning properties of differential equations of enzymatic kinetics. Since analytical handling of these systems of differential equations is possible only in the case when variables are few, preferably equal to only two or three, the interest in these models has been always limited to very short chains of biochemical reactions. While numerous scientific papers have been published about models with few variables, literature about models with many variables is very poor. Actually, a dichotomy has been created between biomathematicians' interest, aimed at reproducing some typical phenomena of non linear dynamics, and biological interest in these models, that is the possibility of simulating on a large scale the real behaviour of the cell.
Since few years some research teams have faced the problem of developing a software capable to reproduce in silico a large number of biochemical reactions. However, these attempts remained limited to the research interest of single American university teams. Although some of these softwares have several definitely interesting features, since they are however softwares substantially intended for simulating specific reactions, they all give the impression of being "partial attempts" not suitable for large scale generalised simulations. In the world of the information technology it is evident the delay in the application of information tools to biotechnology, most of all in a time when decisions have to be taken with regard to the use of genetically modified products, the usefulness or dangerousness of which to health is nowadays matter only for speculation, due to lack of data about body re- actions to food having a new genomic structure not exhaustively tested in laboratory. Today available experimental data are fragmentary, except in some anecdotal cases.
It is therefore an object of the present invention to solve the aforementioned drawbacks and problems, permitting the implementation of a "virtual cell" which is usable, in a simple, reliable, sturdy, and safe way, by the biological research as an aid in the production of new drugs and new therapies.
More in particular, it is therefore an object of the present invention to allow the implementation of a "virtual cell" permitting large scale simula- tions of both prokaryotic and eukarγotic intracellular processes to be carried out.
It is a further object of the present invention to provide the instru-
ments and the apparatus performing the method.
It is specific subject matter of the present invention a method for simulation of intracellular processes, comprising the following steps:
A. defining, through a graphic interface, a graphic representation of one or more intracellular reactions to be simulated,
B. associating one or more first values corresponding to one or more biochemical characteristics of one or more biochemical components involved in said one or more reactions defined in step A and/or one or more second values corresponding to one or more biokinetic con- stants of said one or more reactions defined in step A,
C. transforming said graphic representation of said one or more reactions into one or more corresponding systems of ordinary differential equations and/or of partial derivative differential equations modelling said one or more reactions, employing said one or more first values and/or said one or more second values,
D. performing one or more mathematical processings comprising a numerical integration of said one or more systems of ordinary differential equations and/or of partial derivative differential equations, and
E. displaying, through said graphic interface, the results of said one or more mathematical processings, said one or more mathematical processings modelling said one or more intracellular reactions with at least one neural network comprising one or more nodes, said ordinary differential equations and/or partial derivative differential equations of said one or more systems of equations being transfer functions of said one or more nodes.
Always according to the invention, said one or more intracellular reactions may be metabolic and/or proteic reactions.
Still according to the invention, step B may comprise the following sub-step: B.1 inputting, through said graphic interface, at least one of said one or more first values and/or at least one of said one or more second values.
Preferably according to the invention, step B comprises the following sub-step: B.2 interrogating a database storing values of biochemical characteristics of biochemical components and of biokinetic constants of reactions, fetching, in the case when they are present, at least one of
said first values and/or at least one of said one or more second values.
Furthermore according to the invention, step B may comprise the following sub-step: B.3 selecting, through said graphic interface, at least one of said one or more first values among a first plurality of values, related to the corresponding biochemical characteristic, stored in the database and/or at least one of said one or more second values among a second plurality of values, stored in the database. Always according to the invention, step B may comprise the following sub-step:
B.4 storing in the database said at least one first value inputted in sub- step B.1 and/or said at least one second value inputted in sub-step B.1. Still according to the invention, step E may comprise the following sub-step:
E.1 storing in the database at least one of said results of said one or more mathematical processings.
Furthermore according to the invention, sub-step E.1 may be auto- matically performed.
Always according to the invention, said sub-step E.1 may be enable- able through said graphic interface.
Still according to the invention, the database may store said values of biochemical characteristics of biochemical components and of bioki- netic constants of reactions according to a classification by species, cellular type, and physical-chemical condition.
Furthermore according to the invention, the database may comprise a portion storing metabolic data and a portion storing proteic data.
Always according to the invention, said graphic interface may display at least one description, stored in the database, on at least one of said one or more biochemical characteristics and/or on at least one of said one or more biochemical components and/or on at least one of said one or more biokinetic constants of reactions.
Still according to the invention, said graphic interface may display at least one hypertextual link or hyperlink, stored in the database, to at least one web page containing at least one description on at least one of said one or more biochemical characteristics and/or on at least one of said one
or more biochemical components and/or on at least one of said one or more biokinetic constants of reactions.
Furthermore according to the invention, step B may comprise the following sub-step: B.5 selecting, through said graphic interface, at least one predefined module, comprising one or more cascades of intracellular reactions, that is stored in the database, said at least one predefined module comprising one or more input elements and/or one or more output elements apt to be linked to at least one biochemical component and/or to at least one predefined module.
Always according to the invention, step B may comprise the following sub-step:
B.6 for at least one predefined module selected in sub-step B.5, further selecting, through said graphic interface, one or more third values, stored in the database, related to one or more biochemical characteristics and/or to one or more biokinetic constants defining said at least one selected predefined module.
Still according to the invention, said at least one neural network may comprise a layer of input nodes, a layer of output nodes, and one or more layers of intermediate nodes.
Furthermore according to the invention, said at least one neural network may be preliminarily trained through one or more training methods employing one or more experimentally detected time curves of one or more biochemical characteristics of one or more biochemical components involved in tested intracellular reactions.
Always according to the invention, said tested intracellular reactions may comprise at least one reaction selected from the group including:
- PC12 paradox in rat PC12 cells;
- transduction of the signal related to PDGF receptors in rat fibroblasts; - transduction of the signal related to integrins in human platelets, effects of the constitutive activation of some integrins;
- role of phosphatases in mouse pancreas cells;
- perturbation of transduction paths related to NF-kB factor and to EGF receptor by viruses; - MAPK signalling path in yeast cells;
- signal transduction by insulin receptor; and
- intracellular signal transduction in cardiac cells and anagenesis.
Still according to the invention, each one of said one or more experimentally detected time curves may reach one or more steady asymptotic states.
Furthermore according to the invention, said at least one neural net- work may be preliminarily trained for determining the existence of one or more steady asymptotic states corresponding to said one or more steady asymptotic states reached by said one or more experimentally detected time curves.
Always according to the invention, said one or more training meth- ods may comprise the error backpropagation method and/or the simulated annealing method.
Still according to the invention, said numerical integration of step D may be performed with a stability control based on multistep numerical integration. Furthermore according to the invention, at least one of said ordinary differential equations and/or partial derivative differential equations may model a kinetics of at least one of said intracellular reactions.
Always according to the invention, said one or more systems of ordinary differential equations and/or of partial derivative differential equa- tions may model said one or more biochemical components as punctiform objects devoid of geometric structure, structural changes being modelled as variations of at least one of said one or more biokinetic constants.
Still according to the invention, in step C, an ordinary or partial derivative differential equation may correspond to each biochemical compo- nent, involved in an intracellular reaction of said one or more reactions defined in step A, equation modelling the interactions among said biochemical component and all the other biochemical components involved in the same intracellular reaction as well.
Furthermore according to the invention, at least one biochemical component may be an enzyme j involved in a reaction, comprising an activation kinetics and a deactivation kinetics, with N sub-layers, with N > 1 , and an equation of said one or more systems of ordinary differential equations and/or partial derivative differential equations may model said reaction through the following formula: dx N -£- = ∑(at, -χ -cι/ -χι 'x ) >
where
- jc, is the active fraction, equal to the ratio of active molecules number to molecules total number, of the t'-th sub-layer,
- Xj is the active fraction, equal to the ratio of active molecules number to molecules total number, of said enzymey,
- ay = kyβ is the constant of said activation kinetics of said reaction, equal to the product of the catalysis constant ky between said active enzymey and the t'-th inactive sub-layer with the concentration βf of said enzymey, and - cv = y + b,j , wherein bυ = [M} ] • hy is the constant of said deactivation kinetics of said reaction, equal to the product of the catalysis constant hy between said active enzymey and the t-th active sub-layer with the concentration [MJ of said enzymey. Always according to the invention, at least one equation of said one or more systems of ordinary differential equations and/or of partial derivative differential equations may model a creation process and/or a destruction process of at least one of said one or more biochemical components. Still according to the invention, said equation modelling said reaction of said enzymey with N sub-layers, with N > 1 , may further model a crea- tion process and/or a destruction process of said enzymey.
Furthermore according to the invention, at least one equation of said one or more systems of ordinary differential equations and/or of partial derivative differential equations may model at least one auto-interaction of at least one of said one or more biochemical components. Always according to the invention, at least one equation of said one or more systems of ordinary differential equations and/or of partial derivative differential equations may model at least one feedback of a biochemical transduction signal.
Still according to the invention, said one or more systems of ordinary differential equations and/or of partial derivative differential equations may be apt to model at least one transport phenomenon of at least one of said one or more biochemical components.
Furthermore according to the invention, said one or more systems of differential equations may be apt to model said at least one transport phenomenon through at least one corresponding time delay of the numerical integration of at least one equation performed in step D.
Always according to the invention, one or more biochemical charac-
teristics, resulting from said one or more mathematical processings of step D, may be stored so as to be used in said one or more mathematical processings with said at least one time delay.
Still according to the invention, at least one time delay may be setta- ble through said graphic interface.
Furthermore according to the invention, at least one time delay may be stored in the database.
Always according to the invention, said one or more systems of ordinary differential equations and/or of partial derivative differential equa- tions may be apt to model at least one diffusion phenomenon of at least one of said one or more biochemical components.
Still according to the invention, said one or more systems of ordinary differential equations and/or of partial derivative differential equations may be apt to model noise phenomena of at least one of said one or more in- tracellular reactions through stochastic terms.
Furthermore according to the invention, said one or more systems of ordinary differential equations and/or of partial derivative differential equations may be apt to model multistable intracellular reactions.
Always according to the invention, at least one time step and/or at least one duration of said numerical integration of step D may be selectable through the graphic interface.
Preferably according to the invention, said one or more systems of ordinary differential equations and/or of partial derivative differential equations are apt to model a compartmentalisation of at least one intra- cellular environment, whereby said intracellular environment comprises two or more compartments in at least two of which at least one particular biochemical component of said one or more biochemical components behaves differently in the reactions in which it is involved.
Still according to the invention, in step B, a first value different for each one of said at least two compartments may be associated with one or more biochemical characteristics of said at least one particular biochemical component and/or a second value different for each one of said at least two compartments may be associated with one or more biokinetic constants of said one or more reactions in which said at least one par- ticular biochemical component is involved.
Furthermore according to the invention, said compartmentalisation of said at least one intracellular environment may be defined in step A.
Always according to the invention, the results displayed in step E may comprise at least one time curve of a biochemical characteristic of at least one of said one or more biochemical components.
Still according to the invention, said biochemical characteristic of which said time curve is displayed may be a chemical concentration.
Furthermore according to the invention, said graphic interface may be apt to add and to eliminate biochemical components.
Always according to the invention, said graphic interface may be apt to make at least one biochemical component constitutively active, so that in step C said one or more systems of ordinary differential equations and/or of partial derivative differential equations comprise an equation modelling a null or constant kinetics of said at least one constitutively active biochemical component.
It is further specific subject matter of the present invention a system for simulation of intracellular processes, comprising one or more electronic computers, characterised in that it is apt to perform the previously described method for simulation of intracellular processes.
Always according to the invention, said system may have either a centralised architecture, comprising a sole electronic computer, or a dis- tributed architecture, comprising at least two electronic computers apt to be connected through one or more communication networks, preferably through the Internet network.
Still according to the invention, said system may have an information architecture of client/server type. It is another specific subject matter of the present invention a computer program characterised in that it comprises code means adapted to execute, when running on a system comprising one or more electronic computers, the previously described method for simulation of intracellular processes. It is further specific subject matter of the present invention a Memory medium, readable by a computer, storing a program, characterised in that the program is at least a portion of the computer program just described.
The present invention will be now described, by way of illustration and not by way of limitation, according to its preferred embodiments, by particularly referring to the Figures of the enclosed drawings, in which:
Figure 1 shows a schematic representation of the structure of the preferred embodiment of the method of simulation according to the inven-
tion;
Figure 2 shows a schematic representation of a neural network of the type employed in the method of Figure 1 ;
Figure 3 shows a schematic representation of the process of the dy- namics of the calcium ion within the cell;
Figure 4 shows a selectable predefined module in the method of Figure 1 ;
Figure 5 shows an interaction among predefined modules that is definable in the method of Figure 1 ; Figure 6 shows the graphic representation of the glycolysis definable in the method of Figure 1 ;
Figures 7 and 8 show some kinetic curves resulting from mathematical processing by the method of Figure 1 on the glycolysis of Figure 6; Figure 9 shows the graphic representation of the Krebs cycle defin- able in the method of 1 ; and
Figures 10 and 11 show some kinetic curves resulting from mathematical processing by the method of Figure 1 on the Krebs cycle of Figure 9.
The present invention uses mathematical methods describing kinet- ics of the biochemical reactions, extending them to the whole set of the intracellular biochemical phenomena, such as, for example, metabolism of sugars and activation of operons in bacteria, and biochemical signal transduction in cytoplasmic environment in eukaryotes. The method according to the invention, extends mathematical models to interactions among cells and to population dynamics in general.
Figure 1 show a schematic representation of the structure of the method of simulation, and in particular of the software implementing it.
The simulator comprises a numerical processor 1 that performs mathematical processing related to the intracellular reactions to be simu- lated. Such processing substantially comprise integration of systems of ordinary or partial derivative differential equations modelling the intracellular reactions to be simulated.
In particular, an operator 2 defines the reactions to be simulated through a graphic interface 3, that communicates with the numerical proc- essor 1 transforming the reactions which are graphically schematised by the operator on the interface 3 into the corresponding systems of ordinary differential equations and/or of partial derivative differential equations
modelling such reactions.
In performing the integrations, the numerical processor 1 interrogates a database 4 that stores the characteristics of biochemical components and of reactions constants, by retrieving the values corresponding to the specific components and constants of the reactions inputted by the operator 2. In particular, some values of the database 4 may be also directly selected by the operator 2 through a database interface 5 linked to the database 4 and to the graphic interface 3. Moreover, some values 6 (for example obtainable from new experimental data), which are not yet stored in the database 4, may be directly inputted through the graphic interface 3 (or through the database interface 5 as well) and through this possibly afterwards be stored in the database 4.
In particular, the various embodiments of the method according to the invention may be implemented by apparatuses having indifferently either centralised or distributed architecture.
By way of example, the database 4 may be locally centralised on a computer of the operator 2, or it may be remote and accessible by the operator 2 through a communication network (for example, the Internet network). Furthermore, the database 4 may be distributed among more re- mote stations which the computer of the operator 2 accesses through one or more communication networks.
Similarly, the architecture of the numerical processor 1 may be also either centralised or distributed, and it may be locally located on a computer of the operator 2, or it may be remote and accessible by the com- puter of the operator 2 through a communication network (for example, the Internet network). In the latter case, the method according to the invention may be implemented by a software architecture of client/server type, wherein a client software is executed on a computer (or more computers, in furthermore distributed architectures) of the operator 2 and a server software is executed on a remote computer (or more computers, in furthermore distributed architectures).
The method according to the invention assumes that, in the intracellular compartments, biomolecules interact in a so dense way to establish a network, on the global behaviour of which events, occurring owing to stimulation by environmental factors, depend. However, models do not concern the structural features of biomolecules (mainly proteins): bio- molecule is always represented as a punctiform object, devoid of its
structure.
Structural changes are simply interpreted as changes of kinetics with which reactions catalyzed by proper enzymes occur. As an example, an enzyme which is activated through phosphorylation modifies its own con- figuration and consequently the kinetics with which it catalyzes a downstream reaction. For the mathematical formulation used by the method, it is sufficient that a one-to-one correspondence exists between configuration and kinetics in order to neglect structural features and to model only kinetics. Large scale mathematical models used by the method represent an innovation since biomathematical modelling has concerned a large variety of models describable with few variables so far.
The method of simulation according to the invention is based on the modular construction, by the operator 2, of protein network interacting in the same way in which the so-called neural networks are built. In particular, the operator may build the network according to two different modes, possibly in combination: an elementary mode, i.e. using the "atomic" elements of the network, that is proteins and experimental data of the database 4; a modular mode, i.e. using the predefined modules of cascades of transduction of the intracellular signal, stored in the database 4, exploiting the fact that certain transduction paths are of universal nature, present in all the cells or even in all the organisms. Also, the method offers to the operator 2 the possibility to store networks constructed during use of the simulator in order to then re-use them in successive simulations. As said, the operating principle of the virtual cell which can be made by means of the method according to the invention is the numerical integration of ordinary differential and/or partial derivative equations. The integration result is the time curve of the activity of the network proteins, that may be compared with possible corresponding graphs obtained by experimental way. Effectively, the operator 2 does not see on the graphic interface 3 the progress of the chemical reactions as images, but he/she follows its time curves of chemical concentrations. By interpreting such curve, the operator 2 is able to immediately evaluate, for example, that a pathological condition of the cell is occurring. By way of example, in cer- tain tumour conditions it is possible to determine the constitutive activity of certain elements of the network (said oncoproteins): by comparing the time curves resulting from the simulation with a known condition of the
cell, it is possible to make forecast about the "normal" or "pathological" status of the same cell.
Simulations carried out through the method of simulation according to the invention have shown that this gives results in accordance with ex- perimental data.
As said, the intracellular networks are formalized by the numerical processor 1 as neural networks, i.e. artificial non linear systems capable to learn, to store and categorise, and to generalise, and in which connection among the various units is a decisive factor for the behaviour of the system. The neural network allows modelling the very complex network of interactions such as those made by biological entities (individuals, cells or biomolecules).
The concept of network, which a priori provides the interaction of all the proteins of the compartment among them, permits modular construc- tion of all functional interactions, so as to reproduce the paths of the signals coming from any type of receptor.
The theory of neural networks, well known by the skilled in the art, comes from the dense network of interconnections among brain neurons. Each neuron is a "processing unit" receiving various input data from other units to which it is connected and gives a processed output datum to the downstream units. Each neuron processes the input set received from the upstream units with a "transfer function". An example of very simple network is reported in Figure 2: in this case there are four input units 7 (a sort of layer collecting information from the outside), forming the so-called "input layer"; three intermediate units 8, forming the "intermediate layer" receiving data from the input units 7, processing them on the basis of well defined rules, and giving output data to the output layer, composed in this case by six output units 9, giving the processing final results. In the case of Figure 2 the layer are three, but, in general, the number of layers of a neural network, as well as the number of nodes of each layer, may be any, depending on the complexity of the biological network to be simulated.
The network formed by biomolecules represents an interesting generalisation of the neural networks: in fact auto-interactions of the units (for example auto-phosphorylation of receptors) or signal back-propagation are provided.
The fundamental element of the network is represented by what is
called "weights", i.e. the intensity with which the units interact and with which an input is distributed toward the successive processing layer. The value of these weights determines the final response of the network, and hence the output value. In a neural network the initial values of the pa- rameters, i.e. the weights, are randomly assigned. At this point it is provided that the network "learns", i.e. the weights are modified so that the output is closer and closer to the real one. It is object of the learning to faithfully reproduce matching between simulated input and output with real input and output. The modification of parameters follows a well precise rule, that tends to minimise (i.e., to optimise) what is called cost function H, representing from a physical point of view the Hamiltonian of the system. One of the methods preferably used for optimising the cost function is the so-called simulated annealing, well known by the skilled in the art. Other embodiments of the method of simulation according to the invention use other known learning methods as an alternative or in combination with the simulated annealing. By way of example, the latter may be initially used for a first general learning of the whole network, while one or more other methods may be after employed for further detailed learning of specific portions of the neural network. Therefore, the protein network is considered as a neural network.
There exist an input layer (represented by the membrane receptors; in this case the input is the extracellular message reaching the set of external receptors of a given cell), an output layer (represented by the set of molecules entering and acting within the cellular nucleus; the output is the code that is read by the nuclear system so as to create an activity pattern of the genome of the single cell) and one or more intermediate layers ("hidden", formed by the various cytoplasmic molecular species). The kinetic constants are randomly assigned, even if on the basis of reasonableness criteria, along with initial values of the molecular concentrations. It is evident that network learning consists in making kinetic constants of this "molecular black box" vary according to the simulated annealing method (and/or to other learning methods), so as to obtain a reasonable matching between simulated output and experimental output, under the same input condition. It is also evident that the experimental output is the asymptotic value of the concentration of a given molecular species. This also implies that the simulated network must reach a steady and unique asymptotic state. In case of very large number of variables, such as in the
case of complex simulations carried out by the method according to the invention, preliminary simulations are also necessary for determining the existence of the asymptotic state and its steadiness.
Reaching of the global optimum of the molecular network means that it has been obtained the set of kinetic constant values that approximates the real values of such constants best. The basic concept is that by enlarging the network with new elements, these constants either must be not much changed or they do not change at all. With these assumptions, the network is capable to generalise, i.e. to face new situations giving results adhering to reality.
In intracellular protein networks, it is observed the auto-interaction of the active proteins (for example, the active Ras tends to auto-deactivate itself, or the activated receptors tend to dimerise and to make an auto- phosphorylase). The possibility of these auto-interactions, i.e. of these feedbacks, is provided in the method of simulation according to the invention. New experimental data on the kinetics of some phosphatases (that, as it is known, tend to "turn the activation signals off') may give useful indicators for introducing into the network the kinetics of these important elements acting with a negative feedback for making the network activa- tion in response to an input transient. It is inevitable that many of these phosphatases reveal themselves as proto-oncoproteins. Using the same previously disclosed principles of network learning to recognise a pathological state, it is possible to carry out simulations about the cellular behaviour in diseases related to mutations of kinases or phosphatases. In other words, with the simulated annealing method (and/or similar learning methods) the network is "constrained" to give the desired output for a specific input. The learning, achieved along many signal transduction paths, allows generalising correct operation of the virtual cell to "new" situations, i.e. to cases in which the operator 2 add new element (for ex- ample, interaction among just discovered proteins or new interactions among already known proteins) to the network.
During learning, equations describing the enzymatic kinetics with related parameters (i.e., kinetic constants) are arranged, numerical simulations are carried out and parameters are automatically modified so that theoretical data matches experimental data (even quantitatively). Experimental data which are used in this stage are the time curves of concentrations of the involved molecular species.
ln particular, a cellular type having the characteristics to be well studied and significant may be used for "training" the neural network of the method of simulation according to the invention. By way of illustration and not by way of limitation, the following cellular types have been used for training the neural network so far:
- PC12 paradox in rat PC12 cells;
- transduction of the signal related to PDGF receptors in rat fibroblasts;
- transduction of the signal related to integrins in human platelets, effects of the constitutive activation of some integrins (for example, alfallb-beta3);
- role of phosphatases in mouse pancreas cells (for example, PTP-1 B);
- perturbation of transduction paths related to NF-kB factor and to EGF receptor by viruses;
- MAPK signalling path in yeast cells (specificity of kinases for invasive growth path and response to pheromones);
- signal transduction by insulin receptor; and
- intracellular signal transduction in cardiac cells and anagenesis.
It is also possible to simulate more complex events, such as for example encoding of information as frequency (oscillations of intracellular calcium), capacity of presenting a multistability of cellular behaviour (a sort of cell "memory" of its past history, important for simulating biochemical events occurring in cells which process information as neurons). From the modelling point of view, the most interesting phenomenon is surely that of signal transduction for starting the cellular cycle. From the bio- medical point of view, cellular cycle deregulation is related to tumour proliferation and to cancer onset. But biochemical system of the cycle is related to other systems, such as apoptosis (programmed cellular death), repair of DNA damages and DNA replication/recombination. From the point of view of modelling and implementation on graphic interfaces, the cellular cycle is the most interesting phenomenon because mathematical models extended to few proteins (for example activation of kinases which make cell enter mitosis phase) have been successfully simulated and signal transduction has a strongly modular structure.
Using the same previously disclosed principles of network learning to recognise a pathological state, it is possible to carry out simulations about cellular behaviour in diseases related to mutations of kinases or phosphatases.
This type of generalisation is a great inventive contribution of the method of simulation according to the invention. In fact, the operator 2 may build a "customised network" on the basis of data deduced from scientific literature and he/she controls its operation. By inputting "new ele- ments", the operator 2 has the possibility to check the normal or pathological behaviour of his/her network. To make an example: in his/her simulations, the operator 2 discovers that by modifying kinetics of a given element a tumour behaviour is obtained; that specific element of the network automatically becomes the target of pharmacology research for that form of tumour. Or also: in a tumour condition, the researcher discover that by modifying element kinetics such condition is left for entering a stationary phase (cellular cycle stop). This element also becomes a pharmacology target: it is studied an active principle (for example an enzyme) stimulating this element activity. It is evident that on the basis of the results from the simulations it is possible to establish that soma elements are irrelevant for the network operation, that is they are redundant. These elements may be eliminated from the pharmacology research. Effectively, it is very frequent, when not inevitable, formation of a hierarchy of the elements of a biological net- work: some of them have a drastic effect on the network behaviour, others are irrelevant. This type of properties, named "emerging properties", are a natural consequence of the non-linearity of the underlying mathematical model.
Some proteins being downstream with respect to other proteins may by-pass the upstream signalling becoming constitutively active, i.e. sending signals independently from (either activating or inhibiting) interactions by the proteins with which they interact. Some tumour forms develop in this way: constitutive activation of an element (for example Ras) makes a continuous proliferation signal, independent from the regulation that should occur upstream, arrive at the nucleus.
The method of simulation according to the invention also allows, in a simple way, "eliminating" from the network one or more elements (besides adding them) or making them constitutively active (a physiologically important condition, for example, for the Ras proto-oncogene), by simply annulling kinetics of a given element or making it constant.
The numerical processor 1 operates on the basis of a mathematical modelling of the enzymatic chains within cells leading to the solution of
systems of ordinary differential equations (ODE) and/or partial derivative equations (PDE). Each element used in the chains represents a new differential equation the terms of which are represented by the bonds existing among this element and the other chain elements. Each element con- tributes to formation or degradation of other elements or products which in turn operates as sub-layer for other products. The speed with which exchanges occur is proportional, through a kinetic constant, to the concentration of the element operating as sub-layer.
The numerical processor 1 executes numerical integrations neces- sary to the solution of systems with a large number of differential equations, not treatable according to an analytical treatment.
The numerical processor 1 is further capable to execute numerical integration of systems of partial derivative differential equations, also representing transport phenomena of molecules and other specific phenom- ena of biomolecule behaviour. The cell is wholly treated as a compartmentalised space. Moreover, in formalising processes, mathematical modelling takes account of noise accompanying protein activity and genes transcription, whereby it also comprises stochastic terms.
In order to obtain the necessary huge computation power, the method of simulation according to the invention is preferably executed by means of computers with parallel architecture, or by means of supercom- putation technologies.
The numerical processor 1 ensures the existence of asymptotic stationary states, which describe the state on long time of the network; from the biological point of view, such asymptotic states describe the cell activity as a response to an external input.
In the numerical integration of the differential equations describing the network, the method according to the invention allows the operator 2 to select both time step and duration of the integration. Going more in detail through the mathematical formulations used by the numerical processor 1 , chemical species which are handled are often proteins which may be in two states, an active state (wherein they have an enzymatic activity) and a non active state (wherein the enzymes may only operate as sub-layers for another active enzyme). Therefore, for each protein, it is useful to introduce the active fraction variable x, defined as the ratio between the number of active molecules of a given species and the total number of molecules of the same species. Indicating with Mi the
i-th molecular species and with [Mj] its molar concentration, the concentration of the active form will be equal to Xι[Mi], while, in the case when variable x is normalised to 1 , the concentration of the inactive form will be equal to (1 - Xi)[Mj]. This choice of variables is suitable to describe the enzymatic reactions which are simulated, which are activation (most of all owing to phosphorylation) or deactivation (most of all owing to dephos- phorylation) reactions of the sub-layer. Hence, in case of an activation reaction, the j-th enzyme (that is a protein in its active state) with concentration Xj[Mj] acts on the i-th sub-layer, that is in turn an enzyme in its in- active state with concentration (1 - Xi)[Mj]. By using a kinetics based on mass action, the speed of such reaction will be proportional to the two concentrations, through a constant that is the reaction kinetic constant. These equations are involved in the definition of the transfer functions of the neural network nodes. The numerical processor 1 also takes account of the fact that the enzyme is expressed by genes and it is degraded after a certain time. In other words, the numerical processor 1 takes account of the dynamics of enzymes, i.e. of the processes of creation and destruction of enzymes. By way of example, in case of simulation of the lac operon these processes are explicitly inserted into the equations.
The same equations which have been justified with the mass action law may be obtained starting from the particular case of the Michaelis- Menten (MM) kinetics in conditions of very low concentration [S] of the sub-layer, that is in the situation in which [S] « Km, where Km is Mi- chaelis constant. In these conditions, it is:
^ = k2 -[Et]-[ l
where k2 is the catalysis constant and [Et] is the enzyme total concentration, assuming that all the enzymes are active (i.e., there is saturation) and that [Et] is constant. In the kinetics of the network of proteins such concentration is not constant, but in this case MM general equations are still valid; it is however necessary to take account of the fact that also [Et] is a kinetics variable. Finally, by taking account of the fact that in the case of the network of enzymes the "product" is nothing more than the activated "sub-layer", the last equation may be redrafted as:
[ !.]- ^ = ^, - [ !]. [ .] - x . . (l- I.)
where y is the catalysis constant between the j-th active enzyme and its i-th inactive sub-layer.
These considerations are important, because assuming the kinetics of activation/deactivation of the enzymes as a Michaelis-Menten kinetics, it is possible to link the constants appearing in the general equations to kinetic constants which may be experimentally measured. Then by simplifying the concentration [Mj] of the active enzyme, the last equation may be redrafted as
10 dx{
— - = a ■ ■ x ■ • (1 — x ) dt υ J , } where ay = ky[Mj] (the constant has been redrafted in matrix form, ai} , because the network of enzymes is formalised by a system of differential equations in which various enzymes acting on different sub-layers will appear; it is then necessary to specify, for each pair of activator/sub-layer, which is the proper kinetic constant: the first index refers to the activator number, the second index refers to the sub-layer number).
However, it is to be noted that in the simulator all these subtleties are lost. Through the graphic interface 3, the operator 2 will see only "blocks" containing the enzyme name, in the case when the block represents the active or inactive "concentration" of the enzyme, or the kinetics characteristics in the case when the block is that representing the simulation of the chemical reaction. A system of similar equations is also valid in the case of deactivation of active enzymes (it is the case of phosphatases which generally carry their sub-layer back to an inactive state):
[ I.]-^^= ^ -[ !.].[ .]. ,. .χ.
where hy is the catalysis constant between the j-th active enzyme and its i-th active sub-layer. The latter formula may be expressed as follows:
By introducing the constants by =[M}]-hυ , similarly to what done for the a constants, the following equations are obtained for the inactivation reactions: dxi y m=-h*-x' x' At this point, the numerical processor 1 constructs the system of differential equations that takes account of the fact that all the enzymes interact among them, with activation and deactivation kinetics. By extending the mass action law to all these possible reactions, equations which are integrated by the numerical processor 1 are reached: dx N ~T = ∑(aυ - X J ~ Cv ' *, ■ *, ) . With Cy = ay + by. at j—
As said before, the numerical processor 1 , on demand from the operator 2 through the graphic interface 3, adds "birth" terms (related to the expression of the gene of the single enzyme) and "death" terms (related to the degradation of the same enzyme) to these equations. Finally, from numerical integration of the systems of the obtained differential equations, the numerical processor 1 shows on the graphic interface 3 the curves of the variations of the involved elements.
The numerical processor 1 is further capable to integrate equations with time delay. This requirement is necessary to better represent the biological transport effects: a given molecule (for example a kinase) is normally activated in a compartment of the cell, it moves around the environment and it acts in a position different from the initial one. The delay mentioned here is nothing more than the time necessary to carry out this position change. To this end, the numerical processor 1 preferably keeps in memory the concentration values of some of the variables calculated in preceding time.
The graphic interface 3 preferably comprises a "button" selectable by the operator 2 that automatically introduces a time delay on the desired variable. In the case when a given molecule always behaves in the same way in any cellular type (same time delay), this datum is directly stored in the database 4, and the numerical processor 1 will automatically introduce this information each time that the network is constructed by inserting such an element.
Preferably, the numerical processor 1 is capable to integrate differential equations with steadiness control based on multistep type numerical integration.
As it has been said above, the method according to the invention allows intracellular environment compartmentalisation or biomolecule diffusion to be taken into consideration, so taking account of cell spatial component.
As said before, proteins are considered as fixed material points in space, as if a single protein is activated in a certain point in space and it acts there on its sub-layers. By considering intracellular environment compartmentalisation, the method may take account of the fact that some enzymes act on their own sub-layer only in certain cellular compartments (for example, transcription factors interact and activate transcription only in the cellular nucleus). In other words, compartments within the cell are space regions within which the protein behaves in a way different from how it behaves outside of them. This ensures a correct simulation of some phenomena drastically depending on the spatial component: for example, the role of calcium ion within a cell and oscillatory phenomena related to its intracellular concen- tration may be understood by only introducing space transport and diffusion effects. In technical terms, differential equations describing waves and oscillations of intracellular calcium are not any more ordinary, but they are partial derivative ones: the mathematical model of the numerical processor 1 describing a so complex phenomenon is formed by partial derivative differential equations.
By way of example, Figure 3 schematically represents the complex process of the dynamics of the calcium ion within the cell, that may cross channels present on both cellular membrane and membrane of internal stores, it may diffuse through the cytoplasm and interact with specific molecules of each one of these compartments (for example, the cytoplas- mic calmodulin). Thanks to the modelling of the cellular compartments, the method according to the invention is capable to simulate such dynamics of the calcium ion.
Compartmentalisation allows dealing the fact that the cell interior is not an homogeneous environment wherein molecules remotely act so as to activate/deactivate each other. Some molecules, once activated, tend to move in the cytoplasmic gel, through either an active transport or sim-
ple passive transport. Then, some tend to move among the cellular compartments, operationally (and sometimes also physically) separated from the rest of the cellular space and within which chemical reactions are different from those occurring elsewhere. Examples of such compartments are the plasmic membrane (the cell external membrane), to which numerous molecules are anchored by chains of fats and on which these molecules interact in a specific way and move by diffusion as within a liquid; mitochondrions, in which oxidation reactions of the "cellular breathing" occur; the endoplasmic reticulum, in which just synthesized proteins are modified so as to be capable to be put in the cellular circle; and most of all the cellular nucleus, in which DNA is contained, which is the keeper of the genetic information (the "genome") that is used for the proper cell operation. An example of protein moving among compartments is the already cited MAPK, a kinase that is activated within the cytoplasm but the action of which occur within the nucleus, after that both diffusion and active transport through the membrane covering the cellular nucleus are occurred. Another important molecule within the cell, subject to compartmentalisation, is cyclic adenosine monophosphate cAMP.
Similarly, within the nucleus, MAPK interacts with a specific activa- tion/deactivation network of this compartment. Also kinase MEK/MAPKK enters the nucleus by diffusion and here it continues to interact with its sub-layer ERK/MAPK. This means that the system of equations describing the interactions limited to the pair ERK-MEK will contain, in principle, four variables instead of two. In fact, when considering the complete system, the variables would be eight (active MEK, inactive MEK, active ERK, inactive ERK, in the cytoplasm; active MEK, inactive MEK, active ERK, inactive ERK in the nucleus), but the fact that the concentration of enzymes may be considered constant binds the variables in pairs (for example active ERK + inactive ERK = constant), so that independent variables are only four.
In other words, in the two compartments similar sets of equations are valid but they concern variables having different biological meaning. In fact interactions between ERK and MEK are identical in the two compartments, even if the active transport of the dimer ERK-MEK from the nu- cleus to the cytoplasm must be formalised with a new variable [ERK-MEK] intervening in the dynamics of the nuclear assembly of these two molecules.
Partial derivative differential equations, with which the numerical processor 1 describes the dynamics within the compartments, may be provided with possible terms of diffusion formalised through the Fick law with a term as a2[cv+] D°r -aT-
where constant DCfl2+ is named diffusion coefficient of the ion Cα2+ . Diffusion coefficients, for diffusing molecular species, differ according to the compartment in which the molecule is. Form the point of view of the graphic construction on the graphic interface 3, each time that the opera- tor 2 inputs (by operating, for example, on an element of the graphic interface 3, such as a button or a pull-down menu) a diffusing molecular species, the diffusion term is automatically introduced in the equations of the corresponding module.
Also, the compartment-like structure is constructed by subdividing the whole cellular space (roughly considered as a sphere) in sub-spaces at the border of which border constraints of the equations describing the dynamics of the molecules within them are imposed. Such conditions must be joined with those of the environment or compartment with which a given compartment borders. On the graphic interface 3, it is preferably present a selectable "button" corresponding to the definition of a compartment: by using it, the operator 2 will have to specify the characteristics of the compartment that he/she defines, such as the type of cell and the type of compartment, starting from physical-chemical data of the corresponding cell (such as cell size, compartment-like structure, compartment size and position).
In other words, the existence of the compartments lead to a proliferation of the equations of those molecules which move from a compartment to another or which exist in different compartments. By way of illustration, when the operator 2 inputs the calcium molecule in his/her model, he/she effectively introduces a plurality of equations of the dynamics of calcium, one per compartment (with the proper border constraints and spatial conditions of validity of such equations, imposed once that cell geometry has been defined).
As previously said, the database 4 contains the values of all the ki- netic constants necessary to conduct simulations, classified by species, by cellular type and by physical-chemical conditions. The database 4 is
intelligent in the sense that, in simulations, it is possible to assume interactions among proteins which have not been experimentally discovered the data of which are consequently not stored in the database 4. With an extrapolation operation, the numerical processor 1 is capable to find out the values of the interaction constants allowing the neural network to better reproduce the real behaviour of an overall cell, through the introduction of an "energy" function of the neural network that is preferably optimised through the known Monte Carlo methods. These "theoretical" values of the constants related to the assumed interactions are automatically inserted in the database 1.
The database 1 may be also updated with new experimental data. In the preferred embodiment of the method according to the invention, the sequence of operations that the operator 2 will have to perform on the graphic interface 3 for accessing data concerning a single protein (enzyme) (bearing that not all the enzymes are proteins in mind, as for example ribozymes made of RNA) are:
- searching the type (with, type it is meant the type of network to be simulated: either proteic or metabolic);
- searching the species (each living species has enzymes differing from those of other species as to activity);
- searching the cellular type (within a pluricellular organism, the various cellular types may have different isoforms of the same enzyme, with different activity);
- searching the pathway (there are signal transduction paths which are kingdom-specific and species-specific, i.e. which are not present in all the living organisms: for example, photosynthesis path is present only in vegetable cells, or urea/glyoxylate path that ha different complexity in amphibians, birds and mammals);
- searching the protein/enzyme (the search may occur through EC num- ber international nomenclature, or according to the protein/enzyme name, and inputting the protein name in a corresponding field it is preferably obtained a pull-down menu containing the list of all the sublayers of this protein/enzyme);
- searching the sub-layer. At the end of these operations, kinetic constant values of the enzyme operating on this sub-layer are univocally determined.
Since two types of processes (metabolic and proteic) are provided,
some of them, as for example the lac operon activation (which is a mix of variables having different significance), have to make use of the two data types: the simulator has to retrieve the necessary data from the database 4, in particular it has to fetch the kinetic constants Vmax and Km of MM equations, the initial concentrations of the sub-layers O, and the biochemical characteristics of involved molecular species, such as DNA and RNA (transcription and transduction). Alternatively, the operator 2 may directly input at least some of these values through the graphic interface 3. Preferably, the database 4 comprises a metabolic database and a protein database.
The graphic interface 3 may further display all the available descriptions (of experimental nature) for each element, also stored in the database 4, as for example the conditions in which the experiment has been conducted to which experimental data correspond and, in general, all the information on the enzyme not strictly concerning the simulation. Alternatively, the graphic interface 3 may display an hyperlink directed to a web page including a description of the corresponding element.
For each one of the parameters different values may be present which are obtained by varying temperature and pH conditions, all stored in the database 4. The graphic interface 3 allows the operator 2 to choose which set of values will be used by the numerical processor 1.
The graphic interface 3 preferably comprises fields, placed in correspondence with each experimental datum, selectable by the operator 2 for confirming the use of such datum. When the use is confirmed, the graphic interface 3 transfers the value to the numerical processor for its insertion in the system of differential equations.
Since the method according to the invention allows simulating dynamics of sub-layers, dynamics of the active fraction of enzymes and genic regulation/RNA transcription/protein encoding/kinetics of sub-layers (for the CP, it is the lac operon), it also analyses the kinetics of the transcription/regulation of DNA. Data related to this kinetics, for example mRNA concentration, are also stored in the database 4.
Preferably, the graphic interface 3, through which the operator 2 graphically constructs the protein network, comprises dynamically variable windows containing information on the single proteins (obtained from database 4) and/or reporting the time curve of the single activity (resulting
from the numerical integrations). In this way, the operator 2 graphically constructs the system of differential equations provided with the parameter values by simply using the mouse of his/her computer. In fact, this interface makes possible transduction, automatic and transparent to the operator 2, of the graphic scheme constructed by him/her into a system of differential equations which are then integrated by the numerical processor 2.
In other words, the graphic interface 3 makes possible determination of the equations to be used and selection of the elements participating in them by means of the simple selection of a button, linking the various terms by means of the use, for example, of arrows which, depending on the directions which they indicate, make an element operating as sublayer or as reaction product. In this way all the mathematical formulation that is at the base of the behaviour of cellular chains is made transparent to the operator 2 who, without any mathematical complication, is capable to start the simulation of the processes which are of interest to him/her, giving to the method the task of automatically determining the right mathematical relations present among the elements, of linking the various elements and hence of solving the system of so obtained differential equations.
In particular, the graphic interface 3 preferably comprises selectable buttons making possible definition of:
- compartmentalisation of a molecule;
- spatial diffusion; - time delay;
- stochastic noise.
Selection of elements preferably occur through a simple menu and/or a "drag and drop" action. The graphic interface 3 further permits the possible indication of the substance name, its initial concentration, entry of a note near the element (the operator 2, for example, could need to note some characteristics of a given enzyme, reactions to which it participates, and the EC code), and selection of specific background colours for the elements. At least some of these data may possibly automatically appear, if stored in the database 4. Similarly, for inserting a reaction, for example of the mass action type, the graphic interface 3 makes possible an operation of the type "drag and drop" from a work board to the active window of the "reaction"
element.
For some types of reaction in which a certain substance is given in a continuous way, it is possible to bufferize the concentration by clicking a button related to the current concentration. Also for the reactions it will be possible to write a short note in text format containing essential information on the enzyme properties and an essential bibliography from which new data may be taken. These data may possibly automatically appear if present in the database 4.
For the reactions in which two or more sub-layers are necessary for obtaining a product, it is provided the possibility of activating an option of "higher order reaction", present as a button on the main board, for example by clicking on an "option" button that opens a pull-down menu in which a higher order reaction activation button is present.
In each cell, apart from details of some signalling paths, signalling or metabolic paths (as the glycolytic cycle) repeats and they vary from species to species only for few details. The concept of the intracellular network just consists in this "universal" structure of the intracellular environment: with different connectivity, all the cells use the same "modules" (cascades) for making as specific the information reaching the nucleus and propagating in the cytoplasm and for determining the nature of its own proteome. The preferred embodiment of the method according to the invention manages a modular structure of the signal transduction paths.
It is hence provided the existence of modules containing cascades of "block-like" reactions, one per cellular type. The kinetic constants neces- sary for making these reactions specific are store in the from which they are automatically fetched when the operator 2 specifies the module corresponding to the cellular type that he/she wants to select for the specific simulation he/she is constructing. Within the database 4 information of each "block" (all the reactions participating to a given cascade) are pres- ent. Each cellular type containing such cascade has specific connections with other active cascades (in term of both kinetic constants and true connectivity), whereby each predefined module is conceived as having various "inputs" and "outputs" that the operator 2 suitably connects with the inputs and outputs of possible other modules. What is obtained is a "layer-like" structure: the operator 2 is not constrained to construct his/her cellular model on the same work windows, but he/she will construct a structure in which each layer contains a cascade connected with the other
cascades in the "second dimension".
Hence, the set of single molecules does not appear any more on the graphic interface 3, but the connected network of cascades of signal transduction and metabolic cycles do, and the operator 2 may choose to display only one of these cascades (a map of cascades to which the current cascade is connected will appear on the work window).
Each predefined module is a series of chemical reactions in cascade having roughly linear structure. For example, Figure 4 shows the module of the cascade of kinases of the proteins activated of mitogenes, or MAP. These modules are kept by the most simple species (for example yeasts and bacteria) up to the most complex species (as man), whereby a series of universal modules, adapted to the single species, are inserted in the database 4.
On the graphic interface 3, it is preferably present a module selec- tion icon, by activating which a stored modules menu opens. Since the cascades of reactions are species or cellular type specific, the choice of the specific module must follow a series of selection operations similar to that with which it is possible to access the single molecules stored in the database 4. The choice of the specific module leads to the construction, within the numerical processor 1 , of a system of differential equations. In the case of the module of Figure 4, formed by the proteins MKKK, MKK and MAPK, the three concentrations (actually active fractions) [MKKK], [MKK] and [MAPK] and the three equations of the chemical kinetics are automatically introduced, the equations having the following logic: the concentration first derivative is proportional to the product of the concentrations of the upstream active protein and of the downstream inactive protein. Indicating with [A*] the generic active concentration and with [A] the inactive concentration of a given chemical species, and with kt the single kinetic constants (parameters) of the chemical reactions, it is:
A i - upstream kinetics dt ~
d[MKK*] = k rMKKK* [MKK] dt
d\MAPK = Λ tA4 :*][AΛ4 K:] dt
where "upstream kinetics" is the activation kinetics of the module first element: the module requires that the operator 2 specifies which is the protein that activates the first element of the cascade. It is foreseeable that the last element of the cascade, i.e. of the module, is connected to some module located downstream. In the record of the module present in the database 4 the existence of possible downstream modules is specified. For example, downstream the MAPK cascade there is a network of strictly compartmentalised molecular interactions, namely the interactions among the molecular factors providing the transcription of the genes. It is also possible that a transduction path is not completely linear, but it has one or more "forks".
Figure 5 shows an example of interaction among modules related to the simulation of the role of the MAPK path in cellular senescence and proliferation. One of the key molecules in the cancerous transformation of cells is the small Ras molecule, the mutations of which may lead to the constitutive activation (i.e. not controllable through biochemical signals) of the MAPK proliferation path. HaRasV12 mutation modifies the behaviour of two signal transduction paths in mammalian cells: p38 kinase path, leading to cellular senescence, and JNK kinase proliferation path. Mutated HaRasV12 molecule acts on these two paths through MAPK cascade. In Figure 5 the three modules MKK3/6-p38-p53/p16, JNK-AP1, Raf-MEK-MAPK are well recognisable. In particular, the first two modules are confined to the nuclear compartment, while the third one is cytoplas- mic; ERK kinase act as a bridge between the two compartments, being activated in the cytoplasm and becoming an effector only once it is moved in the nucleus). In the following reasoning it is assumed that the cell does not include compartments, even if the formalisation of the latter situation is fairly direct.
This simulation is graphically constructed by the operator 2 on the graphic interface 3 by collecting the aforesaid three modules. In each module, graphic "punctiform connectors" are programmed, so that it is possible to specify that not only the MAPK module is connected to the p38 one, but also that it is the ERK element that is connected to MKK3/6 kinase; the same is valid for the connection of the ERK with the AP-1 ele- ment of JNK module.
From the point of view of the construction of the mathematical model, in the database 4 the three independent modules are stored. The first
module related to the p38 path gives the following equations: d[MKK3/6*] r . .
— dt - = u rpstream - kinetics ^38*] kλ[MKK?>l6*][p3$\ dt
<*l 53*] = *2| 38*]| 53] dt
The second module related to the MAPK path gives the following
equations: d[Raf*]
= upstream _kinetics dt
d[MEK*] = k^Rqf*-^MEK] dt
dtERK*] = kA[MEK*][ERK] dt
The third module related to the JNK path gives the following equations:
— = upstream kinetics dt ~
d^AP ~ 1^ = h [JNK*][AP - 1] dt
Once graphically connected the three modules as in Figure 5, the numerical processor 1 constructs the following system of equations with HaRasV12 that is active in constitutive way (d[HaRasV12*]/dt = 0, with concentration value fixed on the basis of the experimental data):
&£1 = k
w[HaRasVl2*][Raf] dt
d ERK*]
= k^
MEK*
[ERK dt
d[λ KK3/6*]
= kϊl[ERK*][MKK3/6\ dt
d[P3S*]
= kl[MKK3!6*][p38] dt
d[p53* = k2[p3S*][p53] dt
d[JNK*] ku[HaRas*][JNK] dt
d[AP -l*] k6 [JNK*][AP - 1] + *13 [ERK*][AP - 1] dt
where the terms of interaction among modules with the additional kinetic constants &10, kn ι kn and k have been automatically introduced. The graphic interface 3 requires the introduction of the numerical values of the new constants, or the creation of new records (related to these interactions) within the database 4, so that it is possible to re-use them for future simulations.
The method of simulation according to the invention allows taking account of the "promiscuity" of some molecules. By way of illustration, cal- cium ion Cα2+ participates to various signalling paths, besides passing through various membranes (also the plasmic one dividing the cell from the surrounding environment) by means of specific channels permeable to calcium only. When the connected module network is constructed on the graphic interface 3, the method takes account of this promiscuity. Another classical example is represented by ADP and ATP molecules, which are essential for the biochemical reactions requiring energy exchange (see for example the cascade of the glycolytic cycle, in which ADP/ATP molecules participate to a plurality of reactions in various stages of the cascade).
These co-factors are preferably represented on the graphic interface 3 with separated icons which are connected to all the modules (also the ones representing different compartments) constituting the network of a given cellular type.
The inventors have carried out some test simulations with the
method of simulation according to the invention.
A first example of simulation has concerned the simulation of metabolism of sugars by a bacterium, of which suitable experimental data to be used as input data for the simulations are available. As known, they are essentially chemical reactions in which one or more enzymes catalyze the transformation of the intracellular sugars, the classical modelling of which comprises the use of the Michaelis-Menten equations. The parameters appearing in these equations are the experimental values of the kinetic constants (affinity and speed of reaction). In principle, knowing experimental values of these constants and knowing the mode according to which reactions occur, it would be possible to reproduce in silico the whole cell metabolism. With the method according to the invention, a linear chain of cascaded reactions has been constructed on the graphic interface 3, without branches. The results obtained with the method ac- cording to the invention are in accordance with the experimental data.
A second example of simulation has concerned the simulation of the regulation of bacteria operons, that represents a refinement and an extension of the preceding approach. In fact, the equations describing these processes are more complex and involve transcription and transduction of the information content of genes, integrating genomics data with proteomics ones. Constructing the various molecular species (proteins, sugars, RNA and DNA) of bacteria (which have no cellular nucleus) on the graphic interface 3, the simulation results are still in accordance with the experimental data in the case of the regulation of the well known lac op- eron.
A third example of performed simulation has concerned the intracellular signal transduction towards the cellular nucleus. This is completely different from the previous types of simulations since it presumes that the intracellular proteins are interconnected as a network of interactions. Each protein potentially interacts with all the other intracellular proteins, and these interactions lead to either the activation or the deactivation of an enzymatic activity. It has been chosen a well known phenomenon related to the PC12 tumour cells of rat. Without going into details of this phenomenon, the link of different extracellular growth factors with the re- lated membrane receptors leads to diverging ends of the same not differentiated cells, even if the two intracellular signals substantially use the same proteins for reaching the nucleus. This phenomenon has been cho-
sen because it shows that the specificity of the intracellular signal may be not due to a specificity of the components of the transduction system, but to quantitative differences in the activity of shared components. Also in this case, the performed simulations on the system of equations describ- ing the behaviour of the elementary network led by RAS protein in PC12 cells have given results in good accordance with the experimental data.
A fourth example of performed simulation has concerned a network of signal transduction having a receptor tyrosine kinase (RTK) upstream, i.e. anchored onto the plasmic membrane. Experimentally ascertained oncogenic products, i.e. those proteins the mutations of which lead to the proliferation and tumour behaviour of the cell, are numerous. Among them there is the Ras protein, on which some simulations have been carried out through the method according to the invention. The method according to the invention easily allows reproducing the constitutive activation of this protein and the tumour behaviour of the PC12 cell, that makes use of this signal transduction path. Also in this case, the simulation has been successfully performed, giving output data in accordance with the experimental ones.
A fifth example of performed simulation has concerned the metabolic schemes of the glycolysis. All the co-factors present in the cycle (ATP, NAD, FAD, GTP) have been ignored.
The choice of δ G° for obtaining reaction constants (δGσ = -2,303-R T IogKeq ; Keq = ka/kb) (even if it is preferable to use δG°, because δ G° is calculated in standard conditions which are usually not found in vivo while δ G° refers to real conditions).
Glycolysis is a linear metabolic path in which metabolites are both product of a reaction and sub-layer of the successive one. For this reason, analysing the metabolic path alone, disconnected from the cellular network, the metabolites should have a concentration time curve that may be described through a monotonic, either increasing or decreasing, function that corresponds to either accumulation or consumption of a certain metabolite in relation to the Keq values of the upstream and downstream reactions.
Figure 6 shows the chain constructed on the graphic interface 3 for the simulation of glycolysis. The initial conditions of the simulation have been set as follows: [Glucose] = 100 μM
The good results obtained by the method of simulation according to the invention are shown by the fact that the kinetic curves reported in Figures 7 and 8 reflects what can be expected by analysing the values of constants.
A sixth example of performed simulation has concerned the metabolic schemes of the Krebs cycle. Since the Krebs cycle is a cyclic and non linear metabolic path, there are some metabolites having a particular role, such as AcetilCoA, H20 and CO2. These compounds are not directly "along the line" on the cycle, i.e. they are not both product of a reaction and sub-layer of the successive one, but they are only either sub-layers or products and their kinetics should present curves only either decreasing or increasing.
The role of CoA is still more particular. Since within the Krebs cycle CoA is both produced (cytricsynthetase, succinilCoA synthetase) and consumed (ketoglutaric dh), it represents a sort of cycle in the cycle. The concentration time curve of this co-enzyme should be describable through an initially oscillating function that then stabilises on a value representing the cycle dynamic balance. Figure 9 shows the chain constructed on the graphic interface 3 for the simulation of the Krebs cycle. The initial conditions of the simulation have been set as follows: [AcetylCoA] = [oxalacetate] = 10 μM, [H20] = 20 μM, ka = kb = 1 for all the cycle reactions.
The values of the constants are the following:
The good results obtained by the method of simulation according to the invention are shown by the fact that the kinetic curves reported in Figures 10 and 11 fit well what is apriori foreseeable through a qualitative analysis on paper. The good accordance between what has been assumed for the key compounds and the qualitative curve obtainable from the simulation is particularly worth to be noted.
The results obtained from the simulation carried out through the method of simulation according to the invention are fully in accordance with what experimentally observed.
The advantages offered by the method according to the invention are evident, and the number of its possible applications is large.
First of all, considering that the deregulation of some proteins leads to tumour proliferation, and that these proteins have a counterpart in the simulated network in those elements the unregulated activity of which leads to an aberrant output towards the cellular nucleus, the use of the method according to the invention could lead to the discovery of potential targets of the pharmacology research (such as new oncogenes or simply proteins the dysregulated activity leads to the pathological state). In this way, the research would be directed towards a drug having well precise characteristics, with evident reduction in use of both human and material resources, with great advantage on time to clinical use of a product.
Still, thanks to the method according to the invention, the formulation of a complex modelling of the global behaviour of the cytoplasmic com- partment of a cell, nowadays still non-existent, makes available to pharmaceutical and chemical industries, research institutes such as universities and scientific laboratories, and also biotechnology oriented information companies, a conceptually new instrument for research and manu-
facture of new products in a simple, fast, inexpensive, and effective way.
By way of example, a pharmaceutical industry nowadays wishing to achieve a new drug, is constrained to face a research that could last many years and involve huge expenses, besides having uncertain results. Ex- penses may be quantified also in millions of Euros.
It is evident that a research supported by information means directing it towards specific molecular targets would quicken drug study and clinic test time and would make possible a drastic lowering of costs, or a better distribution of financing available to the research. In fact, consider- ing that the implementation of an experimental protocol requires staff training, purchase of specific (and sometimes very expensive) reactants and the use of laboratory animals, definition of a well precise target of the research has a positive effect on all these aspects.
From a more purely medical point of view, it is possible to shorten the time to market of effective drugs, aspect that has a great and also ethical value.
From the point of view of the true experimentation, it is possible to reduce the number of cavies to be used, of patients on which human experimentation stage is to be carried out; it has not to be neglected that the use of a virtual cell "living" in a computer could solve the problem of the experimentation on stem or germinal line cells, with an evident positive effect on transparency and ethical correctness of pharmaceutical research.
Not least, the use of a reliable instrument of simulation also makes possible a high saving for the National Public Health System, with a reduction of costs of the pharmaceutical products and of the related therapies.
The method of simulation according to the invention, that is apt to integrate the proteome with the genome, is a useful instrument for check- ing modifications in modifications of the cellular behaviour in presence of specific changes of its genome.
The construction of the networks may be carried out independently from the requirements of pharmaceutical nature and it may hence make possible the construction of theoretical models in the field of biology, it may be used in the teaching field (also at a university level) as a useful instrument for understanding the quantitative behaviour of the cell.
The preferred embodiments have been above described and some
modifications of this invention have been suggested, but it should be understood that those skilled in the art can make variations and changes, without so departing from the related scope of protection, as defined by the following claims.