CN110928757B

CN110928757B - Performance analysis method for positioning HDFS (Hadoop distributed File System) key low-efficiency function based on Bayesian network

Info

Publication number: CN110928757B
Application number: CN201911163380.XA
Authority: CN
Inventors: 杨海龙; 刘一; 陈鹤; 李云春
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2019-11-25
Filing date: 2019-11-25
Publication date: 2021-03-23
Anticipated expiration: 2039-11-25
Also published as: CN110928757A

Abstract

The invention relates to a performance analysis method for positioning a key low-efficiency function of an HDFS (Hadoop Distributed File System) based on a Bayesian network, which is widely applied to big data application platforms such as Hadoop and Spark, and takes the HDFS (Hadoop Distributed File System) as a default Distributed File system. When the distributed file system provides support for upper-layer applications, the whole big data application is low in execution efficiency due to the inefficiency of certain functions, and the detection of the key inefficient functions is helpful for a big data application developer to improve the performance of the big data application. According to the key low-efficiency function analysis method for the HDFS, statistical analysis is carried out on function running time and I/O data volume information obtained by system instrumentation, low-efficiency probabilities of the functions are calculated, and then the key low-efficiency function optimized by the value of the HDFS source code is found out according to a Bayesian network of which the low-efficiency probabilities are established.

Description

Performance analysis method for positioning HDFS (Hadoop distributed File System) key low-efficiency function based on Bayesian network

Technical Field

The invention relates to a performance analysis method for positioning a key low-efficiency function of an HDFS (distributed file system) based on a Bayesian network, which is used for performance analysis, resource monitoring, performance bottleneck diagnosis and visualization of a big data distributed file system.

Background

In the past decades, the rapid development of internet technology and the popularization of terminals such as computers and mobile phones have led to the internet going into thousands of households, and the increase of data in exponential order has brought about, which requires the support of big data and distributed computing. Analyzing a big data storage system can be analyzed from four levels: bare devices, local file systems, distributed file systems, big data applications. Among them, the distributed file system is an important ring. The distributed computing needs to be supported by a distributed file system, so that the analysis and optimization of the performance of the distributed file system have very important research and application values. Today, widely used big data application platforms, Hadoop and Spark, all use hdfs (Hadoop Distributed File system) as a default Distributed File system. Therefore, the performance of the HDFS is optimized, and the method has very important research and application values.

In previous researches, most of the HDFS is analyzed as a black box, so that the root cause of the performance problem cannot be found by coarse-grained analysis, and therefore the performance problem of the HDFS cannot be fundamentally solved by the optimization scheme provided by the method. There are also studies to locate performance bottlenecks by Trace analysis, but the method of analysis is not deep enough and systematic. The existing HDFS analysis methods are roughly the following methods:

(1) benchmark test

For the performance analysis of the distributed file system, many tasks are to count the read-write time by adopting Benchmark or running some simple test programs. Some HDFS are subjected to performance evaluation by using a testDFSIO carried by Hadoop. Some use two typical Benchmark, TeraSort and TestDFSIO, comparing the Hadoop performance on HDFS and Lustre file systems. Some Performance Evaluation Process Algebras (PEPA) are used, a formal language is used for analyzing the Performance of the HDFS, however, only the Performance index of the level of the response time of the write operation can be obtained, the granularity is too coarse, and the Performance index cannot be deeply optimized in the HDFS system;

(2) statistical-based analysis method

The method mainly comprises the steps of inserting piles into the distributed file system, obtaining detailed bottom information, calculating time information used by a task to execute each step, and analyzing each function through a statistical method so as to find out the reasons of inefficiency. The method has the disadvantages that the method for searching the low-efficiency function is completely based on a statistical method, some methods only carry out simple time comparison, and select the low-efficiency function with longer time, so that the low-efficiency reasons obtained by the method are not comprehensive enough, and meanwhile, the method has no improved feasibility;

(3) local optimization

With respect to performance optimization of HDFS, many studies are optimized for only one point of its performance problem. For example, some methods optimize a redundancy backup strategy of the HDFS, some methods optimize the problem that the HDFS has low performance in processing small files, and some methods specially optimize name nodes of the HDFS. None of these allow for global optimization of HDFS performance from a macro level.

In summary, the prior art is too coarse in granularity or not comprehensive enough, and cannot perform performance analysis and optimization on the HDFS from the whole situation.

Disclosure of Invention

The invention solves the problems: the method overcomes the defects of the prior art, provides a performance analysis method for positioning the key low-efficiency function of the HDFS based on the Bayesian network, can position the function-level fine-grained key low-efficiency function, and has global property.

The technical scheme of the invention is as follows: a performance analysis method for positioning a key low-efficiency function of an HDFS (Hadoop distributed file system) based on a Bayes network is characterized in that function characteristic information is obtained based on HTrace instrumentation, and then the function is subjected to Bayes network and statistical reason analysis to obtain the key low-efficiency function of the HDFS, and the method has the advantages that the low-efficiency reason analysis of an HDFS layer is identified from a finer granularity, so that a user can conveniently position a performance bottleneck, and a distributed file system is improved, and specifically comprises the following steps (1) - (8):

step (1), performing function level source code instrumentation on the HDFS;

the method is characterized in that a probe, namely a code for performance acquisition, is inserted into a function in an HDFS source code when the pile is inserted, the function is shown in table 1, so that the time stamp of the function inlet and the function outlet of the HDFS to be inserted and the data volume characteristic of function reading and writing can be obtained in the running process of an application program, the code for calling the pile insertion function is inserted into the function inlet to be inserted, and a function interface used by the pile insertion is as follows:

TraceScope newPathTraceScope(String description,String path)；

TABLE 1 instrumentation objective function

Sampling data obtained by pile insertion;

when probes are inserted into certain code segments that are short in execution time but very frequent, the performance of the program may be severely disturbed by instrumented code. Furthermore, if the execution time of the program is long and the system is large in scale, the data generated therefrom is sometimes too large to be stored and analyzed. The method comprises the steps of selecting an improved HTrace to sample, wherein a sampling method is a token bucket algorithm, parameters are set to be the size of a token bucket to be 1000, and time intervals are 180 ms;

calculating the low-efficiency probability of the function;

and (3) carrying out statistical calculation on the function execution time obtained in the step (2). For the function related to I/O, the calculated index is the read-write time of unit data, and the calculation method is as follows:

wherein the content of the first and second substances,

representing the execution time of the function f at the i-th execution,

the data amount read and written by the function f at the i-th execution time is represented. The function that does not involve I/O, the index calculated is the execution time, namely:

for each function f, calculating

The average of the 25% to 75% quantiles, the number of executions that exceeds the average by 1.5 times divided by the total number of executions, this ratio being the probability of inefficiency of the resulting function;

step (4), constructing a function inefficiency probability data set;

executing the application program in the steps (1) to (3) each time to obtain a group of low-efficiency probabilities of the functions as one piece of data, adopting different workloads or different data scales for different experiments, forming not less than 50 pieces of data by not less than 50 times of experiments, and uniformly integrating the experimental data, thereby constructing a data set of the low-efficiency functions;

step 5, constructing a Bayesian network by using a structure learning method through an inefficient function data set;

the nodes of the Bayesian network here represent functions, the parameters of the nodes represent the probabilities of inefficiencies of the functions, and the directed edges represent the interaction relationships of inefficient behaviors between the functions. Bayes structure learning is that a directed acyclic graph representing conditional probability is generated from a data set through a statistical method and Bayes probability calculation; the method uses a Bayesian grading strategy and a hill-climbing search method for structure learning, the Bayesian grading method is a classic and effective grading strategy, the hill-climbing method reduces the search space, reduces the complexity of the algorithm, and enables the method to be more feasible, and the formula of the structure learning is divided into two aspects of a grading function and a search strategy:

bayesian scoring:

g is a variable X in the variable set XⁱA directed acyclic graph of the probabilistic dependency relationship between, D is a sample data set,

for a super coefficient, i is the ith node X of the node set XⁱJ is Pa (X)ⁱ) K is XⁱThe kth value of n_ijkIndicating that the condition is satisfied in the dataset: xⁱ＝x_ik；Pa(Xⁱ)＝Pa(Xⁱ)_jNumber of instances of (c).

Climbing search:

let E be the set of all candidate edges, Δ (E) represents the change in the scoring function after adding a new edge E in the network structure (E ∈ E). Firstly, assuming that an initial network structure is an empty network, selecting a new edge E from a candidate edge set E to enable the new edge E to meet the condition delta (E) which is more than or equal to delta (E '), if the condition is met, adding E into the current network structure, deleting the edge E from the candidate edge set E, continuously searching the next edge E' meeting the condition, and if the edge meeting the condition cannot be found, stopping;

step (6) parameter learning is carried out on the basis of structure learning;

the parameter learning is the probability dependence degree of learning variables relative to father nodes to further obtain a local conditional probability distribution function, two commonly used parameter learning methods at present are a maximum likelihood estimation method and a Bayes method, when the number of records of a data set is insufficient, the calculation precision of the maximum likelihood estimation is usually not high enough, and at some moments, the calculation formula of the maximum likelihood estimation fails, and the Bayes method can effectively overcome the defects of the maximum likelihood estimation, so the invention adopts the Bayes method to carry out, and the specific formula of the parameter learning is as follows:

assuming that the prior distribution of the parameter θ is a Dirichlet distribution, the posterior probability of the parameter θ also follows the Dirichlet distribution, and the maximum posterior estimate of the parameter θ is:

θ＝{θ₁,θ₂,···,θ_ndenotes node X in the networkⁱPa (X) relative to its parent node setⁱ) Is compared with the conditional probability distribution table of (1),

is XⁱIs relative to Pa (X)ⁱ) An estimate of the jth value of (c),

for a super coefficient, i is the ith node X of the node set XⁱJ is Pa (X)ⁱ) K is XⁱThe kth value of n_ijkIndicating that the condition is satisfied in the dataset: xⁱ＝x_ik；Pa(Xⁱ)＝Pa(Xⁱ)_jThe number of instances of (c);

step (7) traversing each function, and executing the following judgment logic for all the functions, wherein the judgment logic comprises the following steps:

(7-1) judging whether the inefficiency probability of the function is larger than a preset threshold value for any traversed function f_lowThe preset threshold value is_lowSetting according to the configuration of a user; if the value is larger than the threshold value, marking the function as an inefficient function, executing (7-2), otherwise, marking the function as not an inefficient function, and continuously judging other functions;

(7-2) setting the obtained inefficiency probability of all the inefficient functions as 100%, under the premise, calculating the posterior probability of other non-inefficient function nodes by using the conditional probability distribution relation learned by the Bayesian network, and if the posterior probability exceeds the given threshold_lowConsidering that the non-low-efficiency function is changed from non-low-efficiency to low-efficiency, and counting the number of the non-low-efficiency functions which are changed from the non-low-efficiency to the low-efficiency after setting the low-efficiency probability of all the low-efficiency functions as 100 percent as N_preAnd continuing to execute (7-3);

(7-3) traversing all the inefficient functions, setting the inefficient probability of any inefficient function f to be 0%, and keeping the inefficient probability of other inefficient functions to be 100%, on the premise that the inefficient probability of any inefficient function f is set to be 0%, calculating the posterior probability of all non-inefficient function nodes, if the posterior probability exceeds a given threshold value, considering that the non-inefficient function is changed from non-inefficient to inefficient, and recording the number of the functions changed from non-inefficient to inefficient as

Judgment of

Whether it is greater than preset threshold_nodeThe preset threshold value is_nodeAccording to the configuration setting of a user, if the value is larger than the preset value, the function f is considered to be a key low-efficiency function, and the function f is executed (7-4), otherwise, the function is not the key low-efficiency function, and other low-efficiency functions are continuously judged;

(7-4) traversing all the inefficient functions, calculating the proportion of the running time of the inefficient functions to the total running time of the program, and judging whether the proportion is greater than a preset threshold value threshold_weightThe preset threshold value is_weightAccording to configuration setting of a user, if the value of the function is larger than that of the key low-efficiency function, the function is a key low-efficiency function worthy of optimization, otherwise, the function is not worthy of optimization;

and (8) displaying a key inefficiency function worthy of optimization and performing cause analysis.

Further, in the performance analysis method for positioning the key low-efficiency function of the HDFS based on the Bayesian network, the source code instrumentation of the HDFS in the step (1) is Trace information acquired by expanding the HTrace. All communication Trace information among name nodes, data nodes and client nodes in the HDFS is obtained, and parameter information of functions including data scale of data sending, receiving, reading and writing operations is collected, so that sufficient data support is provided for further analysis.

Further, in the performance analysis method for positioning the key inefficient function of the HDFS based on the bayesian network, there is no clear definition for the inefficient function in the performance detection of the HDFS, and the inefficient function probability described in this patent has the following calculation formula:

wherein p is_low(f) The corresponding inefficiency probability of the function f; f_low(f) The number of times of inefficient execution of the function f is specifically calculated by performing instrumentation to obtain all the execution times of the function f or the unit data read-write time

Among them, 25% and 75% of the quantiles were found to be within this range

Average and mark as E (t)_f)，F_low(f) For all executions of function f, satisfy

Number of executions of, F_all(f) Is the total number of executions of the function f.

Further, in the performance analysis method for positioning the key low-efficiency functions of the HDFS based on the Bayesian network, the total number of the low-efficiency functions needs to be obtained in the step (7-1), and the judgment standard is a preset threshold value threshold_lowThe calculation formula for counting the number of the original low-efficiency functions is as follows:

Node_f＝1,p_low(f)＞threshold_low

＝0,p_low(f)≤threshold_low

N_low＝ΣNode_f

N_lowrepresenting the total number of inefficient functions, threshold, calculated according to the statistical result before the Bayesian network is established_lowNode representing a preset threshold for determining whether the function is inefficient_fFor the intermediate variable, it is determined whether the probability of inefficiency of the function f is greater than an expected threshold.

Further, in the performance analysis method for locating the key inefficient function of the HDFS based on the bayesian network, in the step (7-2), the process of converting the non-inefficient function into the inefficient function is as follows: taking the probability of the low-efficiency function as 1 as an initial condition, reasoning on the network, changing the posterior probability of the non-low-efficiency function after reasoning, and if the changed posterior probability exceeds a preset threshold value threshold_lowThen the non-inefficient function is deemed to be transformed to be inefficient. The number of functions which are converted from non-inefficient to inefficient is recorded as N_pre。

Further, in the performance analysis method for positioning the key inefficient functions of the HDFS based on the Bayesian network, the number of the inefficient functions converted from the non-inefficient functions in the step (7-3) is less than the number of the inefficient functions

The number of functions that are converted from non-inefficient to inefficient after reasoning on the bayesian network on the premise that the probability of the inefficient function f is reduced from 100% to 0% based on step (7-2) is shown. The inefficiency function f that satisfies the following condition is considered to be a key inefficiency function:

among them, threshold_nodeThe preset threshold value is specified by user configuration.

Further, in the performance analysis method for positioning the key inefficient functions of the HDFS based on the bayesian network, the step (7-4) is a method for calculating the actual running time of the function, which is a method for determining the running time of the function as a proportion of the total running time of the program, and includes the following three methods:

the first mode is as follows:

wherein the content of the first and second substances,

representing the result of the actual running time sought by the function f,

represents the average of all execution times of the function f within the interval 25% to 75%,

representing the average value of the running times of the function f at all the nodes;

the second mode is as follows:

wherein the content of the first and second substances,

representing the result of the actual running time sought by the function f,

represents the sum of all execution times of the function f,

representing the total number of nodes on which the function f runs;

the third mode is as follows:

wherein the content of the first and second substances,

representing the result of the actual running time sought by the function f,

representing the total execution time of the function f on the node, because the Trace information obtained by instrumentation contains the IP address of the node where the function is executed each time, the method can not only solve the problem of the total execution time of the function f on the node, but also can be used for solving the problem of the total execution time of the function f on the node, and can also be used for

Can be simply summed from this information.

In some cases, the determination is made by selecting a method suitable for calculating the time ratio, and in some cases, the determination may be made by using a plurality of methods at the same time.

Compared with the prior art, the invention has the advantages that:

(1) the performance analysis and positioning of the HDFS are carried out on the function level granularity, the performance analysis is carried out on the basis of the function characteristics obtained by the method from the pile insertion to the HDFS, the granularity is fine, and the analysis can be carried out in a deep system.

(2) Because the manual selection of the pile inserting site is adopted, the related pile inserting position can be detailed and perfected to a certain extent, and the method is more comprehensive compared with a method for analyzing based on a specific performance problem.

Drawings

FIG. 1 is a schematic diagram of a system architecture for implementing the performance analysis method for locating the key inefficient function of the HDFS based on the Bayesian network according to the present invention;

FIG. 2 is a flowchart of a performance analysis method for locating a critical inefficiency function of the HDFS based on the Bayesian network according to the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.

The basic idea of the method is to extract the running time characteristics of the functions to calculate the low efficiency probability of each function, generate a Bayesian network of the functions according to the low efficiency probability of the functions, and screen the low efficiency functions according to the time characteristics, so as to obtain the key low efficiency functions.

Fig. 1 is a schematic diagram of a system architecture for implementing the performance analysis method for locating the key inefficient function of the HDFS based on the bayesian network according to the present invention. The HDFS node provides Trace for the system, the Trace merging module merges the Trace, the Trace processing module structurizes function calling information, the Bayesian network construction module structurally expresses the Bayesian network and then gives the structural expression to the Bayesian network inference module for probability inference, and the visualization module is responsible for showing key low-efficiency functions to users.

FIG. 2 is a flow chart of a performance analysis method for locating a key inefficiency function of an HDFS based on a Bayesian network, the detailed flow includes steps (1) to (8):

step (1), performing function level source code instrumentation on the HDFS;

TraceScope newPathTraceScope(String description,String path)；

TABLE 1 instrumentation objective function

Sampling data obtained by pile insertion;

calculating the low-efficiency probability of the function;

wherein the content of the first and second substances,

representing the execution time of the function f at the i-th execution,

for each function f, calculating

step (4), constructing a function inefficiency probability data set;

bayesian scoring:

Climbing search:

step (6) parameter learning is carried out on the basis of structure learning;

is XⁱIs relative to Pa (X)ⁱ) An estimate of the jth value of (c),

(7-2) setting the obtained inefficiency probability of all the inefficient functions as 100%, under the premise, calculating the posterior probability of other non-inefficient function nodes by using the conditional probability distribution relation learned by the Bayesian network, and if the posterior probability exceeds a given threshold threshold_lowConsidering that the non-low-efficiency function is changed from non-low-efficiency to low-efficiency, and counting the number of the non-low-efficiency functions which are changed from the non-low-efficiency to the low-efficiency after setting the low-efficiency probability of all the low-efficiency functions as 100 percent as N_preAnd continuing to execute (7-3);

Judgment of

The invention has not been described in detail and is within the skill of the art.

The above description is only a part of the embodiments of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention.

Claims

1. A performance analysis method for positioning a key low-efficiency function of an HDFS (Hadoop distributed File System) based on a Bayesian network is characterized by comprising the following steps of:

step (1): performing function level instrumentation on a specific target function under DFSInputStream, BlockReaderLocalLegacy, BlockReaderRemote2, BlockReaderLocal, BlockReaderRemote, BlockSender, BlockReceiver, FSNameSystems, DFSCLient, DistributedFilesystem classes of a distributed file system HDFS to obtain time stamps of function inlets and function outlets of the instrumented functions of the HDFS and data amount read and written by the functions as data obtained by the instrumentation;

step (2): sampling data obtained by pile insertion by using the improved HTrace to obtain data with equal pile insertion function times;

and (3): calculating the low-efficiency probability of all the pile inserting functions to obtain the low-efficiency probability of each pile inserting function;

and (4): repeating the steps (1) to (3) as a one-time process for 50-100 times, taking the one-time process as one piece of data, and constructing a function inefficiency probability data set by using 50-100 pieces of data;

and (5): constructing a Bayesian network by adopting a structure learning method based on the low-efficiency function data set, wherein the structure learning method is a Bayesian grading strategy and a hill-climbing search method;

step (6) parameter learning is carried out on the basis of the structure learning method to obtain a local conditional probability distribution function, and the specific algorithm of the parameter learning is a Bayes method;

and (7): traversing each pile inserting function obtained in the step (3), and carrying out sensitivity analysis on each pile inserting function to obtain a key low-efficiency function;

the HDFS source code instrumentation in step (1) is Trace information acquired by expanding httrace, acquires all communication Trace information among name nodes, data nodes, and client nodes in the HDFS, and acquires parameter information of an instrumentation function, where the instrumentation function is an instrumentation target function: constructors of DFSInputStream class, openInfo, fetchBlockAt, constructors of readWithStratage, actualGetFromOneDataNode, BlockReaderLocalLegacy/BlockReaderRemote2/BlockReaderLocal/BlockReaderRemote class, readFully, readAll, sendPage or transferTo of BlockSender class, blockSender # sendPacket or readLocal, BlockSender \ sendPacket or WeToSot, docSendBlock, FlushOrSync of BlockReceivePack, receivePack, receiveBlockBuck, getLocats of FSNameckSystems, remetLocato, retrieveTo, getLogetLogetLortent, Setletestentry, SetlementLortende, SetlertLogetLortegetLortement, SetlementLortement, SetletionSedelockLocate, SegetLocate, Se;

the step (7) specifically comprises the following steps:

(7-2) setting the obtained inefficiency probability of all the inefficient functions as 100%, calculating the posterior probability of other non-inefficient function nodes by utilizing the conditional probability distribution relation learned by the Bayesian network, and if the posterior probability exceeds the given threshold_lowConsidering that the non-low-efficiency function is changed from non-low-efficiency to low-efficiency, and counting the number of the non-low-efficiency functions which are changed from the non-low-efficiency to the low-efficiency after setting the low-efficiency probability of all the low-efficiency functions as 100 percent as N_preAnd continuing to execute (7-3);

Judgment of

(7-4) traversing all the inefficient functions, calculating the proportion of the running time of all the inefficient functions to the total running time of the program, namely calculating the actual running time of the functions, and judging whether the proportion is greater than a preset threshold value threshold_weightThe preset threshold value is_weightAccording to the configuration setting of a user, if the value is smaller than a threshold value, the function is not worth optimizing; if so, the function is a key inefficiency worth optimizing.

2. The performance analysis method for locating the key inefficiency function of the HDFS based on the bayesian network according to claim 1, wherein: in the step (2), sampling is performed by adopting an improved HTrace, namely, a distributed system tracking frame from Cloudera open source, the sampling method is a token bucket algorithm, the parameter is set to be the size of a token bucket of 1000, and the time interval is 180 ms.

3. The performance analysis method for locating the key inefficiency function of the HDFS based on the bayesian network according to claim 1, wherein: in the step (3), in the performance detection of the HDFS, the probability calculation formula of the inefficient function is as follows:

Among them, 25% and 75% of the quantiles were found to be within this range

4. The performance analysis method for locating the key inefficiency function of the HDFS based on the bayesian network according to claim 1, wherein: in the step (3), the method for calculating the inefficiency probability of each instrumentation function is as follows: subtracting the time stamp of the function outlet from the time stamp of the function inlet in the step (1) to obtain the execution time of the function, and performing statistical calculation, wherein for the function related to I/O in the HDFS system, the calculated index is the unit data read-write time, and the calculation method comprises the following steps:

wherein the content of the first and second substances,

representing the execution time of the function f at the i-th execution,

the data volume read and written by the function f in the ith execution time is represented, the function does not relate to I/O, and the calculated index is the execution time, namely:

for each function f, calculating

The average between 25% and 75% quantiles, the number of executions that exceeds the average by a factor of 1.5 divided by the total number of executions, is the probability of inefficiency of the resulting function.

5. The performance analysis method for locating the key inefficiency function of the HDFS based on the bayesian network according to claim 1, wherein: in the step (7-4), the method for the proportion of the function running time to the total program running time is any one of the following three methods:

the first mode is as follows:

wherein the content of the first and second substances,

representing the result of the actual running time sought by the function f,

the second mode is as follows:

wherein the content of the first and second substances,

representing the result of the actual running time sought by the function f,

represents the sum of all execution times of the function f,

representing the total number of nodes on which the function f runs;

the third mode is as follows:

wherein the content of the first and second substances,

representing the result of the actual running time sought by the function f,

representing the total execution time of the function f on the node, because the Trace information obtained by instrumentation contains the IP address of the node where the function is executed each time,

simply summed from this information.