EP3285169A2 - Procédé de visualisation de l'état du système et dispositif de visualisation de l'état du système - Google Patents

Procédé de visualisation de l'état du système et dispositif de visualisation de l'état du système Download PDF

Info

Publication number
EP3285169A2
EP3285169A2 EP17181438.7A EP17181438A EP3285169A2 EP 3285169 A2 EP3285169 A2 EP 3285169A2 EP 17181438 A EP17181438 A EP 17181438A EP 3285169 A2 EP3285169 A2 EP 3285169A2
Authority
EP
European Patent Office
Prior art keywords
response time
application
unit
performance
calculating
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP17181438.7A
Other languages
German (de)
English (en)
Other versions
EP3285169A3 (fr
Inventor
Shuji Suzuki
Yasuhiko Kanemasa
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujitsu Ltd
Original Assignee
Fujitsu Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujitsu Ltd filed Critical Fujitsu Ltd
Publication of EP3285169A2 publication Critical patent/EP3285169A2/fr
Publication of EP3285169A3 publication Critical patent/EP3285169A3/fr
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3409Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring
    • G06F11/3495Performance evaluation by tracing or monitoring for systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/302Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a software system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/32Monitoring with visual or acoustical indication of the functioning of the machine
    • G06F11/324Display of status information
    • G06F11/328Computer systems status display
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3409Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
    • G06F11/3419Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment by assessing time
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2201/00Indexing scheme relating to error detection, to error correction, and to monitoring
    • G06F2201/865Monitoring of software
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2201/00Indexing scheme relating to error detection, to error correction, and to monitoring
    • G06F2201/87Monitoring of transactions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/001Texturing; Colouring; Generation of texture or colour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/20Drawing from basic elements, e.g. lines or circles
    • G06T11/206Drawing of charts or graphs

Definitions

  • the embodiment discussed herein is related to a system status visualization method and a system status visualization device.
  • a system that provides resources to clients it is important to monitor the status of a provided resource and check whether there is no problem with the resource.
  • a cloud system for providing a virtual machine it is important to monitor a response time and the load of applications running on the virtual machine, and to check whether there is no problem with the performance of the applications.
  • a virtual machine represents a virtual computer that runs on a physical machine (computer).
  • a cloud system represents a system that provides computer hardware or computer software to the user via a network.
  • FIG. 29 is a diagram for explaining the monitoring performed by agents.
  • a virtual machine 9a runs on a physical machine 9, and an application 9b and an agent 9c are executed by the virtual machine 9a.
  • the agent 9c collects the data related to the performance of the application 9b, and monitors the performance of the application 9b.
  • the infrastructure of cloud computing represents an infrastructure in which the ICT infrastructure (ICT stands for Information and Communication Technology) of servers, networks, and storages is provided using virtualization technology.
  • the infrastructure of cloud computing has the functions of virtual machine management, storage management, and network management.
  • a system status visualization program for causing a computer to execute a process including storing that, for each of a plurality of applications executed in a system, includes obtaining data passing through a predetermined point of the system and storing the data; calculating, on an application-by-application basis, average response time in each predetermined time window using the stored data; calculating normalized response time on an application-by-application basis by normalizing the calculated average response time; and outputting that includes determining status of the system according to magnitude of the normalized response time that is calculated, and outputting
  • FIG. 1 is a diagram illustrating a configuration of the cloud system according to the embodiment.
  • a cloud system 1 according to the embodiment includes a performance status diagnosing device 2, an arbitrary number of physical machines 3, and a network switch 4.
  • the performance status diagnosing device 2 is a device for diagnosing the performance status of the cloud system 1.
  • Each physical machine 3 is a computer that executes applications. On each physical machine 3 runs a virtual machine 3a, and the applications are executed by the virtual machine 3a. Meanwhile, in FIG. 1 , although it is illustrated that a single virtual machine 3a runs on the physical machine 3, it is possible to have a plurality of virtual machines 3a running on the physical machine 3.
  • the cloud system 1 enables implementation of a three-tier system made of, for example, a web server, an application server, and a database (DB) server.
  • a three-tier system made of, for example, a web server, an application server, and a database (DB) server.
  • DB database
  • the network switch 4 is a device for connecting the physical machine 3 to an external network.
  • the network switch 4 is disposed at the gateway of the cloud system 1.
  • the performance status diagnosing device 2 captures communication packets from the network switch 4, and uses them in the diagnosis of the performance status of the cloud system 1.
  • a user uses an application in the cloud system 1 via a network from a client device installed on the outside of the cloud system 1.
  • the communication packets between the user and the application invariably pass through the network switch 4 disposed at the gateway of the cloud system 1. For that reason, if the communication packets are port-mirrored at the network switch 4 that is disposed at the gateway and captured, it becomes possible to obtain, regarding all applications in the cloud system 1, the communication packets meant for performing communication with the outside.
  • the performance status diagnosing device 2 includes a capturing unit 21, a packet information storing unit 22, a type-determination-data storing unit 23, a type determining unit 24, a type information storing unit 25, a response time calculating unit 26, a response-time-information storing unit 27, a normalizing unit 28, and a representative information storing unit 29. Moreover, the performance status diagnosing device 2 includes a normalization information storing unit 30, a performance decrease determining unit 31, a determination information storing unit 32, a diagnosing unit 33, a cloud information storing unit 34, a visualizing unit 35, a visualization data storing unit 36, and a display control unit 37.
  • the capturing unit 21 captures the communication packets passing through and port-mirrored at the network switch 4, and stores the captured communication packets in the packet information storing unit 22.
  • the packet information storing unit 22 is used to store the information of the communication packets passing through the network switch 4.
  • the type-determination-data storing unit 23 is used to store the data to be used in determining the types of applications.
  • the types of applications include applications for which the response time holds importance from the performance perspective, and other-type applications.
  • the performance status diagnosing device 2 treats the applications for which the response time holds importance from the performance perspective as the target applications for diagnosis.
  • the type determining unit 24 determines, using the data stored in the type-determination-data storing unit 23, the type of application for each communication connection.
  • FIGS. 2A and 2B are diagrams illustrating an example of the type-determination-data storing unit 23.
  • FIG. 2A is illustrated a case in which a port list representing a list of port numbers is stored as the data meant for determining the types of application.
  • the port numbers stored by the type-determination-data storing unit 23 are the port numbers used by the applications for which the response time holds importance from the performance perspective.
  • the type-determination-data storing unit 23 stores "80" and "443" as the port numbers used by applications for which the response time holds importance from the performance perspective.
  • the type determining unit 24 analyzes the information about the communication packets as stored in the packet information storing unit 22, and extracts the port number of the server side.
  • the server implies the virtual machine 3a. If the extracted port number is included in the port list stored in the type-determination-data storing unit 23, then the type determining unit 24 determines that the application performing transmission or reception in the analyzed communication packets is an application for which the response time holds importance from the performance perspective. Then, the type determining unit 24 stores the determination result in the type information storing unit 25.
  • the type determining unit 24 determines the type of that application by performing machine learning with communication patterns serving as input.
  • the type determining unit 24 collects the communication packets in advance. Then, the type determining unit 24 analyzes the collected communication packets and calculates the average response time for a fixed time window (such as one minute), the average communication volume of the server, the average communication count of the server, the average communication volume of the client device, and the average communication count of the client device.
  • a fixed time window such as one minute
  • the type determining unit 24 builds a learning machine with the calculated values serving as learning data.
  • a learning machine it is possible to use a support vector machine (SVM) or random forests.
  • FIG. 2B is illustrated a case in which learning data is stored as the data meant for determination of the types of applications in the type-determination-data storing unit 23.
  • the following information is stored as a single set of learning data: the type of application, the average response time, the average communication volume of the server, the average communication count of the server, the average communication volume of the client device, and the average communication count of the client device.
  • the average response time is in the unit of microseconds
  • the average communication volume of the server as well as the average communication volume of the client device is in the unit of bytes.
  • two sets of learning data are specified for the applications of the type "application for which the response time holds importance from the performance perspective"; and a single set of learning data is specified for an application of the type "other-type application".
  • the average response time is "600”
  • the average communication volume of the server is "100”
  • the average communication count of the server is "1”.
  • the average communication volume of the client device is "100” and the average communication count of the client device is "1".
  • the type determining unit 24 calculates, for each communication connection, the average response time, the average communication volume of the server, the average communication count of the server, the average communication volume of the client device, and the average communication count of the client device for the same time window as that of the learning data. Then, from the calculated values, the type determining unit 24 determines the type of application corresponding to the concerned communication connection using the learning machine. Subsequently, the type determining unit 24 stores the determination result in the type information storing unit 25.
  • the type information storing unit 25 is used to store the determination result about the types of applications.
  • FIG. 3 is a diagram illustrating an example of the type information storing unit 25.
  • the type information storing unit 25 is used to store the IP address, the port number, and the type on an application-by-application basis.
  • the IP address represents the IP address of the virtual machine 3a on which the concerned application is running.
  • the port number represents the port number used by the concerned application.
  • the type represents the type of the concerned application. For example, an application that runs on the virtual machine 3a having the IP address "10.20.30.40" uses the port number "80" and is of the type "application for which the response time holds importance from the performance perspective".
  • the response time calculating unit 26 analyzes the communication packets and calculates the response time; and stores the calculated response time in the response-time-information storing unit 27. If the communication packets are not encrypted, then the response time calculating unit 26 rebuilds a protocol message and calculates the response time according to the timing of the request and the timing of the response.
  • the response time calculating unit 26 reconstructs a protocol message from the communication packets, and determines the communication packets representing the request message and the communication packets representing the response message. Then, the response time calculating unit 26 calculates, as the response time, the time window starting from the transmission of the request message to the reception of a response message.
  • FIG. 4 is a diagram for explaining a method for calculating the response time.
  • a request message transmitted by a client device is processed by an application 3b running on the virtual machine 3a in the cloud system 1, and a response message is transmitted from the application 3b to the client device.
  • the response time calculating unit 26 sets the time window between the timing of capturing the request message and the timing of capturing the response message as the response time.
  • the response time calculating unit 26 analyzes the transmission-reception flow of the communication packets and estimates the response time of the application.
  • the protocol is not analyzable because the contents of the communication packets are not known.
  • the response time calculating unit 26 is not able to reconstruct the request message or the response message.
  • the response time calculating unit 26 estimates the response time from the time windows of the communication packets between the client device and the application in the cloud system 1.
  • the response-time-information storing unit 27 is used to store the response time calculated on an application-by-application basis by the response time calculating unit 26.
  • FIG. 5 is a diagram illustrating an example of the response-time-information storing unit 27. As illustrated in FIG. 5 , the response-time-information storing unit 27 is used to store the timing, the IP address, the port number, and the response time in a corresponding manner.
  • the timing represents the timing of calculation of the response time.
  • the IP address represents the IP address of the virtual machine 3a on which the concerned application is running.
  • the port number represents the port number used by the concerned application.
  • the response time represents the response time calculated by the response time calculating unit 26.
  • the response time is in the unit of microseconds. For example, an application that runs on the virtual machine 3a having the IP address "10.20.30.40" and that uses the port number "80" has the response time of "600” on "24/06/2016 09:00:00".
  • the normalizing unit 28 reads the response times, which have been calculated by the response time calculating unit 26, from the response-time-information storing unit 27; and calculates the average response time in each time window on an application-by-application basis. Then, the normalizing unit 28 performs normalization of the average response time using the information stored in the representative information storing unit 29, and stores the normalized average response time in the normalization information storing unit 30.
  • FIG. 6 is a diagram for explaining the normalization of the average response time.
  • the response time can take different values in normal condition and has a different standard for considering delay.
  • the performance status diagnosing device 2 normalizes the average response time and converts it into a scale that is comparable among the applications.
  • applications #1 and #2 having different values that can be taken in normal condition, as a result of normalizing the respective average response times, it becomes possible to compare the response times.
  • the fundamental statistic include the average, the median value, and the mode value.
  • the representative information storing unit 29 is used to store the representative response time of each application.
  • FIG. 7 is a diagram illustrating an example of the representative information storing unit 29. As illustrated in FIG. 7 , the representative information storing unit 29 is used to store the timing, the IP address, the port number, and the response time on an application-by-application basis.
  • the timing represents the timing of calculation of the representative response time.
  • the IP address represents the IP address of the virtual machine 3a on which the concerned application is running.
  • the port number represents the port number used by the concerned application.
  • the response time represents the representative response time.
  • the representative response time is in the unit of microseconds. For example, an application that runs on the virtual machine 3a having the IP address "10.20.30.40" and that uses the port number "80" has the representative response time of "600” as calculated on "23/06/2017 00:00:00".
  • the normalizing unit 28 calculates the average response time t for each fixed time window (such as one minute) on an application-by-application basis, and calculates the fundamental statistic t r of the average response times t.
  • the data for calculating the fundamental statistic the data of the whole previous day is used.
  • FIGS. 8A and 8B are diagrams for explaining the ex-Gaussian distribution.
  • the ex-Gaussian distribution is a type of probability distribution and, as illustrated in FIG. 8A , is obtained by convolution of the Gaussian distribution (normal distribution) and the exponential distribution.
  • the ex-Gaussian distribution is determined according to three parameters, namely, the mean ⁇ of the Gaussian component, the standard deviation ⁇ of the Gaussian component, and the mean ⁇ of the exponential component.
  • the parameter ⁇ represents the value at the peak portion of the distribution.
  • the normalizing unit 28 determines the likelihood of the fitting using the one-sample Kolmogorov-Smirnov test.
  • the one-sample Kolmogorov-Smirnov test there are two inputs, namely, the distribution of average response times and the distribution curve of the fitting result.
  • the normalizing unit 28 performs the test at, for example, the significance level of 0.05 and, if the test result indicates that the distribution of average response times represents the ex-Gaussian distribution, sets the parameter ⁇ of the ex-Gaussian distribution as the representative average response time.
  • FIG. 9 is a diagram for explaining an outlier. As illustrated in FIG. 9 , an outlier is a value that is way off from the other values. If an outlier is present among the average response times, there are times when the outlier cannot be successfully fit in the ex-Gaussian distribution. For that reason, the normalizing unit 28 removes such outliers before performing fitting. Examples of the method for outlier removal include the Tukey outlier removal.
  • the normalization information storing unit 30 is used to store, on an application-by-application basis, the normalized average response time obtained by the normalizing unit 28.
  • FIG. 10 is a diagram illustrating an example of the normalization information storing unit 30. As illustrated in FIG. 10 , the normalization information storing unit 30 is used to store the timing, the IP address, the port number, the normalized average response time, and the request count on an application-by-application basis.
  • the timing represents the timing of calculation of the response time.
  • the IP address represents the IP address of the virtual machine 3a on which the concerned application is running.
  • the port number represents the port number used by the concerned application.
  • the normalized average response time represents the average response time that has been normalized.
  • the request count represents the number of requests used in the calculation of the normalized average response time.
  • the normalized average response time is "1.0" related to the response time calculated on "24/06/2016 09:00:00".
  • the request count is "2".
  • the performance decrease determining unit 31 determines, based on the normalized average response time and the request count, whether there is a decrease in the performance of the application; and stores the determination result in the determination information storing unit 32.
  • the request count is low, there is an increase in the variability in the normalized average response time.
  • FIG. 11 is a diagram for explaining the variability in the normalized average response time in the case in which the request count is low. As illustrated in FIG. 11 , when the request count is low, there is a high variability of the normalized average response time.
  • the performance decrease determining unit 31 determines that the performance of the application has decreased if (the normalized average response time)>(the threshold value T rt ) holds true as well as (the threshold value T req - min ) ⁇ (the request count) holds true.
  • the determination information storing unit 32 is used to store, on an application-by-application basis, the determination result obtained by the performance decrease determining unit 31.
  • FIG. 12 is a diagram illustrating an example of the determination information storing unit 32. As illustrated in FIG. 12 , the determination information storing unit 32 is used to store the timing, the IP address, the port number, and the determination result on an application-by-application basis.
  • the timing represents the timing of calculation of the response time.
  • the IP address represents the IP address of the virtual machine 3a on which the concerned application is running.
  • the port number represents the port number used by the concerned application.
  • the determination result represents the determination of whether or not the performance has decreased, and either indicates “no decrease in performance” or indicates "decrease in performance”. For example, regarding an application that runs on the virtual machine 3a having the IP address "10.20.30.40" and that uses the port number "80", the determination result "no decrease in performance" is stored corresponding to "24/06/2016 09:00:00".
  • the diagnosing unit 33 refers to the normalization information storing unit 30 and determines whether the decrease in the performance is attributable to the application or attributable to the infrastructure of cloud computing. Then, the diagnosing unit 33 stores the determination result in the cloud information storing unit 34 and, if the decrease is determined to be attributable to the infrastructure of cloud computing, notifies an operations manager 5 of the cloud system 1 via, for example, an electronic mail.
  • the diagnosing unit 33 determines whether or not any one of the following three cases is applicable and accordingly determines whether the decrease in the performance is attributable to the application or attributable to the infrastructure of cloud computing. If none of the following three cases is applicable, then the diagnosing unit 33 determines that the cause of the decrease in the performance is not clear.
  • the diagnosing unit 33 determines that the decrease in the performance is occurring due to the effect of an increase in the load of the application itself, and thus determines that the decrease in the performance is attributable to the application.
  • the diagnosing unit 33 performs a decorrelation test between the request count and the normalized average response time of the application at, for example, the significance level of 0.05 and, if the determination result is significant, determines that the decrease in the performance is attributable to the application.
  • the diagnosing unit 33 determines that the performance has decreased because some sort of resources are competing among the applications thereby leading to a shortage of resources, and thus determines that the decrease in the performance is attributable to the infrastructure of cloud computing.
  • the diagnosing unit 33 performs a decorrelation test between the normalized average response times of two applications at, for example, the significance level of 0.05. If the determination result is significant, then the diagnosing unit 33 determines that the decrease in the performance is attributable to the infrastructure of cloud computing.
  • the request count of the application of a particular user is correlated with the performance status (the normalized average response time) of the application of another user.
  • the diagnosing unit 33 determines that the use of resources is affecting the performance of the application of the other user and causing a decrease in the performance, and thus determines that the decrease in the performance is attributable to the infrastructure of cloud computing.
  • the diagnosing unit 33 performs, at, for example, the significance level of 0.05, a decorrelation test of the normalized average response time with the request count of the application that has undergone a decrease in the performance. If the determination result is significant, then the diagnosing unit 33 determines that the decrease in the performance is attributable to the infrastructure of cloud computing.
  • the cloud information storing unit 34 is used to store, on an application-by-application basis, the determination result obtained by the diagnosing unit 33.
  • FIG. 13 is a diagram illustrating an example of the cloud information storing unit 34. As illustrated in FIG. 13 , the cloud information storing unit 34 is used to store the timing, the IP address, the port number, and the determination result on an application-by-application basis.
  • the timing represents the timing of calculation of the response time.
  • the IP address represents the IP address of the virtual machine 3a in which the concerned application is running.
  • the port number represents the port number used by the concerned application.
  • the determination result represents the determination result obtained by the diagnosing unit 33. When an applicable case is present, a notification thereof is added to the determination result stored in the determination information storing unit 32. On the other hand, when there is no decrease in the performance, the determination result is same as the information stored in the determination information storing unit 32.
  • the visualizing unit 35 reads the normalized average response times from the normalization information storing unit 30; creates visualization data for all applications in such a way that there is continuous variation in the color depending on the magnitude of the normalized average response times; and stores the visualization data in the visualization data storing unit 36.
  • the visualizing unit 35 creates visualization data representing "green” that indicates normal condition.
  • the visualizing unit 35 creates visualization data representing "yellow” that indicates worsening of the performance to a certain extent.
  • the visualizing unit 35 creates visualization data representing "red” that indicates worsening of the performance.
  • the visualizing unit 35 creates visualization data representing "white” that indicates absence of data.
  • the visualization data storing unit 36 is used to store the visualization data created by the visualizing unit 35.
  • FIG. 14 is a diagram illustrating an example of the visualization data storing unit 36. As illustrated in FIG. 14 , the visualization data storing unit 36 is used to store the timing, the IP address, the port number, the color, and the opacity on an application-by-application basis.
  • the timing represents the timing of calculation of the response time.
  • the IP address represents the IP address of the virtual machine 3a on which the concerned application is running.
  • the port number represents the port number used by the concerned application.
  • the color represents RGB of the color indicating the performance status.
  • the opacity represents a value indicating the magnitude of the request count and ranges from 0 to 1.0.
  • visualization data for the normalized average response time calculated on "24/06/2016 09:00:00” has "#00FF00” as the value of RGB of the color indicating the performance status and has "0.02" as the opacity indicating the magnitude of the request count.
  • the display control unit 37 reads the visualization data from the visualization data storing unit 36, and displays the performance status of each application on a display device 6.
  • FIG. 15 is a diagram illustrating an exemplary display of the performance statuses.
  • the vertical axis represents the applications
  • the horizontal axis represents the timings.
  • the timings given on the horizontal axis have intervals of 10 minutes, for example.
  • FIG. 15 is illustrated in grayscale, the colors are not visible. However, in the actual display screen, for example, a display position 44 is displayed in green; a display position 45 is displayed in yellow; a display position 46 is displayed in red; and a display position 47 is displayed in white.
  • the operations manager 5 of the cloud system 1 becomes able to get an overview of the performance status of all applications in the cloud system 1.
  • the operations manager 5 of the cloud system 1 can check the number of virtual machines 3a in which the performance is lagging and check the tendency of occurrence of the lag.
  • the visualizing unit 35 can also create visualization data in which the contrasting density of the colors is varied. For example, if the request frequency per unit time is high, then the visualizing unit 35 creates visualization data with dark colors. On the other hand, if the request frequency per unit time is low, then the visualizing unit 35 creates visualization data with faint colors.
  • FIG. 16 is a diagram illustrating a display example in which the contrasting density of the colors is varied according to the request frequency.
  • a display position 48 is displayed with a dark color because of a high request frequency, while a display position 49 is displayed with a faint color because of a low request frequency.
  • the operations manager 5 becomes able to correctly understand the overall performance status of the cloud system 1.
  • the performance status diagnosing device 2 makes the response delays having a high request frequency and having a greater impact more prominent, so that any oversight by the operations manager 5 can be prevented.
  • FIG. 17 is a flowchart for explaining the flow of a packet capturing operation.
  • the capturing unit 21 captures communication packets at regular time windows (Step S1) and writes the information of the captured communication packets in the packet information storing unit 22.
  • the capturing unit 21 repeatedly performs the operation at Step S1 until a termination command is received from the performance status diagnosing device 2.
  • FIG. 18 is a flowchart for explaining the flow of an operation for diagnosing the performance status of the infrastructure of cloud computing. As illustrated in FIG. 18 , until a termination command is received, the performance status diagnosing device 2 repeatedly performs the operations from Step S11 to Step S21 explained below.
  • the performance status diagnosing device 2 reads the information about communication packets from the packet information storing unit 22 (Step S11), and repeatedly performs the subsequent operations from Step S12 to Step S19 for a number of times equal to the number of communication connections.
  • the performance status diagnosing device 2 performs a type determination operation for determining the type of the concerned application (Step S12), and determines whether or not the application is of the type in which the response time holds importance from the performance perspective (Step S13). If the application is not of the type in which the response time holds importance from the performance perspective, then the performance status diagnosing device 2 processes the next communication connection.
  • the performance status diagnosing device 2 calculates the response time (Step S14) and stores it in the response-time-information storing unit 27. Then, the performance status diagnosing device 2 counts the request count in the time window within which the response time is calculated (Step S15). Subsequently, the performance status diagnosing device 2 calculates the average response time (Step S16) and performs a normalization operation to normalize the average response time (Step S17).
  • the performance status diagnosing device 2 determines whether or not information about the normalized average response time is available (Step S18). If information about the normalized average response time is not available, then the performance status diagnosing device 2 processes the next communication connection.
  • the case in which the information about the normalized average response time is not available is the case in which, at the time of calculating the representative average time using the ex-Gaussian distribution, the distribution of average response times does not fit in the ex-Gaussian distribution.
  • the performance status diagnosing device 2 performs a performance decrease determination operation for determining whether or not the performance of the application has decreased (Step S19). Subsequently, the performance status diagnosing device 2 processes the next communication connection.
  • the performance status diagnosing device 2 After repeatedly performing the operations from Step S12 to Step S19 for a number of times equal to the number of communication connections, the performance status diagnosing device 2 performs a diagnosis operation for diagnosing whether or not the decrease in the performance is attributable to the infrastructure of cloud computing (Step S20). Then, the performance status diagnosing device 2 performs a visualization operation for creating visualization data (Step S21). Subsequently, the performance status diagnosing device 2 displays the visualization data, which is stored in the visualization data storing unit 36, on the display device 6 (Step S22).
  • the performance status diagnosing device 2 can identify whether the decrease in the performance of the application is attributable to the infrastructure of cloud computing or attributable to the application.
  • FIG. 19 is a flowchart for explaining the flow of a type determination operation.
  • the type determining unit 24 extracts, from the information about communication packets, the port number of the server side of the communication connection (Step S31). Then, the type determining unit 24 reads the port list from the type-determination-data storing unit 23 (Step S32).
  • the type determining unit 24 determines whether or not the extracted port number is present in the port list (Step S33). If the extracted port number is present in the port list, then the type determining unit 24 sets the type of the application as the application for which the response time holds importance from the performance perspective (Step S34). Subsequently, the type determining unit 24 writes the type in the type information storing unit 25. However, if the extracted port number is not present in the port list, then the type determining unit 24 sets the type of the application as other-type application (Step S35) and writes the type in the type information storing unit 25.
  • FIG. 20 is a flowchart for explaining the flow of a normalization operation.
  • the normalizing unit 28 determines whether or not the timing is meant for calculating the representative response time (Step S41). If the timing is not meant for calculating the representative response time, then the system control proceeds to Step S43. When the timing is meant for calculating the representative response time, the normalizing unit 28 calculates the fundamental statistic of the average response time and sets it as the latest representative response time (Step S42).
  • the normalizing unit 28 sets (the average response time)/(the latest representative response time) as the normalized average response time (Step S43).
  • FIG. 21 is a flowchart for explaining the flow of a performance decrease determination operation.
  • the performance decrease determining unit 31 determines whether or not the normalized average response time is greater than the threshold value T rt as well as whether or not the request count is greater than the threshold value T req-min (Step S51) .
  • the performance decrease determining unit 31 determines that the performance of the application has decreased (Step S52), and writes the determination result in the determination information storing unit 32.
  • the performance decrease determining unit 31 determines that the performance of the application has not decreased (Step S53), and writes the determination result in the determination information storing unit 32.
  • FIG. 22 is a flowchart for explaining the flow of a diagnosis operation. As illustrated in FIG. 22 , the diagnosing unit 33 repeatedly performs the operations from Step S61 to Step S70 explained below for a number of times equal to the number of applications stored in the determination information storing unit 32.
  • the diagnosing unit 33 determines whether or not the performance of the application has decreased (Step S61). If the performance of the application has not decreased, then the diagnosing unit 33 processes the next application. However, if the performance of the application has decreased, then the diagnosing unit 33 performs a decorrelation test to check whether there is a correlation between the normalized average response time and the request count of the application that has undergone a decrease in the performance (Step S62).
  • the diagnosing unit 33 determines whether or not the result of the test is significant (Step S63). If the result of the test is significant, then the diagnosing unit 33 determines that the decrease in the performance is attributable to the application (Step S64), and writes the determination result in the cloud information storing unit 34. Subsequently, the diagnosing unit 33 processes the next application.
  • the diagnosing unit 33 repeatedly performs the following operations from Step S65 to Step S69 with respect to each other user other than the user of the application that has undergone a decrease in the performance.
  • the diagnosing unit 33 performs a decorrelation test to check whether there is a correlation between the normalized average response time of the application which has undergone a decrease in the performance and the normalized average response time of the application of a different user (Step S65). Then, the diagnosing unit 33 determines whether or not the result of the test is significant (Step S66). If the result of the test is significant, then the diagnosing unit 33 determines that the decrease in the performance is attributable to the infrastructure of cloud computing (Step S67), and writes the determination result in the cloud information storing unit 34. Subsequently, the diagnosing unit 33 processes the next application.
  • the diagnosing unit 33 performs a decorrelation test to check whether there is a correlation between the normalized average response time of the application which has undergone a decrease in the performance and the request count of the application of a different user (Step S68). Then, the diagnosing unit 33 determines whether or not the result of the test is significant (Step S69). If the result of the test is significant, then the diagnosing unit 33 determines that the decrease in the performance is attributable to the infrastructure of cloud computing (Step S67), and writes the determination result in the cloud information storing unit 34. Subsequently, the diagnosing unit 33 processes the next application.
  • the diagnosing unit 33 determines that the cause of the decrease in the performance is not clear (Step S70). Subsequently, the diagnosing unit 33 writes the determination result in the cloud information storing unit 34 and processes the next application.
  • the diagnosing unit 33 After performing the operations from Step S61 to Step S70 for a number of times equal to the number of applications stored in the determination information storing unit 32, the diagnosing unit 33 determines whether or not the decrease in the performance is attributable to the infrastructure of cloud computing (Step S71). If the decrease in the performance is attributable to the infrastructure of cloud computing, then the diagnosing unit 33 notifies the operations manager 5 of the cloud system 1 about the same (Step S72).
  • FIG. 23 is a flowchart for explaining the flow of a visualization operation. As illustrated in FIG. 23 , the visualizing unit 35 repeatedly performs the following operations at the Steps S81 and S82 for a number of times equal to the number of applications for which the normalized average response time could be calculated.
  • the visualizing unit 35 calculates the color according to the normalized average response time (Step S81) and calculates the opacity according to the request count (Step S82). Then, the visualizing unit 35 writes the calculated color and the calculated opacity in the visualization data storing unit 36.
  • FIG. 24 is a flowchart for explaining the flow of a type determination operation performed using machine learning.
  • the type determining unit 24 extracts, from the information about communication packets, the port number of the server side of the communication connection (Step S91). Then, the type determining unit 24 reads the port list from the type-determination-data storing unit 23 (Step S92).
  • the type determining unit 24 determines whether or not the extracted port number is present in the port list (Step S93). If the extracted port number is present in the port list, then the type determining unit 24 sets the type of the application as the application for which the response time holds importance from the performance perspective (Step S94). However, if the extracted port number is not present in the port list, then the type determining unit 24 performs an input calculation operation for calculating the data to be input to a learning machine (Step S95). Then, the type determining unit 24 determines the type of the application using the learning machine (Step S96).
  • FIG. 25 is a flowchart for explaining the flow of an input calculation operation.
  • the type determining unit 24 calculates the average response time (Step S101). Then, the type determining unit 24 calculates the average communication count of the server (Step S102), and calculates the average communication volume of the server (Step S103). Subsequently, the type determining unit 24 calculates the average communication volume of the client device (Step S104), and calculates the average communication count of the client device (Step S105).
  • FIG. 26 is a flowchart for explaining the flow of an operation for building a learning machine.
  • the type determining unit 24 reads the communication packets of an application for which the response time holds importance from the performance perspective (Step S111), and reads other communication packets (Step S112).
  • the type determining unit 24 performs an input calculation operation for a number of times equal to the number of applications (Step S113). Subsequently, with the average response time, the average communication count of the server, the average communication volume of the server, the average communication volume of the client device, and the average communication count of the client device serving as the input; the type determining unit 24 builds a learning machine meant for outputting the type of the application (Step S114).
  • the type determining unit 24 can perform type determination of even such an application whose type is not determinable from the port number.
  • FIG. 27 is a flowchart for explaining the flow of a normalization operation performed using the ex-Gaussian distribution.
  • the normalizing unit 28 determines whether or not the timing is meant for calculating the representative response time (Step S121). If the timing is not meant for calculating the representative response time, then the normalizing unit 28 determines whether or not the latest representative response time is available (Step S122). If the latest representative response time is available, then the normalizing unit 28 sets (the average response time)/(the latest representative response time) as the normalized average response time (Step S123).
  • the normalizing unit 28 removes the outliers among the average response times (Step S124). If the outliers among the average response times are not to be removed, then the normalizing unit 28 does not perform the operation at Step S124.
  • the normalizing unit 28 fits the distribution of average response times, from which the outliers have been removed, in the ex-Gaussian distribution (Step S125). Then, the normalizing unit 28 performs the one-sample Kolmogorov-Smirnov test for which the distribution of average response times and the distribution curve of the fitting result serve as the input (Step S126).
  • the normalizing unit 28 determines whether or not the result of the test is significant (Step S127). If the result of the test is significant, then the normalizing unit 28 sets the parameter ⁇ of the parameters of the ex-Gaussian distribution as the representative response time (Step S128), and the system control proceeds to Step S123. On the other hand, if the result of the test is not significant, then the normalizing unit 28 ends the operations without performing normalization.
  • the normalizing unit 28 can obtain the representative response time by fitting the distribution of average response times in the ex-Gaussian distribution.
  • the communication packets of the applications for which the response time holds importance from the performance perspective are used by the response time calculating unit 26 to calculate the response time on an application-by-application basis.
  • the normalizing unit 28 calculates the average response time and normalizes the average response time using the representative response time to calculate the normalized response time on an application-by-application basis.
  • the performance decrease determining unit 31 uses the normalized response time and determines whether or not the performance of the concerned application has decreased.
  • the diagnosing unit 33 determines whether or not the decrease is attributable to the application or attributable to the infrastructure of cloud computing. With that, the performance status diagnosing device 2 becomes able to identify whether the decrease in the performance of an application is attributable to the infrastructure of cloud computing or attributable to the application.
  • the type determining unit 24 determines the same using machine learning. Hence, the type of the application can be reliably determined.
  • the normalizing unit 28 calculates the representative response time by fitting the distribution of average response times in the ex-Gaussian distribution. Hence, the representative response time can be accurately calculated.
  • the normalizing unit 28 fits the post-outlier-removal distribution of average response times in the ex-Gaussian distribution, it becomes possible to enhance the possibility of achieving a fit in the ex-Gaussian distribution.
  • the visualizing unit 35 calculates colors according to the normalized average response times, and the display control unit 37 displays the normalized average response times using the respective colors on the display device 6.
  • the operations manager 5 becomes able to check the number of virtual machines 3a in which the performance is lagging and check the tendency of occurrence of the lag.
  • the visualizing unit 35 calculates the contrasting density of the colors according to the request count, and the display control unit 37 displays the normalized average response times using the respective colors and the respective contrasting densities on the display device 6.
  • the performance status diagnosing device 2 the performance status of the applications having a high request frequency and having a significant impact can be displayed in a prominent manner.
  • the performance decrease determining unit 31 performs determination by further using the request count with respect to the concerned application. Hence, a decrease in the performance of the application can be accurately determined.
  • the explanation is given about the performance status diagnosing device 2.
  • the configuration of the performance status diagnosing device 2 can be implemented using software, so that a performance status diagnosing program having identical functions can be obtained.
  • Given below is the explanation of a computer that executes the performance status diagnosing program.
  • FIG. 28 is a diagram illustrating a configuration of the computer that executes the performance status diagnosing program according to the embodiment.
  • a computer 50 includes a main memory 51, a central processing unit (CPU) 52, a local area network (LAN) interface 53, and a hard disk drive (HDD) 54.
  • the computer 50 includes a super input-output (10) 55, a digital visual interface (DVI) 56, and an optical disk drive (ODD) 57.
  • DVI digital visual interface
  • ODD optical disk drive
  • the main memory 51 is a memory for storing computer programs or the intermediate execution results of computer programs.
  • the CPU 52 is a central processing device that reads computer programs from the main memory 51 and executes them.
  • the CPU 52 includes a chipset having a memory controller.
  • the LAN interface 53 is an interface for connecting the computer 50 to other computers via a LAN.
  • the HDD 54 is a disk device for storing computer programs and data.
  • the super 10 55 is an interface for connecting an input device such as a mouse or a keyboard.
  • the DVI 56 is an interface for connecting a liquid display device.
  • the ODD 57 is a device for performing reading and writing with respect to digital versatile discs (DVDs).
  • the LAN interface 53 is connected to the CPU 52 using the PCI express (PCIe).
  • PCIe PCI express
  • the HDD 54 and the ODD 57 are connected to the CPU 52 using the serial advanced technology attachment (SATA).
  • SATA serial advanced technology attachment
  • the super 10 55 is connected to the CPU 52 using the low pin count (LPC).
  • the performance status diagnosing program to be executed in the computer 50 is stored in a DVD; and is read by the ODD 57 from the DVD and installed in the computer 50.
  • the performance status diagnosing program is stored in a database of another computer that is connected via the LAN interface 53; and is read from that database and installed in the computer 50.
  • the installed performance status diagnosing program is stored in the HDD 54; read into the main memory 51; and executed by the CPU 52.
  • the present invention is not limited to that case and can be implemented in an identical manner in the case of diagnosing the performance status of any arbitrary system.
EP17181438.7A 2016-08-17 2017-07-14 Procédé de visualisation de l'état du système et dispositif de visualisation de l'état du système Withdrawn EP3285169A3 (fr)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2016160149A JP2018028783A (ja) 2016-08-17 2016-08-17 システム状態可視化プログラム、システム状態可視化方法及びシステム状態可視化装置

Publications (2)

Publication Number Publication Date
EP3285169A2 true EP3285169A2 (fr) 2018-02-21
EP3285169A3 EP3285169A3 (fr) 2018-06-13

Family

ID=59350780

Family Applications (1)

Application Number Title Priority Date Filing Date
EP17181438.7A Withdrawn EP3285169A3 (fr) 2016-08-17 2017-07-14 Procédé de visualisation de l'état du système et dispositif de visualisation de l'état du système

Country Status (3)

Country Link
US (1) US20180052755A1 (fr)
EP (1) EP3285169A3 (fr)
JP (1) JP2018028783A (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109308243A (zh) * 2018-09-12 2019-02-05 杭州朗和科技有限公司 数据处理方法、装置、计算机设备和介质

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10885676B2 (en) * 2016-12-27 2021-01-05 Samsung Electronics Co., Ltd. Method and apparatus for modifying display settings in virtual/augmented reality
US11175959B2 (en) * 2019-05-01 2021-11-16 International Business Machines Corporation Determine a load balancing mechanism for allocation of shared resources in a storage system by training a machine learning module based on number of I/O operations
US11175958B2 (en) * 2019-05-01 2021-11-16 International Business Machines Corporation Determine a load balancing mechanism for allocation of shared resources in a storage system using a machine learning module based on number of I/O operations
JP2020201638A (ja) * 2019-06-07 2020-12-17 京セラドキュメントソリューションズ株式会社 監視システムおよび監視プログラム
JP7311319B2 (ja) * 2019-06-19 2023-07-19 ファナック株式会社 時系列データ表示装置
JP7302439B2 (ja) * 2019-10-30 2023-07-04 富士通株式会社 システム分析方法、およびシステム分析プログラム
JP7250107B1 (ja) * 2021-12-13 2023-03-31 延世大学校 産学協力団 先制的及び反応的入力数量化基盤のユーザインターフェース装置及び方法

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006011683A (ja) 2004-06-24 2006-01-12 Fujitsu Ltd システム分析プログラム、システム分析方法及びシステム分析装置
JP2011258057A (ja) 2010-06-10 2011-12-22 Fujitsu Ltd 解析プログラム、解析方法、および解析装置
JP2015011653A (ja) 2013-07-02 2015-01-19 富士通株式会社 性能測定方法、性能測定プログラム及び性能測定装置

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020198985A1 (en) * 2001-05-09 2002-12-26 Noam Fraenkel Post-deployment monitoring and analysis of server performance
JP2004206658A (ja) * 2002-10-29 2004-07-22 Fuji Xerox Co Ltd 表示制御方法、情報表示処理システム、クライアント端末、管理サーバ、プログラム
JP5418250B2 (ja) * 2010-01-26 2014-02-19 富士通株式会社 異常検出装置、プログラム、及び異常検出方法

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006011683A (ja) 2004-06-24 2006-01-12 Fujitsu Ltd システム分析プログラム、システム分析方法及びシステム分析装置
JP2011258057A (ja) 2010-06-10 2011-12-22 Fujitsu Ltd 解析プログラム、解析方法、および解析装置
JP2015011653A (ja) 2013-07-02 2015-01-19 富士通株式会社 性能測定方法、性能測定プログラム及び性能測定装置

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109308243A (zh) * 2018-09-12 2019-02-05 杭州朗和科技有限公司 数据处理方法、装置、计算机设备和介质

Also Published As

Publication number Publication date
JP2018028783A (ja) 2018-02-22
EP3285169A3 (fr) 2018-06-13
US20180052755A1 (en) 2018-02-22

Similar Documents

Publication Publication Date Title
EP3285169A2 (fr) Procédé de visualisation de l'état du système et dispositif de visualisation de l'état du système
US11150974B2 (en) Anomaly detection using circumstance-specific detectors
US10212063B2 (en) Network aware distributed business transaction anomaly detection
US10069900B2 (en) Systems and methods for adaptive thresholding using maximum concentration intervals
US10904112B2 (en) Automatic capture of detailed analysis information based on remote server analysis
US9384114B2 (en) Group server performance correction via actions to server subset
US20210092160A1 (en) Data set creation with crowd-based reinforcement
CN107704387B (zh) 用于系统预警的方法、装置、电子设备及计算机可读介质
US20130227690A1 (en) Program analysis system and method thereof
US9658917B2 (en) Server performance correction using remote server actions
US11106562B2 (en) System and method for detecting anomalies based on feature signature of task workflows
US11416321B2 (en) Component failure prediction
JP5865486B2 (ja) 利用者体感品質推定装置、端末ボトルネック判定装置、類似操作抽出装置、及び方法、並びにプログラム
US10684906B2 (en) Monitoring peripheral transactions
CN111897700A (zh) 应用指标监控方法及装置、电子设备和可读存储介质
CN110955890B (zh) 恶意批量访问行为的检测方法、装置和计算机存储介质
US10936401B2 (en) Device operation anomaly identification and reporting system
Lee et al. ATMSim: An anomaly teletraffic detection measurement analysis simulator
US20130046809A1 (en) Method and apparatus for monitoring network traffic and determining the timing associated with an application
WO2020163230A1 (fr) Systèmes et procédés de modélisation de réponse par élément d'évaluations numériques
WO2023181241A1 (fr) Dispositif serveur de surveillance, système, procédé et programme
US20210092159A1 (en) System for the prioritization and dynamic presentation of digital content
CN114329450A (zh) 数据安全处理方法、装置、设备及存储介质
JP2018022305A (ja) 境界値特定プログラム、境界値特定方法および境界値特定装置
EP3547628A1 (fr) Procédé et commutateur de calcul à haute performance (hpc) pour optimiser la distribution de paquets de données

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

AK Designated contracting states

Kind code of ref document: A2

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

AX Request for extension of the european patent

Extension state: BA ME

PUAL Search report despatched

Free format text: ORIGINAL CODE: 0009013

AK Designated contracting states

Kind code of ref document: A3

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

AX Request for extension of the european patent

Extension state: BA ME

RIC1 Information provided on ipc code assigned before grant

Ipc: G06F 11/30 20060101ALI20180504BHEP

Ipc: G06F 11/34 20060101ALI20180504BHEP

Ipc: G06F 11/32 20060101AFI20180504BHEP

17P Request for examination filed

Effective date: 20180626

RBV Designated contracting states (corrected)

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION HAS BEEN WITHDRAWN

18W Application withdrawn

Effective date: 20200723