WO2018162034A1 - Method and control node for enabling detection of states in a computer system - Google Patents

Method and control node for enabling detection of states in a computer system Download PDF

Info

Publication number
WO2018162034A1
WO2018162034A1 PCT/EP2017/055211 EP2017055211W WO2018162034A1 WO 2018162034 A1 WO2018162034 A1 WO 2018162034A1 EP 2017055211 W EP2017055211 W EP 2017055211W WO 2018162034 A1 WO2018162034 A1 WO 2018162034A1
Authority
WO
WIPO (PCT)
Prior art keywords
computer system
data
control node
pixels
model
Prior art date
Application number
PCT/EP2017/055211
Other languages
French (fr)
Inventor
Nicolas Seyvet
Hjalmar Olsson
Steven Corroy
Original Assignee
Telefonaktiebolaget Lm Ericsson (Publ)
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Telefonaktiebolaget Lm Ericsson (Publ) filed Critical Telefonaktiebolaget Lm Ericsson (Publ)
Priority to PCT/EP2017/055211 priority Critical patent/WO2018162034A1/en
Publication of WO2018162034A1 publication Critical patent/WO2018162034A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3006Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • G06F11/0709Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a distributed system consisting of a plurality of standalone computer nodes, e.g. clusters, client-server systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3055Monitoring arrangements for monitoring the status of the computing system or of the computing system component, e.g. monitoring if the computing system is on, off, available, not available
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/32Monitoring with visual or acoustical indication of the functioning of the machine
    • G06F11/324Display of status information
    • G06F11/327Alarm or error message display

Definitions

  • the present disclosure relates generally to a method and a control node for enabling detection of states in a computer system comprising one or more computers for processing data.
  • performance and various functions in a computer system or the like are generally monitored by collecting various data generated in the computer system and analyzing said data in order to detect and find a cause to any problem or incident that has occurred.
  • an alarm may be issued to notify a system operator about a potential problem which may need to be resolved or at least addressed by taking some action in the computer system.
  • an observed state may indicate that a potential problem may be forthcoming and this state may in that case be handled so that the actual problem can be proactively avoided, reduced or otherwise addressed.
  • a computer system is used to represent one or more computers, servers or similar entities for handling data, which are operative to perform tasks that involve some kind of data processing.
  • a computer system may be comprised of a cluster of computers or servers in a data center which could be useful in a manufacturing or administration facility and/or for providing cloud services for clients, to mention some illustrative examples although the following description is not limited to thereto.
  • the computer system may further be used for managing a communications network such as a wireless network.
  • a method is performed by a control node for enabling detection of states in a computer system comprising one or more servers for processing data.
  • the control node collects data generated in the computer system during a training phase, and creates an image from the collected data by translating the collected data into pixels according to predefined rules.
  • the control node further assigns a label to the created image representing a state in the computer system as reflected by the collected data.
  • the control node builds set of reference images with associated labels by repeating the preceding actions when different states occur in the computer system during said training phase.
  • the control node also builds a model for mapping the reference images to said labels, and provides the model for use in detecting a potential state in the computer system during a usage phase by applying the model on a new image created according to said predefined rules.
  • a control node is arranged to enable detection of states in a computer system comprising one or more servers for processing data.
  • the control node is configured to collect data generated in the computer system during a training phase, and to create an image from the collected data by translating the collected data into pixels according to predefined rules.
  • the control node is further configured to assign a label to the created image representing a state in the computer system as reflected by the collected data, and to build a set of reference images with associated labels by repeating the preceding actions when different states occur in the computer system during said training phase.
  • the control node is also configured to build a model for mapping the reference images to said labels, and to provide the model for use in detecting a potential state in the computer system during a usage phase by applying the model on a new image created according to said predefined rules.
  • the above method and control node may be configured and implemented according to different optional embodiments to accomplish further features and benefits, to be described below. It is an advantage that different states in the computer system can be detected automatically and on a continuous basis, basically without requiring any manual work for analyzing large amounts of data generated in computer system. Another advantage is that the data is translated into more easily handled images which are effectively a compressed version of the data requiring less resources for storing. Still another advantage is that if the data is sensitive to exposure, e.g. in terms of privacy or integrity, the images will effectively encode, or "hash", the data so as to protect it from explicit exposure.
  • a computer program is also provided which comprises instructions which, when executed on at least one processor, cause the at least one processor to carry out the method described above.
  • a carrier containing the above computer program is further provided, wherein the carrier is one of an electronic signal, an optical signal, a radio signal, or a computer readable storage medium.
  • Fig. 1 A is a scenario illustrating that data generated in a computer system is translated into reference images in a training phase, according to some possible embodiments.
  • Fig. 1 B is a scenario illustrating that the reference images created in Fig. 1A are used for detection of a state in a usage phase, according to further possible embodiments.
  • Fig. 2 is a flow chart illustrating a procedure that may be performed in a control node, according to further possible embodiments.
  • Fig. 3 is a diagram illustrating an example of how a control node may operate when the solution is used for creating reference images, according to further possible embodiments.
  • Fig. 3A is a diagram illustrating in more detail how the control node may perform one of the actions of Fig. 3, according to further possible embodiments.
  • Fig. 4 is another diagram illustrating an example of how the control node may operate when the solution is used for detecting a state, according to further possible embodiments.
  • Fig. 5 is a schematic diagram illustrating an example of how pixels may be defined using an ARGB (Alpha, Red, Green, Blue) format, according to further possible embodiments
  • Fig. 6 is a table illustrating an example of how labels may be assigned to reference images when the solution is used for a base station of a wireless network, according to further possible embodiments.
  • ARGB Alpha, Red, Green, Blue
  • Fig. 7A is a block diagram illustrating an example of how a control node may be configured, according to further possible embodiments.
  • Fig. 7B is a block diagram illustrating another example of how a control node may be configured, according to further possible embodiments.
  • Detailed description
  • a solution for enabling detection of states in a computer system in which one or more servers are used for processing data, e.g. in a data center for cloud services or the like, or in a network for
  • telecommunication such as a wireless network.
  • detection is enabled as follows.
  • reference images are created from data generated in the computer system, by translating the data into pixels according to certain predefined rules which may be defined in the form of a
  • a label is further assigned to each reference image to represent a state in the computer system as reflected by the data being translated into the image.
  • a model for mapping images to the labels is also built, e.g. by training a neural network.
  • model as used herein thus represents a mapping function or the like for mapping images to labels representing states in the computer system.
  • state indicates a certain situation, condition or characteristic of the computer system which may potentially need to be addressed or handled in some way, e.g. in order to avoid a problem or shortcoming in the system caused by a detected state.
  • the model can then be provided and later used for detecting a potential
  • a usage phase by creating a new image according to the same predefined rules and applying the model on the new image to find a label that indicates the state.
  • the outcome of this operation is thus a label derived from the new image by means of the model, i.e. the label that was assigned to a reference image that effectively matches the new image, which label is a classification of the state that generated the data used for creating the new image.
  • the obtained label indicates or describes, i.e. classifies, the detected state in the computer system.
  • a state in the computer system may refer to a power failure, a faulty component, a malfunction, a high workload, to mention some illustrative but non- limiting examples.
  • a malfunction in this context may e.g. be a misconfiguration of software or a "bug" in a computer program. Any of the above examples may cause a "situation" or problem that may need to be solved or at least addressed in some way. It may therefore be of interest to detect such states or situations as soon as possible so that an operator can take suitable action, even proactively before any problem becomes serious and harmful in some sense. For example, an alarm or other suitable notification may be issued automatically, at least when detecting certain potentially harmful states, so that personnel can be alerted to take suitable action in the computer system.
  • the above-described training phase may continue so as to overlap the usage phase, wherein the model, e.g. a neural network, may continue to be further trained at the same time as the model is also used in practice for detecting states from new images.
  • the usage phase may be employed in the manner of a validation procedure where the reference images and the trained model are validated by creating the new images and detecting states therefrom. For example, retraining of the model may be triggered at regular intervals, or when new images are added such as when adding at least a predefined amount of new images to the set of reference images.
  • control node which term is used herein to generally represent a logical entity that is capable of performing at least the above-described procedure in the training phase, and optionally also in the usage phase.
  • the control node may be integrated in the computer system or it may be a separate entity that can be connected to the computer system, e.g. temporarily, in order to execute the various actions described herein.
  • the reference images and the model are created, i.e. built, in the above-mentioned training phase during which the reference images and the model may be created and refined based on data collected during multiple succeeding time intervals. Thereby, the reference images and the model can be gradually improved and "stabilized" for each time interval in the manner of training a neural network.
  • Fig. 1A illustrates how a control node 100 may operate during such a training phase.
  • the control node 100 collects data that is generated by various data sources in a computer system 102, e.g. during different successive time intervals.
  • the data sources may include log files, CSV (Comma-Separated Values) files, an SQL (Structured Query Language) database, etc., although the solution is not limited to these examples.
  • a further action 1 :2 illustrates that the control node 100 creates reference images from the collected data by translating the collected data into pixels according to predefined rules.
  • the predefined rules may dictate how the collected data shall be translated into pixels in the image so that the image is built by different pixels representing different data values.
  • the collected data may also be combined into a so-called "data frame" which is then translated into a reference image.
  • a plurality of collected data values may be combined into one representative average value in the data frame, or more generally a useful representation of the data values.
  • a label is also assigned to each reference image in this action, so that the label represents or indicates a state in the computer system that is more or less reflected by the collected data. Effectively, the label of a reference image can be seen as a classification of that reference image and the state it represents.
  • Another action 1 :3 further illustrates that the control node 100 builds a model which can be used for mapping images to the above labels.
  • This model may be created by training a neural network which is a technique used in the field of machine learning, also known as "Deep Learning".
  • training a neural network means finding the weight for each neuron or node that minimizes a Mean Square Error, MSE, between the output of the network after propagating an image into it and a label to define.
  • MSE Mean Square Error
  • the neural network is a function f with N weights w1 through wN
  • the best function f(w1 , wN) that maps an image to a label or class can be determined by minimizing MSE(f(X),y) over w1 , wN.
  • the pixels of the image are thus used as input to the neural network and a resulting label or class is the output from the neural network.
  • the above-mentioned model would then be the resulting function f(w1 , wN).
  • model could be replaced by the term “mapping function", thus referring to a function for mapping images, as defined by pixels, to labels representing states in the computer system .
  • a final action 1 :4 illustrates that the control node 100 basically provides the model built in action 1 :3, for use in detecting a potential state in the computer system 102 during the usage phase.
  • This action may be realized by saving the model in a suitable storage 100A which can be accessed during the usage phase, e.g. by the control node 100 itself or by another node that performs the usage phase.
  • Fig. 1 B illustrates how the above-mentioned usage phase can be executed, which in this example is done by the control node 100 although it could also be done by another node as mentioned above.
  • the control node 100 collects further data that is generated by various data sources in the computer system 102, which may be performed basically in the same manner as action 1 :1 above. It is assumed that it would be of interest to detect whether the computer system 102 is in any particular state that is reflected by the collected data, e.g. to be able to adapt operation of the computer system 102 to the detected state which may be done to avoid, eliminate, reduce or otherwise address some problem or issue implied by the detected state.
  • the control node 100 creates a new image from the collected data by translating the data into pixels according to the same predefined rules that were used in the creation of reference images in action 1 :2.
  • the control node 100 retrieves the model from storage 100A and applies it on the new image, which produces a label that indicates a state in the computer system 102.
  • the new image can be used accordingly by applying the model on the new image to find a label that indicates or classifies a potential state in the computer system.
  • the model is able to predict a label/state, at least within a certain accuracy.
  • a first action 200 the control node 100 collects data generated in the computer system during a training phase, which corresponds to the above action 1 :1 .
  • data sources in a computer system from which data can be collected have been mentioned above.
  • the control node 100 creates an image from the collected data by translating the collected data into pixels according to predefined rules, which corresponds to the above action 1 :2.
  • the control node 100 also assigns a label to the created image representing a state in the computer system 102 as reflected by the collected data. This may be done by knowing which state the computer system 102 was in when the data was collected, or has entered immediately after the data was collected, and assigning a suitable, e.g. descriptive, name for that state as the label for the image. For example, such a label may be named "power failure", “cell down”, “malfunction in server X", etc., depending on the state.
  • a next action 206 the control node 100 builds a set of reference images with associated labels by repeating the preceding actions at different points when different states occur in the computer system during said training phase. For example, actions 200-206 may be repeated at successive time intervals such that new data is collected at each time interval as a basis for creating a new image or for validating or confirming an already created image.
  • control node 100 further builds a model for mapping the reference images to said labels.
  • This model may be built by training a neural network, which has been described above for action 1 :3.
  • the model may also be denoted mapping function.
  • a final action 210 of the training phase the control node 100 provides the model, e.g. by saving it in a suitable storage 100A, for use in detecting a potential state in the computer system 102 during a usage phase by applying the model on a new image created according to said predefined rules.
  • Action 210 corresponds to the above action 1 :4.
  • the usage phase can be executed by using the model on a new image for finding a corresponding label that indicates a new state, either by the control node 100 itself or by another node that has access to the set of reference images and the model.
  • Fig. 2 The remaining actions in Fig. 2 are thus performed during the usage phase and in this example it is the control node 100 that performs the following, although they could also be performed by another node as indicated above.
  • the control node 100 uses new collected data to create a new image according to the same predefined rules that were used in action 202 above which corresponds to the above action 1 :6.
  • the new image is effectively created in the same manner as the reference images and the model can be used to find a label that indicates or classifies the new image and thereby a potential state in the computer system resulted by the new data.
  • Another action 214 illustrates accordingly that the control node 100 detects a potential state in the computer system by applying the model built in action 208 on the new image, which basically corresponds to the above actions 1 :7 - 1 :8.
  • a final optional action 216 indicates that a suitable notification or alarm may be
  • building the model may comprise training a neural network, and one possible way of doing this has been described above for action 1 :3.
  • applying the model on the new image may comprise applying the neural network to the new image.
  • the pixels of the new image may be fed into the neural network and the resulting output from the neural network will then be the label of the new image.
  • Fig. 3 illustrates an example of an operation flow executed by the control node 100 for realizing the above actions 200-206, as follows. First, data is collected from the server network in action 300. Then, the collected data is combined in action 302, to form a single data frame which is used for creating an image in action 304, by translating the data frame into pixels. An example of how an image could be created in this way will be described below with reference to Fig. 4. Since the amount of data may be huge, it is practical to combine that data into a more condensed or summarized data set, i.e. the data frame created in action 302 which may be referred to as a "blob".
  • hundreds of collected data values such as fluctuating measurements or the like, may be translated into a single pixel, according to the afore-mentioned predefined rules, that somehow describes or represents those data values.
  • the created image can be seen as a compressed version of the collected data set.
  • labels representing different states of the computer system are defined in action 306. These labels are assigned to the images from action 304 and a set of reference images with assigned labels is built in action 308. The created images and their assigned labels are then used for training a neural network for the above-described model for mapping images to labels, in an action 310. Such training may comprise feeding each image's pixels into an algorithm or function and minimizing the MSE to obtain the corresponding labels, as described above. It should be noted that several different data sources may be used in this procedure, such as: CSV, SQL, log files, metrics, ESR, etc.
  • the data frame of action 302 may have the following features:
  • Each data frame may have several X values and a single Y value.
  • each row of the data frame may have one Y value and a huge amount of X-values/column/feature.
  • a "metadata map” may be built to keep information between an entry in the data frame and its type. That metadata map can be used next to generate a "color map” which dictates which colors to use. It may only be necessary to generate the metadata map once.
  • the metadata map and optionally also the color map basically refer to the above-mentioned predefined rules for translating data into pixels.
  • the combining of data may be performed repeatedly at each time interval t so that different snapshots of the computer system are produced at regular intervals.
  • the new image may be created from data generated in the computer system during the usage phase by translating the generated data into pixels according to said predefined rules.
  • applying the model on a new image may produce a label which may classify the detected potential state.
  • the collected data may comprise multiple data values which are translated into pixels according to said predefined rules.
  • the pixels may be defined by at least one of: color and transparency.
  • the pixels may further have an ARGB (Alpha, Red, Green, Blue) format with N positions where a numerical value is assigned to each ARGB position.
  • the Alpha value is used to indicate the transparency.
  • each collected data value can be translated into M pixels to obtain up to M 2N different numbers.
  • each data value may indicate an observed or measured feature in the computer system according to a predefined metadata map.
  • each reference image may represent data generated in the computer system during a specific time frame so that the reference image is in that case created from data generated during said specific time frame.
  • the whole image represents a single "snapshot" in time of the computer system.
  • each reference image may comprise multiple rows of pixels where each pixel row represents data generated in the computer system during a specific time frame so that the reference image is in that case created from data generated during a series of consecutive time frames.
  • each row in the image represents a snapshot in time of the computer system such that the whole image represents a series of snapshots.
  • different columns in the pixel rows may have different weights to indicate different importance to the data represented by respective pixels.
  • weighting can be used in the above- mentioned predefined rules for translating the collected data into pixels, such that some columns will be important compared to others by giving those columns more effect by higher weight. This can be achieved by allocating more pixels for that column, or by increasing the alpha value.
  • These weights can influence the rules used to generate an image from the row.
  • each row in the image may represent a snapshot in time of the computer system such that the whole image represents a series of snapshots which are based on data collected in a corresponding time frame. In this case a data frame may be created from data in each snapshot or time frame.
  • Fig. 3A illustrates another example of an operation flow executed by the control node 100 for realizing the above action 304 of creating an image from data combined in such multiple data frames associated with a series of time frames, as follows.
  • a metadata map is generated from a data frame in action 304A, the metadata map indicating which type of data there is in each entry in the data frame.
  • action 304B a pixel is assigned to each entry in the data frame, to generate an image row of pixels.
  • action 304C the image is built by adding the image row. The procedure then repeats actions 304A-304C for data collected during a next time frame, as indicated by a dashed arrow, thus producing a new data frame and corresponding image row, until the image contains a desired set of rows representing a number of corresponding time frames.
  • Fig. 4 illustrates another example of an operation flow executed, e.g. by the control node 100 or by another node, for realizing the above actions 212-214 of detecting a state in the computer system in the usage phase in case the operation flows of Figs 3 and 3A have been executed.
  • the procedure in Fig. 4 may also be referred to as "classification" of a detected state.
  • data is collected from the server network in action 400, which can be performed as in action 300.
  • action 402 the collected data is combined to produce a data frame, which can be performed in the same manner as in action 302.
  • the data frame is then used for creating a new image in action 404, by translating the data frame into pixels, using the same predefined rules for translation that were used in action 304.
  • action 406 the neural network trained in action 310 is applied on the created new image, which produces a label that indicates a detected state in the computer system.
  • the pixels may have an ARGB format with N positions where a numerical value is assigned to each ARGB position.
  • An example of how the pixels can be defined in this manner is illustrated in Fig. 5 where Alpha, Red, Green, Blue are referred to as "channels".
  • Fig. 5 there are 0-31 possible positions in total with 8 positions in each channel.
  • ARGB values are typically expressed using 8 hexadecimal digits, with each pair of the hexadecimal digits representing values of the Alpha, Red, Green and Blue channel, respectively.
  • the ARGB format as such is well-known in the art for defining pixels, which is thus not necessary to describe in detail here.
  • a set of reference images with associated labels is built during the training phase.
  • the table in Fig. 6 illustrates some examples of how state-describing labels may be assigned to 4 different reference images created from different data sets generated in the computer system.
  • Image 1 represents the state "cell down” to indicate that this state means that a certain cell of a wireless or cellular network is “down", i.e. not active, for whatever reason.
  • Image 2 represents the state "cell up” to indicate that the cell is active.
  • Images 3 and 4 are two different images both representing the same state
  • T_STATE represents the maximal time correlation that can be observed. As a consequence, the resulting image will have n rows.
  • Another approach to create an image for a specific time t is to use information about the computer system to build the picture.
  • the correlation between different parts of the data may be reflected as well. Such correlations may be present and therefore visible in the image.
  • the dimensions of the data may be translated to the generated image. For instance, information about time, location, network topology, software architecture are typically available in generated data sets. These parameters can be transformed and represented by the dimensions available in an image such as X, Y, color and transparence, number of pixel used for each data value, etc.
  • a simplified example to illustrate the above dimension aspects may be produced when a simple data set is transformed into the image and the relationship between the different values in the same column is also reflected in the image.
  • different levels of log entries can be represented as different colors and the network topology can be represented as X and Y distance of the pixels.
  • many different types of information can be embedded or encoded in the image when the above-described solution is employed.
  • Another feature that could be applied in the solution is to add new images that are labeled to the set of reference images, while some labels may be deleted because the associated images are never used. For example, it may be counted how many times a state Y in the metadata map is detected, in order to keep a record of the usefulness of this state Y. This procedure could be useful when a computer system has changed its configuration in such a way that a known state Y can never occur again.
  • the block diagram in Fig. 7 illustrates a detailed but non-limiting example of how a control node 700 may be structured to bring about the above-described solution and embodiments thereof.
  • the control node 700 may be configured to operate according to any of the examples and embodiments of employing the solution as described above, where appropriate, and as follows.
  • the control node 700 is shown to comprise a processor P and a memory M, said memory comprising instructions executable by said processor P whereby the control node 700 is operable as described herein.
  • the control node 700 also comprises a communication circuit C with suitable equipment for receiving and transmitting signals in the manner described herein.
  • the communication circuit C may be configured for communication with a server controller or the like in the computer system, using a suitable protocol depending on the implementation.
  • the solution and embodiments herein are thus not limited to using any specific types of networks, technology or protocols for communication.
  • the control node 700 is operable to perform at least some of the actions 200-214 in Fig. 2, and optionally also at least some of the operations and function described above for Figs 3-4.
  • the control node 700 is arranged or configured to enable detection of states in a computer system comprising one or more servers for processing data.
  • the control node 700 is configured to collect data generated in the computer system during a training phase. This operation may be performed by a collecting unit 700A in the control node 700, e.g. in the manner described for action 200 above.
  • the control node 700 is also configured to create an image from the collected data by translating the collected data into pixels according to predefined rules. This operation may be performed by a creating unit 700B in the control node 700, e.g. as described for action 202 above.
  • the control node 700 is also configured to assign a label to the created image representing a state in the computer system as reflected by the collected data. This operation may be performed by an assigning unit 700C in the control node 700, e.g. as described above for action 204.
  • the control node 700 is further configured to build a set of reference images with associated labels by repeating the preceding actions when different states occur in the computer system during said training phase. This operation may be performed by a building unit 700D in the control node 700, e.g. as described above for action 206.
  • the control node 700 is also configured to build a model for mapping images to said labels. This operation may be performed by the building unit 700D, e.g. as described above for action 208.
  • the control node 700 is also configured to provide the model for use in detecting a potential state in the computer system during a usage phase by applying the model to a new image created according to said predefined rules. Applying the model to the new image may produce a label which indicates the detected state.
  • This providing operation may be performed by a providing unit 700E in the control node 700, e.g. as described above for action 210.
  • An optional detection unit, not shown, in the control node 700 may also be configured to detect a potential state in the computer system during a usage phase by applying the model on a new image created according to said
  • a detection unit may reside in another node and not in the above-described control node 700, such that the training phase would be executed by the control node 700 and the usage phase would be executed by the other node based on the reference images and model provided by the control node 700, e.g. in a suitable storage that the other node can access.
  • Fig. 7A illustrates various functional units or modules in the control node 700, and the skilled person is able to implement these functional units or modules in practice using suitable software and hardware.
  • the solution is generally not limited to the shown structures of the control node 700, and the functional units or modules 700A-E therein may be configured to operate according to any of the features and embodiments described in this disclosure, where appropriate.
  • the functional units or modules 700A-E described above can be implemented in the control node 700 by means of suitable hardware and program modules of a computer program comprising code means which, when run by the processor P causes the control node 700 to perform at least some of the above-described actions and procedures.
  • control node 700 comprises the functional units or modules 700A-E and a processor P, the units or modules 700A-E being configured to operate in the manner described above e.g. with reference to Fig 2.
  • the processor P may comprise a single Central
  • the processor P may include a general purpose microprocessor, an instruction set processor and/or related chip sets and/or a special purpose microprocessor such as an Application Specific Integrated Circuit (ASIC).
  • ASIC Application Specific Integrated Circuit
  • the processor P may also comprise a storage for caching purposes.
  • a computer program 700F is also provided comprising instructions which, when executed on at least one processor, cause the at least one processor to carry out either of the methods described above.
  • a carrier is further provided that contains the above computer program 700F, wherein the carrier comprises an electronic signal, an optical signal, a radio signal, or a computer readable storage medium 700G, the latter shown in Fig. 7.
  • the computer program 700F may be stored on the computer readable storage medium 700G in the form of computer program modules or the like.
  • the memory M may be a flash memory, a Random-Access Memory (RAM), a Read-Only Memory (ROM), an Electrically Erasable Programmable ROM (EEPROM) or hard drive storage (HDD), and the program modules could in alternative embodiments be distributed on different computer program products in the form of memories within the control node 700.
  • RAM Random-Access Memory
  • ROM Read-Only Memory
  • EEPROM Electrically Erasable Programmable ROM
  • HDD hard drive storage

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Image Analysis (AREA)

Abstract

A method and control node (100) for enabling detection of states in a computer system (102) comprising one or more servers for processing data. Reference images are created (1 :2) from data collected (1 :1) from the computer system during a training phase, by translating the data into pixels according to predefined rules. A label is further assigned to each reference image to represent a state in the computer system as reflected by the collected data. A model is also built (1 :3) for mapping images to said labels. The model is then provided (1 :4) for use in detecting (1 :8) a potential state in the computer system (102) during a usage phase by creating (1 :6) a new image according to said predefined rules and applying (1 :7) the model on the new image to find a label that indicates the detected state.

Description

METHOD AND CONTROL NODE FOR ENABLING DETECTION OF STATES IN
A COMPUTER SYSTEM
Technical field
The present disclosure relates generally to a method and a control node for enabling detection of states in a computer system comprising one or more computers for processing data.
Background
In the field of data processing, performance and various functions in a computer system or the like are generally monitored by collecting various data generated in the computer system and analyzing said data in order to detect and find a cause to any problem or incident that has occurred. When detecting some state or incident in the computer system that could deteriorate the performance in some way, an alarm may be issued to notify a system operator about a potential problem which may need to be resolved or at least addressed by taking some action in the computer system. For example, an observed state may indicate that a potential problem may be forthcoming and this state may in that case be handled so that the actual problem can be proactively avoided, reduced or otherwise addressed.
In this disclosure, the term "computer system" is used to represent one or more computers, servers or similar entities for handling data, which are operative to perform tasks that involve some kind of data processing. For example, a computer system may be comprised of a cluster of computers or servers in a data center which could be useful in a manufacturing or administration facility and/or for providing cloud services for clients, to mention some illustrative examples although the following description is not limited to thereto. The computer system may further be used for managing a communications network such as a wireless network.
However, it is a problem that huge amounts of data is typically generated in computer systems which makes it very laborious to detect states and incidents by reviewing and analyzing the data, e.g. available in log files or the like, which work typically must be done manually. This approach may work well for a very small system with a limited number of log files preferably having a homogeneous format. However, for larger systems, e.g. using multiple formats, the manual exploration of such log files, or automatic exploration based on man-made rules, the analyzing work and the overwhelming amount of possibilities and combinations of data from various sources, such as individual computers, radio processors, memory, disks, software, etc., becomes more or less impossible to manage. For example, cloud services executed in large data centers are becoming very popular and the above problems of identifying a certain state and finding its cause is very difficult using the above manual approach in such large data centers or computer systems where huge amounts of data is constantly generated and processed. Summary
It is an object of embodiments described herein to address at least some of the problems and issues outlined above. It is possible to achieve this object and others by using a method and a control node as defined in the attached
independent claims. According to one aspect, a method is performed by a control node for enabling detection of states in a computer system comprising one or more servers for processing data. In this method, the control node collects data generated in the computer system during a training phase, and creates an image from the collected data by translating the collected data into pixels according to predefined rules. The control node further assigns a label to the created image representing a state in the computer system as reflected by the collected data. This way, the control node builds set of reference images with associated labels by repeating the preceding actions when different states occur in the computer system during said training phase. The control node also builds a model for mapping the reference images to said labels, and provides the model for use in detecting a potential state in the computer system during a usage phase by applying the model on a new image created according to said predefined rules.
According to another aspect, a control node is arranged to enable detection of states in a computer system comprising one or more servers for processing data. The control node is configured to collect data generated in the computer system during a training phase, and to create an image from the collected data by translating the collected data into pixels according to predefined rules. The control node is further configured to assign a label to the created image representing a state in the computer system as reflected by the collected data, and to build a set of reference images with associated labels by repeating the preceding actions when different states occur in the computer system during said training phase.
The control node is also configured to build a model for mapping the reference images to said labels, and to provide the model for use in detecting a potential state in the computer system during a usage phase by applying the model on a new image created according to said predefined rules.
The above method and control node may be configured and implemented according to different optional embodiments to accomplish further features and benefits, to be described below. It is an advantage that different states in the computer system can be detected automatically and on a continuous basis, basically without requiring any manual work for analyzing large amounts of data generated in computer system. Another advantage is that the data is translated into more easily handled images which are effectively a compressed version of the data requiring less resources for storing. Still another advantage is that if the data is sensitive to exposure, e.g. in terms of privacy or integrity, the images will effectively encode, or "hash", the data so as to protect it from explicit exposure.
A computer program is also provided which comprises instructions which, when executed on at least one processor, cause the at least one processor to carry out the method described above. A carrier containing the above computer program is further provided, wherein the carrier is one of an electronic signal, an optical signal, a radio signal, or a computer readable storage medium.
Brief description of drawings
The solution will now be described in more detail by means of exemplary embodiments and with reference to the accompanying drawings, in which: Fig. 1 A is a scenario illustrating that data generated in a computer system is translated into reference images in a training phase, according to some possible embodiments.
Fig. 1 B is a scenario illustrating that the reference images created in Fig. 1A are used for detection of a state in a usage phase, according to further possible embodiments.
Fig. 2 is a flow chart illustrating a procedure that may be performed in a control node, according to further possible embodiments.
Fig. 3 is a diagram illustrating an example of how a control node may operate when the solution is used for creating reference images, according to further possible embodiments.
Fig. 3A is a diagram illustrating in more detail how the control node may perform one of the actions of Fig. 3, according to further possible embodiments.
Fig. 4 is another diagram illustrating an example of how the control node may operate when the solution is used for detecting a state, according to further possible embodiments.
Fig. 5 is a schematic diagram illustrating an example of how pixels may be defined using an ARGB (Alpha, Red, Green, Blue) format, according to further possible embodiments Fig. 6 is a table illustrating an example of how labels may be assigned to reference images when the solution is used for a base station of a wireless network, according to further possible embodiments.
Fig. 7A is a block diagram illustrating an example of how a control node may be configured, according to further possible embodiments. Fig. 7B is a block diagram illustrating another example of how a control node may be configured, according to further possible embodiments. Detailed description
Briefly described, a solution is provided for enabling detection of states in a computer system in which one or more servers are used for processing data, e.g. in a data center for cloud services or the like, or in a network for
telecommunication such as a wireless network. Such detection is enabled as follows. During a so-called "training phase", reference images are created from data generated in the computer system, by translating the data into pixels according to certain predefined rules which may be defined in the form of a
"metadata map", to be described below. A label is further assigned to each reference image to represent a state in the computer system as reflected by the data being translated into the image. A model for mapping images to the labels is also built, e.g. by training a neural network.
The term "model" as used herein thus represents a mapping function or the like for mapping images to labels representing states in the computer system. Further, the term "state" indicates a certain situation, condition or characteristic of the computer system which may potentially need to be addressed or handled in some way, e.g. in order to avoid a problem or shortcoming in the system caused by a detected state.
The model can then be provided and later used for detecting a potential
subsequent state in the computer system, which operation is referred to as a "usage phase", by creating a new image according to the same predefined rules and applying the model on the new image to find a label that indicates the state. The outcome of this operation is thus a label derived from the new image by means of the model, i.e. the label that was assigned to a reference image that effectively matches the new image, which label is a classification of the state that generated the data used for creating the new image. As a result, the obtained label indicates or describes, i.e. classifies, the detected state in the computer system. Thereby, it is an advantage that detection of different states in the computer system is made completely automatic and can be executed on a continuous basis, such that no manual work is required for analyzing large amounts of data generated in computer system. The term "class" could be used throughout this description instead of label.
In this context, a state in the computer system may refer to a power failure, a faulty component, a malfunction, a high workload, to mention some illustrative but non- limiting examples. A malfunction in this context may e.g. be a misconfiguration of software or a "bug" in a computer program. Any of the above examples may cause a "situation" or problem that may need to be solved or at least addressed in some way. It may therefore be of interest to detect such states or situations as soon as possible so that an operator can take suitable action, even proactively before any problem becomes serious and harmful in some sense. For example, an alarm or other suitable notification may be issued automatically, at least when detecting certain potentially harmful states, so that personnel can be alerted to take suitable action in the computer system.
The above-described training phase may continue so as to overlap the usage phase, wherein the model, e.g. a neural network, may continue to be further trained at the same time as the model is also used in practice for detecting states from new images. In that case the usage phase may be employed in the manner of a validation procedure where the reference images and the trained model are validated by creating the new images and detecting states therefrom. For example, retraining of the model may be triggered at regular intervals, or when new images are added such as when adding at least a predefined amount of new images to the set of reference images.
The solution will now be described in terms of functionality in a "control node" which term is used herein to generally represent a logical entity that is capable of performing at least the above-described procedure in the training phase, and optionally also in the usage phase. The control node may be integrated in the computer system or it may be a separate entity that can be connected to the computer system, e.g. temporarily, in order to execute the various actions described herein. It was mentioned above that the reference images and the model are created, i.e. built, in the above-mentioned training phase during which the reference images and the model may be created and refined based on data collected during multiple succeeding time intervals. Thereby, the reference images and the model can be gradually improved and "stabilized" for each time interval in the manner of training a neural network.
Fig. 1A illustrates how a control node 100 may operate during such a training phase. In a first action 1 :1 , the control node 100 collects data that is generated by various data sources in a computer system 102, e.g. during different successive time intervals. The data sources may include log files, CSV (Comma-Separated Values) files, an SQL (Structured Query Language) database, etc., although the solution is not limited to these examples. A further action 1 :2 illustrates that the control node 100 creates reference images from the collected data by translating the collected data into pixels according to predefined rules. For example, the predefined rules may dictate how the collected data shall be translated into pixels in the image so that the image is built by different pixels representing different data values. It will be described in more detail below some non-limiting examples of how this translation could be done. The collected data may also be combined into a so-called "data frame" which is then translated into a reference image. For example, a plurality of collected data values may be combined into one representative average value in the data frame, or more generally a useful representation of the data values. A label is also assigned to each reference image in this action, so that the label represents or indicates a state in the computer system that is more or less reflected by the collected data. Effectively, the label of a reference image can be seen as a classification of that reference image and the state it represents.
Another action 1 :3 further illustrates that the control node 100 builds a model which can be used for mapping images to the above labels. This model may be created by training a neural network which is a technique used in the field of machine learning, also known as "Deep Learning". In this context, training a neural network means finding the weight for each neuron or node that minimizes a Mean Square Error, MSE, between the output of the network after propagating an image into it and a label to define. Briefly described, if X denotes a collection of images and y denotes a vector of labels, and the neural network is a function f with N weights w1 through wN, the best function f(w1 , wN) that maps an image to a label or class can be determined by minimizing MSE(f(X),y) over w1 , wN. The pixels of the image are thus used as input to the neural network and a resulting label or class is the output from the neural network. The above-mentioned model would then be the resulting function f(w1 , wN). Throughout this description, The term "model" could be replaced by the term "mapping function", thus referring to a function for mapping images, as defined by pixels, to labels representing states in the computer system .
A final action 1 :4 illustrates that the control node 100 basically provides the model built in action 1 :3, for use in detecting a potential state in the computer system 102 during the usage phase. This action may be realized by saving the model in a suitable storage 100A which can be accessed during the usage phase, e.g. by the control node 100 itself or by another node that performs the usage phase.
Fig. 1 B illustrates how the above-mentioned usage phase can be executed, which in this example is done by the control node 100 although it could also be done by another node as mentioned above. In a first action 1 :5 of the usage phase, the control node 100 collects further data that is generated by various data sources in the computer system 102, which may be performed basically in the same manner as action 1 :1 above. It is assumed that it would be of interest to detect whether the computer system 102 is in any particular state that is reflected by the collected data, e.g. to be able to adapt operation of the computer system 102 to the detected state which may be done to avoid, eliminate, reduce or otherwise address some problem or issue implied by the detected state.
In a next action 1 :6, the control node 100 creates a new image from the collected data by translating the data into pixels according to the same predefined rules that were used in the creation of reference images in action 1 :2. In a following action 1 :7, the control node 100 retrieves the model from storage 100A and applies it on the new image, which produces a label that indicates a state in the computer system 102. In this action, the new image can be used accordingly by applying the model on the new image to find a label that indicates or classifies a potential state in the computer system. In other words, by using the new data/image as input, the model is able to predict a label/state, at least within a certain accuracy. The resulting detected state is thus obtained as indicated by a final action 1 :8. An example will now be described, with reference to the flow chart in Fig. 2, of how the solution may be employed in terms of actions which may be performed by the above-mentioned control node. Some optional example embodiments that could be used in this procedure will also be described below. Reference will also be made, without limiting the described features and embodiments, to the example scenario shown in Figs 1A-B. This procedure can be performed by a control node 100 for enabling detection of states in a computer system comprising one or more servers for processing data.
In a first action 200, the control node 100 collects data generated in the computer system during a training phase, which corresponds to the above action 1 :1 . Some non-limiting examples of data sources in a computer system from which data can be collected, have been mentioned above. In a next action 202, the control node 100 creates an image from the collected data by translating the collected data into pixels according to predefined rules, which corresponds to the above action 1 :2. Some examples of how an image may be created as of this action will be described in more detail below with reference to Figs 3-5.
In a next action 204, the control node 100 also assigns a label to the created image representing a state in the computer system 102 as reflected by the collected data. This may be done by knowing which state the computer system 102 was in when the data was collected, or has entered immediately after the data was collected, and assigning a suitable, e.g. descriptive, name for that state as the label for the image. For example, such a label may be named "power failure", "cell down", "malfunction in server X", etc., depending on the state.
In a next action 206, the control node 100 builds a set of reference images with associated labels by repeating the preceding actions at different points when different states occur in the computer system during said training phase. For example, actions 200-206 may be repeated at successive time intervals such that new data is collected at each time interval as a basis for creating a new image or for validating or confirming an already created image.
In a next action 208, the control node 100 further builds a model for mapping the reference images to said labels. This model may be built by training a neural network, which has been described above for action 1 :3. As said above, the model may also be denoted mapping function.
In a final action 210 of the training phase, the control node 100 provides the model, e.g. by saving it in a suitable storage 100A, for use in detecting a potential state in the computer system 102 during a usage phase by applying the model on a new image created according to said predefined rules. Action 210 corresponds to the above action 1 :4. Thereby, the usage phase can be executed by using the model on a new image for finding a corresponding label that indicates a new state, either by the control node 100 itself or by another node that has access to the set of reference images and the model.
The remaining actions in Fig. 2 are thus performed during the usage phase and in this example it is the control node 100 that performs the following, although they could also be performed by another node as indicated above. In a further action 212, thus starting the usage phase, the control node 100 uses new collected data to create a new image according to the same predefined rules that were used in action 202 above which corresponds to the above action 1 :6. Thereby, the new image is effectively created in the same manner as the reference images and the model can be used to find a label that indicates or classifies the new image and thereby a potential state in the computer system resulted by the new data. Another action 214 illustrates accordingly that the control node 100 detects a potential state in the computer system by applying the model built in action 208 on the new image, which basically corresponds to the above actions 1 :7 - 1 :8. A final optional action 216 indicates that a suitable notification or alarm may be
generated, e.g. in case the detected state needs to be announced and addressed in some manner. Some further embodiments and examples of how the above procedure in Fig. 2 may be realized, will now be outlined. In one example embodiment, building the model may comprise training a neural network, and one possible way of doing this has been described above for action 1 :3. In another example embodiment, applying the model on the new image may comprise applying the neural network to the new image. In this embodiment, the pixels of the new image may be fed into the neural network and the resulting output from the neural network will then be the label of the new image.
Fig. 3 illustrates an example of an operation flow executed by the control node 100 for realizing the above actions 200-206, as follows. First, data is collected from the server network in action 300. Then, the collected data is combined in action 302, to form a single data frame which is used for creating an image in action 304, by translating the data frame into pixels. An example of how an image could be created in this way will be described below with reference to Fig. 4. Since the amount of data may be huge, it is practical to combine that data into a more condensed or summarized data set, i.e. the data frame created in action 302 which may be referred to as a "blob". For example, hundreds of collected data values such as fluctuating measurements or the like, may be translated into a single pixel, according to the afore-mentioned predefined rules, that somehow describes or represents those data values. Thereby, the created image can be seen as a compressed version of the collected data set.
In addition to action 304, labels representing different states of the computer system are defined in action 306. These labels are assigned to the images from action 304 and a set of reference images with assigned labels is built in action 308. The created images and their assigned labels are then used for training a neural network for the above-described model for mapping images to labels, in an action 310. Such training may comprise feeding each image's pixels into an algorithm or function and minimizing the MSE to obtain the corresponding labels, as described above. It should be noted that several different data sources may be used in this procedure, such as: CSV, SQL, log files, metrics, ESR, etc. The data frame of action 302 may have the following features:
Each data frame may have several X values and a single Y value. The Y value is the result in a classical mathematical sense of f(x)=y, where y is effectively a description of the state and its corresponding label.
Basically each row of the data frame may have one Y value and a huge amount of X-values/column/feature. In addition to creating the data frame that contains only the combined data, a "metadata map" may be built to keep information between an entry in the data frame and its type. That metadata map can be used next to generate a "color map" which dictates which colors to use. It may only be necessary to generate the metadata map once. The metadata map and optionally also the color map basically refer to the above-mentioned predefined rules for translating data into pixels. The combining of data may be performed repeatedly at each time interval t so that different snapshots of the computer system are produced at regular intervals.
In another example embodiment, the new image may be created from data generated in the computer system during the usage phase by translating the generated data into pixels according to said predefined rules. In another example embodiment, applying the model on a new image may produce a label which may classify the detected potential state.
In another example embodiment, the collected data may comprise multiple data values which are translated into pixels according to said predefined rules. In another example embodiment, the pixels may be defined by at least one of: color and transparency. In another example embodiment, the pixels may further have an ARGB (Alpha, Red, Green, Blue) format with N positions where a numerical value is assigned to each ARGB position. In this scheme, the Alpha value is used to indicate the transparency. When the latter embodiment is used, another example embodiment may be that each collected data value can be translated into M pixels to obtain up to M2N different numbers. In another example embodiment, each data value may indicate an observed or measured feature in the computer system according to a predefined metadata map.
In another example embodiment, each reference image may represent data generated in the computer system during a specific time frame so that the reference image is in that case created from data generated during said specific time frame. In this case, the whole image represents a single "snapshot" in time of the computer system.
In another example embodiment, each reference image may comprise multiple rows of pixels where each pixel row represents data generated in the computer system during a specific time frame so that the reference image is in that case created from data generated during a series of consecutive time frames. In this case, each row in the image represents a snapshot in time of the computer system such that the whole image represents a series of snapshots.
In another example embodiment, different columns in the pixel rows may have different weights to indicate different importance to the data represented by respective pixels. It should be noted that weighting can be used in the above- mentioned predefined rules for translating the collected data into pixels, such that some columns will be important compared to others by giving those columns more effect by higher weight. This can be achieved by allocating more pixels for that column, or by increasing the alpha value. These weights can influence the rules used to generate an image from the row. It was mentioned above that each row in the image may represent a snapshot in time of the computer system such that the whole image represents a series of snapshots which are based on data collected in a corresponding time frame. In this case a data frame may be created from data in each snapshot or time frame. Fig. 3A illustrates another example of an operation flow executed by the control node 100 for realizing the above action 304 of creating an image from data combined in such multiple data frames associated with a series of time frames, as follows.
First, a metadata map is generated from a data frame in action 304A, the metadata map indicating which type of data there is in each entry in the data frame. Then in action 304B, a pixel is assigned to each entry in the data frame, to generate an image row of pixels. In action 304C the image is built by adding the image row. The procedure then repeats actions 304A-304C for data collected during a next time frame, as indicated by a dashed arrow, thus producing a new data frame and corresponding image row, until the image contains a desired set of rows representing a number of corresponding time frames.
Fig. 4 illustrates another example of an operation flow executed, e.g. by the control node 100 or by another node, for realizing the above actions 212-214 of detecting a state in the computer system in the usage phase in case the operation flows of Figs 3 and 3A have been executed. The procedure in Fig. 4 may also be referred to as "classification" of a detected state.
First, data is collected from the server network in action 400, which can be performed as in action 300. Then in action 402, the collected data is combined to produce a data frame, which can be performed in the same manner as in action 302. The data frame is then used for creating a new image in action 404, by translating the data frame into pixels, using the same predefined rules for translation that were used in action 304. Next in action 406, the neural network trained in action 310 is applied on the created new image, which produces a label that indicates a detected state in the computer system.
It was mentioned above that the pixels may have an ARGB format with N positions where a numerical value is assigned to each ARGB position. An example of how the pixels can be defined in this manner is illustrated in Fig. 5 where Alpha, Red, Green, Blue are referred to as "channels". In Fig. 5 there are 0-31 possible positions in total with 8 positions in each channel. ARGB values are typically expressed using 8 hexadecimal digits, with each pair of the hexadecimal digits representing values of the Alpha, Red, Green and Blue channel, respectively. The ARGB format as such is well-known in the art for defining pixels, which is thus not necessary to describe in detail here.
It has been described above that a set of reference images with associated labels is built during the training phase. The table in Fig. 6 illustrates some examples of how state-describing labels may be assigned to 4 different reference images created from different data sets generated in the computer system.
Image 1 represents the state "cell down" to indicate that this state means that a certain cell of a wireless or cellular network is "down", i.e. not active, for whatever reason. Image 2 represents the state "cell up" to indicate that the cell is active. Images 3 and 4 are two different images both representing the same state
"power failure" to indicate that the computer system, or a certain part thereof, has no power and is therefore not active. It should be noted that two different sets of data can result in the same state, as illustrated by Images 3 and 4.
One approach for creating It was described above that an image may be created by generating multiple rows, one for each collection time frame. This way, a temporal representation of the computer system can be built. Temporal correlation between different states of the computer system and its dynamics may also be explored as follows. It is possible to represent the state of the system by its time evolution between a first time t and a second time t+T_STATE where T STATE = T_DATA * n
T_STATE represents the maximal time correlation that can be observed. As a consequence, the resulting image will have n rows.
Another approach to create an image for a specific time t is to use information about the computer system to build the picture. When transforming the data to an image, the correlation between different parts of the data may be reflected as well. Such correlations may be present and therefore visible in the image. The dimensions of the data may be translated to the generated image. For instance, information about time, location, network topology, software architecture are typically available in generated data sets. These parameters can be transformed and represented by the dimensions available in an image such as X, Y, color and transparence, number of pixel used for each data value, etc.
A simplified example to illustrate the above dimension aspects may be produced when a simple data set is transformed into the image and the relationship between the different values in the same column is also reflected in the image. In that case, different levels of log entries can be represented as different colors and the network topology can be represented as X and Y distance of the pixels. In conclusion, many different types of information can be embedded or encoded in the image when the above-described solution is employed. Another feature that could be applied in the solution is to add new images that are labeled to the set of reference images, while some labels may be deleted because the associated images are never used. For example, it may be counted how many times a state Y in the metadata map is detected, in order to keep a record of the usefulness of this state Y. This procedure could be useful when a computer system has changed its configuration in such a way that a known state Y can never occur again.
The block diagram in Fig. 7 illustrates a detailed but non-limiting example of how a control node 700 may be structured to bring about the above-described solution and embodiments thereof. The control node 700 may be configured to operate according to any of the examples and embodiments of employing the solution as described above, where appropriate, and as follows. The control node 700 is shown to comprise a processor P and a memory M, said memory comprising instructions executable by said processor P whereby the control node 700 is operable as described herein. The control node 700 also comprises a communication circuit C with suitable equipment for receiving and transmitting signals in the manner described herein.
The communication circuit C may be configured for communication with a server controller or the like in the computer system, using a suitable protocol depending on the implementation. The solution and embodiments herein are thus not limited to using any specific types of networks, technology or protocols for communication. The control node 700 is operable to perform at least some of the actions 200-214 in Fig. 2, and optionally also at least some of the operations and function described above for Figs 3-4.
The control node 700 is arranged or configured to enable detection of states in a computer system comprising one or more servers for processing data. The control node 700 is configured to collect data generated in the computer system during a training phase. This operation may be performed by a collecting unit 700A in the control node 700, e.g. in the manner described for action 200 above.
The control node 700 is also configured to create an image from the collected data by translating the collected data into pixels according to predefined rules. This operation may be performed by a creating unit 700B in the control node 700, e.g. as described for action 202 above.
The control node 700 is also configured to assign a label to the created image representing a state in the computer system as reflected by the collected data. This operation may be performed by an assigning unit 700C in the control node 700, e.g. as described above for action 204.
The control node 700 is further configured to build a set of reference images with associated labels by repeating the preceding actions when different states occur in the computer system during said training phase. This operation may be performed by a building unit 700D in the control node 700, e.g. as described above for action 206. The control node 700 is also configured to build a model for mapping images to said labels. This operation may be performed by the building unit 700D, e.g. as described above for action 208.
The control node 700 is also configured to provide the model for use in detecting a potential state in the computer system during a usage phase by applying the model to a new image created according to said predefined rules. Applying the model to the new image may produce a label which indicates the detected state. This providing operation may be performed by a providing unit 700E in the control node 700, e.g. as described above for action 210. An optional detection unit, not shown, in the control node 700 may also be configured to detect a potential state in the computer system during a usage phase by applying the model on a new image created according to said
predefined rules, e.g. as described above for actions 212 and 214. Alternatively, such a detection unit may reside in another node and not in the above-described control node 700, such that the training phase would be executed by the control node 700 and the usage phase would be executed by the other node based on the reference images and model provided by the control node 700, e.g. in a suitable storage that the other node can access. It should be noted that Fig. 7A illustrates various functional units or modules in the control node 700, and the skilled person is able to implement these functional units or modules in practice using suitable software and hardware. Thus, the solution is generally not limited to the shown structures of the control node 700, and the functional units or modules 700A-E therein may be configured to operate according to any of the features and embodiments described in this disclosure, where appropriate.
The functional units or modules 700A-E described above can be implemented in the control node 700 by means of suitable hardware and program modules of a computer program comprising code means which, when run by the processor P causes the control node 700 to perform at least some of the above-described actions and procedures.
Another example of how the control node 700 may be configured is shown schematically in the block diagram of Fig. 7B. In this example, the control node 700 comprises the functional units or modules 700A-E and a processor P, the units or modules 700A-E being configured to operate in the manner described above e.g. with reference to Fig 2.
In either Fig. 7 or Fig. 7A, the processor P may comprise a single Central
Processing Unit (CPU) or Graphics Processing Unit (GPU), or could comprise two or more processing units such as CPUs or GPUs. For example, the processor P may include a general purpose microprocessor, an instruction set processor and/or related chip sets and/or a special purpose microprocessor such as an Application Specific Integrated Circuit (ASIC). The processor P may also comprise a storage for caching purposes.
A computer program 700F is also provided comprising instructions which, when executed on at least one processor, cause the at least one processor to carry out either of the methods described above. A carrier is further provided that contains the above computer program 700F, wherein the carrier comprises an electronic signal, an optical signal, a radio signal, or a computer readable storage medium 700G, the latter shown in Fig. 7. For example, the computer program 700F may be stored on the computer readable storage medium 700G in the form of computer program modules or the like. The memory M may be a flash memory, a Random-Access Memory (RAM), a Read-Only Memory (ROM), an Electrically Erasable Programmable ROM (EEPROM) or hard drive storage (HDD), and the program modules could in alternative embodiments be distributed on different computer program products in the form of memories within the control node 700.
While the solution has been described with reference to specific exemplifying embodiments, the description is generally only intended to illustrate the inventive concept and should not be taken as limiting the scope of the solution. For example, the terms "control node", "computer system", "state", "training phase", "usage phase", "reference image", "label" and "metadata map" have been used throughout this disclosure, although any other corresponding entities, functions, and/or parameters could also be used having the features and characteristics described here. The solution is defined by the appended claims.

Claims

1 . A method performed by a control node (100) for enabling detection of states in a computer system comprising one or more servers for processing data, the method comprising: - collecting (200) data generated in the computer system during a training phase,
- creating (202) an image from the collected data by translating the collected data into pixels according to predefined rules,
- assigning (204) a label to the created image representing a state in the computer system as reflected by the collected data, - building (206) a set of reference images with associated labels by repeating the preceding actions when different states occur in the computer system during said training phase,
- building (208) a model for mapping the reference images to said labels, and
- providing (210) the model for use in detecting (214) a potential state in the computer system during a usage phase by applying the model on a new image created (212) according to said predefined rules.
2. A method according to claim 1 , wherein building the model comprises training a neural network.
3. A method according to claim 2, wherein applying the model on the new image comprises applying the neural network to the new image.
4. A method according to any of claims 1 -3, wherein the new image is created from data generated in the computer system during the usage phase by translating the generated data into pixels according to said predefined rules.
5. A method according to any of claims 1 -4, wherein applying the model on a new image produces a label which classifies the detected potential state.
6. A method according to any of claims 1 -5, wherein the collected data comprises multiple data values which are translated into pixels according to said predefined rules.
7. A method according to claim 6, wherein the pixels are defined by at least one of: colour and transparency.
8. A method according to claim 7, wherein the pixels have an ARGB (Alpha, Red, Green, Blue) format with N positions where a numerical value is assigned to each ARGB position.
9. A method according to claim 8, wherein each collected data value is translated into M pixels to obtain up to M2N different numbers.
10. A method according to any of claims 6-9, wherein each data value indicates an observed or measured feature in the computer system according to a predefined metadata map.
1 1 . A method according to any of claims 1 -10, wherein each reference image represents data generated in the computer system during a specific time frame so that the reference image is created from data generated during said specific time frame.
12. A method according to any of claims 1 -10, wherein each reference image comprises multiple rows of pixels where each pixel row represents data generated in the computer system during a specific time frame so that the reference image is created from data generated during a series of consecutive time frames.
13. A method according to claim 12, wherein different columns in the pixel rows have different weights to indicate different importance to the data
represented by respective pixels.
14. A control node (700) arranged to enable detection of states in a computer system comprising one or more servers for processing data, wherein the control node (700) is configured to: - collect (700A) data generated in the computer system during a training phase,
- create (700B) an image from the collected data by translating the collected data into pixels according to predefined rules,
- assign (700C) a label to the created image representing a state in the computer system as reflected by the collected data,
- build (700D) a set of reference images with associated labels by repeating the preceding actions when different states occur in the computer system during said training phase,
- build (700D) a model for mapping the reference images to said labels, and - provide (700E) the model for use in detecting (212) a potential state in the computer system during a usage phase by applying the model on a new image created according to said predefined rules.
15. A control node (700) according to claim 14, wherein building the model comprises training a neural network.
16. A control node (700) according to claim 15, wherein applying the model on the new image comprises applying the neural network to the new image.
17. A control node (700) according to any of claims 14-16, wherein the new image is created from data generated in the computer system during the usage phase by translating the generated data into pixels according to said predefined rules.
18. A control node (700) according to any of claims 14-17, wherein applying the model on a new image produces a label which classifies the detected potential state.
19. A control node (700) according to any of claims 14-18, wherein the collected data comprises multiple data values which are translated into pixels according to said predefined rules.
20. A control node (700) according to claim 19, wherein the pixels are defined by at least one of: colour and transparency.
21 . A control node (700) according to claim 20, wherein the pixels have an ARGB (Alpha, Red, Green, Blue) format with N positions where a numerical value is assigned to each ARGB position.
22. A control node (700) according to claim 21 , wherein each collected data value is translated into M pixels to obtain up to M2N different numbers.
23. A control node (700) according to any of claims 19-22, wherein each data value indicates an observed or measured feature in the computer system according to a predefined metadata map.
24. A control node (700) according to any of claims 14-23, wherein each reference image represents data generated in the computer system during a specific time frame so that the reference image is created from data generated during said specific time frame.
25. A control node (700) according to any of claims 14-23, wherein each reference image comprises multiple rows of pixels where each pixel row
represents data generated in the computer system during a specific time frame so that the reference image is created from data generated during a series of consecutive time frames.
26. A control node (700) according to claim 25, wherein different columns in the pixel rows have different weights to indicate different importance to the data represented by respective pixels.
27. A computer program (700F) comprising instructions which, when executed on at least one processor (P), cause the at least one processor (P) to carry out the method according to any one of claims 1 -13.
A carrier containing the computer program (700F) of claim 27, wherein the carrier is one of an electronic signal, optical signal, radio signal, or computer readable storage medium (700G).
PCT/EP2017/055211 2017-03-06 2017-03-06 Method and control node for enabling detection of states in a computer system WO2018162034A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/EP2017/055211 WO2018162034A1 (en) 2017-03-06 2017-03-06 Method and control node for enabling detection of states in a computer system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP2017/055211 WO2018162034A1 (en) 2017-03-06 2017-03-06 Method and control node for enabling detection of states in a computer system

Publications (1)

Publication Number Publication Date
WO2018162034A1 true WO2018162034A1 (en) 2018-09-13

Family

ID=58231621

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2017/055211 WO2018162034A1 (en) 2017-03-06 2017-03-06 Method and control node for enabling detection of states in a computer system

Country Status (1)

Country Link
WO (1) WO2018162034A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
RU2699685C1 (en) * 2018-12-18 2019-09-09 федеральное государственное бюджетное образовательное учреждение высшего образования "Южно-Российский государственный политехнический университет (НПИ) имени М.И. Платова" Method of analyzing and monitoring the state of a technical installation comprising a plurality of dynamic systems

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6327550B1 (en) * 1998-05-26 2001-12-04 Computer Associates Think, Inc. Method and apparatus for system state monitoring using pattern recognition and neural networks
EP1280298A1 (en) * 2001-07-26 2003-01-29 BRITISH TELECOMMUNICATIONS public limited company Method and apparatus of detecting network activity

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6327550B1 (en) * 1998-05-26 2001-12-04 Computer Associates Think, Inc. Method and apparatus for system state monitoring using pattern recognition and neural networks
EP1280298A1 (en) * 2001-07-26 2003-01-29 BRITISH TELECOMMUNICATIONS public limited company Method and apparatus of detecting network activity

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
FADI AL-KALANI ET AL: "Implementation of Image Encoding Based on RGB and ARGB Implementation of Image Encoding Based on RGB and ARGB Implementation of Image Encoding Based on RGB and ARGB", GLOBAL JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 30 June 2012 (2012-06-30), pages 1 - 5, XP055425522, Retrieved from the Internet <URL:https://computerresearch.org/index.php/computer/article/download/532/532> [retrieved on 20171115] *
LI YEN-HAN ET AL: "VISO: Characterizing Malicious Behaviors of Virtual Machines with Unsupervised Clustering", 2015 IEEE 7TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING TECHNOLOGY AND SCIENCE (CLOUDCOM), IEEE, 30 November 2015 (2015-11-30), pages 34 - 41, XP032859096, DOI: 10.1109/CLOUDCOM.2015.19 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
RU2699685C1 (en) * 2018-12-18 2019-09-09 федеральное государственное бюджетное образовательное учреждение высшего образования "Южно-Российский государственный политехнический университет (НПИ) имени М.И. Платова" Method of analyzing and monitoring the state of a technical installation comprising a plurality of dynamic systems

Similar Documents

Publication Publication Date Title
US10560465B2 (en) Real time anomaly detection for data streams
US11550829B2 (en) Systems and methods for load balancing in a system providing dynamic indexer discovery
CN109074377B (en) Managed function execution for real-time processing of data streams
US10778707B1 (en) Outlier detection for streaming data using locality sensitive hashing
US10073683B2 (en) System and method for providing software build violation detection and self-healing
US10051056B2 (en) Resource planning method, system, and apparatus for cluster computing architecture
US10361943B2 (en) Methods providing performance management using a proxy baseline and related systems and computer program products
CN111818159A (en) Data processing node management method, device, equipment and storage medium
US20210096977A1 (en) Handling of workload surges in a software application
JP2019523952A (en) Streaming data distributed processing method and apparatus
US20190182118A1 (en) Network Anomaly Detection
JP2019046469A (en) Visualization of variable data as image
CN109308309B (en) Data service quality assessment method and terminal
CN114861172A (en) Data processing method and system based on government affair service system
CN115392501A (en) Data acquisition method and device, electronic equipment and storage medium
CN114064402A (en) Server system monitoring method
JP2018525728A (en) A distributed machine learning analysis framework for analyzing streaming datasets from computer environments
CN113656369A (en) Log distributed streaming acquisition and calculation method in big data scene
WO2018162034A1 (en) Method and control node for enabling detection of states in a computer system
CN117194165A (en) Server performance monitoring method, device, computer equipment and storage medium
WO2023103350A1 (en) Information pushing method and apparatus, and storage medium
CN109800775B (en) File clustering method, device, equipment and readable medium
CN112966180B (en) Request processing method, apparatus, device, medium, and program product
US11050643B2 (en) Method for managing software service, and server
CN111274795B (en) Vector acquisition method, vector acquisition device, electronic equipment and computer readable storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17709058

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17709058

Country of ref document: EP

Kind code of ref document: A1