WO2010120737A1 - Apprentissage de comportement de programme pour une détection d'anomalie - Google Patents

Apprentissage de comportement de programme pour une détection d'anomalie Download PDF

Info

Publication number
WO2010120737A1
WO2010120737A1 PCT/US2010/030838 US2010030838W WO2010120737A1 WO 2010120737 A1 WO2010120737 A1 WO 2010120737A1 US 2010030838 W US2010030838 W US 2010030838W WO 2010120737 A1 WO2010120737 A1 WO 2010120737A1
Authority
WO
WIPO (PCT)
Prior art keywords
event
sequences
events
program
event sequences
Prior art date
Application number
PCT/US2010/030838
Other languages
English (en)
Inventor
Hiralal Agrawal
Clifford Behrens
Balakrishnan Dasarathy
Original Assignee
Telcordia Technologies, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US12/694,806 external-priority patent/US8522085B2/en
Application filed by Telcordia Technologies, Inc. filed Critical Telcordia Technologies, Inc.
Publication of WO2010120737A1 publication Critical patent/WO2010120737A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/28Error detection; Error correction; Monitoring by checking the correct order of processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2433Single-class perspective, e.g. one-against-all classification; Novelty detection; Outlier detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Definitions

  • the subject matter of the present application relates generally to techniques for machine learning of program behaviors by observing application level events.
  • One purpose for learning program behavior can be to support runtime anomaly detection, for example.
  • a method can be provided for learning behavior of a program.
  • a program can be executed while varying a plurality of stimuli provided to the program.
  • Stimuli typically are information received as input to the program.
  • the stimuli can affect results of executing the program.
  • Results of executing the program include events .
  • the method can include recording a multiplicity of sequences of events of different types.
  • the sequences of events may vary in one or more of a combination of the different types of events, in the order in which events occur in the sequence, or in the combination and in the order in which the different types of events occur.
  • at least one of the combination or the order in which the events occur in the sequence is determined by the results of executing the program.
  • the multiplicity of sequences can be arranged in a plurality of clusters based on similarities, e.g., edit distances, between the sequences of events.
  • the arrangement of an event sequence in a cluster can be performed in which all of the events in an event sequence are considered.
  • a plurality of signatures corresponding to the plurality of clusters can be determined, where each signature can be a sequence of events which is representative of a respective cluster.
  • Each of the plurality of signatures can be a benchmark representative of acceptable behavior of the program.
  • a computer- enabled method is provided for learning a behavior of a program.
  • a processor can execute a target program during a learning interval while varying a plurality of stimuli provided to the program, the stimuli affecting results of executing the program, so as to produce a multiplicity of different sequences of events which differ in the combination of types of events in respective sequences, an order in which the types of events occur in respective sequences, or in the combination and in the order in which the types of events occur.
  • the multiplicity of event sequences can be recorded, and a second program can be executed by a processor to (a) determine a plurality of clusters based on similarities between the event sequences; and (b) determine a plurality of signatures corresponding to the plurality of clusters, each signature being a sequence of events representative of a respective cluster.
  • each of the plurality of signatures can be a benchmark representative of acceptable behavior of the target program.
  • the method can include varying the stimuli in a multiplicity of ways exemplary of acceptable stimuli to produce event sequences representative of acceptable behavior of the target program, and steps (a) and (b) can be performed during a learning interval of executing the program.
  • the arranging of event sequences in clusters can be performed using a spatial clustering technique, among which is K- ⁇ neans clustering. Spatial clustering can be performed such that when two sequences have relatively small edit distance between them, the two sequences of events can be assigned to one and the same cluster.
  • principal component analysis can be performed on the matrix of edit distances between every pair of event sequences to reduce the number of dimensions for spatial clustering. In this way, the complexity of spatial clustering can be managed.
  • the determination of the signature of a respective cluster can include determining a longest common subsequence of events included in the event sequences of the cluster as the signature for such cluster.
  • the arranging of recorded event sequences in clusters can include finding event subsequences in loops which are repeated in at least ones of the event sequences, e.g., by finding sub-string structures in loops which are repeated therein, and generating linearized event sequences which are representative of the repeated sub-strings.
  • the arranging of the recorded event sequences can include arranging the linearized event sequences with the recorded event sequences in the plurality of clusters.
  • the finding of the repeated event subsequences can include inferring state information regarding the program by analyzing at least some of the recorded event sequences .
  • the determining of the clusters can be performed by considering the entireties of the event sequences .
  • a method can be performed which includes further executing the target program during an in-service interval after determining the clusters and determining the signatures of the clusters.
  • Such method can include detecting whether a given sequence of events observed during an in-service interval is anomalous based on a difference between the given sequence of events and cluster signatures.
  • a detected degree of difference between a given one of the sequences of events and the plurality of signatures is determined based on edit distance between the given sequence of events and cluster signatures .
  • signatures can be hierarchically ordered.
  • An edit distance can be determined between the given event sequence and one or more signatures. If the edit distance between the event sequence and a signature at a top of a hierarchically ordered group is sufficiently small, further determination can be made of edit distances between the event sequence and other signatures within the hierarchically ordered group to determine which signature is closest to the event sequence.
  • an information processing apparatus includes a processor and a set of instructions which are executable by the processor to perform a method such as described in the foregoing.
  • a computer-readable medium which has instructions recorded thereon, wherein the instructions are executable by a processor to perform a method such as described in the foregoing .
  • Fig. 1 is a schematic block diagram functionally- illustrating a system for learning the behavior of a program, and for detecting an anomaly during execution of a program, in accordance with an embodiment of the invention.
  • Fig 2A is a diagram illustrating an example of principal components analysis which can be performed in accordance with one embodiment herein. Figure 2A illustrates that the number of dimensions for spatial clustering can be reduced to two.
  • Fig. 2B is a diagram illustrating an arrangement of event sequences in a plurality of clusters, in accordance with an embodiment of the invention.
  • FIG. 3 is a flow diagram illustrating actions in a method of learning a behavior of a program, in accordance with an embodiment of the invention.
  • Fig. 4 is a flow diagram illustrating actions in a method of detecting an anomaly during execution of a program, in accordance with an embodiment of the invention.
  • Fig. 5 is a schematic block diagram functionally illustrating a system for learning the behavior of a program, and for detecting an anomaly during execution of a program, in accordance with a variation of the embodiment of the invention.
  • Fig. 6 is a diagram illustrating an inferred state machine showing repetitiveness of events produced.
  • Fig. 7 is a flow diagram illustrating actions in a method of learning the behavior of a program, in accordance with a variation of an embodiment of the invention.
  • Fig. 8 is a block and schematic diagram illustrating an information processing apparatus having a processor in accordance with an embodiment of the invention. DETAILED DESCRIPTION
  • a method for learning behavior of a program.
  • a program installed for use in a processor-enabled system e.g., computer or other system can be executed during a learning interval while varying a plurality of stimuli thereto.
  • the stimuli can include input information which affects the execution of the program.
  • Results of executing the program can include producing a multiplicity of different sequences of events, which then are recorded.
  • the sequences can differ in the combinations of types of events in respective sequences, the orders in which the types of events occur, or in both the combinations and in the orders in which the types of events occurs.
  • the recorded event sequences can be used in determining a plurality of clusters, and signatures can be determined for the respective clusters, where each signature can be sequence of events that is representative of a respective cluster. Each resulting signature can be a benchmark representative of acceptable behavior of the program.
  • the embodiments provided herein can be applied to learning the behavior of a variety of programs.
  • Some programs e.g., programs executing on server computers which deliver functionality to many clients or users, are intended to operate on a continuous or near- continuous basis. Such programs may be required to service many different types of requests and may need to respond in a predictable manner even when the input presented during the operation of such programs cannot be completely characterized or predicted in advance .
  • Programs executed by control systems that operate or monitor facilities, equipment, installations of computer or telecommunications devices or networks and the like, are among programs which have a need to perform predictably and reliably even in the face of unpredicted input thereto.
  • a computer's operating system is another program which has a strong need to perform predictably and reliably even when unpredicted input is received.
  • PBX office communications system
  • the letters PBX are an acronym for "private branch exchange”
  • the abbreviation commonly refers to many different types of analog, digital and combined analog and digital switching systems which provide telephone switching for offices or other communications within an office, facility or organization, even if such systems are not literally a "private branch exchange” .
  • Such systems also connect telephone calls between internal telephone locations and external carrier lines.
  • the stimuli can include a signal that a particular telephone in the office served by the system has gone off-hook, that a particular telephone has gone on-hook, and can be key input from a telephone keypad, for example.
  • a multiplicity of sequences of events are recorded which are determined by the results of executing a target program whose behavior is to be learned.
  • the recorded sequences of events are representative of, and can be considered manifestations of the behavior of the program.
  • the combination of events and the order of events within each recorded sequence of events are indicative of how the program responds to stimuli.
  • a sequence of events can include a sequence which occurs when a call is made from one extension of the PBX to another extension and a connection is established.
  • extension can refer to one of many numerically dial -able or otherwise addressable internal telephone locations served by the PBX.
  • a recorded event sequence could include, for example, the following: extension 1 goes off-hook (Event A) ; extension 1 dials a number assigned to extension 2 (Event B) ; extension 1 receives a ring-back tone from extension 2 (Event C) ; extension 2 rings (Event D) ; extension 2 goes off-hook (Event E) ; a ring-back tone ends at extension 1 (Event F) ; and the calling extension is now connected with the called extension for voice communication (Event G) .
  • the above-described sequence of events can manifest a normal intended behavior of the program, e.g., PBX control program.
  • the control program for a PBX can usually handle more than connecting one extension with another.
  • a control program may need to support services for voice response units ("VRUs"), call forwarding, voice messaging, and conferencing, etc.
  • VRUs voice response units
  • services to an extension are usually available via many different routes. For example, one extension may be dialed from another extension.
  • a voice messaging service can be available when the called extension does not pick up, or when the called extension is busy at the time.
  • a hypothetical example of malicious misuse of a PBX might be if a PBX could be used without authorization to connect a particular extension to an external carrier line to establish a telephone call to an international destination.
  • One way that the call might be placed without authorization is if the PBX allowed such call to originate not merely from an internal extension served by the PBX, but instead from a connection from an external location outside of the office.
  • An embodiment of the invention herein can provide a way of learning acceptable behavior of a program by executing the program during a learning interval and determining a plurality of signatures which can be benchmarks representative of acceptable behavior of the program.
  • a benefit of learning acceptable program behavior may be to detect possible malicious misuse of a PBX. In that way, it may be possible for a program executing in a system to block an attempt at misuse, or avoid possible harm from occurring by halting the further progress of an attempt to misuse the system.
  • Fig. 1 illustrates a system 100 which can be used to detect occurrence of an anomaly during the execution of a program.
  • the system can be used to detect the occurrence of an anomaly during the execution of a program that automatically controls a PBX.
  • System 100 can have two phases of operation: a learning interval in which a target program is executed by a processor to operate a target system 10 under relatively controlled conditions. During the learning interval, a multiplicity of event sequences is recorded, which can then be used by a Clustering Component 110 of the learning program to characterize acceptable behavior of the target system program.
  • a Run Time Anomaly Detector 20 can use the characterization of acceptable target program behavior made by the Clustering Component 110 to determine when an anomaly occurs during execution of the target system program.
  • the Clustering Component 110 can operate with respect to event strings 102, i.e., sequences of events observed during the operation of the target system 10, i.e., during the execution of a target program of target system 10 by a processor.
  • the event strings 102 can be generated during the learning interval of operation of the target system 10.
  • the behavior of a program can be represented by the particular sequences of events which occur. Some of the events can occur in response to stimuli.
  • the Clustering Component 110 can perform a key function in arranging the observable manifestations of behavior of the program, e.g., event sequences, into clusters.
  • the determination of a plurality of clusters based on the event sequences can be performed in which all of the events in an event sequence are considered.
  • This type of operation can be contrasted with techniques which consider only substrings of events of fixed length within a sliding window, e.g., substrings of two, three, four, or five events in length. Such techniques can be referred to as an n-gram technique.
  • Output 112 of the Clustering Component 110 are clusters and the signatures which correspond to the clusters.
  • the Clustering Component 110 can determine the plurality of clusters in the following way.
  • the clustering process can be performed based on edit distances among event sequence strings.
  • the edit distances among all the event sequences recorded during the learning interval can be determined.
  • each recorded event sequence can be modeled as a character string which is composed from a customized alphabet, the alphabet representing each type of event by one or more alphabetic, alphanumeric or numeric characters, for example.
  • edit distance between two event sequence strings can be determined as a Levenshtein distance which computes the distance between two event sequence strings as the minimum number of simple edit operations, i.e., character insertions, deletions, and substitutions, that are required to convert one event string into another.
  • the computed edit distances among all the recorded event sequence strings forms an N-dimensional data space, with N being very large, and where N is the number of event sequence strings.
  • PCA Principal components analysis
  • Fig. 2A SCREE plot of factor scores derived from the first two principal components can reveal that most of the variability in the distance matrix can be accounted for by sequence scores on these principal components .
  • Results of PCA can include mapping the first two principal component scores for all the recorded event sequence strings as points in a two- dimensional space, as shown illustratively in Fig. 2B as the horizontal scale "PCl Scores" and the vertical scale "PC2 Scores" therein.
  • a spatial clustering algorithm e.g., a K-means algorithm can be applied to the first two principal components scores to determine clusters of similar event sequence strings, based on their proximity within the two-dimensional space.
  • the results of this analysis can be used in determining the content of clusters, and the boundaries between the clusters.
  • the boundaries between clusters 210 are illustratively depicted using dotted lines.
  • spatial clustering algorithms other than a K-means algorithm can be used to determine clusters and the signatures of respective clusters .
  • an event string ABCDEFGB is recorded when executing the target system program during a learning interval, the event string representing a sequence of events in which each event is indicated by a different letter of the alphabet, each event occurs in the order it is listed, and each different type of event is indicated by a different letter of the alphabet.
  • Another (second) recorded event string: ABCEFGC is similar, but not the same. Event “D”, which occurs in the first sequence, is absent from the second sequence. Also, event "C” now occurs as the final event in the second event string, rather than event "B”.
  • the clustering component 110 determines edit distances of each event string from each other event string observed during the learning phase, finds the corresponding principal components, and uses spatial clustering, e.g., K-means clustering, to determine cluster.
  • a signature can be determined which is representative of each respective cluster.
  • the signature can be determined as a longest common subsequence ("LCS") of the plurality of event sequences which belong to the cluster. For example, when the cluster includes event sequences (1) ABCDGGH; and (2) ABDGI, the longest common subsequence (“LCS”) is ABDG. Referring to Fig.
  • a method for analyzing the behavior of a target program in response to stimuli provided thereto.
  • the target program can be executed during a learning interval (block 310) .
  • a learning interval can be an interval other than an in-service interval in which the target program is being executed for use during normal operation.
  • stimuli can be provided to the target program
  • the stimuli can be such as described above, e.g., signals, keypad input, voice input, etc., which affect a result of executing the program, and which can affect occurrence of events which make up the behavior of the program.
  • the stimuli provided during execution of the target program are controlled so as to vary in ways which are exemplary of acceptable stimuli .
  • sequences of events which occur during the execution of the target program are recorded. Providing stimuli to exercise various functionality and recording event sequences can be repeated many times.
  • the determination of a plurality of clusters (block 340) based on the recorded sequences of events can be performed by a process of determining edit distances among event sequence strings representing the events, determining principal components scores, and then spatial clustering, e.g., K- means clustering to arrange the event sequences in clusters, and to set boundaries between clusters.
  • K- means clustering to arrange the event sequences in clusters, and to set boundaries between clusters.
  • a method can be performed for detecting an anomaly during in-service execution of a target program.
  • the method can be performed by a "Run Time Anomaly Detector" 20 (Fig. 1) provided for that purpose. Referring to Fig. 4, such method can be performed while a target program is being executed during an operating, i.e., in-service interval of operation (block 410) .
  • the method can be performed while the target system 10 (Fig.
  • stimuli are received by the program.
  • the stimuli can include signals or other input representing events or other occurrences relating to operation of the system.
  • the stimuli can include signals indicating when a particular telephone unit goes off-hook, when the telephone unit is dialing, the number dialed, whether a ring-back tone is active, for example, as well as many others .
  • event sequences which occur during operation of the target system 10 can be compared with the signatures of the respective clusters 210 (Fig. 2B) .
  • the edit distance between a current event sequence and each signature can be determined.
  • a "small" edit distance from the signature means an edit distance that falls within a quantile of the distribution that is relatively close to the signature.
  • that edit distance will not be a small edit distance.
  • an alarm indicating an anomaly is present can be displayed, printed or sounded audibly.
  • a system administrator who notices the alert can then take an appropriate action, e.g., enabling or disabling particular function of a system from which the alert originated.
  • the system administrator can isolate, suspend execution of, reset or shut down the system which generated the alert while a solution is determined.
  • a system 500 (Fig. 5) for learning the behavior of a program the system can include one or more additional main components which can operate together with the Clustering Component 110 to 'learn' acceptable behaviors of a program under test.
  • the system 500 can include three main components: the Clustering Component 110, having a function as described above, a Loop Linearization component 520, and a State Machine Inference component 530.
  • the Loop Linearization Component 520 can also participate in the processing of event sequences during the execution of the program. This component can reduce event sequences which include some repeated events into simplified representations.
  • the Loop Linearization Component 520 can recognize that the following sequence of events: IABDEGABDEGT contains the subsequence ABDEG, and that that subsequence is repeated twice.
  • the Loop Linearization Component 520 replaces the event sequence IABDEGABDEGT with the expression: I (ABDEG) 2T. Having simplified the expression for an event sequence that contains a repeated subsequence, the Clustering Component 510 can now determine that the edit distance between the simplified expression I (ABDEG) 2T and IABDEGT is much less than the edit distance between IABDEGABDEGT and IABDEGT. Moreover, other ways of simplifying expressions are possible. For example, an original event sequence IACDEGABDFGT can be replaced with the expression I (A (B I C) D (E I F)G) 2T.
  • a State Machine Inference Component 530 can be used to infer state information from the program under test by analyzing event sequences that occur while executing the program during the learning interval.
  • the problem of determining a regular expression from a given set of event sequences is the same as that of inferring a finite state machine (FSM) that accepts those sequences. Determining an appropriate solution to this problem is computationally hard, i.e., requiring unusually large amounts of computing resources. Determining a solution to this problem can also require determining and analyzing examples of sequences that should be rejected as input.
  • FSM finite state machine
  • the State Machine Inference Component 530 takes a practical approach to derive state information from data contained in the events themselves. To do so, the State Machine Inference Component 530 can discover and use clues about the internal "states" of a program from the events when they were emitted.
  • state information can be obtained from at least some events of the event sequences which occur during operation of such program.
  • some events contain a field that reports the status of telephone line or channel which is involved.
  • Such events can be recorded with description in a field using terms like "Ringing”, “Busy”, “Up”, or “Down”. Such field can provide a direct clue about the internal state of the corresponding phone line.
  • these states are not unique for a particular type of call; rather, the states are shared by phone lines involved in all types of calls: incoming calls, outgoing calls, internal calls, conference calls, interactive voice response ("IVR") sessions, etc. Therefore, it may be beneficial to further distinguish between such states based on the type of the call.
  • one feature of such program is that different types of calls can be handled by different parts of dial plans for the PBX system, the dial plans being, in essence, scripts which direct the PBX how to handle various types of calls.
  • an open source PBX for example, one type of event which can provide more information for the event record is "NewExtenEvent" .
  • This type of event can contain three fields: “context”, “extension”, and “priority”, which together provide further clues about which part of the scripts are responsible for generating an event.
  • the "context" field may directly name the command group in the dial plan that is now handling that call.
  • the "extension” field can identify the physical or logical keys that were dialed or pressed.
  • the "priority” field can identify the position of the current command in the command group that led to the generation of that event. Combinations of values in these fields, along with the values of the aforementioned channel status field can be used to derive states in the inferred finite state machine.
  • Fig. 6 is an inferred state diagram that illustrates a result of analysis performed by the State Machine Inference Component 530.
  • events of types selected from the group consisting of I, A, B, C, D, ⁇ , F, G, and T are recorded during the execution of the target program. The events do not necessarily occur in the order IABCDEFGT, nor do all such events usually occur in one sequence.
  • Fig. 6 shows a result of analysis that event A occurs only after either event I or event G, and that event A only occurs sometimes after event G.
  • the State Machine Inference Component also determines that event B and event C are alternative events which can occur only after event A, and that event E and event F are alternative events which can occur only after event D. Through analyzing relationships between events such as these, the State Machine Inference Component can determine the structure of an inferred state machine 600 (Fig. 6) which describes the operational states of the target program during execution.
  • the State Machine Inference Component 530 (Fig. 5) has determined that the program under test has eight different states. The states are listed 1, 2, 3, . . . , 8 in Fig. 6. Observed events I, A, B, C, D, E, F, G, and T are emitted when the program transitions from one state to another.
  • the Loop Linearization Component 520 (Fig. 5) can regularize expressions for event sequences that contain alternative expressions therein. The Loop Linearization Component 520 can determine from the inferred state machine 600 (Fig. 6) that two different types of events normally occur in the alternative.
  • events B or C normally only occur in the alternative following another different type of event, for example, event A.
  • events E or F normally only occur in the alternative following another different type of event, for example, event D.
  • the regular expression 532 produced by the State Machine Inference Component 530 which includes state information, i.e., correspondence between recordable events and the internal state of the programmed system, can then be provided to the Loop Linearization Component 520, as seen in Fig. 5.
  • the Loop Linearization Component 520 can produce simplified expression regarding recorded event sequences, i.e., "linearized event strings" 522, which then are provided to the Clustering Component 110, having function as described above.
  • Fig. 7 is a flow diagram illustrating a method of learning the behavior of a program in accordance with a variation of the above -described embodiment (Fig. 3) .
  • loop linearization is performed, such as described above with respect to Fig. 5.
  • Loop Linearization can be referred to as a method of finding "repeated substring structures in loops", as indicated in block 730.
  • the method illustrated in Fig. 7 can differ from the method described in Fig. 3 with the insertion of an additional block 730.
  • Block 730 relates to handling of event sequences which include repeated substring structures, e.g., subsequences which are the same or nearly the same and can be described by simplified expression.
  • an event sequence such as the sequence IABDEGABD ⁇ GT noted above, can be described with the simplified expression: I (ABDEG) 2T.
  • an original event sequence IACD ⁇ GABDFGT can be replaced with the expression I (A(B I C) D (E I F ⁇ G ⁇ 2T.
  • the rest of the actions performed in accordance with this variation can be the same as those described above with reference to Fig. 3.
  • the addition of the extra step 730 enables event sequences that have repeated substrings to be clustered together. If not for this step 730, these strings may be placed in different clusters, as edit distance is very sensitive to the lengths of the strings compared.
  • FIG. 8 illustrates an information processing apparatus 800 in accordance with an embodiment of the invention.
  • the information processing apparatus can include a central processing unit (CPU) 810 provided with a memory 820.
  • the CPU 810 may include a single processor or a plurality of processors arranged to execute instructions of a program in a parallel or semi-parallel manner.
  • An input output (I/O) interface 830 is be provided for inputting a program including instructions and data to the CPU 810 for execution of the instructions or portion thereof and for outputting the results of executing the instructions.
  • the I/O interface 830 may include an optical, magnetic or electrical scanning or reading function, for example, and may include one or more types of equipment for reading the contents of storage media.
  • Storage media can include, for example, but are not limited to a magnetic disk, magneto-optic disk, read/write and/or read only optical disc, tape, removable or non-removable disk drive and/or removable or non-removable memory, e.g., a semiconductor memory such as a memory card, and other sources of stored information that can be read optically, magnetically or electrically.
  • the I/O interface can include a network interface such as a modem or network adapter card for permitting transfer of information to and from a network.
  • the I/O interface 830 may include a display for outputting information (events and alarms) to and/or inputting information (stimuli) from a user.
  • a program containing a set of instructions to perform a method for learning the behavior of a target program can be stored in such storage medium.
  • a set of instructions in such program can be received as input 840 through the I/O interface 830 to the CPU 810.
  • a corresponding set of data to be operated upon by the instructions can also be received as input through the I/O interface 830.
  • the CPU can execute instructions relative to the corresponding data and provide output 850 to the I/O interface 830.
  • a program containing instructions for performing a method of learning a behavior of a target program can be stored on one or more removable storage media to be provided to the I/O interface 830, the instructions then being loaded into the CPU 810.
  • the program can be stored in a fixed system storage medium of a computer, e.g., a hard-disk drive memory, electronic memory system or other storage medium of the computer which is designed to be a permanent part of the computer, although such part may be replaceable when upgrading the computer with a different fixed storage medium or when repairing a malfunctioning storage medium.
  • a set of instructions included in the program can be received from a storage medium such as a memory of one or more computers or other storage devices of a network at a modem, network adapter or other device of the I/O interface 830 and received at the CPU 810.
  • the CPU 810 can then execute the instructions relative to a set of data provided to the CPU 810.
  • the instructions of a program used to learn the behavior of a target program can be executed by a processor relative to a data set which includes a multiplicity of event sequences recorded based on execution of the target program, to arrange the recorded event sequences in a plurality of clusters, and determine a plurality of signatures representative of the respective clusters, each signature being a benchmark representative of acceptable behavior of the target program.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Medical Informatics (AREA)
  • Quality & Reliability (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Debugging And Monitoring (AREA)

Abstract

L'invention porte sur un procédé assisté par ordinateur d'apprentissage du comportement d'un programme. Un processeur peut exécuter un programme cible durant un intervalle d'équipe en faisant varier une pluralité de stimuli délivrés au programme cible de façon à produire une multiplicité de séquences d'évènements différentes, qui diffèrent par des combinaisons de types d'évènements dans des séquences respectives, d'ordres selon lesquels les types d'évènements se produisent dans des séquences respectives, ou par des combinaisons et des ordres selon lesquels les types d'évènements se produisent. La multiplicité de séquences d'évènements peut être enregistrée, et un second programme peut être exécuté par un processeur pour déterminer une pluralité de groupes en fonction de similitudes entre les séquences d'évènements dans leur totalité.
PCT/US2010/030838 2009-04-13 2010-04-13 Apprentissage de comportement de programme pour une détection d'anomalie WO2010120737A1 (fr)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US16876509P 2009-04-13 2009-04-13
US61/168,765 2009-04-13
US12/694,806 2010-01-27
US12/694,806 US8522085B2 (en) 2010-01-27 2010-01-27 Learning program behavior for anomaly detection

Publications (1)

Publication Number Publication Date
WO2010120737A1 true WO2010120737A1 (fr) 2010-10-21

Family

ID=42982807

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2010/030838 WO2010120737A1 (fr) 2009-04-13 2010-04-13 Apprentissage de comportement de programme pour une détection d'anomalie

Country Status (1)

Country Link
WO (1) WO2010120737A1 (fr)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040216061A1 (en) * 2003-04-28 2004-10-28 International Business Machines Corporation Embeddable method and apparatus for functional pattern testing of repeatable program instruction-driven logic circuits via signal signature generation
US20060053422A1 (en) * 2004-09-07 2006-03-09 El Hachemi Alikacem Antipattern detection processing for a multithreaded application
US20070260950A1 (en) * 2006-02-16 2007-11-08 Morrison Gary R Method and apparatus for testing a data processing system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040216061A1 (en) * 2003-04-28 2004-10-28 International Business Machines Corporation Embeddable method and apparatus for functional pattern testing of repeatable program instruction-driven logic circuits via signal signature generation
US20060053422A1 (en) * 2004-09-07 2006-03-09 El Hachemi Alikacem Antipattern detection processing for a multithreaded application
US20070260950A1 (en) * 2006-02-16 2007-11-08 Morrison Gary R Method and apparatus for testing a data processing system

Similar Documents

Publication Publication Date Title
US8522085B2 (en) Learning program behavior for anomaly detection
US10902105B2 (en) Fraud detection in interactive voice response systems
KR100445599B1 (ko) 통신네트워크의부정사용검출방법및시스템
US9503571B2 (en) Systems, methods, and media for determining fraud patterns and creating fraud behavioral models
US8793131B2 (en) Systems, methods, and media for determining fraud patterns and creating fraud behavioral models
US20100146622A1 (en) Security system and method for detecting intrusion in a computerized system
US11445066B2 (en) Call data management platform
CN110113315A (zh) 一种业务数据的处理方法及设备
Chen et al. Log analytics for dependable enterprise telephony
US20090304162A1 (en) User authenticating method, user authenticating system, user authenticating device and user authenticating program
CN110061876B (zh) 运维审计系统的优化方法及系统
CN109478156A (zh) 用于将测试数据点重新分类为非异常的基于密度的装置、计算机程序和方法
KR102228021B1 (ko) 기계 학습에 기반한 불법호 검출 시스템 및 그 제어방법
CN109521889A (zh) 一种输入方法及装置、终端及存储介质
WO2010120737A1 (fr) Apprentissage de comportement de programme pour une détection d'anomalie
Chandrasekran et al. Adoption of future banking using biometric technology in automated teller machine (atm)
CN114598556B (zh) It基础设施配置完整性保护方法及保护系统
US20020166055A1 (en) Secure pin entry into a security chip
WO2016123758A1 (fr) Procédé et dispositif permettant de masquer des informations personnelles sur une interface d'appel
CA2129903C (fr) Dispositif a fuite pour la surveillance de processus industriels
US11900179B1 (en) Detection of abnormal application programming interface (API) sessions including a sequence of API requests
EP4160454A1 (fr) Systèmes et procédés mis en uvre par ordinateur pour l'identification et l'authentification d'applications
JP2007286656A (ja) ソフトウェア動作モデル化装置、ソフトウェア動作監視装置、ソフトウェア動作モデル化方法及びソフトウェア動作監視方法
CN118118223A (zh) 多方关联数据合谋行为识别模型构建方法、识别方法及装置
Kaplan et al. Just the Fax-Differentiating Voice and Fax Phone Lines Using Call Billing Data.

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 10764992

Country of ref document: EP

Kind code of ref document: A1

DPE1 Request for preliminary examination filed after expiration of 19th month from priority date (pct application filed from 20040101)
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 10764992

Country of ref document: EP

Kind code of ref document: A1