CN1860476B - Systems and methods for automated computer support - Google Patents

Systems and methods for automated computer support Download PDF

Info

Publication number
CN1860476B
CN1860476B CN2004800281029A CN200480028102A CN1860476B CN 1860476 B CN1860476 B CN 1860476B CN 2004800281029 A CN2004800281029 A CN 2004800281029A CN 200480028102 A CN200480028102 A CN 200480028102A CN 1860476 B CN1860476 B CN 1860476B
Authority
CN
China
Prior art keywords
snapshot
reference model
unusual
adaptive reference
assets
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN2004800281029A
Other languages
Chinese (zh)
Other versions
CN1860476A (en
Inventor
大卫·尤金·胡克斯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chorus Systems Inc
Original Assignee
Chorus Systems Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chorus Systems Inc filed Critical Chorus Systems Inc
Priority claimed from PCT/US2004/026186 external-priority patent/WO2005020001A2/en
Publication of CN1860476A publication Critical patent/CN1860476A/en
Application granted granted Critical
Publication of CN1860476B publication Critical patent/CN1860476B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Debugging And Monitoring (AREA)

Abstract

Systems and methods for creation and use of an adaptive reference model are described. One described method comprises receiving a plurality of snapshots from a plurality of computers, each of the plurality of snapshots comprising a plurality of pairs of asset names and asset values, and automatically creating an adaptive reference model based at least in part on the plurality of snapshots.

Description

Be used for the system and method that automated computer is supported
The cross reference of related application
The application requires to enjoy the U.S. Provisional Application submitted on August 11st, 2003 number 60/494,225 and name be called the U. S. application sequence number of herewith submitting to 10/916 of " system and method that is used to create and use adaptive reference model ", 800, the rights and interests of procurator's label 52270/302840, described two applications are whole to be incorporated herein by reference at this.
Technical field
The present invention relates generally to be used for the system and method that automated computer is supported.
Background technology
Along with the infotech complicacy continues to increase, the issue management cost can progressively raise, because support the frequency of incident to rise, and is provided with for manual analysis person's technical ability and requires to become more overcritical.Traditional issue management instrument is designed to reduce cost by increasing the man efficiency of carrying out these support tasks.This typically by make at least in part trouble ticket information catch robotization and by make the visit Knowledge Base finish.Although useful, such robotization has arrived the decreasing returns point, and the basic weakness in the support model itself---it is to artificial dependence because it fails to handle.
Table 1 has shown and traditional distributing based on the relevant cost of labor of the solution of the incident in the artificial support model.Data presented is by the main supplier MotiveCommunications Inc.of Austin of help desk software, Texas ( Www.motive.com) provide.The tip heigh project is and needs relevant those of manual analysis and/or mutual task (for example diagnose, investigate, solve).
Table 1:
The support task The % cost of labor
Simple and repeat problem (30%)
Desktop configure (user causes) 4%
Desktop environment (software fault) 9%
Network be connected 7%
ABC (problem) 10%
Complicated ﹠ dynamic problem (70%)
Tail over (identification user and support right) 7%
Diagnosis (analytic engine state) 11%
The support task The % cost of labor
Investigation (source of pinpointing the problems) 35%
Solve and reparation (seeing the user off) by reparation 18%
Being used for wide region that the traditional software solution of automation issues management attempts to reduce these costs and cross over service level adds and is worth.Forrester Research, Inc.of Cambridge, MA ( Www.forrester.com) the useful sign of these service levels is provided.ForresterResearch is divided into 5 service levels with traditional automated computer support solutions, comprising: (1) is concentrated and is recovered (Mass-Healing)---before taking place, incident solves; (2) recover (Self-Healing) automatically---when taking place, incident solves; (3) solution incident before automation services (Self-Service)---the customer call; (4) assistant service (Assisted-Service)---solve incident during customer call; And (5) table top visit (Desk-side Visit)---when every other failure, solve incident.According to Forrester, the cost of each incident that uses traditional automatic recovery service is less than 1 dollar.Yet if the visit of ultimate demand table top, cost progressively raises rapidly, reaches each incident more than 300 dollars.
Concentrating the target of recovering is to solve them before incident takes place.In traditional system,, guarantee on the minimum level that perhaps the problem of finding can not repeat, and realizes this target on a PC on any other PC by making all PC configurations identical.Typically relevant with this service level traditional product is made up of software distribution instrument and configuration management tool.Safety product such as anti-virus scan program, intruding detection system and data integrity check program is considered to the part of this level equally, takes place because they concentrate on the incident that prevents.
Attempt to handle the traditional product of this service level, be restricted to a spot of known correct configuration, and, operate by detecting and eliminate the known error configuration (for example virus signature) of relatively small amount by colony with management.The problem of this method is its supposition: (1) can know the configuration that all are correct and wrong in advance; And (2) in case they are known, and they keep relative stability.Along with computing machine and network system complicacy increase, the stability of any specific node is tending towards reducing in the network.Hardware and software on any specific node all may frequent variations.For example, many software products can use the software patch via internal network or access to the Internet to upgrade oneself automatically.Because there is the right and wrong configuration of unlimited amount, and because their frequent variations, so the automatic product that recovers of these tradition forever can not be more than part effectively.
Further, viral author continues the more and more clever virus of exploitation.Tradition virus detects with elimination software and depends on the ability of identification known mode to detect and to eradicate virus.Yet, along with the number and the complicacy of virus increases, keep the required resource of database known viruse and that be used for the repairing of those viruses, the required resource of node colony together with described repairing being distributed on the network becomes irresistible.In addition, the conventional P C of use Microsoft's Window operating system comprises the system file more than 7,000 and all is the many-valued registry entry more than 100,000.Therefore,, may there be the error condition of the correct status and the unlimited amount of unlimited amount, makes that the task of identification error state is complicated more for all actual purpose.
Automatically the target of recovery level is, before problem causes calling out Help Desk, ideally before the user understands that just problem exists, induction and from normal moveout correction they.Since the end of the eighties, at that time Peter Norton introduced a cover PC diagnosis and fix tool ( Www.Symantec.com), tradition recovery instrument automatically just exists.These instruments also comprise following instrument, and described instrument allows the user that PC is returned to new product recovery point set (restore point set) before is installed.Yet, do not have conventional tool under current conditions, to work well.
A difficulty that root problem is following aspect of these conventional tool: create reference model, it has enough scopes, the size of space and flexibility, distinguishes reliably to allow " normally " and " unusually ".What make that this problem increases is the following fact: the definition of " normally " must the often change along with latest software update and disposing application program.This is a difficult technological challenge, and is not captured by any conventional tool as yet.
The target of automation services level is, makes the final user can help themselves the automation tools and the set of knowledge base by providing, and reduces the portfolio that Help Desk is called out.Tradition automation services product is made up of the set of " ABC " knowledge base and software solution, and described software solution makes the low-risk such as the password that replacement is forgotten, the support function robotization of repetition.These traditional solutions have following significant unfavorable aspect: they have increased the possibility of self-inflicted infringement.Because this reason, they are limited to the problem and the application of particular type.
The target of assistant service level is, by being provided for the automation foundation facility of management service request, and by provide the Long-distance Control personal computer and with the mutual ability of final user, strengthen man efficiency.Tradition assistant service product comprises Help Desk software, online reference and remote control software.
Although perhaps the product of this service level is the most ripe in traditional product described here and the solution, they still fail fully to satisfy the needs of user and tissue.Especially, the ability of these product automatic diagnosis problems, aspect the type of the problem that can correctly discern, and diagnostic accuracy (common multiple selection) aspect, all seriously limited.
When all failing, it is necessary that desktop access just becomes when every other.This service level comprises any " on the spot " activity, and it may become and be necessary, with recover can not remote diagnosis/reparation computing machine.It also comprises tracking and manages these activities to guarantee timely solution.In all service levels, this level most possibly need from training highly and from but the plenty of time of expensive human resources.
The traditional product of this level is made up of special diagnostic tool and software product, and described software product is along with the past of time also crosses over a plurality of customer services representative tracking potentially and solves Customer Problems.
So, needed is exactly normal form conversion (paradigm shift), and it is necessary significantly to reduce supports cost.The appearance that new support model is used in this conversion characterizes, and in described new model, machine will serve as and be used to the main agency that determines and initiate to take action.
Summary of the invention
Embodiments of the invention provide and have been used for the system and method that automated computer is supported.A kind of method according to an embodiment of the invention comprises: receive a plurality of snapshots (snapshot) from a plurality of computing machines; Described a plurality of snapshots are stored in the data-carrier store; And, create adaptive reference model at least in part based on described a plurality of snapshots.In another embodiment, computer-readable medium (as for example random access memory or computer disk and so on) comprises the code of the method being used to carry out.
Mention that these embodiment are not in order to limit or stipulate the present invention, but understand helping for example that the embodiment of the invention is provided.Exemplary embodiment has been discussed in detailed description, and further instruction of the present invention is provided there.Can further understand the advantage that various embodiment of the present invention provides by checking this detailed description.
Description of drawings
When below reading with reference to the accompanying drawings, describing in detail, these that the present invention may be better understood and other feature, aspect and advantage, wherein:
Fig. 1 has shown the exemplary environment that is used to carry out one embodiment of the present of invention;
Fig. 2 is the block scheme that shows the information in one embodiment of the present of invention and the flow process of taking action;
Fig. 3 is the process flow diagram that shows the overall process of the abnormality detection in one embodiment of the present of invention;
Fig. 4 is the block scheme that shows the parts of the adaptive reference model in one embodiment of the present of invention;
Fig. 5 is the process flow diagram of the process of the registry information on the standardization client computer that shows in one embodiment of the present of invention;
Fig. 6 shows that being used in one embodiment of the present of invention discern the process flow diagram with the method for response abnormality;
Fig. 7 is a process flow diagram of being used in demonstration one embodiment of the present of invention discerning the unusual process of certain type;
Fig. 8 shows that being used in one embodiment of the present of invention generates the process flow diagram of the process of adaptive reference model;
Fig. 9 is the initiatively process flow diagram of the process of abnormality detection that is used for that shows in one embodiment of the present of invention;
Figure 10 is the process flow diagram that shows the course of reaction that is used for abnormality detection in one embodiment of the present of invention;
Figure 11 is a screenshot capture of being used in one embodiment of the present of invention creating the user interface of adaptive reference model;
Figure 12 is a screenshot capture of being used in one embodiment of the present of invention managing the user interface of adaptive reference model;
Figure 13 selects snapshot to be used to create the screenshot capture of the user interface of discerning filtrator being used in one embodiment of the present of invention;
Figure 14 is a screenshot capture of managing the user interface of identification filtrator being used in one embodiment of the present of invention;
Figure 15 selects " gold system " screenshot capture for the user interface of the usefulness of policy template being used in demonstration one embodiment of the present of invention; And
Figure 16 is the screenshot capture of the user interface that is used for selection strategy template assets in one embodiment of the present of invention.
Embodiment
Embodiments of the invention provide and have been used for the system and method that automated computer is supported.With reference now to accompanying drawing,, wherein spread all over the identical numeral of a plurality of accompanying drawings indication components identical, Fig. 1 is the block scheme that shows the exemplary environment that is used to carry out one embodiment of the present of invention.The embodiment that shows comprises robotization backup facility 102.Although robotization backup facility 102 is shown as individual facilities in Fig. 1, it can comprise a plurality of facilities, perhaps is incorporated into the place at management colony place.The robotization backup facility comprises fire wall 104, and it is communicated by letter with network 106, is used for providing safety to the data that are stored within the robotization backup facility 102.Robotization backup facility 102 also comprises collecting part 108.Collecting part 108 also provides a kind of mechanism except other features, be used to transmit the data of passing in and out of robotization backup facility 102.Convey program can use standard agreement, such as file transfer protocol (FTP) (FTP) or HTTP (HTTP), perhaps can use proprietary protocol.Collecting part also provides download, decompresses and resolves the required processing logic of snapshot of coming in.
The robotization backup facility 102 that shows also comprises analysis component 110, and it is communicated by letter with collecting part 108.Analysis component 110 comprises hardware and software, is used to carry out the adaptive reference model in this explanation, and adaptive reference model is stored in the database element 112.Analysis component 110 is extracted adaptive reference model and snapshot from database element 112, analyze snapshot in the environment of reference model, discern and filter any unusual, and transmission (one or more) response agent program (response agent) in due course.Analysis component 110 also provides user interface to system.
The embodiment that shows also comprises database element 112, and it is communicated by letter with collecting part 108 and analysis component 110.Database element 112 is provided for storing the means of data, and described data are from Agent, and is used for the process that the embodiment of the invention is carried out.The major function of database element can be storage snapshot and adaptive reference model.It comprises one group of database table and manages the required processing logic of those forms automatically.The embodiment that shows only comprises a database element 112 and an analysis component 110.Other embodiment comprise many databases and/or analysis component 112,110.An embodiment comprises a database element and a plurality of analysis component, allows a plurality of support staff to share the individual data storehouse, carries out parallel analysis task simultaneously.
Embodiments of the invention offer management colony 114 with the robotization support, and it can comprise a plurality of client computer 116a, b.Management colony provides data via network 106 to robotization backup facility 102.
In the embodiment that Fig. 1 shows, within machine 116a, the b of each supervision, dispose and act on behalf of parts 202.Act on behalf of parts 202 from client computer 116 gather datas.At interval (for example once a day) or response are acted on behalf of the detailed snapshot that parts 202 obtain the state of its resident machine from the order of analysis component 110 with preset time.This snapshot comprises following detailed inspection: all system files, the application file of appointment, registration table, performance counter, process, service, communication port, hardware configuration and journal file.The result of each scanning is compressed then and is transferred to collecting part 108 with the form of snapshot.
In the server that shows among Fig. 1, computing machine and the network components each all comprises processor and computer-readable medium.Such as well known to the skilled person, by with multiple function combinations in single computing machine, perhaps selectively, carry out individual tasks by using a plurality of computing machines, can dispose embodiments of the invention with countless ways.
The processor that embodiments of the invention use can comprise for example digital logical processor, and it can be handled input, execution algorithm and generate output when supporting according to process of the present invention in case of necessity.Such processor can comprise microprocessor, ASIC and state machine.Such processor comprise or can with the medium communication of for example computer-readable medium, the instruction that described medium memory is such when described instruction is carried out by processor, makes processor carry out step in this explanation.
The embodiment of computer-readable medium includes but not limited to electronics, optics, magnetic or other storages or transmitting device, and it can offer processor with computer-readable instruction, such as with processor that the touch sensible input media is communicated by letter.Other examples of suitable medium include but not limited to that the processor of floppy disk, CD-ROM, disk, storage chip, ROM, RAM, ASIC, configuration, all optical medium, all tape or other magnetic mediums or computer processor can be from any other media of reading command wherein.Equally, various other forms of computer-readable mediums can or transport instruction to the computing machine transmission, comprise router, special use or common network or other transmitting devices or channel, not only comprise wired but also comprise wireless.Instruction can comprise the code from any computer programming language, comprises for example C, C#, C++, Visual Basic, Java and JavaScript.
Fig. 2 is the block scheme that shows the information in one embodiment of the present of invention and the flow process of taking action.The embodiment that shows comprises and acts on behalf of parts 202.Act on behalf of parts 202 and be the part of the system that within the machine of each supervision, disposes.It can carry out 3 major functions.At first, it can be responsible for gather data.With preset time at interval, response by acting on behalf of the parts 202 detected incidents of being concerned about, acts on behalf of the extensive scanning that parts 202 can be carried out client computer 116a, b from the order of analysis component 110 or response.This scanning can comprise following detailed inspection: all system files, the application file of appointment, registration table, performance counter, hardware configuration, daily record, operation task, service, network connect and other related datas.The result of each scanning is compressed and is transferred to collecting part 108 with the form of " snapshot " via network 106.
In one embodiment, act on behalf of parts 202 and read each byte of wanting checked file, and be each document creation digital signature or hash (hash).Digital signature is discerned the definite content of each file, rather than the metadata such as size and date created is provided simply.Some tradition viruses change file header information, attempt to deceive the system that relies on the metadata that is used to detect.Such embodiment can successfully detect such virus.
The scanning of acting on behalf of 202 pairs of client computer of parts can be resource-intensive.In one embodiment, periodically, for example carry out full scan when user's time durations when using client computer not every day.In another embodiment, act on behalf of the increment scanning that parts 202 are carried out client computer, only write down the variation since the last scan.In another embodiment, when requiring, act on behalf of parts 202 and carry out scanning, provide valuable instrument to the unusual technician or the support staff that attempt to repair on the client computer.
Second major function that agency 202 carries out is behavior blocking-up (behavior blocking).Agency 202 frequently (or basically frequently) monitor visit to the critical system resources such as system file and registration table.It can optionally block the visit to these resources in real time, destroys to prevent Malware.Take place on ongoing basis although behavior monitors, the behavior blocking-up can be as the part of repairing action.For example, exist if analysis component 110 is suspected virus, then it can download the reparation action, so that act on behalf of the key message resource within the system of blocking virus Access Management Access.Act on behalf of parts 202 and provide the part of information as snapshot from monitoring process.
The 3rd major function acting on behalf of parts 202 execution provides and is used for response agent program implementation environment.The response agent program is mobile software part, and it carries out automation process to handle various types of fault state.For example, if analysis component 110 suspects that virus exists, then it can download the response agent program, removes suspicious assets so that act on behalf of parts 202 from the system of management.Act on behalf of parts 202 and can be used as service on the computing machine that just is being monitored or the operation of other background process.Because the range of information that embodiments of the invention provide and the size of space, and compare with traditional system, can more accurately carry out reparation.Although be illustrated, manage the computing machine that colony 114 can comprise PC workstation, server or any other type according to client computer.
The embodiment that shows also comprises adaptive reference model parts 206.Set up robotization and support a difficult technologies challenge in the product to be, create can be used in the reference model of between normal and pathological system state, distinguishing.The system state of modern computer determined by many many-valued variablees, thereby and has in fact an almost normal and abnormality of unlimited amount.What make things worse is that these variablees are along with disposing new software upgrading and changing continually along with final user's communication.The snapshot that adaptive reference model 206 among the embodiment that shows is analyzed from many computing machines, and use the general data refinement algorithm or the exclusive data that design for this purpose specially to refine algorithm, discern the statistically significant pattern.Consequent rule set is abundant (hundreds and thousands of rules) extremely, and are customized to the specific characteristic of management colony.In the embodiment that shows, set up the process full automation of new reference model, and can periodically be carried out, change with the expection that allows model to be suitable for such as the plan of software upgrading is disposed.
Because adaptive reference model 206 is used to analyze the statistically significant pattern from machine colony, so in one embodiment, the machine of analyzing minimum number is to guarantee the accuracy of statistical measurement.In one embodiment, test the minimum colony of nearly 50 machines, to realize being used for the relevant pattern of systematicness of analytic engine.In a single day with reference to being established, sample just can be used in determines whether any unusual thing takes place within any member of whole colony or colony.
In another embodiment, analysis component 110 is calculated one and is formed ripe specification (maturitymetrics), and it makes the user can determine when that the sample of sufficient amount has been accumulated so that analysis accurately to be provided.The number percent corresponding to the available relationship at each level place of the model of the preassigned of various confidence levels (for example high, neutralization is low) has been satisfied in these ripe specifications indications.In such embodiment, the user monitors described specification, and guarantees that enough snapshots have been absorbed assimilation to create ripe model.In another such embodiment, analysis component 110 absorption and assimilation samples are till it reaches the ripe object set of consumer premise.In arbitrary such embodiment, the sample of unnecessary absorption and assimilation some (for example 50).
The embodiment that shows among Fig. 2 also comprises policy template parts 208.Policy template parts 208 allow the service provider with the form of " strategy " rule to be inserted in the adaptive reference model by hand.Strategy is the combination of attribute (file, registry entry or the like) and value, when it is provided for model, and the part of the information that the statistics ground in the overlay model generates.This mechanism can be used in and makes multiple common maintenance activity robotization, such as check to the obedience of security strategy and check with guarantee appropriate software upgrade be mounted.
When certain fault took place computing machine, it influenced some different information assets (file, registry entry or the like) usually.For example, " Troy " may install the malice file, adds certain registry entry guaranteeing carrying out those files, and opens the port that is used to communicate by letter.By will comparing from the standard that comprises in the snapshot of infected machine and the adaptive reference model, the embodiment that shows among Fig. 2 with these undesirable change-detection for unusual.Unusually be defined as unexpected assets, the unexpected non-existent assets that exist or have the assets of unknown-value.The storehouse coupling of contrast identification filtrator 216 is unusual.Identification filtrator 216 comprises unusual AD HOC, and it indicates the existence of specific basic reason situation or general category situation.Identification filtrator 216 also interrelates with situation and seriousness indication, textual description and to linking of response agent program.In another embodiment, identification filtrator 216 can be used in identification and explains optimum unusual.For example, if the user adds the new application program that the keeper be sure of can not cause any problem, then according to system of the present invention still described new application program can be reported as one group unusual.If application program is new, so the assets of adding being reported as is correct unusually.Yet it is optimum with the interpretation of anomaly that produces by the interpolation application program that the keeper can use identification filtrator 216.
In an embodiment of the present invention, some attribute relates to continuous process.For example, performance data is made up of various counters.The variety of event that these counter meterings take place during specific time limit.Whether normal for the value of determining such counter if crossing over colony, one embodiment of the present of invention calculating mean value and standard deviation.If the value of counter deviates from standard deviation more than some from mean value, then statement is unusual.
In another embodiment, mechanism is handled the situation that adaptive reference model 206 absorption and assimilation comprise unusual snapshot.In case model reaches the ripe level of expection, it just experiences the unusual process that may be absorbed assimilation of removing.These unusual isolated exceptions as strong relation are visible in ripe model.For example, if file A occurs in 999 machines together with file B, but file A exists and file B loses in 1 machine, and then process can suppose that the latter concerns unusually, and can be with its removal from model.When model is used to check subsequently, any include file A but the machine of include file B all can not be marked as unusually.
The embodiments of the invention that show among Fig. 2 also comprise response agent routine library 212.Response agent routine library 212 permission service providers create and store the robotization response for certain fault conditions.These robotizations respond structure from the script set, and it can be scheduled to the machine of management, to carry out the action as alternate file or change registry value.In case analyzed fault state and defined the response agent program, just should automatically proofread and correct any same fault situation that takes place subsequently.
Fig. 3 is the process flow diagram that shows the overall process of the abnormality detection in one embodiment of the present of invention.In the embodiment that shows, act on behalf of parts (202) on the basis in cycle, for example carry out snapshot 302 once a day.This snapshot comprises the collection lot of data, and can obtain Anywhere, and ground is carried out from a few minutes to the several hrs, and this depends on the configuration of client computer.When scanning was finished, the result was compressed, formats and is transferred to the security server 304 that is called as collecting part with the form of snapshot.Collecting part serves as the central repository that is used for from all snapshots of management colony submission.Each snapshot is decompressed, resolves and be stored in the various forms of database by collecting part then.
Measuring ability (218) uses the data of storage in the adaptive reference model parts (206), checks the content of snapshot to contrast hundreds of statistical dependence relation, and described statistical dependence relation is known to be normal 308 for this management colony.If no abnormal 310, then process finishes 324.
If note abnormalities 310, consulting identification filtrator (210) then is to determine whether to mate unusually any known situation 312.If answer is for being, so according to the status report of having been diagnosed unusual 314.Otherwise, be unrecognized unusual 316 with exception reporting.Identification filtrator (216) also indicates for the situation of this particular type whether authorized robotization response 318.
In one embodiment, identification filtrator (216) can be discerned and merge multiple unusual.Analyze whole snapshot and detected all relevant with this snapshot unusual after, execution will discern filtrator and the process of mating unusually.If find coupling between unusual subclass and identification filtrator, then in output stream, the title of identification filtrator can be relevant with unusual subclass.For example, Bing Du existence may generate that one group of file is unusual, process exception and registration table are unusual.The identification filtrator can be used in merging, and these are unusual, so that the user can see descriptive title simply, it unusually all is associated with possible common cause with all, that is virus.
If the robotization response is authorized to, response agent routine library (212) is just downloaded the appropriate responsive Agent and is given infected machine 320 so.The parts (202) of acting on behalf of in the infected machine are just carried out the required script sequence 322 of correction fault state then.The process that shows finishes 324 then.
Embodiments of the invention have significantly reduced the cost of safeguarding personal computer and server colony.By detecting automatically before progressively being elevated to Help Desk at fault state and proofread and correct them, and by providing diagnostic message to support the analyst to solve the required time of any problem of not handled automatically to shorten, an embodiment has finished this target.
Anything of minimizing incident occurrence frequency supports all have remarkable positive influences aspect the cost at computing machine.One embodiment of the present of invention monitor and regulate the state of the machine of management, so that it more has resistibility to threatening.The usage policy template, the service provider can monitor the security postures of the system of each management daily, automatically regulates security set and install software and upgrades to eliminate known weakness.
In based on artificial support model, fault state is detected by the final user, reports to Help Desk, and is diagnosed by human expert.This process increases cost in a number of ways.At first, when final user etc. is to be solved, exist and the relevant cost of losing of throughput rate.Equally, the cost that has usually the data aggregation of carrying out by the Help Desk personnel.In addition, there is diagnosing cost, described diagnosis needs trained (costliness) support analyst's service.In contrast, the fault state that the support model based on machine of execution is automatically responded to, reported and diagnoses many softwares to be correlated with according to the present invention.The adaptive reference model technology makes detection and impossible variation with susceptibility and accuracy in the past with the unusual condition under the diversity situation extremely become possibility.
In one embodiment of the invention, in order to prevent false positive, can configuration-system under various confidence levels, operating, and use the identification filtrator to leach to be known as optimum unusual.The identification filtrator can also be used to warn the service provider to have the software of the undesirable or malice of particular type.
In legacy system, usually by manually solving the computing machine incident by using a series of repetition test reparation action.These are repaired action and are tending towards becoming " sledgehammer " mutation, that is influence is planned the solution of the fault state proofreaied and correct more than their far away.Multiple selection repair process and sledgehammer solution are the sources of the result and the unnecessary cost of insufficient understanding problem.Because system according to the present invention makes data fully characterize problem, so it can reduce rehabilitation cost in two ways.At first, if the identification filtrator is defined, it specifies the robotization response that needs, and so just can solve incident automatically.The second, if repair impossiblely automatically, then the diagnosis capability of system has been eliminated based on conjecture intrinsic in the artificial repair process, has reduced the execution time and has allowed bigger degree of accuracy.
Fig. 4 is the block scheme that shows the parts of the adaptive reference model in one embodiment of the present of invention.Fig. 4 only is exemplary.
The embodiment that shows among Fig. 4 has shown the adaptive reference model 402 of multilayer, monotubular storehouse (single-silo).In the embodiment that shows, silo 404 comprises 3 layers: value layer (valuelayer) 406, bunch layer (cluster layer) 408 and profile layer (profile layer) 410.
The right value of assets/value that parts (202) provide of acting on behalf of in this explanation of the management colony (114) of value layer 406 tracing figure 1.When snapshot is compared with adaptive reference model 402, the right value part of each assets/value that value layer 406 estimation of adaptive reference model 402 wherein comprise.This estimation comprises whether any assets value of determining in the snapshot has violated the statistically significant pattern of assets value within the management colony, as by adaptive reference model 402 expressions.
For example, agency (116b) transmits snapshot, and it comprises the digital signature that is used for the particular system file.(when constructing adaptive reference model) during the absorption and assimilation process, model value that it runs into for each Asset Name record and the number of times that runs into this value.So, for each Asset Name, model is known " legal " value that it has been seen in colony.When model was used to check, whether the value that value layer 406 is determined each attribute in the snapshots one in " legal " value in the Matching Model.For example, under the situation of file, some " legal " value is possible, because may there be the different editions of file in management colony.If model comprises the compatible one or more file value of statistics, and snapshot comprises and any model that do not match in file value, then statement is unusual.Model can also detect the situation that does not have " legal " value for attribute.For example, journal file does not just have legal value, because they change continually.If there is no " legal " value so during checking will be ignored the property value in the snapshot.
In one embodiment, adaptive reference model 402 operative norms are unusual to guarantee certain unusually, and are not only new file variant.Standard can comprise confidence level.Confidence level does not stop unique file to be reported as unusually.The relation that confidence level is used in the model during with checking process is restricted to those relations that satisfy certain standard.The standard relevant with each level is designed to realize certain statistical probability.For example, in one embodiment, the standard that is used for high confidence level is designed to realize the statistical probability greater than 90%.If specify lower confidence level, in checking process, just comprise so not as the reliable other relation of statistics.Consider feasible, but the process of less possible relation is similar to when we need determine and artificial supposition process can not allow all information that we be sure of the time.In the environment that continue to change, the keeper may wish to leach relevant with low confidence level unusually, that is the keeper may wish to eliminate false positive as much as possible.
In the embodiment that carries out confidence level, if certain fault takes place in the user report machine, it is any unusual that but the keeper can not see under the acquiescence confidence level, then the keeper can reduce confidence level, makes analytic process can consider to have low statistical significance and unheeded relation under higher confidence level.By reducing confidence level, the keeper allows adaptive reference model 402 to comprise following pattern, and described pattern may not have enough samples becoming statistically significant, is and so on clue but may provide about problem.In other words, the keeper is allowing machine to infer.
In another embodiment, if after the snapshot of absorption and assimilation specified quantity, the assets value fails to show any stable pattern, and then value layer 406 is eliminated the assets value from adaptive reference model 402.For example, many application programs generate journal file.The value of journal file often changes and is seldom identical from machine to machine.In one embodiment, estimate these file value at first, after the estimation of specified quantity, they are eliminated from adaptive reference model 402 then.By eliminate the file value of these types from model 402, the unnecessary comparison during the testing process 218 has been eliminated by system, and by cropping low value information, has reduced the database storing demand.
Embodiments of the invention are not limited to eliminate the assets value from adaptive reference model 402.In one embodiment, process is equally applicable to Asset Name.Some Asset Name is " from birth unique ", that is they are unique to particular machine, but they are secondary products of normal running.The process of separating in one embodiment, is handled unsettled Asset Name.The Asset Name that this procedure identification among such embodiment is unique from birth, and allow them to rest in the model, so that they are not reported as unusually.
Second layer that shows among Fig. 4 is bunch layer 408.The relation that bunch layer 408 is followed the tracks of between the Asset Names.Asset Name can be applied to multiple entity, comprises filename, registration table key name, port numbers, process name, Service name, performance counter name or hardware characteristics.When having one group of specific Asset Name in tandem on the machine in management colony (114) usually, if some described Asset Name groups do not exist, then bunch layer 408 is can mark unusual.
For example, the many application programs on the computing machine of execution Microsoft's Window operating system need a plurality of dynamic link libraries (DLL).Each DLL can depend on one or more other DLL usually.If first DLL exists, other DLL also must exist so.Bunch layer 408 is followed the tracks of this dependence, and if among the DLL one lose or be changed, then bunch layer 408 alert management person have been taken place unusually.
The 3rd layer in the adaptive reference model 402 that shows among Fig. 4 is profile layer 410.Profile layer 410 among the embodiment that shows detects unusual based on the violation of bunch relation.Have two types relation, in conjunction with (bunch together occur) and exclusive (bunch from appearance not together).Profile layer 410 allows adaptive reference model to detect not by detected assets of losing of bunch layer and the conflict between the assets.Profile layer 410 determines which bunch has strong combination and exclusive relation each other.In such embodiments, if do not detect specific bunch in snapshot, wherein, owing to have the existence of other bunches of strong marriage relation with it, estimate under the normal condition that its can be there, profile layer 410 will this bunch so does not exist and detects to unusually.Equally, if detect bunch, wherein, estimate under the normal condition that it can be not there in snapshot, because there are other bunches that have his relation of forced-ventilated with it, profile layer 410 detects the existence of first bunch for unusually so.It is undetectable unusual that profile layer 410 allows adaptive reference model 402 to detect under the reduced levels of silo 404.
Can be with the adaptive reference model 402 that shows in the well-known variety of way execution graph 4 of those skilled in the art.By the processing of optimization adaptive reference model 402, and by enough processing and storage resources are provided, embodiments of the invention can be supported the management colony and the individual client of unlimited amount.The use of model all comprises the comparison of hundreds of Property Name and value in the absorption and assimilation of new model and the inspection.It relatively is very harsh Processing tasks that the text string that use is used for title and value is carried out these.Each unique character string in the snapshot of coming in one embodiment of the invention, all is endowed integer identifiers.Use integer identifiers rather than character string to carry out comparison then.Because compare with the long character string relevant with filename or registry entry name, computing machine can compare integer fasterly, so greatly strengthened treatment effeciency.
Adaptive reference model 402 relies on from the data of acting on behalf of parts (202).Illustrated above and acted on behalf of the functional of parts (202) that it is user interface in one embodiment of the present of invention and the feature summary of acting on behalf of parts (202).
The client computer that embodiments of the invention can be crossed in the management colony compares registry entry.A difficulty in the different machines comparison registry entry of leap operation Microsoft's Window operating system is derived from the use of Globally Unique Identifier (" GUID ").The GUID that is used for specific project on the machine may be different from the GUID that is used for same project on second machine.Therefore, embodiments of the invention provide and have been used for the mechanism of the GUID that standardizes for purpose relatively.
Fig. 5 is the process flow diagram of the process of the registry information on the standardization client computer that shows in one embodiment of the present of invention.In the embodiment that shows, at first GUID is returned synthetic two groups 502.First group is used for crossing over the GUID of the machine not exclusive (repetition) of managing colony.Second group comprises crosses over the unique GUID of machine, that is identical item has different GUID on the different machines within the management colony.Next step classification is used for the item 504 of second group.In this way, can discern the relation between two or more within the same machines.Purpose is for such relation of standardizing in the following manner, and described mode will allow them to cross over a plurality of machines to compare.
Next step value establishment hash 506 in be of the embodiment that shows.This has created unique signature for other values that comprise in all titles, pathname and the item.Replace GUID 508 with hash then.In this way, within machine, kept uniqueness, but identical hash occurs in each machine, so that can discern relation.Relation allows unusual within the adaptive reference model identification management colony.
For example, traditional virus usually changes registry entry, so that infected machine can move the executable file of transmitted virus.So embodiments of the invention are because its ability of standardization registry entry can be discerned the variation of registration table in one or more machines of colony.
Fig. 6 shows that being used in one embodiment of the present of invention discern the process flow diagram with the method for response abnormality.In the embodiment that shows, the processor such as collecting part (108) receives a plurality of snapshots 602 from a plurality of computing machines.Although following discussion is to be carried out by analysis component (110) with the process prescription that shows among Fig. 6, any suitable processor can be carried out the process of demonstration.A plurality of snapshots can comprise few to two snapshots from two computing machines.Selectively, a plurality of snapshots can comprise number in dried snapshot.Snapshot comprises about wanting the data of the computing machine in the checked colony.For example, can from the computing machine of local area network communication of tissue each receive a plurality of snapshots.Each snapshot comprises the right collection of assets/value, and described assets/value is to having represented the state of computing machine at particular point in time.
Along with collecting part (108) receives snapshot, its store they 604.The storage snapshot can comprise they are stored in the data-carrier store, such as database (112) or storer (not shown).Can be temporarily or for good and all store snapshot.Equally, in one embodiment of the invention, whole snapshot is stored in the data-carrier store.In another embodiment, only storage from before the part snapshot (that is increment snapshot) that changes of version.
Analysis component (110) is used the data creation adaptive reference model 606 in a plurality of snapshots.In the snapshot each all comprises a plurality of assets, and it comprises a plurality of paired Asset Names and assets value.Assets are attributes of computing machine, such as filename, registration table key name, performance parameter or communication port.Assets have reflected the actual or virtual computer mode within the computing machine colony that analyzes.The assets value is the states of assets at particular point in time.For example, for file, value can comprise the MD5 hash, the content of its expression file; For registry entry, value can comprise text string, and the data of item are given in its expression.
Adaptive reference model comprises a plurality of assets equally.The assets of adaptive reference model can be compared with the assets of snapshot, to discern unusual and for other purpose.In one embodiment, adaptive reference model comprises the collection of data, and described data are about the various relations between the assets of the one or more normal computing machines of sign of particular point in time.
In one embodiment, analysis component (110) identification Asset Name bunch.The one or more non-overlapped group that bunch comprises the Asset Name that occurs together.Relation between analysis component (110) can attempt to discern bunch equally.For example, analysis component (110) can the calculating probability matrix, and it considers the existence of specific clusters in the snapshot, the possibility of any other bunch existence in the prediction snapshot.Based on a large amount of snapshots and or the probability of very high (for example greater than 95%) or very low (for example less than 5%) can use to detect unusual by model.Based on a small amount of snapshot (that is not being the quantity of statistically significant) or neither very high neither low-down probability be not used in detect unusual.
Adaptive reference model can comprise puts the beacon standard, is used to determine when that relation can be used in the test snapshot.For example, putting beacon will definitely be to comprise the minimum threshold that is used for some snapshots that adaptive reference model comprises.If do not surpass threshold value, then can not use relation.Adaptive reference model can or instead comprise the minimum threshold that is used for some snapshots that adaptive reference model comprises equally, and described adaptive reference model comprises relation, has only and just use relation when surpassing threshold value.In one embodiment, adaptive reference model comprises max-thresholds, and its quantity that is used for different assets values is to the ratio of the snapshot quantity that comprises described assets value.Adaptive reference model can comprise one or more minimums and the max-thresholds relevant with the digital asset value.
In a plurality of assets in adaptive reference model or the snapshot each can be relevant with Asset Type.Asset Type can comprise process, daily record and the communication port of for example file, registry entry, performance measurement, service, hardware component, operation.Embodiments of the invention can also use other Asset Type.For conserve space, can compress Asset Name and assets value.For example, in one embodiment of the invention, occur the first time of Asset Name or assets value in one of a plurality of snapshots of collecting part (108) identification, and generation relevant identifier occurs for the first time with this.Subsequently, if occur the second time of collecting part (108) identification Asset Name or assets value, then collecting part (108) is associated identifier with second Asset Name and assets value.Identifier and Asset Name or assets value can be stored in the index then, and have only identifier to be stored in adaptive reference model or the snapshot in the mode of data.In this way, the frequent Asset Name that repeats of storage or be worth required space and be minimized.
Can generate adaptive reference model automatically.In one embodiment, adaptive reference model is generated automatically, is revised with explanation technical support personnel or other knowledge by manual then.Figure 11 is a screenshot capture of being used in one embodiment of the present of invention creating the user interface of adaptive reference model.In the embodiment that shows, the user moves to the task machine window 1104 from machine choice menus window 1102 and selects them by being included in snapshot in the model.When the user finishes selection course and clicks when finishing button 1106, the robotization task just is created, and it causes model to generate.In case model is created, the user just can use another interface screen to manage it.Figure 12 is a screenshot capture of being used in one embodiment of the present of invention managing the user interface of adaptive reference model.
Refer again to Fig. 6, in case adaptive reference model is created, analysis component (110) just compares 608 with in a plurality of snapshots at least one with adaptive reference model.For example, collecting part (108) can receive and store 100 snapshots in database element (112).Analysis component (110) is used described 100 snapshot creation adaptive reference models.Analysis component (110) begins each snapshot in a plurality of snapshots is compared with adaptive reference model then.Collecting part (108) can receive 100 new snapshots from acting on behalf of parts after a period of time, and it can use with the revision and the identification that generate adaptive reference model unusual by analysis component then.
In one embodiment of the invention, one or more snapshots comprise the relation of checking between the Asset Name with comparing of adaptive reference model.For example, when second Asset Name existed, the probability that first Asset Name exists may be very high.In one embodiment, describedly compare to comprise to determine whether all Asset Names all are present within the adaptive reference model in the snapshot, and whether with Asset Name between a plurality of high probabilities relations consistent.
Still with reference to figure 6, in one embodiment, analysis component (110) is snapshot and adaptive reference model relatively, so as may to exist on the identification computing machine any unusual 610.Certain part of the lucid and lively photograph of exception table has deviated from normal as adaptive reference model definition.For example, Asset Name or value can deviate from normal assets name of expecting and the assets value as the adaptive reference model definition under particular condition.Unusually perhaps, perhaps do not send signal, claim with computing machine that snapshot is associated on or relate to described computing machine and have known or new fault or situation of problem.Situation is one group of be correlated with unusual.For example, can be correlated with unusually for one group, because they result from single basic reason.For example, when application-specific is not present on other computing machines within the given colony usually, can indicate the existence on computers of this application program unusually.Unusual identification can also be used for the function such as capacitance balance.For example, by estimating the performance measurement of several servers, analysis component (110) can determine when automatic deployment that triggers new server and the demand that is configured to processing variation.
Situation comprises one group of be correlated with unusual.For example, can be correlated with unusually for one group, because they result from single basic reason, such as set up applications or exist " worm ".Situation can comprise the situation class.The situation class allows various situation combinations with one another.
In the embodiment that Fig. 6 shows, if note abnormalities, then analysis component (110) is attempted with unusual and identification filtrator coupling, so that diagnosis situation 612.Unusually can be identified as optimum unusually,, that is so that avoid because hide real fault state as course of normal operation result's unusual existence so that eliminate noise during analyzing.Inspection is comparing of snapshot and adaptive reference model.Inspection can automatically perform.The output of checking can comprise one group of unusual and situation that has been detected.In one embodiment, will unusual and a plurality of identification filtrator couplings.The identification filtrator comprises the signature of situation or situation class.For example discern filtrator and can comprise the paired Asset Name and the collection of value, it represents the signature of the situation of the hope identification such as the existence of worm when being brought together.The ordinary recognition filtrator can be provided for creating the template of more special filtrator.For example, the identification filtrator that is suitable for searching for general worm can be suitable for searching for special worm.
In one embodiment of the invention, the identification filtrator comprises following at least one: the combination of the Asset Name relevant with situation, the assets value of being correlated with situation, the Asset Name of being correlated with situation and assets value, the max-thresholds relevant with situation with the assets value and the minimum threshold of being correlated with assets value and situation.From the Asset Name/value of snapshot to can with the name/value of coming the self-identifying filtrator to comparing, to find coupling and diagnosis situation.Name/value coupling can be definite, perhaps discerns filtrator and can comprise asterisk wildcard, allows the part value to enter into the identification filtrator, mates with snapshot then.Specific Asset Name and/or value can be mated with the diagnosis situation with a plurality of identification filtrators.
Can create the identification filtrator in every way.For example, in one embodiment of the invention, the user duplicates unusual from the machine that has the situation of being concerned about.Unusually can be submitted to and to gather unusually, gather unusually from described, they can be selected and copy to filtrator.In another embodiment, the user imports asterisk wildcard in filter definition.For example, one section spyware that is called Gator generate thousands of with character string " hklm software gator " registry entry of taking the lead.Embodiments of the invention can provide asterisk wildcard mechanism with this situation of effective processing.Asterisk wildcard can for example be a percentage sign (%), and can be used in before the text string, after the text string or in the middle of the text string.Continue the example of Gator, if the user in filter body input of character string " hklm software gator % ", so filtrator will discern any with " hklm software gator " take the lead.The user may wish to be configured to the filtrator of the situation that do not experience as yet in management colony.For example, be used for the filtrator of virus, described virus is based on the publicly available information on the Internet, rather than viral practical examples within the management colony.In order to handle this situation, the user directly is input to relevant information in the filtrator.
Figure 13 selects snapshot to be used to create the screenshot capture of the user interface of discerning filtrator being used in one embodiment of the present of invention.The snapshot of the screenshot capture that user capture shows to select to be used is to create the identification filtrator.Figure 14 is a screenshot capture of creating or edit the user interface of identification filtrator being used in one embodiment of the present of invention.In the embodiment that shows, the assets of the snapshot of selecting in the interface displayed in data source window 1402, having shown from Figure 13.The user selects these assets and they is copied to source window 1404, to create the identification filtrator.
In one embodiment, the coupling of identification filtrator and one group between unusual is relevant with mass measurement.For example, compare with the coupling of one group of unusual middle Asset Name and assets value with the subclass of Asset Name and assets value in the identification filtrator, the definite coupling of all Asset Names and assets value and one group of unusual middle Asset Name and assets value can be relevant with higher mass measurement in the identification filtrator.
The identification filtrator also can comprise other attributes.For example, in one embodiment, the identification filtrator comprises control mark, is used for determining whether to comprise the Asset Name and the assets value of adaptive reference model.In another embodiment, the identification filtrator comprises the one or more textual descriptions relevant with one or more situations.In yet another embodiment, the identification filtrator comprises the seriousness indicator, and it indicates the seriousness of situation according to for example it may cause great destruction, removal to have many difficulties or certain other suitable measurement.
The identification filtrator can comprise such field, and described field comes down to manage.For example, in one embodiment, the identification filtrator comprises identification filtrator identifier, founder's title and update date-time.
Still with reference to figure 6, next step responsive status 614 of analysis component (110).Responsive status for example can comprise: to the notice of supporting technology person's generation such as Email; Submit trouble ticket to problem-management system; Ask for permission to take action, for example ask from supporting technology person's affirmation so that patch to be installed; And remove described situation in from a plurality of computing machines at least one.Removing described situation for example can comprise to make in response agent program any one in a plurality of computing machines that infected by described situation and carry out.Situation can be with relevant from dynamic response.Can repeat to diagnose 612 and the step of responsive status 614 to each situation.Equally, for each independent snapshot process of 610 that can repeat to note abnormalities.
In the embodiment that Fig. 6 shows, next step determines whether other snapshot wants analyzed 616 analysis component (110).If like this, repeat following steps for each snapshot so: snapshot compares 608 with adaptive reference model; Note abnormalities 610; Unusual and identification filtrator mates with diagnosis situation 612; And respond described situation 614.In case all snapshots are all analyzed, process just finishes 618.
In one embodiment of the invention, in case analysis component (110) has been discerned situation, analysis component (110) just attempts determining within the colony that in a plurality of computing machines which infected by described situation.For example, analysis component (110) can check that snapshot is to discern the unusual of specific group.On behalf of each infected computing machine, analysis component (110) can make then the response of described situation is carried out.For example, in one embodiment, act on behalf of parts (202) and reside in a plurality of computing machines each.Act on behalf of parts (202) and generate the snapshot of estimating by analysis component (110).In such embodiment, if analysis component (110) has recognized situation on a computing machine, then analysis component (110) is used and is acted on behalf of parts (202) execution responder.In the diagnosis situation, analysis component (110) perhaps, perhaps can not be discerned the basic reason of situation.
Fig. 7 is a process flow diagram of being used in demonstration one embodiment of the present of invention discerning the unusual process of certain type.In the embodiment that shows, analysis component (110) is estimated the snapshot 702 of a plurality of computing machines.These snapshots can be basic snapshots, and it comprises the good working condition of computing machine, or the increment snapshot, and it has comprised the variation of the computer mode since last basic snapshot.Analysis component (110) is used snapshot creation adaptive reference model 704.Notice that when using the increment snapshot for this purpose, analysis component must at first be applied to the equivalent that nearest basic snapshot is reformulated basic snapshot by the variation that will describe in the increment snapshot.Analysis component (110) receives second snapshot (basic or increment) 706 in a plurality of computing machines at least one subsequently.Can be based on variety of event, go over, the installation or certain other suitable incident of new procedures, create snapshot such as the time of scheduled volume.
Analysis component (110) compares second snapshot and adaptive reference model is unusual to attempt and to detect.May exist various types of unusual on computers.In the embodiment that shows, analysis component (110) is at first attempted the unexpected non-existent Asset Name 710 of identification.For example, all or all basically computing machines may all comprise specific file within the colony.The existence of described file in adaptive reference model write down in the existence of Asset Name.If unexpected disappearance the computing machine of described file within colony, that is Asset Name is undiscovered, then certain situation may infect the computing machine that described file is lost.If the Asset Name accident does not exist, then described the existence is identified as unusual 712.For example, the clauses and subclauses of identification computing machine, date and unexpected non-existent assets can be input in the data-carrier store.
Next step attempts the unexpected Asset Name 714 that exists of identification analysis component (110).The existence of the unexpected Asset Name such as filename or registry entry may have been indicated the existence of the fault state such as computer worm.If Asset Name had never been seen in the past, if never seen before perhaps in the context of finding it, then it is exactly unexpected the existence.Exist if Asset Name is unexpected, then described existence is identified as unusual 716.
Next step attempts the unexpected assets value 718 of identification analysis component (110).For example, in one embodiment, analysis component (110) is attempted identification string assets value, and it is unknown for the Asset Name relevant with it.In another embodiment, analysis component (110) is compared digital asset with minimum or the max-thresholds relevant with the corresponding assets name.In an embodiment of the present invention, can threshold value be set automatically based on mean value that is used for assets value within the colony and standard deviation.According to the embodiment that shows, if detect unexpected assets value, then it is identified as unusual 720.Process finishes 722 then.
Although the process among Fig. 7 is shown as series process, the comparison of snapshot and adaptive reference model and the unusual identification generation that can walk abreast.Equally, each step of description can be repeated many times.Further, in each cycle period, perhaps increment snapshot or basic snapshot can be compared with adaptive reference model.
In case finished analysis, analysis component (110) just can generate the result such as exception reporting.This report further can be offered the user.For example, analysis component (110) can generate webpage, and it comprises the comparative result of snapshot and adaptive reference model.Embodiments of the invention can be provided for carrying out the examination of robotization safety, file and registration table integrity check, based on unusual virus detects and robotization is repaired means.
Fig. 8 shows that being used in one embodiment of the present of invention generates the process flow diagram of the process of adaptive reference model.In the embodiment that shows, analysis component (110) is via a plurality of snapshots of database element visit from a plurality of computing machines.Each snapshot all comprises a plurality of paired Asset Names and assets value.Analysis component (110) is automatically created at least in part the adaptive reference model based on described snapshot.
Adaptive reference model can comprise certain attributes, the relation of any sundry assets name and value and measure.In the embodiment that Fig. 8 shows, analysis component (110) is at first found the Asset Name of one or more uniquenesses, determines the number of times 804 that each unique Asset Name occurs within a plurality of snapshots then.For example, the file that is used for the basic operating system driver may occur on computing machines all basically within the colony.Filename is unique Asset Name; It only can occur once within snapshot, but all can occur on all basically snapshots probably.
In the embodiment that shows, next step determines the unique assets value 806 relevant with each Asset Name analysis component (110).For example, be used for the filename assets of the driver described about step 804, have identical value possibly for each appearance of this document name assets.In contrast, the file value that is used for journal file has the different value with the occurrence number as much possibly, that is, the journal file on any certain computer can comprise with colony in the clauses and subclauses of each other computing machine varying number.
Because colony may be very big, so in the embodiment that Fig. 8 shows,, then suspend and determine 810 if the quantity of the uniqueness value relevant with Asset Name has surpassed threshold value 808.In other words, in the example of the journal file of Miao Shuing, whether computing machine is in normal condition does not depend on the journal file with homogeneity value in the above.Can expect that log file contents changes on each computing machine.Yet be noted that the existence of journal file or do not exist to be stored in the adaptive reference model, as normal or unusual indication.
In the embodiment that Fig. 8 shows, next step determines the unique character string assets value 812 relevant with each Asset Name analysis component (110).For example, in one embodiment, only there is two types assets value: character string and numeral.File hash and registration entry value are the examples of character string; The example of performance counter value numeral.
Next step determines the statistical measurement relevant with the unique number value, described unique number value relevant with Asset Name 814 analysis component (110).For example, in one embodiment, analysis component (110) is caught the performance measurement such as memory paging.If a computing machine in the colony is paging memory usually, the memory resource that then may indicate rogue's program to carry out on the backstage and need be a large amount of.Yet, if each in the colony or quite a large amount of computing machine paging memory usually may instruct computer lack memory resource usually.In one embodiment, analysis component (110) is determined mean value and standard deviation for the digital value relevant with unique Asset Name.In example of memory,, then can discern unusual far away if be used for the outside that the measurement of the memory paging of a computing machine drops on the assembly average that is used for colony.
In one embodiment of the invention, can revise adaptive reference model by the application strategy template.Policy template is the right collection of assets/value, and described assets/value is to being identified and being applied to adaptive reference model, to set up the standard of reflection specific policy.For example, policy template can comprise a plurality of paired Asset Names and assets value, and its expectation is present in the normal computing machine.In one embodiment, the application strategy template comprises the modification adaptive reference model, so that paired Asset Name that exists in the policy template and assets value seem to be present in in a plurality of snapshots each, that is, look like the normal condition of computing machine in the colony.
Figure 15 selects " gold system " screenshot capture for the user interface of the usefulness of policy template being used in demonstration one embodiment of the present of invention.As mentioned above, the user at first the selection strategy template will thereon gold system of base.Figure 16 is the screenshot capture of the user interface that is used for selection strategy template assets in one embodiment of the present of invention.As the user interface that is used to create the identification filtrator.The user selects assets from data source window 1602, and they are copied to properties window, template content window 1604.
Fig. 9 is the initiatively process flow diagram of the process of abnormality detection that is used for that shows in one embodiment of the present of invention.In the embodiment that shows, when analyzing generation, analysis component (110) is established to the connection of database (112), and analyzed snapshot 902 is wanted in described database (112) storage.In the embodiment that shows, only use a database.Yet, in other databases, can analyze from a plurality of data of database.
Before carrying out deagnostic test, create one or more reference models 904.Reference model periodically updates, and is for example weekly, keeps up-to-date to guarantee the information that they comprise.One embodiment of the present of invention provide task dispatch, and it allows model creation to be configured to full automatic process.
In case reference model is created, it just can be processed in every way, so that dissimilar analyses becomes possibility.For example, can define aforesaid policy template 906.For example, policy template can require to manage machines all in the colony and all install and move anti-viral software.In case policy template has been applied to model, the deagnostic test that contrasts this model will comprise the test that is used for the strategy obedience.Policy template can be used in the multiple application, comprises more new management of the examination of robotization safety, performance threshold inspection and form.Policy template comprises assets and value group, and it can be compulsorily entered in the model as standard.In one embodiment, the edit model process is based on " gold system " method.The gold system is such system, described system demonstration the user wish to be incorporated into assets and value in the template.The user finds out the snapshot corresponding to the gold system, and each assets/value of selecting the user to wish to be included in the template then is right.
In the process that Fig. 9 shows, then policy template is applied to model to revise its normal definition 908.This allows forming model by this way, and described mode allows to check the obedience of the user-defined strategy of contrast as described here.
Equally can transformation model 910.Transfer process has changed reference model.For example, in one embodiment, transfer process is removed any unique information assets from model, that is any one and the assets that only occur in a snapshot.When the model of contrast conversion was carried out inspection, all unique information assets all can be reported as unusually.Such inspection when making installation agent parts for the first time, exist before the fault state of the unknown appear in one's mind aspect of great use.The model of conversion is setting up aspect the initial baseline of great use, because they have exposed unique characteristic.Because this reason, the model of conversion are known as baseline model sometimes in an embodiment of the present invention.
In another embodiment, the modelling process is removed the information assets of any coupling identification filtrator from model, be not incorporated in the model to guarantee known fault state.When first time during installation system, management colony comprises some known fault situations that are not noted as yet very usually.Find these situations and they removed very important that because if not, they can be incorporated in the adaptive reference model, as the part of machine normal condition from model.
Act on behalf of parts (202) obtain the machine of each management on predetermined basis the snapshot 924 of state.Snapshot is transmitted and enters into database as snapshot.Can or respond the particular event such as application program is installed when requiring equally and generate snapshot.
In the active issue management process that shows, carry out the periodic test 912 of the nearest snapshot of the up-to-date reference model of contrast.The output of periodic test be one group unusual, it as a result of shows 914 to the user.The result also comprises any situation unusual and identification filtrator matching result that is identified as.Can define identification filtrator 916 as described above.Unusually transmitted by being used to explain the identification filtrator that causes one group of situation.Situation can change to the seriousness scope of certain serious as Troy thing from certain optimum thing as form upgrades.
The fault state that can take place in computing machine is evolved along with the hardware and software parts of forming this computing machine and is changed.Therefore, there are the lasting needs that define and share new identification filtrator along with the new combination that notes abnormalities.The identification filtrator can be counted as proving the very detailed and structurized method of fault state, and can be counted as like this: they have represented to promote the important mechanisms of cooperation.The embodiment that shows comprises following mechanism, and described mechanism is used for the identification filtrator is exported to the XML file and imports the identification filtrator from the XML file.
In case the situation of identifying just generates the initiatively report 920 of check result of proof.Report for example can comprise the summary description of all detected situations or the detailed description of particular condition.
Figure 10 is the process flow diagram that shows the course of reaction in one embodiment of the present of invention.In the process that Figure 10 shows, suppose that adaptive reference model is created.When the customer call Help Desk Reported a Problem, the process of demonstration began 1002.In traditional Help Desk example, the information of the next procedure symptom that will to be oral collection experiencing about the user.In contrast, in the embodiments of the invention that show, next procedure is the deagnostic test 1003 that the nearest snapshot of contrast moves suspicious machine.May there be 3 kinds of possibilities so in the direct diagnosis of situation if this does not have problems: (1) situation since obtaining last snapshot takes place; (2) situation is new, and is not discerned by its filtrator; Perhaps (3) situation for example is a hardware problem outside analyst coverage.
If suspected that situation takes place since obtaining last snapshot, the user can make the parts (202) of acting on behalf of on the client computer obtain another Zhang Kuaizhao 1006 so.In case consequent snapshot can be used, just can carry out new deagnostic test 1004.
If it is new suspecting fault state, then the analyst can carry out comparing function, and what it provided that the machine state between the special time window phase such as the new application program that may be mounted changes breaks down 1008.The user can also watch the detailed expression 1010 of the machine state of various time points.If the fault state that analyst's identification makes new advances, then the user can be identified as group of assets the identification filtrator 1012 that is used for subsequent analysis.
Although traditional product has concentrated on the efficient of enhancing based on artificial support model, embodiments of the invention are around different examples---the support model based on machine designs.The fundamental difference of this method has the most profoundly shown it oneself in the field of data collection and analysis.Because machine rather than manually will carry out the analysis of a large amount of collection data is so the data of collection can be that quantity is huge.For example, in one embodiment, from the data that individual machine is collected, it is known as " health examination " or the snapshot that is used for described machine, comprises the value that is used for hundreds of attribute.Collect the ability of huge quantity data, aspect can the value volume and range of product of detected situation, provide the remarkable advantage on the legacy system to embodiments of the invention.
An alternative embodiment of the invention provides powerful analysis ability.The basis that is used for the height value analysis among such embodiment is the ability of accurately distinguishing between normal and unusual condition.For example, by refining the statistically significant relation in the snapshot data of from its client computer, collecting from it, according to synthetic automatically its reference model of a system of the present invention.The definition of consequent " self-adaptation " reference model is for what is normal for that cura specialis colony of that particular moment.
One embodiment of the present of invention are in conjunction with above-described data aggregation and adaptive analysis feature.In such embodiments; the data collection capability of the brilliance that combines with the analysis ability of adaptive reference model is converted into some significant competitive advantages; comprise following ability: by implementing the safety examination every day and checking that software upgrading to eliminate weakness, provides the automatic protection to security threat.Such embodiment can also can scan the system of all management to find them on the basis in routine on one's own initiative before problem causes the throughput rate forfeiture or calls out Help Desk.
The embodiments of the invention of carrying out the adaptive reference model ability can also detect unknown fault state in advance.Further, such embodiment is by automatically synthetic and keep, and need seldom or do not need producer to upgrade to work.To the automatic such embodiment of customization of specific management colony, make it can detect fault mode to this colony's uniqueness.
The additional advantage of the embodiment of the invention is, can not solve automatically under the situation of fault state, and such embodiment can provide a large amount of structured techniques information, to promote supporting technology person's work.
One embodiment of the present of invention provide the ability of the problem of automatic reparation identification.Such embodiment, when with before the adaptive reference model of embodiment of description when combining, because of ability of its all aspects of identification fault state can be repaired uniquely automatically.
Embodiments of the invention are also in the many advantages that provide aspect the service level described here on legacy system and the method.For example, concentrate to recover aspect the service level, and in case destroy and solved incident and compare, it is significantly more cheap to prevent that incident from taking place.Embodiments of the invention have significantly increased the number percent of following incident, and described incident can be detected in the mode that comprises computing machine difference in the actual environment and dynamic property/prevents, and do not need manual intervention.
Further, by automatic detection and repair known and unknown unusually, embodiments of the invention can be handled automatic recovery service level.The embodiment that carries out adaptive reference model described here is suitable for detecting automatically and repairing uniquely.Automation services and reparation also help to eliminate or the needs to automation services and table top visit are minimized.
By remarkable diagnosis capability and information resources widely are provided, embodiments of the invention provide the advantage of assistant service level.Embodiment collects and also to analyze a large amount of end user datas, promote with based on the relevant multiple needs of artificial support model, comprising: examination, stock control, performance evaluation, fault diagnosis are examined, disposed to safety.
Introduced the above stated specification of the embodiment of the invention, purpose only is in order to show and to illustrate, rather than plans to be exhaustive or to limit the invention to the precise forms of disclosure.Modification that they are countless and reorganization will be significantly to those skilled in the art, and can not deviate from the spirit and scope of the present invention.

Claims (19)

1. method that is used for detection computations machine pathological system state comprises:
A plurality of computing machines from computing machine colony receive snapshot, and wherein each snapshot comprises the data of the state of indicating corresponding computer;
Described a plurality of snapshots are stored in the data-carrier store;
Automatically generate adaptive reference model, this adaptive reference model comprises the rule set at the characteristic customization of described computing machine colony, develop described rule set by recognition mode among from each snapshot of described a plurality of computing machines, make described adaptive model represent the normal condition of each computing machine in the described colony; And
Snapshot and described adaptive reference model from least one computing machine are compared, unusual whether to exist in the state of determining this at least one computing machine.
2. the method for claim 1 further comprises comparing with the identification filtrator unusually described, to diagnose the fault state in described at least one computing machine; And
If the situation of breaking down, produce to this fault state from dynamic response.
3. method as claimed in claim 2, wherein, described identification filtrator comprises the AD HOC that each is unusual, is used to point out the existence of this reason of specific root situation and the general category of situation.
4. the method for claim 1 further comprises:
The a plurality of unusual same identification filtrator that is associated with particular snapshot is compared with the tracing trouble situation; And
A plurality of at least one unusual subclass that information is complementary in response and the described identification filtrator are diagnosed the fault state on described at least one computing machine.
5. the method for claim 1, wherein each snapshot comprises and following at least one data that are associated: system file, application file, registry entry, performance counter, process, communication port, hardware configuration, journal file, operation task, service, and network connects.
6. the method for claim 1 further comprises:
Rule manually is inserted in the described rule set of described adaptive reference model, to expand or to cover the rule of the described rule set that generates automatically according to described snapshot.
7. the method for claim 1, wherein generate described adaptive reference model to comprise value layer, be used for determining whether the assets value that snapshot comprises is unusual.
8. the method for claim 1, wherein generate described adaptive reference model to comprise a bunch layer, be used to follow the tracks of the relation between each assets, and discern unusually in response to the assets that asset concentration accident in the snapshot does not exist or exists.
9. the method for claim 1, wherein generate described adaptive reference model to comprise the profile layer, be used to respond assets in the snapshot bunch relation run counter to discern unusual.
10. system that is used for detection computations machine pathological system state comprises:
A plurality of software agent reside in respectively on a plurality of computing machines in the computing machine colony, and described software agent generates snapshot, and described snapshot comprises the data that are used to indicate the corresponding computer state, and;
Analytic unit, can be used for generating automatically adaptive reference model, this adaptive reference model comprises the rule set at the characteristic customization of described computing machine colony, develop described rule set by recognition mode among from each snapshot of described a plurality of computing machines, make described adaptive model represent the normal condition of each computing machine in the described colony, whether wherein said analytic unit compares snapshot and the described adaptive reference model from least one computing machine, unusual to exist in the state of determining this at least one computing machine.
11. system as claimed in claim 10, wherein said analytic unit compares with the identification filtrator unusually described, to diagnose the fault state in described at least one computing machine.
12. system as claimed in claim 11, wherein said analytic unit compares described fault state and response agent routine library, and generate to described fault state from dynamic response.
13. system as claimed in claim 11, wherein, described identification filtrator comprises the AD HOC that each is unusual, is used to point out the existence of this reason of specific root situation and the general category of situation.
14. system as claimed in claim 10, a plurality of unusual same identification filtrator that wherein said analytic unit handle is associated with particular snapshot compares with the tracing trouble situation; And
A plurality of at least one unusual subclass that information is complementary in response and the described identification filtrator are diagnosed the fault state on described at least one computing machine.
15. system as claimed in claim 10, wherein, each snapshot comprises and following at least one data that are associated: system file, application file, registry entry, performance counter, process, communication port, hardware configuration, journal file, operation task, service, and network connects.
16. system as claimed in claim 10, wherein said analytic unit manually is inserted into rule in the described rule set of described adaptive reference model, to expand or to cover the rule of the described rule set that generates automatically according to described snapshot.
17. system as claimed in claim 10, wherein, described adaptive reference model comprises value layer, is used for determining whether the assets value that snapshot comprises is unusual.
18. system as claimed in claim 10, wherein, described adaptive reference model comprises a bunch layer, is used to follow the tracks of the relation between each assets, and discerns unusually in response to the assets that asset concentration accident in the snapshot does not exist or exists.
19. system as claimed in claim 10, wherein, described adaptive reference model comprises the profile layer, be used to respond assets in the snapshot bunch relation run counter to discern unusual.
CN2004800281029A 2003-08-11 2004-08-11 Systems and methods for automated computer support Expired - Fee Related CN1860476B (en)

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
US49422503P 2003-08-11 2003-08-11
US60/494,225 2003-08-11
US10/916,800 2004-08-11
PCT/US2004/026186 WO2005020001A2 (en) 2003-08-11 2004-08-11 Systems and methods for automated computer support
US10/916,800 US20050038818A1 (en) 2003-08-11 2004-08-11 Systems and methods for creation and use of an adaptive reference model

Related Child Applications (1)

Application Number Title Priority Date Filing Date
CN2010101700582A Division CN101882102A (en) 2003-08-11 2004-08-11 Be used for the system that automated computer is supported

Publications (2)

Publication Number Publication Date
CN1860476A CN1860476A (en) 2006-11-08
CN1860476B true CN1860476B (en) 2010-06-23

Family

ID=37196005

Family Applications (2)

Application Number Title Priority Date Filing Date
CN 200480027940 Pending CN1856781A (en) 2003-08-11 2004-08-11 Systems and methods for automated computer support
CN2004800281029A Expired - Fee Related CN1860476B (en) 2003-08-11 2004-08-11 Systems and methods for automated computer support

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN 200480027940 Pending CN1856781A (en) 2003-08-11 2004-08-11 Systems and methods for automated computer support

Country Status (2)

Country Link
CN (2) CN1856781A (en)
ZA (2) ZA200601937B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102831010A (en) * 2012-08-30 2012-12-19 腾讯科技(深圳)有限公司 Method and device for opening unknown file
CN105745663B (en) * 2013-12-19 2018-11-16 英特尔公司 Protection system including the assessment of machine learning snapshot
US9904584B2 (en) * 2014-11-26 2018-02-27 Microsoft Technology Licensing, Llc Performance anomaly diagnosis
CN105302657B (en) * 2015-11-05 2020-12-15 网易宝有限公司 Abnormal condition analysis method and device
EP3255562A1 (en) * 2016-06-09 2017-12-13 Mastercard International Incorporated Method and systems for monitoring changes for a server system
CN113282485B (en) * 2021-04-25 2023-11-03 南京大学 Program automatic repairing method based on self-adaptive search

Also Published As

Publication number Publication date
CN1860476A (en) 2006-11-08
ZA200601937B (en) 2007-05-30
CN1856781A (en) 2006-11-01
ZA200601938B (en) 2007-05-30

Similar Documents

Publication Publication Date Title
CN101882102A (en) Be used for the system that automated computer is supported
Kabinna et al. Examining the stability of logging statements
US9940190B2 (en) System for automated computer support
Li et al. Towards just-in-time suggestions for log changes
Bailis et al. Macrobase: Prioritizing attention in fast data
Lunt et al. A real-time intrusion-detection expert system (IDES)
Shar et al. Mining SQL injection and cross site scripting vulnerabilities using hybrid program analysis
US9652318B2 (en) System and method for automatically managing fault events of data center
US8214364B2 (en) Modeling user access to computer resources
AU2017274576B2 (en) Classification of log data
CN102812441A (en) Automated malware detection and remediation
Lin et al. Fast dimensional analysis for root cause investigation in a large-scale service environment
Zhang et al. Sentilog: Anomaly detecting on parallel file systems via log-based sentiment analysis
Cai et al. A real-time trace-level root-cause diagnosis system in alibaba datacenters
CN1860476B (en) Systems and methods for automated computer support
He et al. Tscope: Automatic timeout bug identification for server systems
Pecchia et al. Assessing invariant mining techniques for cloud-based utility computing systems
Sosnowski et al. Analyzing logs of the university data repository
US20240080332A1 (en) System and method for gathering, analyzing, and reporting global cybersecurity threats
Borse et al. MDAP: Module Dependency based Anomaly Prediction
Chakraborty Automated Extraction of Behaviour Model of Applications
Farshchi et al. Technical Report: Anomaly Detection for a Critical Industrial System using Context, Logs and Metrics
Shar et al. Mining SQL injection and cross site scripting vulnerabilities using hybrid program analysis.(2013)
Hassan Suhas Kabinna, Cor-Paul Bezemer, Weiyi Shang, Mark D. Syer & Ahmed

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C56 Change in the name or address of the patentee

Owner name: CHUNANMUFENTE CO.,LTD

Free format text: FORMER NAME: CHORUS SYSTEMS INC.

CP01 Change in the name or title of a patent holder

Address after: North Carolina

Patentee after: Chorus Systems Inc.

Address before: North Carolina

Patentee before: Chorus Systems Inc.

C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20100623

Termination date: 20130811