US11288153B2 - Self-healing computing device - Google Patents

Self-healing computing device Download PDF

Info

Publication number
US11288153B2
US11288153B2 US16/905,592 US202016905592A US11288153B2 US 11288153 B2 US11288153 B2 US 11288153B2 US 202016905592 A US202016905592 A US 202016905592A US 11288153 B2 US11288153 B2 US 11288153B2
Authority
US
United States
Prior art keywords
issue
script
hardware component
solution
commands
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active, expires
Application number
US16/905,592
Other versions
US20210397527A1 (en
Inventor
Sasidhar Purushothaman
Ankush Sethi
Gowthaman Trichy Karuppusamy
Shikha Dixit
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Bank of America Corp
Original Assignee
Bank of America Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bank of America Corp filed Critical Bank of America Corp
Priority to US16/905,592 priority Critical patent/US11288153B2/en
Assigned to BANK OF AMERICA CORPORATION reassignment BANK OF AMERICA CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: DIXIT, SHIKHA, PURUSHOTHAMAN, SASIDHAR, SETHI, ANKUSH, TRICHY KARUPPUSAMY, GOWTHAMAN
Publication of US20210397527A1 publication Critical patent/US20210397527A1/en
Application granted granted Critical
Publication of US11288153B2 publication Critical patent/US11288153B2/en
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/22Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
    • G06F11/26Functional testing
    • G06F11/263Generation of test inputs, e.g. test vectors, patterns or sequences ; with adaptation of the tested hardware for testability with external testers
    • G06F11/2635Generation of test inputs, e.g. test vectors, patterns or sequences ; with adaptation of the tested hardware for testability with external testers using a storage for the test inputs, e.g. test ROM, script files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0751Error or fault detection not based on redundancy
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/079Root cause analysis, i.e. error or fault diagnosis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0793Remedial or corrective actions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/22Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
    • G06F11/2247Verification or detection of system hardware configuration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/22Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
    • G06F11/2268Logging of test results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/445Program loading or initiating
    • G06F9/44505Configuring for program initiating, e.g. using registry, configuration files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3409Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation

Definitions

  • the present disclosure relates generally to computing devices, and more specifically to self-healing computing devices.
  • the system disclosed in the present application provides a technical solution to the technical problems discussed above by monitoring for issues within a computer system and autonomously resolving these issues.
  • a computer system may experience error codes, data errors, data loss, slow response times, an increase processor usage, a decrease in available memory, a decrease in available bandwidth, a decrease in data throughput, or any other type of decrease of performance.
  • the disclosed system provides the ability to detect and resolve any issues that affect the performance of the computer system.
  • the disclosed system provides several practical applications and technical advantages which include a process for using supervised learning to identify solutions for resolving issues within a computer system and generating scripts that can be used to resolve these issues in the future.
  • this process allows the computer system to learn which commands and operations are typically used to resolve an issue within the computer system and how to apply these commands to resolve future issues within the computer system.
  • this process enables a computer system to then autonomously detect and resolve future issues within the computer system.
  • the source for issues such as a decrease in performance may not be easily detectable. This means that the computer system will experience a decrease performance, for example a decrease in throughput, until the source of the issue has been determined. Once the source of an issue has identified, then the computer system will need to be at least partially shut down to allow a network operator to make repairs to the computer system. This shutdown results in a downtime where the computer system may operate in a limited capacity.
  • the process disclosed in the present application allows the computer system to quickly detect an issue within the computer system, to identify a source of the issue, and to autonomously implement a solution to resolve the issue.
  • the computer system is able to reduce amount of time that the computer system operates with degraded performance.
  • the computer system is able to reduce the amount of time it takes to resolve an issue within the computer system which reduces the amount of downtime that the computer system will experience.
  • the computer system is able to spend more time operating at its full capacity which means that the computer system can maintain a higher throughput and improve the utilization of the computer system.
  • the computer system is configured to test and vet solutions using test environment and testing scripts before they are deployed within the computer system. This process reduces the likelihood of introducing new errors and issues into the system infrastructure after deploying a solution to resolve an issue.
  • the system comprises a system healing device that is configured to monitor the health and operational activity of a computing system infrastructure as the system infrastructure changes over time. This process allows the system healing device to detect and resolve issues as they arise within the system infrastructure. This process reduces the downtime due to diagnosing and resolving issues within the system infrastructure.
  • the system healing device is configured to operate in a learning phase to identify patterns and instructions that are used to resolve issues within the system infrastructure.
  • the system healing device is configured to use Application Programming Interfaces (APIs) to communicate with system components (e.g. software and hardware components) to determine operating characteristics of the system components.
  • APIs Application Programming Interfaces
  • the system healing device is further configured to generate solution scripts and testing scripts using supervised learning based on monitoring the actions that are taken by a network operator to resolve issues within the system infrastructure.
  • the solution scripts comprise instructions for resolving an issue.
  • the testing scripts comprise instructions for testing a solution script within a test environment before deploying the solution script within the system infrastructure.
  • the system healing device is configured to operate in an autonomous self-healing phase to begin detecting and resolving issues within the system infrastructure.
  • the system healing device is further configured to autonomously detect issues within the system infrastructure and to identify solution scripts and testing scripts for resolving the detected issue.
  • the system healing device is configured to execute the testing script to generate a test environment for determining which of the identified solution scripts best resolves the detected issue.
  • the system healing device is configured to deploy the selected solution script within the system infrastructure by executing the instructions or commands provided by the solution script. This process allows the system healing device to deploy solutions that have been tested and vetted before they are deploy within the system infrastructure. This process reduces the likelihood of introducing new errors and issues into the system infrastructure after deploying a solution to resolve an issue.
  • FIG. 1 is a schematic diagram of a self-healing computing system
  • FIG. 2 is a flowchart of an embodiment of a self-healing method
  • FIG. 3 is a schematic diagram of an embodiment of a device configured to implement self-healing.
  • FIG. 1 is a schematic diagram of a self-healing computing system 100 .
  • the system 100 comprises a system healing device 102 that is in signal communication with a computing system infrastructure 104 within a network 106 .
  • the network 106 may be any suitable type of wireless and/or wired network including, but not limited to, all or a portion of the Internet, an Intranet, a private network, a public network, a peer-to-peer network, the public switched telephone network, a cellular network, a local area network (LAN), a metropolitan area network (MAN), a wide area network (WAN), and a satellite network.
  • the network 106 may be configured to support any suitable type of communication protocol as would be appreciated by one of ordinary skill in the art.
  • the system healing device 102 that is configured to monitor the health and operational activity of the computing system infrastructure 104 as the system infrastructure changes over time. Initially, the system healing device 102 is configured to operate in a learning phase to identify patterns and instructions that are used to resolve issues within the system infrastructure 104 . The system healing device 102 is further configured to generate solution scripts 120 and testing scripts 122 using supervised learning based on monitoring the actions that are taken by a network operator to resolve issues within the system infrastructure.
  • the solution scripts comprise instructions for resolving an issue.
  • the testing scripts comprise instructions for testing a solution script within a test environment before deploying the solution script within the system infrastructure.
  • the system healing device 102 is configured to operate in an autonomous self-healing phase to begin detecting and resolving issues within the system infrastructure 104 .
  • the system healing device 102 is further configured to autonomously detect issues within the system infrastructure and to identify solution scripts 120 and testing scripts 122 for resolving the detected issue.
  • the system healing device 102 is configured to execute the testing script 122 to generate a test environment for determining which of the identified solution scripts 120 best resolves the detected issue.
  • the system healing device 102 is configured to deploy the selected solution script 120 within the system infrastructure 104 by executing the instructions or commands provided by the solution script 120 .
  • the computing system infrastructure 104 comprises a plurality of system components 108 .
  • System components 108 are hardware and software components that are configured to form a computing system. Examples of system components 108 include, but are not limited to, processors, databases, memories, database management tools, servers, clients, network devices, operating systems, applications, virtual machines, cloud services, development tools, or any other suitable type of hardware or software component.
  • the system healing device 102 is generally configured to monitor the health of the system infrastructure 104 and to autonomously resolve issues within the system infrastructure 104 .
  • the system healing device 102 is in signal communication with the system components 108 using Application Programming Interfaces (APIs) 110 which allow the system healing device 102 to monitor the operational activity and health of the system components 108 .
  • APIs Application Programming Interfaces
  • the system healing device 102 may use APIs to determine response times, processer utilization, memory utilization, bandwidth utilization, data throughput, available memory disk space, error codes, job failures, batch errors, or any other suitable type of information about a system component 108 .
  • the system healing device 102 comprises a monitoring engine 112 and a memory 114 .
  • the system healing device 102 may be configured as shown or in any other suitable configuration. Additional information about the hardware configuration of the system healing device 102 is described in FIG. 3 .
  • the memory 114 is configured to store system information 116 , system maintenance logs 118 , solution scripts 120 , testing scripts 122 , script maps 124 , and/or any other suitable type of data.
  • the system information 116 comprises information about the state or health of system components 108 and the overall system infrastructure 104 .
  • the system information 116 may comprise information about response times, processer utilization, memory utilization, bandwidth utilization, data throughput, available memory disk space, error codes, job failures, batch errors, or any other suitable type of information about a system component 108 or the overall system infrastructure 104 .
  • the system maintenance logs 118 comprise a sequence of commands and instructions that are used to resolve the issue.
  • the solution scripts 120 comprise executable commands for performing operations on one or more system components 108 .
  • the testing scripts 122 comprise executable commands for configuring a test environment to simulate one or more solution scripts 120 .
  • the script map 124 is configured to associate an issue with a solution script 120 and a testing script 122 that are associated with
  • the monitoring engine 112 is generally configured to monitor the health and the operational activity of the system infrastructure 104 . Over time, the system components 108 within the system infrastructure 104 may be modified. For example, a network operator may add new system components 108 to the system infrastructure 104 , to remove system components 108 from the system infrastructure 104 , or to modify a configuration for a system component 108 .
  • the monitoring engine 112 is configured to monitor the health and operational activity of the system infrastructure 104 as the system infrastructure changes. This process allows the system healing device 102 to detect and to resolve issues as they arise within the system infrastructure 104 . This process reduces the downtime due to diagnosing and resolving issues within the system infrastructure 104 .
  • the monitoring engine 112 is configured to use APIs to communicate with system components 108 to determine operating characteristics of the system components 108 .
  • the monitoring engine 112 may be configured to periodically capture system information 116 that describes the state or health of system components 108 and the overall system infrastructure 104 .
  • the monitoring engine 112 may also be configured to collect system information 116 that comprises data traffic that can be used in a testing environment for resolving issues within the system infrastructure 104 .
  • the monitoring engine 112 operates in a learning phase to identify patterns and instructions that are used to resolve issues within the system infrastructure 104 .
  • the monitoring engine 112 is further configured to generate solution scripts 120 and testing scripts 122 using supervised learning based on monitoring the actions that are taken by a network operator to resolve issues within the system infrastructure 104 .
  • the solution scripts 120 comprise instructions for resolving an issue.
  • the testing scripts 122 comprise instructions for testing a solution script 120 within a test environment before deploying the solution script 120 within the system infrastructure 104 .
  • the monitoring engine 112 is configured to operate in an autonomous self-healing phase.
  • the monitoring engine 112 begins autonomously detecting and resolving issues within the system infrastructure 104 .
  • the monitoring engine 112 is further configured to autonomously detect issues within the system infrastructure 104 and to identify solution scripts 120 and testing scripts 122 for resolving the detected issue.
  • the monitoring engine 112 is configured to execute the testing script 122 to generate a test environment for determining which of the identified solution scripts 120 best resolves the detected issue.
  • the monitoring engine 112 is configured to deploy the selected solution script 120 within the system infrastructure 104 by executing the instructions or commands provided by the solution script 120 .
  • An example of the monitoring engine 112 in operation is described in FIG. 2 .
  • FIG. 2 is a flowchart of an embodiment of a self-healing method 200 .
  • the system healing device 102 may employ method 200 to monitor the health of the system infrastructure 104 and to autonomously resolve issues within the system infrastructure 104 over time using solution scripts 120 and testing scripts 122 .
  • the system healing device 102 may method 200 in conjunction with a DevOps toolchain to monitor the health of the system infrastructure 104 as the system components 108 within the system infrastructure 104 change over time. This process allows the system healing device 102 to detect and resolve issues as they arise within the system infrastructure 104 . This process also reduces the downtime due to diagnosing and resolving issues within the system infrastructure 104 .
  • the system healing device 102 is configured to provide an interface (e.g. a graphical user interface) that allows a network operator to monitor the health of the system infrastructure 104 and to modify the system components 108 within the system infrastructure 104 .
  • a network operator may use the system healing device 102 to add new system components 108 to the system infrastructure 104 , to remove system components 108 from the system infrastructure 104 , or to modify a configuration for a system component 108 .
  • the system healing device 102 may allow a user to add a new system component 108 , to remove a system component 108 , or to modify settings of a system component 108 .
  • the system healing device 102 may receive a device configuration for a new hardware device from a user using a graphical user interface (e.g. a web portal). In this example, the system healing device 102 will use the received device configuration to configure the new system component 108 to integrate the new system component 108 with the system infrastructure 104 . As another example, the system healing device 102 may receive a device configuration for reconfiguring a hardware device from a user. In this example, the system healing device 102 will use the received device configuration to reconfigure and modify the operation of the system component 108 .
  • a graphical user interface e.g. a web portal
  • the monitoring engine 112 monitors the operational activity of system components 108 within the system infrastructure 104 .
  • the monitoring engine 112 is configured to periodically measure the performance of the system infrastructure 104 .
  • the monitoring engine 112 may periodically collect data about the number of system components 108 , the types of system components 108 , the operating conditions of the system components 108 , and/or data activity within the system infrastructure 104 .
  • the monitoring engine 112 is configured to use API to determine the current operating conditions of the system components 108 .
  • the monitoring engine 112 may periodically send API calls to request information about the current operating conditions of one or more system components 108 .
  • the monitoring engine 112 detects an issue within the system infrastructure 104 based on the current operating conditions of the system components 108 .
  • the monitoring engine 112 detects an issue or error that is associated with one or more system components 108 within the system infrastructure 104 .
  • the monitoring engine 112 may detect an issue within the system infrastructure 104 based on a decrease in the performance of one or more system components 108 over a predetermined period of time. For instance, the monitoring engine 112 may compare differences in the operating characteristics of a system component 108 to predetermined threshold values to determine whether the system component 108 is experiencing a decrease in performance.
  • the monitoring engine 112 may detect an increase processor usage, a decrease in available memory, a decrease in available bandwidth, a decrease in data throughput, or any other type of decrease of performance for a system component 108 .
  • the monitoring engine 112 may detect an issue within the system infrastructure 104 in response to receiving an error code from one or more of the system components 108 .
  • the monitoring engine 112 may detect an issue after integrating the new system component 108 with the system infrastructure 104 , removing a system component 108 from the system infrastructure 104 , or modifying a system component 108 within the system infrastructure 104 .
  • the monitoring engine 112 identifies solution steps and test cases that are used to resolve the issue.
  • the monitoring engine 112 may be configured to use a system maintenance log 118 to identify actions and test cases that a network operator uses to resolve the issue.
  • the system maintenance log 118 may comprise a sequence of commands and instructions that are used to resolve the issue.
  • the system maintenance log 118 may comprise commands for clearing caches, clearing logs, clearing disk space, rebooting a system component 108 , load balancing data among system components 108 , data prioritization, modifying network settings, modifying settings of a system component 108 , or any other suitable type of commands that are sent to a system component 108 .
  • the system maintenance log 118 may also comprise settings or instructions for configuring a test environment that is used to simulate and test the solution that is used to resolve the issue.
  • the test environment is configured to simulate the effect on the system infrastructure 104 in response to sending commands to one or more hardware components 108 .
  • a network operator may build a test environment that uses synthetic data or previously stored data activity from the system infrastructure 104 for testing a solution to the detected issue.
  • the monitoring engine 112 identifies the settings for the test environment and the commands that were sent to one or more system components 108 to simulate and test a solution for resolving the issue.
  • the monitoring engine 112 may be configured to monitor the operations and actions that are performed by a network operator to identify solution steps and test cases that are used to resolve the issue. For instance, the monitoring engine 112 may monitor the actions that are performed by the network operator while the network operator uses one or more DevOps tools to resolve the detected issue. In this case, the monitoring engine 112 may be configured to monitor the operational activity of the system infrastructure 104 as a network operator resolves the issue. For example, the monitoring engine 112 may use APIs to identify commands and instructions that are used by the network operator to resolve the detected issue. The monitoring engine 112 may also use APIs to identify settings or instructions for configuring and operating a test environment that is used to simulate and test a solution for resolving the issue.
  • the monitoring engine 112 generates a solution script 120 based on the identified solutions steps.
  • the solution script 120 comprises machine-executable commands for performing operations on one or more system components 108 .
  • the monitoring engine 112 generates a solution script 120 by formatting or converting the identified commands that were sent to system components 108 in step 206 into machine-executable instructions or commands that can be executed by one or more system components 108 .
  • the solution script 120 captures the commands that are used, where the commands are sent, and the sequence that the commands are sent.
  • the monitoring engine 112 may link the solution script 120 with an identifier 128 that uniquely identifies the solution script 120 .
  • the identifier 128 may be an alphanumeric identifier or any other suitable type of identifier.
  • the monitoring engine 112 generates a testing script 122 based on the identified test cases.
  • the testing script 122 comprises machine-executable commands for configuring a test environment and simulating one or more solution scripts 120 .
  • the monitoring engine 112 generates a testing script 122 by converting the test environment settings and instructions that were identified in step 206 into machine-executable instructions or commands that can be used to configure a test environment and executable instructions or commands that can be executed by the test environment to simulate one or more solution scripts 120 .
  • the monitoring engine 112 may link the testing script 122 with an identifier 130 that uniquely identifies the testing script 122 .
  • the identifier 130 may be an alphanumeric identifier or any other suitable type of identifier.
  • the monitoring engine 112 links the solution script 120 and the testing scripts 122 in the script map 124 .
  • the monitoring engine 112 creates an entry in the script map 124 that links an issue with a solution script 120 and a testing script 122 that are associated with resolving the issue.
  • the monitoring engine 112 may first associate the issue that was detected in step 204 with an issue identifier 126 .
  • the issue identifier 126 may be an error code or any other suitable type identifier that uniquely identifies a type of issue.
  • the issue identifier 126 may be uniquely associated with a particular system component 108 type and an error type.
  • the monitoring engine 112 then links the issue identifier 126 with the identifier 128 for the solution script 120 and the identifier 130 for the testing script 122 that was generated in steps 208 and 210 , respectively, in the script map 124 .
  • the monitoring engine 112 may repeat steps 202 - 212 for a predetermined amount of time or a predetermined number of iterations which allows the monitoring engine 112 to build a repository of solution scripts 120 and testing scripts 122 that can be used to resolve future issues that are detected within the system infrastructure 104 .
  • the monitoring engine 112 may begin using steps 214 - 224 to autonomously detect and resolve issues within the system infrastructure 104 .
  • the monitoring engine 112 may continue to run steps 202 - 210 in parallel after enabling the ability to autonomously detect and resolve issues within the system infrastructure 104 . This option allows the monitoring engine 112 to continue to build the repository of solution scripts 120 and testing scripts 122 .
  • the monitoring engine 112 determines whether another issue been detected within the system infrastructure 104 .
  • the monitoring engine 112 determines whether another issue has been detected using a process that is similar to the process described in steps 202 and 204 .
  • the monitoring engine 112 returns to step 202 in response to determining that another issue has not been detected within the system infrastructure 104 . In this case, the monitoring engine 112 returns to step 202 to continue monitoring the operational activity of the system infrastructure 104 .
  • the monitoring engine 112 proceeds to step 216 in response to determining that an issue has been detected within the system infrastructure 104 .
  • the monitoring engine 112 may detect an issue after integrating the new system component 108 with the system infrastructure 104 , removing a system component 108 from the system infrastructure 104 , or modifying a system component 108 within the system infrastructure 104 .
  • the monitoring engine 112 identifies one or more solution scripts 120 that correspond with the detected issue.
  • the monitoring engine 112 will identify an issue identifier 126 that corresponds with the detected issue.
  • the monitoring engine 112 uses the issue identifier 126 with the script map 124 to identify solution scripts 120 that are associated with the issue identifier 126 .
  • the monitoring engine 112 may use the issue identifier 126 as a search token to identify entries in the script map 124 that are associated with the issue identifier 126 .
  • the monitoring engine 112 identifies a testing script 122 that corresponds with the identified solution scripts 120 .
  • the monitoring engine 112 identifies a testing script 122 that is linked with the identified solution scripts 120 in the script map 124 .
  • the monitoring engine 112 executes the testing script 122 with the identified solution scripts 120 .
  • the monitoring engine 112 executes the identified testing script 122 by using the instructions or commands from the testing script 122 to configure a test environment and to simulate the identified solution scripts 120 .
  • the monitoring engine 112 may also use previously collected system information 116 with the test environment to simulate the identified solution scripts 120 .
  • the monitoring engine 112 may use information about the configurations of the system components 108 in the system infrastructure 104 to configure the test environment.
  • the monitoring engine 112 may collect data traffic samples from the system infrastructure 104 while monitoring the operational activity within the system infrastructure 104 . In this case, the monitoring engine 112 may use the collected data traffic with the test environment to simulate the identified solution scripts 120 .
  • the monitoring engine 112 selects a solution script 120 based on the results of executing the testing script 122 .
  • the monitoring engine 112 compares the results from executing the testing scripts 122 to identify which solution script 120 resolves the detected issue and/or provides the greatest performance increase.
  • the issue identifier 126 may be associated with two solution scripts 120 .
  • the monitoring engine 112 may simulate both solution scripts 120 and select the solution script 120 that resolves the detected issue and provides the most performance improvements.
  • the monitoring engine 112 executes the selected solution script 120 within the system infrastructure 104 .
  • the monitoring engine 112 executes the command and instructions provided by the selected solution script 120 to resolve the detected issue.
  • the monitoring engine 112 may send commands to one or more system components 108 to resolve the detected issue based on the selected solution script 120 .
  • the monitoring engine 112 may reconfigure one or more system components 108 based on instructions provided by the selected solution script 120 .
  • the monitoring engine 112 may terminate method 200 or may return to step 202 to continue monitoring for other issues within the system infrastructure 104 .
  • FIG. 3 is a schematic diagram of an embodiment of a device (e.g. system healing device 102 ) configured to monitor and resolve issues associated with a system infrastructure 104 .
  • the system healing device 102 comprises a processor 302 , a memory 114 , and a network interface 304 .
  • the system healing device 102 may be configured as shown or in any other suitable configuration.
  • the processor 302 comprises one or more processors operably coupled to the memory 114 .
  • the processor 302 is any electronic circuitry including, but not limited to, state machines, one or more central processing unit (CPU) chips, logic units, cores (e.g. a multi-core processor), field-programmable gate array (FPGAs), application-specific integrated circuits (ASICs), or digital signal processors (DSPs).
  • the processor 302 may be a programmable logic device, a microcontroller, a microprocessor, or any suitable combination of the preceding.
  • the processor 302 is communicatively coupled to and in signal communication with the memory 114 .
  • the one or more processors are configured to process data and may be implemented in hardware or software.
  • the processor 302 may be 8-bit, 16-bit, 32-bit, 64-bit, or of any other suitable architecture.
  • the processor 302 may include an arithmetic logic unit (ALU) for performing arithmetic and logic operations, processor registers that supply operands to the ALU and store the results of ALU operations, and a control unit that fetches instructions from memory and executes them by directing the coordinated operations of the ALU, registers and other components.
  • ALU arithmetic logic unit
  • the one or more processors are configured to implement various instructions.
  • the one or more processors are configured to execute monitoring instructions 306 to implement a monitoring engine 112 .
  • processor 302 may be a special-purpose computer designed to implement the functions disclosed herein.
  • the monitoring engine 112 is implemented using logic units, FPGAs, ASICs, DSPs, or any other suitable hardware.
  • the monitoring engine 112 is configured to operate as described in FIGS. 1 and 2 .
  • the monitoring engine 112 may be configured to perform the steps of method 200 as described in FIG. 2 .
  • the memory 114 comprises one or more disks, tape drives, or solid-state drives, and may be used as an over-flow data storage device, to store programs when such programs are selected for execution, and to store instructions and data that are read during program execution.
  • the memory 114 may be volatile or non-volatile and may comprise a read-only memory (ROM), random-access memory (RAM), ternary content-addressable memory (TCAM), dynamic random-access memory (DRAM), and static random-access memory (SRAM).
  • ROM read-only memory
  • RAM random-access memory
  • TCAM ternary content-addressable memory
  • DRAM dynamic random-access memory
  • SRAM static random-access memory
  • the memory 114 is operable to store monitoring instructions 306 , system information 116 , system maintenance logs 118 , solution scripts 120 , testing scripts 122 , script maps 124 , and/or any other data or instructions.
  • the monitoring instructions 306 may comprise any suitable set of instructions, logic, rules, or code operable to execute the monitoring engine 112 .
  • the system information 116 , system maintenance logs 118 , solution scripts 120 , testing scripts 122 , and script maps 124 are configured similar to system information 116 , system maintenance logs 118 , solution scripts 120 , testing scripts 122 , and script maps 124 described in FIGS. 1 and 2 , respectively.
  • the network interface 304 is configured to enable wired and/or wireless communications.
  • the network interface 304 is configured to communicate data between the system healing device 102 and other devices (e.g. system components 108 ), systems, or domain.
  • the network interface 304 may comprise a WIFI interface, a LAN interface, a WAN interface, a modem, a switch, or a router.
  • the processor 302 is configured to send and receive data using the network interface 304 .
  • the network interface 304 may be configured to use any suitable type of communication protocol as would be appreciated by one of ordinary skill in the art.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Computer Hardware Design (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Debugging And Monitoring (AREA)

Abstract

A device configured to periodically monitor operational activity of hardware components within a computing system infrastructure. The device is further configured to detect an issue that is associated with a hardware component, to identify commands that are sent to the hardware component to resolve the first issue, and to identify a test environment configuration for simulating the effect of sending the commands to the hardware component on the computing system infrastructure. The device is further configured to generate a solution script based on the identified commands and a testing script based on the identified test environment configuration, and to store an association between the first issue, the solution script, and the testing script in a script map.

Description

TECHNICAL FIELD
The present disclosure relates generally to computing devices, and more specifically to self-healing computing devices.
BACKGROUND
Existing computer systems are constantly changing to keep up with a consumer's needs. Hardware and software components may be continuously added, removed, or modified as the needs of a computer system evolves. The continuous evolution of a computer system poses a technical challenge because new errors and issues may arise as changes are made to the computer system. Identifying and resolving these types of issues in large computer systems is a difficult and time-consuming task which results in a significant amount of downtime for the computer system and reduces the throughput of the computing system while the system is being repaired. Issues within a computer system may be unique to the configuration of components within the system infrastructure and they may arise due to any number of variables. This means that each issue requires a sufficient amount of time to troubleshoot and resolve. This downtime also has a detrimental effect on the performance and throughput of other computer systems that rely on data from the computer system.
SUMMARY
The system disclosed in the present application provides a technical solution to the technical problems discussed above by monitoring for issues within a computer system and autonomously resolving these issues. For example, a computer system may experience error codes, data errors, data loss, slow response times, an increase processor usage, a decrease in available memory, a decrease in available bandwidth, a decrease in data throughput, or any other type of decrease of performance. The disclosed system provides the ability to detect and resolve any issues that affect the performance of the computer system. The disclosed system provides several practical applications and technical advantages which include a process for using supervised learning to identify solutions for resolving issues within a computer system and generating scripts that can be used to resolve these issues in the future. For example, this process allows the computer system to learn which commands and operations are typically used to resolve an issue within the computer system and how to apply these commands to resolve future issues within the computer system. After an initial learning phase, this process enables a computer system to then autonomously detect and resolve future issues within the computer system. In existing computer systems, the source for issues such as a decrease in performance may not be easily detectable. This means that the computer system will experience a decrease performance, for example a decrease in throughput, until the source of the issue has been determined. Once the source of an issue has identified, then the computer system will need to be at least partially shut down to allow a network operator to make repairs to the computer system. This shutdown results in a downtime where the computer system may operate in a limited capacity.
In contrast, the process disclosed in the present application allows the computer system to quickly detect an issue within the computer system, to identify a source of the issue, and to autonomously implement a solution to resolve the issue. By reducing the amount of time required to detect an issue and its source, the computer system is able to reduce amount of time that the computer system operates with degraded performance. In addition, by autonomously identifying and implementing a solution, the computer system is able to reduce the amount of time it takes to resolve an issue within the computer system which reduces the amount of downtime that the computer system will experience. By reducing the amount of downtime that the computer system experiences, this means that the computer system is able to spend more time operating at its full capacity which means that the computer system can maintain a higher throughput and improve the utilization of the computer system. Furthermore, the computer system is configured to test and vet solutions using test environment and testing scripts before they are deployed within the computer system. This process reduces the likelihood of introducing new errors and issues into the system infrastructure after deploying a solution to resolve an issue.
In one embodiment, the system comprises a system healing device that is configured to monitor the health and operational activity of a computing system infrastructure as the system infrastructure changes over time. This process allows the system healing device to detect and resolve issues as they arise within the system infrastructure. This process reduces the downtime due to diagnosing and resolving issues within the system infrastructure. Initially, the system healing device is configured to operate in a learning phase to identify patterns and instructions that are used to resolve issues within the system infrastructure. In one embodiment, the system healing device is configured to use Application Programming Interfaces (APIs) to communicate with system components (e.g. software and hardware components) to determine operating characteristics of the system components. The system healing device is further configured to generate solution scripts and testing scripts using supervised learning based on monitoring the actions that are taken by a network operator to resolve issues within the system infrastructure. The solution scripts comprise instructions for resolving an issue. The testing scripts comprise instructions for testing a solution script within a test environment before deploying the solution script within the system infrastructure.
After the learning phase and establishing a repository of solution scripts and testing scripts, the system healing device is configured to operate in an autonomous self-healing phase to begin detecting and resolving issues within the system infrastructure. The system healing device is further configured to autonomously detect issues within the system infrastructure and to identify solution scripts and testing scripts for resolving the detected issue. The system healing device is configured to execute the testing script to generate a test environment for determining which of the identified solution scripts best resolves the detected issue. After identifying a solution script that resolves the detected issue, the system healing device is configured to deploy the selected solution script within the system infrastructure by executing the instructions or commands provided by the solution script. This process allows the system healing device to deploy solutions that have been tested and vetted before they are deploy within the system infrastructure. This process reduces the likelihood of introducing new errors and issues into the system infrastructure after deploying a solution to resolve an issue.
Certain embodiments of the present disclosure may include some, all, or none of these advantages. These advantages and other features will be more clearly understood from the following detailed description taken in conjunction with the accompanying drawings and claims.
BRIEF DESCRIPTION OF THE DRAWINGS
For a more complete understanding of this disclosure, reference is now made to the following brief description, taken in connection with the accompanying drawings and detailed description, wherein like reference numerals represent like parts.
FIG. 1 is a schematic diagram of a self-healing computing system;
FIG. 2 is a flowchart of an embodiment of a self-healing method; and
FIG. 3 is a schematic diagram of an embodiment of a device configured to implement self-healing.
DETAILED DESCRIPTION
System Overview
FIG. 1 is a schematic diagram of a self-healing computing system 100. In one embodiment, the system 100 comprises a system healing device 102 that is in signal communication with a computing system infrastructure 104 within a network 106.
The network 106 may be any suitable type of wireless and/or wired network including, but not limited to, all or a portion of the Internet, an Intranet, a private network, a public network, a peer-to-peer network, the public switched telephone network, a cellular network, a local area network (LAN), a metropolitan area network (MAN), a wide area network (WAN), and a satellite network. The network 106 may be configured to support any suitable type of communication protocol as would be appreciated by one of ordinary skill in the art.
In one embodiment, the system healing device 102 that is configured to monitor the health and operational activity of the computing system infrastructure 104 as the system infrastructure changes over time. Initially, the system healing device 102 is configured to operate in a learning phase to identify patterns and instructions that are used to resolve issues within the system infrastructure 104. The system healing device 102 is further configured to generate solution scripts 120 and testing scripts 122 using supervised learning based on monitoring the actions that are taken by a network operator to resolve issues within the system infrastructure. The solution scripts comprise instructions for resolving an issue. The testing scripts comprise instructions for testing a solution script within a test environment before deploying the solution script within the system infrastructure.
After the learning phase and establishing a repository of solution scripts 120 and testing scripts 122, the system healing device 102 is configured to operate in an autonomous self-healing phase to begin detecting and resolving issues within the system infrastructure 104. The system healing device 102 is further configured to autonomously detect issues within the system infrastructure and to identify solution scripts 120 and testing scripts 122 for resolving the detected issue. The system healing device 102 is configured to execute the testing script 122 to generate a test environment for determining which of the identified solution scripts 120 best resolves the detected issue. After identifying a solution script 120 that resolves the detected issue, the system healing device 102 is configured to deploy the selected solution script 120 within the system infrastructure 104 by executing the instructions or commands provided by the solution script 120.
Computing System Infrastructure
The computing system infrastructure 104 comprises a plurality of system components 108. System components 108 are hardware and software components that are configured to form a computing system. Examples of system components 108 include, but are not limited to, processors, databases, memories, database management tools, servers, clients, network devices, operating systems, applications, virtual machines, cloud services, development tools, or any other suitable type of hardware or software component.
System Healing Device
The system healing device 102 is generally configured to monitor the health of the system infrastructure 104 and to autonomously resolve issues within the system infrastructure 104. The system healing device 102 is in signal communication with the system components 108 using Application Programming Interfaces (APIs) 110 which allow the system healing device 102 to monitor the operational activity and health of the system components 108. For example, the system healing device 102 may use APIs to determine response times, processer utilization, memory utilization, bandwidth utilization, data throughput, available memory disk space, error codes, job failures, batch errors, or any other suitable type of information about a system component 108.
The system healing device 102 comprises a monitoring engine 112 and a memory 114. The system healing device 102 may be configured as shown or in any other suitable configuration. Additional information about the hardware configuration of the system healing device 102 is described in FIG. 3.
The memory 114 is configured to store system information 116, system maintenance logs 118, solution scripts 120, testing scripts 122, script maps 124, and/or any other suitable type of data. The system information 116 comprises information about the state or health of system components 108 and the overall system infrastructure 104. For example, the system information 116 may comprise information about response times, processer utilization, memory utilization, bandwidth utilization, data throughput, available memory disk space, error codes, job failures, batch errors, or any other suitable type of information about a system component 108 or the overall system infrastructure 104. The system maintenance logs 118 comprise a sequence of commands and instructions that are used to resolve the issue. The solution scripts 120 comprise executable commands for performing operations on one or more system components 108. The testing scripts 122 comprise executable commands for configuring a test environment to simulate one or more solution scripts 120. The script map 124 is configured to associate an issue with a solution script 120 and a testing script 122 that are associated with resolving the issue.
Monitoring Engine
The monitoring engine 112 is generally configured to monitor the health and the operational activity of the system infrastructure 104. Over time, the system components 108 within the system infrastructure 104 may be modified. For example, a network operator may add new system components 108 to the system infrastructure 104, to remove system components 108 from the system infrastructure 104, or to modify a configuration for a system component 108. The monitoring engine 112 is configured to monitor the health and operational activity of the system infrastructure 104 as the system infrastructure changes. This process allows the system healing device 102 to detect and to resolve issues as they arise within the system infrastructure 104. This process reduces the downtime due to diagnosing and resolving issues within the system infrastructure 104. In one embodiment, the monitoring engine 112 is configured to use APIs to communicate with system components 108 to determine operating characteristics of the system components 108. In some embodiments, the monitoring engine 112 may be configured to periodically capture system information 116 that describes the state or health of system components 108 and the overall system infrastructure 104. The monitoring engine 112 may also be configured to collect system information 116 that comprises data traffic that can be used in a testing environment for resolving issues within the system infrastructure 104.
Initially, the monitoring engine 112 operates in a learning phase to identify patterns and instructions that are used to resolve issues within the system infrastructure 104. The monitoring engine 112 is further configured to generate solution scripts 120 and testing scripts 122 using supervised learning based on monitoring the actions that are taken by a network operator to resolve issues within the system infrastructure 104. The solution scripts 120 comprise instructions for resolving an issue. The testing scripts 122 comprise instructions for testing a solution script 120 within a test environment before deploying the solution script 120 within the system infrastructure 104.
After the learning phase and establishing a repository of solution scripts 120 and testing scripts 122, the monitoring engine 112 is configured to operate in an autonomous self-healing phase. In the self-healing phase, the monitoring engine 112 begins autonomously detecting and resolving issues within the system infrastructure 104. The monitoring engine 112 is further configured to autonomously detect issues within the system infrastructure 104 and to identify solution scripts 120 and testing scripts 122 for resolving the detected issue. The monitoring engine 112 is configured to execute the testing script 122 to generate a test environment for determining which of the identified solution scripts 120 best resolves the detected issue. After identifying a solution script 120 that resolves the detected issue, the monitoring engine 112 is configured to deploy the selected solution script 120 within the system infrastructure 104 by executing the instructions or commands provided by the solution script 120. An example of the monitoring engine 112 in operation is described in FIG. 2.
Self-Healing Process
FIG. 2 is a flowchart of an embodiment of a self-healing method 200. The system healing device 102 may employ method 200 to monitor the health of the system infrastructure 104 and to autonomously resolve issues within the system infrastructure 104 over time using solution scripts 120 and testing scripts 122. The system healing device 102 may method 200 in conjunction with a DevOps toolchain to monitor the health of the system infrastructure 104 as the system components 108 within the system infrastructure 104 change over time. This process allows the system healing device 102 to detect and resolve issues as they arise within the system infrastructure 104. This process also reduces the downtime due to diagnosing and resolving issues within the system infrastructure 104.
In one embodiment, the system healing device 102 is configured to provide an interface (e.g. a graphical user interface) that allows a network operator to monitor the health of the system infrastructure 104 and to modify the system components 108 within the system infrastructure 104. A network operator may use the system healing device 102 to add new system components 108 to the system infrastructure 104, to remove system components 108 from the system infrastructure 104, or to modify a configuration for a system component 108. For instance, the system healing device 102 may allow a user to add a new system component 108, to remove a system component 108, or to modify settings of a system component 108. As an example, the system healing device 102 may receive a device configuration for a new hardware device from a user using a graphical user interface (e.g. a web portal). In this example, the system healing device 102 will use the received device configuration to configure the new system component 108 to integrate the new system component 108 with the system infrastructure 104. As another example, the system healing device 102 may receive a device configuration for reconfiguring a hardware device from a user. In this example, the system healing device 102 will use the received device configuration to reconfigure and modify the operation of the system component 108.
Learning Phase
At step 202, the monitoring engine 112 monitors the operational activity of system components 108 within the system infrastructure 104. The monitoring engine 112 is configured to periodically measure the performance of the system infrastructure 104. For example, the monitoring engine 112 may periodically collect data about the number of system components 108, the types of system components 108, the operating conditions of the system components 108, and/or data activity within the system infrastructure 104. In one embodiment, the monitoring engine 112 is configured to use API to determine the current operating conditions of the system components 108. For example, the monitoring engine 112 may periodically send API calls to request information about the current operating conditions of one or more system components 108.
At step 204, the monitoring engine 112 detects an issue within the system infrastructure 104 based on the current operating conditions of the system components 108. Here, the monitoring engine 112 detects an issue or error that is associated with one or more system components 108 within the system infrastructure 104. As an example, the monitoring engine 112 may detect an issue within the system infrastructure 104 based on a decrease in the performance of one or more system components 108 over a predetermined period of time. For instance, the monitoring engine 112 may compare differences in the operating characteristics of a system component 108 to predetermined threshold values to determine whether the system component 108 is experiencing a decrease in performance. For instance, the monitoring engine 112 may detect an increase processor usage, a decrease in available memory, a decrease in available bandwidth, a decrease in data throughput, or any other type of decrease of performance for a system component 108. As another example, the monitoring engine 112 may detect an issue within the system infrastructure 104 in response to receiving an error code from one or more of the system components 108. In some instances, the monitoring engine 112 may detect an issue after integrating the new system component 108 with the system infrastructure 104, removing a system component 108 from the system infrastructure 104, or modifying a system component 108 within the system infrastructure 104.
At step 206, the monitoring engine 112 identifies solution steps and test cases that are used to resolve the issue. As an example, the monitoring engine 112 may be configured to use a system maintenance log 118 to identify actions and test cases that a network operator uses to resolve the issue. In this case, the system maintenance log 118 may comprise a sequence of commands and instructions that are used to resolve the issue. For example, the system maintenance log 118 may comprise commands for clearing caches, clearing logs, clearing disk space, rebooting a system component 108, load balancing data among system components 108, data prioritization, modifying network settings, modifying settings of a system component 108, or any other suitable type of commands that are sent to a system component 108.
The system maintenance log 118 may also comprise settings or instructions for configuring a test environment that is used to simulate and test the solution that is used to resolve the issue. The test environment is configured to simulate the effect on the system infrastructure 104 in response to sending commands to one or more hardware components 108. For example, a network operator may build a test environment that uses synthetic data or previously stored data activity from the system infrastructure 104 for testing a solution to the detected issue. In this case, the monitoring engine 112 identifies the settings for the test environment and the commands that were sent to one or more system components 108 to simulate and test a solution for resolving the issue.
As another example, the monitoring engine 112 may be configured to monitor the operations and actions that are performed by a network operator to identify solution steps and test cases that are used to resolve the issue. For instance, the monitoring engine 112 may monitor the actions that are performed by the network operator while the network operator uses one or more DevOps tools to resolve the detected issue. In this case, the monitoring engine 112 may be configured to monitor the operational activity of the system infrastructure 104 as a network operator resolves the issue. For example, the monitoring engine 112 may use APIs to identify commands and instructions that are used by the network operator to resolve the detected issue. The monitoring engine 112 may also use APIs to identify settings or instructions for configuring and operating a test environment that is used to simulate and test a solution for resolving the issue.
At step 208, the monitoring engine 112 generates a solution script 120 based on the identified solutions steps. The solution script 120 comprises machine-executable commands for performing operations on one or more system components 108. Here, the monitoring engine 112 generates a solution script 120 by formatting or converting the identified commands that were sent to system components 108 in step 206 into machine-executable instructions or commands that can be executed by one or more system components 108. The solution script 120 captures the commands that are used, where the commands are sent, and the sequence that the commands are sent. After generating the solution script 120, the monitoring engine 112 may link the solution script 120 with an identifier 128 that uniquely identifies the solution script 120. The identifier 128 may be an alphanumeric identifier or any other suitable type of identifier.
At step 210, the monitoring engine 112 generates a testing script 122 based on the identified test cases. The testing script 122 comprises machine-executable commands for configuring a test environment and simulating one or more solution scripts 120. Here, the monitoring engine 112 generates a testing script 122 by converting the test environment settings and instructions that were identified in step 206 into machine-executable instructions or commands that can be used to configure a test environment and executable instructions or commands that can be executed by the test environment to simulate one or more solution scripts 120. After generating the testing script 122, the monitoring engine 112 may link the testing script 122 with an identifier 130 that uniquely identifies the testing script 122. The identifier 130 may be an alphanumeric identifier or any other suitable type of identifier.
At step 212, the monitoring engine 112 links the solution script 120 and the testing scripts 122 in the script map 124. Here, the monitoring engine 112 creates an entry in the script map 124 that links an issue with a solution script 120 and a testing script 122 that are associated with resolving the issue. The monitoring engine 112 may first associate the issue that was detected in step 204 with an issue identifier 126. The issue identifier 126 may be an error code or any other suitable type identifier that uniquely identifies a type of issue. For example, the issue identifier 126 may be uniquely associated with a particular system component 108 type and an error type. The monitoring engine 112 then links the issue identifier 126 with the identifier 128 for the solution script 120 and the identifier 130 for the testing script 122 that was generated in steps 208 and 210, respectively, in the script map 124.
Autonomous Self-Healing Phase
In one embodiment, the monitoring engine 112 may repeat steps 202-212 for a predetermined amount of time or a predetermined number of iterations which allows the monitoring engine 112 to build a repository of solution scripts 120 and testing scripts 122 that can be used to resolve future issues that are detected within the system infrastructure 104. Once the monitoring engine 112 has a suitable number of solution scripts 120 and testing scripts 122 entries in the script map 124, the monitoring engine 112 may begin using steps 214-224 to autonomously detect and resolve issues within the system infrastructure 104. In some embodiments, the monitoring engine 112 may continue to run steps 202-210 in parallel after enabling the ability to autonomously detect and resolve issues within the system infrastructure 104. This option allows the monitoring engine 112 to continue to build the repository of solution scripts 120 and testing scripts 122.
At step 214, the monitoring engine 112 determines whether another issue been detected within the system infrastructure 104. The monitoring engine 112 determines whether another issue has been detected using a process that is similar to the process described in steps 202 and 204. The monitoring engine 112 returns to step 202 in response to determining that another issue has not been detected within the system infrastructure 104. In this case, the monitoring engine 112 returns to step 202 to continue monitoring the operational activity of the system infrastructure 104. The monitoring engine 112 proceeds to step 216 in response to determining that an issue has been detected within the system infrastructure 104. For example, the monitoring engine 112 may detect an issue after integrating the new system component 108 with the system infrastructure 104, removing a system component 108 from the system infrastructure 104, or modifying a system component 108 within the system infrastructure 104.
At step 216, the monitoring engine 112 identifies one or more solution scripts 120 that correspond with the detected issue. When an issue is detected, the monitoring engine 112 will identify an issue identifier 126 that corresponds with the detected issue. The monitoring engine 112 uses the issue identifier 126 with the script map 124 to identify solution scripts 120 that are associated with the issue identifier 126. For example, the monitoring engine 112 may use the issue identifier 126 as a search token to identify entries in the script map 124 that are associated with the issue identifier 126.
At step 218, the monitoring engine 112 identifies a testing script 122 that corresponds with the identified solution scripts 120. Here, the monitoring engine 112 identifies a testing script 122 that is linked with the identified solution scripts 120 in the script map 124.
At step 220, the monitoring engine 112 executes the testing script 122 with the identified solution scripts 120. The monitoring engine 112 executes the identified testing script 122 by using the instructions or commands from the testing script 122 to configure a test environment and to simulate the identified solution scripts 120. In one embodiment, the monitoring engine 112 may also use previously collected system information 116 with the test environment to simulate the identified solution scripts 120. For example, the monitoring engine 112 may use information about the configurations of the system components 108 in the system infrastructure 104 to configure the test environment. As another example, the monitoring engine 112 may collect data traffic samples from the system infrastructure 104 while monitoring the operational activity within the system infrastructure 104. In this case, the monitoring engine 112 may use the collected data traffic with the test environment to simulate the identified solution scripts 120.
At step 222, the monitoring engine 112 selects a solution script 120 based on the results of executing the testing script 122. When there are more than one solution scripts 120 that are associated with the issue identifier 126, the monitoring engine 112 compares the results from executing the testing scripts 122 to identify which solution script 120 resolves the detected issue and/or provides the greatest performance increase. For example, the issue identifier 126 may be associated with two solution scripts 120. In this example, the monitoring engine 112 may simulate both solution scripts 120 and select the solution script 120 that resolves the detected issue and provides the most performance improvements.
At step 224, the monitoring engine 112 executes the selected solution script 120 within the system infrastructure 104. Here, the monitoring engine 112 executes the command and instructions provided by the selected solution script 120 to resolve the detected issue. For example, the monitoring engine 112 may send commands to one or more system components 108 to resolve the detected issue based on the selected solution script 120. As another example, the monitoring engine 112 may reconfigure one or more system components 108 based on instructions provided by the selected solution script 120. After the monitoring engine 112 executes the selected solution script 120, the monitoring engine 112 may terminate method 200 or may return to step 202 to continue monitoring for other issues within the system infrastructure 104.
System Healing Device Hardware Configuration
FIG. 3 is a schematic diagram of an embodiment of a device (e.g. system healing device 102) configured to monitor and resolve issues associated with a system infrastructure 104. The system healing device 102 comprises a processor 302, a memory 114, and a network interface 304. The system healing device 102 may be configured as shown or in any other suitable configuration.
The processor 302 comprises one or more processors operably coupled to the memory 114. The processor 302 is any electronic circuitry including, but not limited to, state machines, one or more central processing unit (CPU) chips, logic units, cores (e.g. a multi-core processor), field-programmable gate array (FPGAs), application-specific integrated circuits (ASICs), or digital signal processors (DSPs). The processor 302 may be a programmable logic device, a microcontroller, a microprocessor, or any suitable combination of the preceding. The processor 302 is communicatively coupled to and in signal communication with the memory 114. The one or more processors are configured to process data and may be implemented in hardware or software. For example, the processor 302 may be 8-bit, 16-bit, 32-bit, 64-bit, or of any other suitable architecture. The processor 302 may include an arithmetic logic unit (ALU) for performing arithmetic and logic operations, processor registers that supply operands to the ALU and store the results of ALU operations, and a control unit that fetches instructions from memory and executes them by directing the coordinated operations of the ALU, registers and other components.
The one or more processors are configured to implement various instructions. For example, the one or more processors are configured to execute monitoring instructions 306 to implement a monitoring engine 112. In this way, processor 302 may be a special-purpose computer designed to implement the functions disclosed herein. In an embodiment, the monitoring engine 112 is implemented using logic units, FPGAs, ASICs, DSPs, or any other suitable hardware. The monitoring engine 112 is configured to operate as described in FIGS. 1 and 2. For example, the monitoring engine 112 may be configured to perform the steps of method 200 as described in FIG. 2.
The memory 114 comprises one or more disks, tape drives, or solid-state drives, and may be used as an over-flow data storage device, to store programs when such programs are selected for execution, and to store instructions and data that are read during program execution. The memory 114 may be volatile or non-volatile and may comprise a read-only memory (ROM), random-access memory (RAM), ternary content-addressable memory (TCAM), dynamic random-access memory (DRAM), and static random-access memory (SRAM).
The memory 114 is operable to store monitoring instructions 306, system information 116, system maintenance logs 118, solution scripts 120, testing scripts 122, script maps 124, and/or any other data or instructions. The monitoring instructions 306 may comprise any suitable set of instructions, logic, rules, or code operable to execute the monitoring engine 112. The system information 116, system maintenance logs 118, solution scripts 120, testing scripts 122, and script maps 124 are configured similar to system information 116, system maintenance logs 118, solution scripts 120, testing scripts 122, and script maps 124 described in FIGS. 1 and 2, respectively.
The network interface 304 is configured to enable wired and/or wireless communications. The network interface 304 is configured to communicate data between the system healing device 102 and other devices (e.g. system components 108), systems, or domain. For example, the network interface 304 may comprise a WIFI interface, a LAN interface, a WAN interface, a modem, a switch, or a router. The processor 302 is configured to send and receive data using the network interface 304. The network interface 304 may be configured to use any suitable type of communication protocol as would be appreciated by one of ordinary skill in the art.
While several embodiments have been provided in the present disclosure, it should be understood that the disclosed systems and methods might be embodied in many other specific forms without departing from the spirit or scope of the present disclosure. The present examples are to be considered as illustrative and not restrictive, and the intention is not to be limited to the details given herein. For example, the various elements or components may be combined or integrated with another system or certain features may be omitted, or not implemented.
In addition, techniques, systems, subsystems, and methods described and illustrated in the various embodiments as discrete or separate may be combined or integrated with other systems, modules, techniques, or methods without departing from the scope of the present disclosure. Other items shown or discussed as coupled or directly coupled or communicating with each other may be indirectly coupled or communicating through some interface, device, or intermediate component whether electrically, mechanically, or otherwise. Other examples of changes, substitutions, and alterations are ascertainable by one skilled in the art and could be made without departing from the spirit and scope disclosed herein.
To aid the Patent Office, and any readers of any patent issued on this application in interpreting the claims appended hereto, applicants note that they do not intend any of the appended claims to invoke 35 U.S.C. § 112(f) as it exists on the date of filing hereof unless the words “means for” or “step for” are explicitly used in the particular claim.

Claims (20)

The invention claimed is:
1. A self-healing computing system, comprising:
a plurality of hardware components configured to form a computing system infrastructure; and
a system healing device in signal communication with the plurality of hardware components, comprising:
a memory operable to store:
solution scripts comprising machine executable commands for performing operations on one or more hardware components;
testing scripts comprising machine executable commands for configuring a test environment to simulate one or more solution scripts; and
a script map configured to associate an issue with a solution script and a testing script that are associated with resolving the issue, wherein the issue is associated with one or more hardware components within the computing system infrastructure; and
a processor operably coupled to the memory, configured to:
periodically send a first Application Programming Interface (API) call to request information about the operating conditions for the plurality hardware components;
detect a first issue that is associated with a hardware component from the plurality hardware components based on the operating conditions for the plurality of hardware components;
send a second API call to request information identifying one or more commands that are used by an operator to resolve the first issue;
generate a solution script based on the identified one or more commands, wherein generating the solution script comprises converting the identified one or more commands into machine executable commands for performing operations on the hardware component;
send a third API call to request information identifying settings for a test environment configuration that is used to simulate sending the one or more commands to the hardware component to resolve the first issue, wherein the test environment simulates the effect on the computing system infrastructure in response to sending the one or more commands to the hardware component;
generate a testing script based on the identified test environment configuration, wherein generating the testing script comprises converting the identified settings for the test environment configuration into executable commands for configuring a test environment and simulating the solution script; and
store an association between the first issue, the solution script, and the testing script in the script map.
2. The system of claim 1, wherein the system healing device is further configured to:
detect a second issue based on the operating conditions for the plurality of hardware components;
identify one or more solution scripts in the script map that correspond with the detected second issue;
identify a testing script in the script map that corresponds with the identified one or more solution scripts;
configure a test environment based on the identified testing script;
execute commands from the identified testing script to obtain simulation results for the one or more solution scripts;
select a solution script from among the one or more solution scripts that correspond with the detected second issue based on the simulation results; and
execute the selected solution script, wherein executing the selected solution script comprises sending commands to performing operations on one or more hardware components of the computing system infrastructure.
3. The system of claim 2, wherein:
system healing device is further configured to:
receive a device configuration for a new hardware component;
configure the new hardware component using the receive device configuration to integrate the new hardware component with the computing system infrastructure; and
detecting the second issue occurs after integrating the new hardware component with the computing system infrastructure.
4. The system of claim 2, wherein:
system healing device is further configured to:
receive a device configuration for a hardware component within the computing system infrastructure;
configure the hardware component using the receive device configuration to modify an operation of the hardware component; and
detecting the second issue occurs after modifying the operation of the hardware component.
5. The system of claim 1, wherein identifying the one or more commands that are sent to the hardware component to resolve the first issue comprises identifying commands associated with the hardware component in a system maintenance log.
6. The system of claim 1, wherein detecting the first issue comprises receiving an error code from the hardware component that is associated with the first error.
7. The system of claim 1, wherein:
system healing device is further configured to periodically measure a performance of the hardware component that is associated with the first issue; and
detecting the first issue is based on a decrease in the performance of the hardware component that is associated with the first issue over a predetermined period of time.
8. A self-healing method for a computing system infrastructure, comprising:
periodically send a first Application Programming Interface (API) call to request information about the operating conditions for a plurality hardware components within a computing system infrastructure;
detecting a first issue that is associated with a hardware component from the plurality hardware components based on the operating conditions for the plurality of hardware components;
sending a second API call to request information identifying one or more commands that are used by an operator to resolve the first issue;
generating a solution script based on the identified one or more commands, wherein generating the solution script comprises converting the identified one or more commands into machine executable commands for performing operations on the hardware component;
sending a third API call to request information identifying settings for a test environment configuration that is used to simulate sending the one or more commands to the hardware component to resolve the first issue, wherein the test environment simulates the effect on the computing system infrastructure in response to sending the one or more commands to the hardware component;
generating a testing script based on the identified test environment configuration, wherein generating the testing script comprises converting the identified settings for the test environment configuration into executable commands for configuring a test environment and simulating the solution script; and
storing an association between the first issue, the solution script, and the testing script in a script map.
9. The method of claim 8, further comprising:
detecting a second issue based on the operating conditions for the plurality of hardware components;
identifying one or more solution scripts in the script map that correspond with the detected second issue;
identifying a testing script in the script map that corresponds with the identified one or more solution scripts;
configuring a test environment based on the identified testing script;
executing commands from the identified testing script to obtain simulation results for the one or more solution scripts;
selecting a solution script from among the one or more solution scripts that correspond with the detected second issue based on the simulation results; and
executing the selected solution script, wherein executing the selected solution script comprises sending commands to performing operations on one or more hardware components of the computing system infrastructure.
10. The method of claim 9, further comprising:
receiving a device configuration for a new hardware component; and
configuring the new hardware component using the receive device configuration to integrate the new hardware component with the computing system infrastructure; and
wherein detecting the second issue occurs after integrating the new hardware component with the computing system infrastructure.
11. The method of claim 9, further comprising:
receiving a device configuration for a hardware component within the computing system infrastructure;
configuring the hardware component using the receive device configuration to modify an operation of the hardware component; and
wherein detecting the second issue occurs after modifying the operation of the hardware component.
12. The method of claim 8, wherein identifying the one or more commands that are sent to the hardware component to resolve the first issue comprises identifying commands associated with the hardware component in a system maintenance log.
13. The method of claim 8, wherein detecting the first issue comprises receiving an error code from the hardware component that is associated with the first error.
14. The method of claim 8, further comprising:
periodically measuring a performance of the hardware component that is associated with the first issue; and
wherein detecting the first issue is based on a decrease in the performance of the hardware component that is associated with the first issue over a predetermined period of time.
15. A computer program comprising executable instructions stored in a non-transitory computer readable medium that when executed by a processor causes the processor to:
periodically send a first Application Programming Interface (API) call to request information about the operating conditions for a plurality hardware components within a computing system infrastructure;
detect a first issue that is associated with a hardware component from the plurality hardware components based on the operating conditions for the plurality of hardware components;
send a second API call to request information identifying one or more commands that are used by an operator to resolve the first issue;
generate a solution script based on the identified one or more commands, wherein generating the solution script comprises converting the identified one or more commands into machine executable commands for performing operations on the hardware component;
send a third API call to request information identifying settings for a test environment configuration that is used to simulate sending the one or more commands to the hardware component to resolve the first issue, wherein the test environment simulates the effect on the computing system infrastructure in response to sending the one or more commands to the hardware component;
generate a testing script based on the identified test environment configuration, wherein generating the testing script comprises converting the identified settings for the test environment configuration into executable commands for configuring a test environment and simulating the generated solution script; and
store an association between the first issue, the solution script, and the testing script in a script map.
16. The computer program of claim 15, further comprising instructions that when executed by the processor causes the processor to:
detect a second issue based on the operating conditions for the plurality of hardware components;
identify one or more solution scripts in the script map that correspond with the detected second issue;
identify a testing script in the script map that corresponds with the identified one or more solution scripts;
configure a test environment based on the identified testing script;
execute commands from the identified testing script to obtain simulation results for the one or more solution scripts;
select a solution script from among the one or more solution scripts that correspond with the detected second issue based on the simulation results; and
execute the selected solution script, wherein executing the selected solution script comprises sending commands to performing operations on one or more hardware components the computing system infrastructure.
17. The computer program of claim 16, further comprising instructions that when executed by the processor causes the processor to:
receive a device configuration for a hardware component within the computing system infrastructure;
configure the hardware component using the receive device configuration to modify an operation of the hardware component; and
wherein detecting the second issue occurs after modifying the operation of the hardware component.
18. The computer program of claim 15, wherein identifying the one or more commands that are sent to the hardware component to resolve the first issue comprises identifying commands associated with the hardware component in a system maintenance log.
19. The computer program of claim 15, wherein detecting the first issue comprises receiving an error code from the hardware component that is associated with the first error.
20. The computer program of claim 15, further comprising instructions that when executed by the processor causes the processor to:
periodically measure a performance of the hardware component that is associated with the first issue; and
wherein detecting the first issue is based on a decrease in the performance of the hardware component that is associated with the first issue over a predetermined period of time.
US16/905,592 2020-06-18 2020-06-18 Self-healing computing device Active 2040-07-08 US11288153B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US16/905,592 US11288153B2 (en) 2020-06-18 2020-06-18 Self-healing computing device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US16/905,592 US11288153B2 (en) 2020-06-18 2020-06-18 Self-healing computing device

Publications (2)

Publication Number Publication Date
US20210397527A1 US20210397527A1 (en) 2021-12-23
US11288153B2 true US11288153B2 (en) 2022-03-29

Family

ID=79023565

Family Applications (1)

Application Number Title Priority Date Filing Date
US16/905,592 Active 2040-07-08 US11288153B2 (en) 2020-06-18 2020-06-18 Self-healing computing device

Country Status (1)

Country Link
US (1) US11288153B2 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102392469B1 (en) * 2021-05-04 2022-04-29 에스디티 주식회사 Method for replicating a project server for trouble-shooting and a cloud development platform system using the same

Citations (50)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5991529A (en) 1997-05-16 1999-11-23 Sony Corporation Testing of hardware by using a hardware system environment that mimics a virtual system environment
US20020188890A1 (en) 2001-06-04 2002-12-12 Shupps Eric A. System and method for testing an application
US20030041288A1 (en) 2001-08-10 2003-02-27 Adam Kolawa Method and system for dynamically invoking and/or checking conditions of a computer test program
US20030046613A1 (en) 2001-09-05 2003-03-06 Eitan Farchi Method and system for integrating test coverage measurements with model based test generation
US6651240B1 (en) 1999-02-17 2003-11-18 Fujitsu Limited Object-oriented software development support apparatus and development support method
US20040078684A1 (en) 2000-10-27 2004-04-22 Friedman George E. Enterprise test system having run time test object generation
US6876314B1 (en) 2004-02-18 2005-04-05 Robocoder Corporation Self-generating automatic code generator
US20050229043A1 (en) 2004-03-29 2005-10-13 Nasuti William J System and method for software testing
US20060010429A1 (en) 2004-07-08 2006-01-12 Denso Corporation Method, system and program for model based software development with test case generation and evaluation
US7062755B2 (en) 2002-10-16 2006-06-13 Hewlett-Packard Development Company, L.P. Recovering from compilation errors in a dynamic compilation environment
US20060161508A1 (en) 2005-01-20 2006-07-20 Duffie Paul K System verification test using a behavior model
US20060230320A1 (en) 2005-04-07 2006-10-12 Salvador Roman S System and method for unit test generation
US20060248405A1 (en) 2005-03-21 2006-11-02 Ponczak Joseph M Method for automating unit test development
US7219279B2 (en) 2005-01-18 2007-05-15 International Business Machines Corporation Software testing
US7313564B2 (en) 2002-12-03 2007-12-25 Symbioware, Inc. Web-interactive software testing management method and computer system including an integrated test case authoring tool
US20080086348A1 (en) 2006-10-09 2008-04-10 Rajagopa Rao Fast business process test case composition
US20080115116A1 (en) 2006-11-15 2008-05-15 Timothy Marc Francis Method and apparatus for dynamically binding service component implementations for specific unit test cases
US20080270841A1 (en) 2007-04-11 2008-10-30 Quilter Patrick J Test case manager
US20090172647A1 (en) 2007-12-31 2009-07-02 Tarun Telang System and method for model driven unit testing environment
US20090271170A1 (en) * 2008-04-25 2009-10-29 Microsoft Corporation Failure simulation and availability report on same
US7681180B2 (en) 2007-06-06 2010-03-16 Microsoft Corporation Parameterized test driven development
US20100146489A1 (en) 2008-12-10 2010-06-10 International Business Machines Corporation Automatic collection of diagnostic traces in an automation framework
US7840944B2 (en) 2005-06-30 2010-11-23 Sap Ag Analytical regression testing on a software build
US7913229B2 (en) 2006-09-18 2011-03-22 Sas Institute Inc. Computer-implemented system for generating automated tests from a web application
US20110123973A1 (en) 2008-06-06 2011-05-26 Sapient Corporation Systems and methods for visual test authoring and automation
US20110131451A1 (en) 2009-11-30 2011-06-02 Ricardo Bosch Methods and system for testing an enterprise system
US20110289488A1 (en) 2010-05-24 2011-11-24 Fujitsu Limited Generating Test Sets Using Intelligent Variable Selection and Test Set Compaction
US8078924B2 (en) 2005-09-16 2011-12-13 Lsi Corporation Method and system for generating a global test plan and identifying test requirements in a storage system environment
US20110307860A1 (en) 2010-06-09 2011-12-15 Hong Seong Park Simulation-based interface testing automation system and method for robot software components
US8087001B2 (en) 2007-06-29 2011-12-27 Sas Institute Inc. Computer-implemented systems and methods for software application testing
US20120084754A1 (en) 2010-09-30 2012-04-05 Oracle International Corporation Streamlining Unit Testing Through Hot Code Swapping
US8239831B2 (en) 2006-10-11 2012-08-07 Micro Focus (Ip) Limited Visual interface for automated software testing
US8245194B2 (en) 2006-10-18 2012-08-14 International Business Machines Corporation Automatically generating unit test cases which can reproduce runtime problems
US20130042222A1 (en) 2011-08-08 2013-02-14 Computer Associates Think, Inc. Automating functionality test cases
US20130055195A1 (en) 2011-08-30 2013-02-28 Uniquesoft, Llc System and method for iterative generating and testing of application code
US8549483B1 (en) 2009-01-22 2013-10-01 Intuit Inc. Engine for scalable software testing
US20130326471A1 (en) 2012-05-31 2013-12-05 Dell Products, Lp System for Providing Regression Testing of an Integrated Process Development System and Method Therefor
US20140013298A1 (en) 2012-07-06 2014-01-09 International Business Machines Corporation Auto generation and linkage of source code to test cases
US20140282419A1 (en) 2013-03-14 2014-09-18 Fujitsu Limited Software verification
US8881109B1 (en) 2009-01-22 2014-11-04 Intuit Inc. Runtime documentation of software testing
US8938647B2 (en) 2012-06-29 2015-01-20 Sap Se System and method for capturing and using web page views in a test environment
US20150113331A1 (en) 2013-10-17 2015-04-23 Wipro Limited Systems and methods for improved software testing project execution
US20150169433A1 (en) 2013-12-12 2015-06-18 Rafi Bryl Automated Generation of Semantically Correct Test Data for Application Development
US20150309813A1 (en) 2012-08-31 2015-10-29 iAppSecure Solutions Pvt. Ltd A System for analyzing applications in order to find security and quality issues
US20150324274A1 (en) 2014-05-09 2015-11-12 Wipro Limited System and method for creating universal test script for testing variants of software application
US9268669B2 (en) 2012-01-17 2016-02-23 Microsoft Technology Licensing, Llc Application quality testing time predictions
US20170060728A1 (en) 2015-08-24 2017-03-02 Bank Of America Corporation Program Lifecycle Testing
US20170083400A1 (en) * 2013-12-23 2017-03-23 Jpmorgan Chase Bank, N.A. Automated Incident Resolution System and Method
US20170337116A1 (en) 2016-05-18 2017-11-23 Google Inc. Application testing on different device types
US20180217921A1 (en) 2017-02-02 2018-08-02 Cognizant Technology Solutions India Pvt. Ltd. System and method for generating and executing automated test cases

Patent Citations (51)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5991529A (en) 1997-05-16 1999-11-23 Sony Corporation Testing of hardware by using a hardware system environment that mimics a virtual system environment
US6651240B1 (en) 1999-02-17 2003-11-18 Fujitsu Limited Object-oriented software development support apparatus and development support method
US20040078684A1 (en) 2000-10-27 2004-04-22 Friedman George E. Enterprise test system having run time test object generation
US20020188890A1 (en) 2001-06-04 2002-12-12 Shupps Eric A. System and method for testing an application
US20030041288A1 (en) 2001-08-10 2003-02-27 Adam Kolawa Method and system for dynamically invoking and/or checking conditions of a computer test program
US20030046613A1 (en) 2001-09-05 2003-03-06 Eitan Farchi Method and system for integrating test coverage measurements with model based test generation
US7062755B2 (en) 2002-10-16 2006-06-13 Hewlett-Packard Development Company, L.P. Recovering from compilation errors in a dynamic compilation environment
US7313564B2 (en) 2002-12-03 2007-12-25 Symbioware, Inc. Web-interactive software testing management method and computer system including an integrated test case authoring tool
US6876314B1 (en) 2004-02-18 2005-04-05 Robocoder Corporation Self-generating automatic code generator
US20050229043A1 (en) 2004-03-29 2005-10-13 Nasuti William J System and method for software testing
US20060010429A1 (en) 2004-07-08 2006-01-12 Denso Corporation Method, system and program for model based software development with test case generation and evaluation
US7219279B2 (en) 2005-01-18 2007-05-15 International Business Machines Corporation Software testing
US20060161508A1 (en) 2005-01-20 2006-07-20 Duffie Paul K System verification test using a behavior model
US20060248405A1 (en) 2005-03-21 2006-11-02 Ponczak Joseph M Method for automating unit test development
US20060230320A1 (en) 2005-04-07 2006-10-12 Salvador Roman S System and method for unit test generation
US7840944B2 (en) 2005-06-30 2010-11-23 Sap Ag Analytical regression testing on a software build
US8078924B2 (en) 2005-09-16 2011-12-13 Lsi Corporation Method and system for generating a global test plan and identifying test requirements in a storage system environment
US7913229B2 (en) 2006-09-18 2011-03-22 Sas Institute Inc. Computer-implemented system for generating automated tests from a web application
US20080086348A1 (en) 2006-10-09 2008-04-10 Rajagopa Rao Fast business process test case composition
US8239831B2 (en) 2006-10-11 2012-08-07 Micro Focus (Ip) Limited Visual interface for automated software testing
US8245194B2 (en) 2006-10-18 2012-08-14 International Business Machines Corporation Automatically generating unit test cases which can reproduce runtime problems
US20080115116A1 (en) 2006-11-15 2008-05-15 Timothy Marc Francis Method and apparatus for dynamically binding service component implementations for specific unit test cases
US20080270841A1 (en) 2007-04-11 2008-10-30 Quilter Patrick J Test case manager
US7681180B2 (en) 2007-06-06 2010-03-16 Microsoft Corporation Parameterized test driven development
US8087001B2 (en) 2007-06-29 2011-12-27 Sas Institute Inc. Computer-implemented systems and methods for software application testing
US20090172647A1 (en) 2007-12-31 2009-07-02 Tarun Telang System and method for model driven unit testing environment
US20090271170A1 (en) * 2008-04-25 2009-10-29 Microsoft Corporation Failure simulation and availability report on same
US20110123973A1 (en) 2008-06-06 2011-05-26 Sapient Corporation Systems and methods for visual test authoring and automation
US20100146489A1 (en) 2008-12-10 2010-06-10 International Business Machines Corporation Automatic collection of diagnostic traces in an automation framework
US8881109B1 (en) 2009-01-22 2014-11-04 Intuit Inc. Runtime documentation of software testing
US8549483B1 (en) 2009-01-22 2013-10-01 Intuit Inc. Engine for scalable software testing
US20110131451A1 (en) 2009-11-30 2011-06-02 Ricardo Bosch Methods and system for testing an enterprise system
US20110289488A1 (en) 2010-05-24 2011-11-24 Fujitsu Limited Generating Test Sets Using Intelligent Variable Selection and Test Set Compaction
US20110307860A1 (en) 2010-06-09 2011-12-15 Hong Seong Park Simulation-based interface testing automation system and method for robot software components
US20120084754A1 (en) 2010-09-30 2012-04-05 Oracle International Corporation Streamlining Unit Testing Through Hot Code Swapping
US20130042222A1 (en) 2011-08-08 2013-02-14 Computer Associates Think, Inc. Automating functionality test cases
US20130055195A1 (en) 2011-08-30 2013-02-28 Uniquesoft, Llc System and method for iterative generating and testing of application code
US9268669B2 (en) 2012-01-17 2016-02-23 Microsoft Technology Licensing, Llc Application quality testing time predictions
US20130326471A1 (en) 2012-05-31 2013-12-05 Dell Products, Lp System for Providing Regression Testing of an Integrated Process Development System and Method Therefor
US8938647B2 (en) 2012-06-29 2015-01-20 Sap Se System and method for capturing and using web page views in a test environment
US20140013298A1 (en) 2012-07-06 2014-01-09 International Business Machines Corporation Auto generation and linkage of source code to test cases
US9632754B2 (en) 2012-07-06 2017-04-25 International Business Machines Corporation Auto generation and linkage of source code to test cases
US20150309813A1 (en) 2012-08-31 2015-10-29 iAppSecure Solutions Pvt. Ltd A System for analyzing applications in order to find security and quality issues
US20140282419A1 (en) 2013-03-14 2014-09-18 Fujitsu Limited Software verification
US20150113331A1 (en) 2013-10-17 2015-04-23 Wipro Limited Systems and methods for improved software testing project execution
US20150169433A1 (en) 2013-12-12 2015-06-18 Rafi Bryl Automated Generation of Semantically Correct Test Data for Application Development
US20170083400A1 (en) * 2013-12-23 2017-03-23 Jpmorgan Chase Bank, N.A. Automated Incident Resolution System and Method
US20150324274A1 (en) 2014-05-09 2015-11-12 Wipro Limited System and method for creating universal test script for testing variants of software application
US20170060728A1 (en) 2015-08-24 2017-03-02 Bank Of America Corporation Program Lifecycle Testing
US20170337116A1 (en) 2016-05-18 2017-11-23 Google Inc. Application testing on different device types
US20180217921A1 (en) 2017-02-02 2018-08-02 Cognizant Technology Solutions India Pvt. Ltd. System and method for generating and executing automated test cases

Also Published As

Publication number Publication date
US20210397527A1 (en) 2021-12-23

Similar Documents

Publication Publication Date Title
EP3425512B1 (en) Software analytics platform
EP3182280B1 (en) Machine for development of analytical models
US9037915B2 (en) Analysis of tests of software programs based on classification of failed test cases
US10185650B1 (en) Testing service with control testing
US9836388B1 (en) Software testing environment that includes a duplicating proxy service
WO2019153458A1 (en) Application update testing method, device, terminal device, and storage medium
US20180052762A1 (en) Build failure management in continuous integration environments for distributed systems
CN111108481B (en) Fault analysis method and related equipment
US11106509B2 (en) Cluster tuner
CN113704046A (en) Fault alarm processing method and device, equipment and storage medium
US11663113B2 (en) Real time fault localization using combinatorial test design techniques and test case priority selection
CN110990289B (en) Method and device for automatically submitting bug, electronic equipment and storage medium
US11288153B2 (en) Self-healing computing device
US20210191842A1 (en) Software bug reproduction
US20230273610A1 (en) Automatic system anomaly detection
US11438251B1 (en) System and method for automatic self-resolution of an exception error in a distributed network
US11556460B2 (en) Test case generation for software development using machine learning
CN115604086A (en) Monitoring alarm fault self-healing method, device, equipment, medium and program product
US11036624B2 (en) Self healing software utilizing regression test fingerprints
US12007832B2 (en) Restoring a system by load switching to an alternative cloud instance and self healing
US11036613B1 (en) Regression analysis for software development and management using machine learning
US11570260B1 (en) Data collection configuration file generation
US9438607B2 (en) Information processing apparatus and verification control method
US12028203B2 (en) Self-resolution of exception errors in a distributed network
US20230273870A1 (en) Developer test environment with containerization of tightly coupled systems

Legal Events

Date Code Title Description
AS Assignment

Owner name: BANK OF AMERICA CORPORATION, NORTH CAROLINA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:PURUSHOTHAMAN, SASIDHAR;SETHI, ANKUSH;TRICHY KARUPPUSAMY, GOWTHAMAN;AND OTHERS;REEL/FRAME:052981/0895

Effective date: 20200618

FEPP Fee payment procedure

Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

STPP Information on status: patent application and granting procedure in general

Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER

STPP Information on status: patent application and granting procedure in general

Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS

STPP Information on status: patent application and granting procedure in general

Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED

STCF Information on status: patent grant

Free format text: PATENTED CASE