CN106776235B - Monitoring system and method for operation and maintenance machine room and search engine - Google Patents

Monitoring system and method for operation and maintenance machine room and search engine Download PDF

Info

Publication number
CN106776235B
CN106776235B CN201710067387.6A CN201710067387A CN106776235B CN 106776235 B CN106776235 B CN 106776235B CN 201710067387 A CN201710067387 A CN 201710067387A CN 106776235 B CN106776235 B CN 106776235B
Authority
CN
China
Prior art keywords
node
solution
state
machine room
feature vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710067387.6A
Other languages
Chinese (zh)
Other versions
CN106776235A (en
Inventor
陈超
陈健
黄新平
范瑾
乔楠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Parallel Technology Co Ltd
Original Assignee
Beijing Parallel Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Parallel Technology Co Ltd filed Critical Beijing Parallel Technology Co Ltd
Priority to CN201710067387.6A priority Critical patent/CN106776235B/en
Publication of CN106776235A publication Critical patent/CN106776235A/en
Application granted granted Critical
Publication of CN106776235B publication Critical patent/CN106776235B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a search engine arranged in an operation and maintenance machine room monitoring system, which comprises: an input/output interface adapted to receive a search request and to send an operation command to each node; the data storage device is suitable for storing node state information and operation and maintenance log records, wherein each node state information comprises a node identifier and a state feature vector, and each operation and maintenance log record comprises a node identifier, a solution, a first state feature vector and a second state feature vector; the searching module is suitable for searching the state characteristic vector of the node and generating a list related to the solution according to the state characteristic vector; the detection module is suitable for sequentially selecting a solution from the solution list and executing the detection operation defined in the solution; the verification module is suitable for comparing the detection result with a second state characteristic vector corresponding to the solution; if the comparison is consistent, the verification is considered to be passed, and if the comparison is not consistent, the detection module continues to detect until the verification is passed.

Description

Monitoring system and method for operation and maintenance machine room and search engine
Technical Field
The invention relates to the technical field of operation and maintenance machine room monitoring, in particular to a monitoring system and method of an operation and maintenance machine room and a search engine.
Background
Environmental equipment (such as power supply and distribution, air conditioners, fire protection, temperature and humidity, water leakage equipment and the like) of a modern information machine room provides a reliable operation environment for a computer system, and meanwhile, the operation state of large-scale computing equipment in the operation and maintenance machine room is also important for the normal operation of the operation and maintenance machine room. Therefore, monitoring of the operation and maintenance machine room is one of the important tasks for machine room maintenance. As described above, many factors to be monitored exist in the operation and maintenance machine room, such as the air transportation state, the power supply and distribution state, the air conditioner operation condition, fire protection, temperature, humidity, water leakage, etc. of the computing device, once the power failure, the over-high ambient temperature, the abnormal air conditioner operation, the fire, the water leakage, etc. occur in the operation and maintenance machine room and cannot be handled in time, the operation of the computing device and even the computer network system are threatened, and even serious consequences and losses are caused.
The monitoring equipment of the traditional operation and maintenance machine room has strong dependence on workers and low working efficiency, once a fault occurs, alarm information can be sent to a monitoring center, the workers start troubleshooting and maintenance after receiving the alarm information, and the troubleshooting time and the working efficiency are seriously affected.
Therefore, a monitoring scheme capable of saving operation and maintenance labor cost and time cost is needed.
Disclosure of Invention
To this end, the present invention provides a monitoring system, method and search engine for an operation and maintenance machine room in an attempt to solve or at least alleviate at least one of the problems identified above.
According to an aspect of the present invention, there is provided a search engine arranged in a monitoring system of an operation and maintenance machine room, wherein the operation and maintenance machine room comprises a plurality of nodes, and the search engine comprises: the input/output interface is suitable for receiving a search request from the client and sending an operation command to each node so that the node executes corresponding operation; the data storage device is suitable for storing one or more node state information and one or more operation and maintenance log records, wherein each node state information comprises a node identification and a state feature vector for representing the node state of the node, and each operation and maintenance log record comprises the node identification, a solution, a first state feature vector of the node before the solution is executed and a second state feature vector of the node after the solution is executed; the searching module is suitable for extracting the node identification in the searching request, searching the state characteristic vector corresponding to the node identification from one or more pieces of node state information, searching the operation and maintenance log records with the first state characteristic vector similar to the state characteristic vector from one or more operation and maintenance log records, and generating a solution list according to the solution in the searched operation and maintenance log records; the detection module is suitable for sequentially selecting a solution from the solution list, executing detection operation defined in the solution and acquiring current state information of the node or the machine room as a detection result; the verification module is suitable for comparing the detection result with a second state characteristic vector corresponding to the solution, if the comparison is consistent, the verification is considered to pass, and if the comparison is inconsistent, the verification is considered to fail; the input/output interface is also suitable for sending the execution operation defined in the solution to the corresponding node when the verification is passed, so that the node executes the execution operation of the solution; and the detection module is also suitable for sequentially selecting the next solution from the solution list and detecting the next solution when the verification is not passed until the verification is passed.
Optionally, in a search engine according to the present invention, the search module includes: an extraction subunit, adapted to extract the node identifier in the search request; the searching subunit is suitable for searching the state characteristic vector corresponding to the node identification from one or more node state information; and the calculating subunit is suitable for calculating at least one first state characteristic vector similar to the state characteristic vector from one or more operation and maintenance log records and generating a solution list according to the solution of the state characteristic vector in the operation and maintenance log records.
Optionally, in the search engine according to the present invention, the search engine is connected to a collector for collecting node state information in a machine room, and the input/output interface is further adapted to obtain the state of each node through the collector, where the state of the node includes CPU running state data and memory running state data of the node.
Optionally, in the search engine according to the present invention, the search engine is connected to a collector for collecting information about a state of a machine room, and the input/output interface is further adapted to obtain the state of the machine room through the collector, where the state of the machine room includes a temperature, a humidity, a power supply, and a network connection state of the machine room.
Optionally, in the search engine according to the present invention, the data storage device is further adapted to generate state feature vectors according to the states of the nodes and the state of the machine room, respectively.
Optionally, in the search engine according to the present invention, the operation and maintenance log record further includes a machine room identifier of a machine room where the node is located, a solution, a first state feature vector of the machine room before the solution is executed, and a second state feature vector of the machine room after the solution is executed.
Optionally, in the search engine according to the present invention, the data storage device is further adapted to record the solution, the node identification, and the first state feature vector and the second state feature vector of the node as an operation and maintenance log record when the verification is passed.
Optionally, in the search engine according to the invention, the calculation subunit is further adapted to calculate similarities between the state feature vectors using a proximity algorithm.
Optionally, in the search engine according to the present invention, the calculation subunit is further adapted to sort the calculated one or more operation and maintenance log records in order of high similarity to low similarity.
Optionally, in the search engine according to the present invention, the operation and maintenance log record further includes a search term corresponding to the node identifier.
Optionally, in the search engine according to the present invention, the extracting subunit is further adapted to extract a search term in the search request; the calculation subunit is further adapted to calculate a solution corresponding to the search term having a correlation with the extracted search term from the operation and maintenance log record, and to list in a solution list.
Optionally, in the search engine according to the present invention, the search module is further adapted to search for a solution corresponding to the computer room for the node when no solution related to the node identification and/or the search term in the search request is searched.
According to another aspect of the present invention, there is provided a monitoring system for an operation and maintenance machine room, including: the system comprises a plurality of collectors, a data processing unit and a data processing unit, wherein the collectors are suitable for collecting the state of each node in a machine room and the state of the machine room, the state of each node comprises CPU (central processing unit) running state data and memory running state data of the node, and the state of the machine room comprises machine room temperature, humidity, a power supply and a network connection state; a search engine as described above; and the executer is arranged on each node in the computer room and is suitable for executing the execution operation of the corresponding solution when receiving the execution command from the search engine.
Optionally, in the monitoring system according to the present invention, further comprising: and the client is suitable for receiving user input and sending a search request to the search engine.
According to another aspect of the present invention, there is provided a method for monitoring an operation and maintenance machine room, including the steps of: responding to the search request, and extracting the node identification in the search request; searching a state feature vector corresponding to the node identifier from one or more node state information, wherein each node state information comprises the node identifier and the state feature vector representing the state of the node; searching operation log records corresponding to at least one first state feature vector similar to the state feature vector from one or more operation log records, wherein each operation log record comprises a node identification, a solution, a first state feature vector of a node before the solution is executed and a second state feature vector of the node after the solution is executed; generating a solution list according to the found solution in the operation and maintenance log record; sequentially selecting a solution from the solution list, executing detection operation defined in the solution, and acquiring current state information of the node or the machine room as a detection result; comparing the detection result with a second state feature vector associated with the solution, if the comparison is consistent, determining that the verification is passed, and sending the execution operation defined in the solution to a corresponding node so that the node executes the execution operation of the solution; and if the comparison is inconsistent, the verification is not passed, the next solution is sequentially selected from the solution list, and the steps of detecting, comparing and verifying are repeated until the verification is passed.
Optionally, in the monitoring method according to the present invention, before the step of searching the state feature vector corresponding to the node identifier from one or more node state information, the method further includes the steps of: acquiring the state of each node in the machine room and generating a state feature vector of the corresponding node; the state of the node comprises CPU running state data and memory running state data of the node.
Optionally, in the monitoring method according to the present invention, the step of obtaining the state of each node in the machine room further includes: acquiring the state of a machine room and generating a state feature vector of the machine room; the state of the machine room comprises machine room temperature, humidity, a power supply and a network connection state.
Optionally, in the monitoring method according to the present invention, the node state information further includes a machine room identifier of a machine room where the node is located and a state feature vector representing a state of the machine room; the operation and maintenance log record also comprises a machine room identifier of the machine room where the node is located, a solution, a first state feature vector of the machine room before the solution is executed and a second state feature vector of the machine room after the solution is executed.
Optionally, in the monitoring method according to the present invention, after the step of determining that the verification is passed if the comparison is consistent, the method further includes: and taking the solution, the node identification and the first state feature vector and the second state feature vector of the node as an operation and maintenance log record.
Optionally, in the monitoring method according to the present invention, the step of searching for at least one first status feature vector similar to the status feature vector from one or more operation and maintenance log records includes: and calculating the similarity between the state feature vectors by adopting a proximity algorithm.
Optionally, in the monitoring method according to the present invention, the step of generating the solution list according to the solution found in the operation and maintenance log record includes: and sequencing the searched solutions in the operation and maintenance log records according to the sequence of the similarity from high to low.
Optionally, in the monitoring method according to the present invention, the step of extracting the node identifier in the search request further includes: and extracting the search terms in the search request, wherein the operation and maintenance log records also comprise the search terms corresponding to the node identifications.
Optionally, in the monitoring method according to the present invention, the step of searching, from one or more operation and maintenance log records, an operation and maintenance log record corresponding to at least one first state feature vector similar to the state feature vector further includes: and searching at least one operation and maintenance log record similar to the extracted search terms from the operation and maintenance log records.
Optionally, in the monitoring method according to the present invention, further comprising the steps of: and when the solution related to the node identification and/or the search word in the search request is not searched, searching the solution corresponding to the computer room of the node.
According to the scheme of the invention, when the operation and maintenance machine room has problems (the operation and maintenance personnel can find the problems or automatically give an alarm after detecting the problems), the state information of the corresponding node can be searched through the search engine, the solutions adopted when the similar problems are solved in the past can be automatically screened, the optimal solution capable of solving the problems can be determined through the detection and verification steps, the actuator of the corresponding node is informed through the search engine, and the corresponding execution operation is executed, so that the purpose of automatically solving the operation and maintenance problems is achieved.
Drawings
To the accomplishment of the foregoing and related ends, certain illustrative aspects are described herein in connection with the following description and the annexed drawings, which are indicative of various ways in which the principles disclosed herein may be practiced, and all aspects and equivalents thereof are intended to be within the scope of the claimed subject matter. The above and other objects, features and advantages of the present disclosure will become more apparent from the following detailed description read in conjunction with the accompanying drawings. Throughout this disclosure, like reference numerals generally refer to like parts or elements.
Fig. 1 shows a schematic diagram of a monitoring system 100 of an operation and maintenance machine room according to an embodiment of the invention;
FIG. 2 shows a schematic diagram of a search engine 120, according to one embodiment of the invention; and
fig. 3 shows a flowchart of a monitoring method 300 for an operation and maintenance machine room according to an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
Fig. 1 shows a schematic diagram of a monitoring system 100 of an operation and maintenance machine room according to an embodiment of the present invention.
A plurality of computing devices or servers are arranged in the operation and maintenance machine room, and especially in the operation and maintenance machine room of a super computing center for completing high-performance computing, the plurality of computing devices or servers execute super computing operation, so that the safe and stable operation of the operation and maintenance machine room is very important. According to the implementation mode of the invention, the computing equipment, the server, the power distribution equipment, the air conditioner and the like in the operation and maintenance machine room are all used as one node in the machine room.
As shown in fig. 1, the system 100 includes a plurality of collectors 110, a search engine 120, an executor 130 arranged on each node in an operation and maintenance machine room, and a client 140. The collector 110 is connected to a search engine 120, and the search engine 120 is connected to an executor 130 and a client 140, respectively. The collector 110 is disposed in an operation and maintenance machine room, for example, on each node in the machine room, and is configured to collect a state of each node in the machine room and a state of the machine room, optionally, the state of the node includes CPU running state data and memory running state data of the node, and the state of the machine room includes machine room temperature, humidity, a power supply, a network connection state, and the like. The executor 130 is disposed on each node in the machine room, and executes an execution operation of a corresponding solution upon receiving an execution command from the search engine 120. The client 140 receives a user input, for example, the user inputs content to be searched on a browser of the client 140, and the client 140 generates a search request and sends the search request to the search engine 130. It should be understood by those skilled in the art that fig. 1 is only an exemplary illustration of the above-mentioned devices, and in a practical system, the number of collectors 110, actuators 130, etc. is determined according to practical situations, and the invention is not limited thereto.
The search engine 120 in the system 100 will be described in detail below. Referring to FIG. 2, a schematic diagram of a search engine 120 is shown, according to one embodiment of the invention. The search engine 120 includes: input/output interface 121, data storage device 123, search module 125, detection module 127, and verification module 129.
The input/output interface 121 receives the state of each node and/or the machine room acquired by the acquisition unit 110, as described above, including the CPU operating state data and the memory operating state data of each node, the temperature, humidity, power supply, network connection state, and the like of the machine room.
The data storage device 123 generates and stores state feature vectors according to the states of the nodes and the state of the machine room, and specifically includes one or more node state information and one or more operation and maintenance log records. The node state information comprises a node identifier and a state feature vector representing the node state of each node; each operation log record includes a node identification, a solution, a first state feature vector of the node before executing the solution, and a second state feature vector of the node after executing the solution. For example, a state feature vector of each node is represented by v, the maximum value, the minimum value and the average value of the CPU operation and the maximum value, the minimum value and the average value of the memory operation of the node in a period of time are recorded, and v can be represented as:
v=[avg cpu,max cpu,min cpu,avg memory,max memory,min memory]
then, the plurality of node state information may be stored in the form of:
node identification Node state feature vector
node1 v_node1
node2 v_node2
Likewise, the multiple operation and maintenance logs are recorded as follows:
node identification Solution scheme First state feature vector Second state feature vector
node1 A v1 v2
node2 B v1’ v2’
According to another embodiment of the present invention, the operation and maintenance log record further includes a machine room identifier of the machine room where the node is located, the solution, a first state feature vector of the machine room before the solution is executed, and a second state feature vector of the machine room after the solution is executed.
According to another embodiment of the invention, the operation and maintenance log record further includes a search term corresponding to the node identifier. When a certain search word appears together with a certain node identifier, the search word is correspondingly recorded in the operation and maintenance log record of the node identifier.
The input/output interface 121 receives a search request from the client 140. Optionally, the search request may include a node identifier and may also include a search word, for example, when the operation and maintenance staff detects that a server in the machine room is too hot, the "node 3 is too hot" may be input on the client 140. Of course, the search request may also include only the node identifier, which is not limited in the present invention.
The search module 125 extracts the node identification in the search request and finds a matching solution from the stored operation and maintenance log records. According to one embodiment of the invention, the search module includes an extraction subunit 1252, a find subunit 1254, and a compute subunit 1256.
Specifically, the extracting subunit 1252 extracts the node identifier in the search request, and optionally, when the search request includes a search term, the extracting subunit 1252 may further extract the search term.
The lookup subunit 1254 then looks up the state feature vector corresponding to the node identification from one or more node state information stored on the data storage device 123.
At least one first state feature vector similar to the state feature vector is then calculated by the calculation subunit 1256 from one or more operation log records stored on the data storage device 123. Optionally, the calculating subunit 1256 calculates the similarity between the state feature vectors using a proximity algorithm. The proximity algorithm is a more common algorithm in data mining classification, and a specific implementation process of the proximity algorithm is not described here. Of course, other data clustering and similarity calculation methods may be adopted to calculate the similarity between the state feature vectors, which is not limited in the present invention. Then, the calculating subunit 1256 takes out the operation and maintenance log records where the at least one first state feature vector is located, sorts the operation and maintenance log records according to the sequence from high to low in similarity, and generates a solution list according to solutions in the operation and maintenance log records. Optionally, the calculating subunit 1256 sorts the solutions with the calculated similarity value greater than a predetermined threshold (e.g., 0.7), resulting in a solution list.
According to the embodiment of the present invention, when the search request includes a search word, the calculating subunit 1256 may also find a corresponding operation and maintenance log record from the operation and maintenance log records by calculating a search word having a correlation with the extracted search word, further find solutions included in the operation and maintenance log records, and merge the solutions into the solution list.
According to another embodiment of the present invention, if the search module 125 cannot search for a solution associated with the node identifier and/or the search term in the search request, the search module continues to search for a solution corresponding to the node in the computer room.
When the solution list is searched by the above method, the detection module 127 sequentially selects one solution from the solution list, executes the detection operation defined in the solution, and acquires the current state information of the node or the machine room as the detection result.
According to the embodiment of the present invention, each solution includes a detection operation and an execution operation, and if it is assumed that the solution is to turn down the air conditioner No. 1, the detection operation may be to turn down the air conditioner No. 1 by 1 ° for 1 minute, and the execution operation may be to turn down the air conditioner by 2 °. The detection module 127 reduces the air conditioner No. 1 by 1 ° and continues for 1 minute according to the instruction of the detection operation, and after 1 minute, the state of the corresponding node is acquired by the collector 110, and corresponding state information is generated as a detection result.
The verification module 129 compares the detection result with the second state feature vector corresponding to the solution, and if the comparison is consistent, the verification is considered to pass, and if the comparison is inconsistent, the verification is considered to fail.
If the verification is passed, it is stated that the solution is valid for the problem of the node, and therefore, the input/output interface 121 transmits the execution operation defined in the solution (i.e., the air conditioner temperature is adjusted down by 2 °) to the actuator 130 on the corresponding node so that the node executes the execution operation of the solution.
On the contrary, if the verification fails, it indicates that the solution may be still deficient, so the detecting module 127 sequentially selects the next solution from the solution list and detects the next solution until the verification passes.
According to some implementation modes, if the solutions in the solution list are verified after being repeated for a plurality of times, the verification is stopped if the appropriate solutions are not matched, and the node identification and/or the search word in the search request are recorded so as to facilitate the subsequent processing of the operation and maintenance personnel.
The above is merely an example, giving a simple solution demonstration, in a practical scenario the solution is more complex. If a node is found to be too hot, the detection operation in the solution may include:
1. checking the operating conditions of air-conditioning equipment
2. Checking whether the CPU fan speed is lower than a certain threshold
3. Checking to see if CPU usage is above a certain threshold
4. Viewing GPU usage
5. Check whether the water cooling equipment is normal
Alternatively, the detecting operation and the performing operation may be the same, or may be different only in the length of the operation time, and the like, and the present invention is not limited thereto.
According to an embodiment of the present invention, when the verification passes, the data storage device 123 records the solution, the node identification, and the first state feature vector and the second state feature vector of the node as an operation and maintenance log.
According to the scheme of the invention, when an operation and maintenance worker finds a problem, the state information of the corresponding node can be searched through the search engine, the solution adopted when the operation and maintenance worker deals with similar problems in the past is automatically screened, the optimal solution capable of solving the problem is determined through the detection and verification steps, and then the search engine informs the actuator of the corresponding node to execute corresponding execution operation, so that the purpose of automatically solving the operation and maintenance problem is achieved.
Further, the search engine can report the current node or machine room state to the client and display the current node or machine room state on the client in real time. And when the state of the current node or the machine room is found to be abnormal, the search engine directly sends alarm information to the client so that operation and maintenance personnel can timely troubleshoot problems.
Accordingly, fig. 3 shows a flowchart of a monitoring method 300 for an operation and maintenance machine room according to an embodiment of the present invention.
The method 300 begins at step S310 by, in response to a search request, extracting a node identification in the search request. Optionally, when the search request includes a search word, the search word in the search request may also be extracted. For example, the search request is: "node 3 is high in temperature", wherein node3 is a node identifier, and the high in temperature is a search word.
According to an embodiment of the present invention, before obtaining the search request, the method 300 further includes the steps of: the method comprises the steps of obtaining the state of each node in a machine room, and generating a state feature vector of the corresponding node, wherein the state of the node comprises CPU (Central processing Unit) running state data and memory running state data of the node. Meanwhile, the state of the machine room can be obtained, and state characteristic vectors of the machine room are generated, wherein the state of the machine room comprises the temperature, the humidity, the power supply and the network connection state of the machine room.
For example, a state feature vector of each node is represented by v, the maximum value, the minimum value and the average value of the CPU operation and the maximum value, the minimum value and the average value of the memory operation of the node in a period of time are recorded, and v can be represented as:
v=[avg cpu,max cpu,min cpu,avg memory,max memory,min memory]。
then, in step S320, a state feature vector corresponding to the node identifier is searched from one or more node state information, where each node state information includes the node identifier and a state feature vector characterizing the node state, as follows:
node identification Node state feature vector
node1 v_node1
node2 v_node2
According to another embodiment of the present invention, the node state information further includes a machine room identifier of a machine room where the node is located and a state feature vector representing a state of the machine room.
Then in step S330, at least one first status feature vector similar to the status feature vector is searched for from one or more operation log records, where each operation log record includes a node identification, a solution, a first status feature vector of the node before executing the solution, and a second status feature vector of the node after executing the solution, as follows:
node identification Solution scheme First state feature vector Second state feature vector
node1 A v1 v2
node2 B v1’ v2’
According to the embodiment of the invention, the operation and maintenance log records further include a search word corresponding to the node identifier, and when a certain search word appears together with a certain node identifier, the search word is correspondingly recorded in the operation and maintenance log records of the node identifier. Optionally, at least one operation and maintenance log record similar to the search term in the search request is searched from the one or more operation and maintenance log records.
Optionally, a proximity algorithm is used to compute the similarity between the state feature vectors. The proximity algorithm is a more common algorithm in data mining classification, and a specific implementation process of the proximity algorithm is not described here. Of course, other data clustering and similarity calculation methods may be adopted to calculate the similarity between the state feature vectors, which is not limited in the present invention. Optionally, a first state feature vector similar to the state feature vector pointed in the search request is determined according to the calculated similarity value, and then a plurality of operation and maintenance log records are screened out, for example, an operation and maintenance log record with a similarity value greater than a predetermined threshold (e.g., 0.7) is selected.
According to the embodiment of the invention, the operation and maintenance log record further includes a machine room identifier of the machine room where the node is located, the solution, a first state feature vector of the machine room before the solution is executed, and a second state feature vector of the machine room after the solution is executed.
Then, in step S340, a solution list is generated according to the found solutions in the operation and maintenance log record. According to the embodiment of the present invention, according to the similarity value calculated in step S330, the searched operation and maintenance log records are sorted in order of high similarity to low similarity, that is, the solutions in the searched operation and maintenance log records are sorted.
Then, in step S350, one solution is sequentially selected from the solution list and the detection operation defined in the solution is executed, and the current state information of the node or the machine room is obtained as the detection result. According to the implementation mode of the invention, each solution comprises a detection operation and an execution operation, the detection operation in the solution is executed firstly, and if the detection operation is verified to have the effect of solving the problem, the execution operation in the solution is executed. Therefore, the monitoring efficiency can be effectively improved, and the problem that time is wasted on the invalid solution of the search request is avoided.
Subsequently, in step S360, the detection result is compared with the second state feature vector associated with the solution, if the comparison is consistent, the verification is considered to be passed, and the execution operation defined in the solution is sent to the corresponding node, so that (the executor on) the node executes the execution operation of the solution.
According to the embodiment of the invention, when the verification is passed, the solution, the node identification and the first state feature vector and the second state feature vector of the node (optionally, the search word if any) are recorded as an operation and maintenance log.
Subsequently, in step S370, if the comparison is inconsistent, the verification is determined not to be passed, a next solution is sequentially selected from the solution list, and the steps of detecting and comparing the verification (i.e., steps S350 and S360 are repeated) are repeated until the verification is passed.
Optionally, when the solution related to the node identification and/or the search word in the search request is not searched, the solution corresponding to the computer room of the node is searched.
According to some implementation modes, if the solutions in the solution list are verified after being repeated for a plurality of times, the verification is stopped if the appropriate solutions are not matched, and the node identification and/or the search word in the search request are recorded so as to facilitate the subsequent processing of the operation and maintenance personnel.
It should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.
Those skilled in the art will appreciate that the modules or units or components of the devices in the examples disclosed herein may be arranged in a device as described in this embodiment or alternatively may be located in one or more devices different from the devices in this example. The modules in the foregoing examples may be combined into one module or may be further divided into multiple sub-modules.
Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
The invention also discloses:
a6, the search engine as in A5, wherein the operation and maintenance log record further includes a machine room identifier of the machine room where the node is located, the solution, the first state feature vector of the machine room before the solution is executed, and the second state feature vector of the machine room after the solution is executed.
A7, the search engine as in any of A1-6, wherein the data storage device is further adapted to record the solution, the node identification, and the first state feature vector and the second state feature vector of the node as an operation log record when the verification is passed.
A8, the search engine as in any of A2-7, wherein the computing subunit is further adapted to compute similarities between state feature vectors using a proximity algorithm.
A9, the search engine as in any one of A2-7, wherein the computing subunit is further adapted to sort the computed one or more operation and maintenance log records in order of high to low similarity.
A10, the search engine as in any one of A2-9, wherein the operation and maintenance log records further include search terms corresponding to node identifications.
A11, the search engine of A10, wherein the extracting subunit is further adapted to extract the search terms in the search request; the computing subunit is further adapted to compute a solution from the operation and maintenance log records corresponding to the search term having a relevance to the extracted search term, and to list in a solution list.
A12, the search engine of any one of A1-11, wherein the search module is further adapted to search for a solution for the computer room if no solution is found that is related to the node identification and/or the search term in the search request.
B18, the method according to any one of B15-17, wherein the node state information further includes machine room identification of the machine room where the node is located and a state feature vector representing the state of the machine room; the operation and maintenance log record also comprises a machine room identifier of the machine room where the node is located, a solution, a first state feature vector of the machine room before the solution is executed and a second state feature vector of the machine room after the solution is executed.
B19, the method as in any one of B15-18, further comprising, after the step of considering the verification to be passed if the comparison is consistent: and taking the solution, the node identification and the first state feature vector and the second state feature vector of the node as an operation and maintenance log record.
B20, the method according to any one of B15-19, wherein the step of finding at least one first state feature vector similar to the state feature vector from one or more operation and maintenance log records comprises: and calculating the similarity between the state feature vectors by adopting a proximity algorithm.
B21, the method according to any B15-20, wherein the step of generating a solution list according to the found solutions in the operation and maintenance log record comprises: and sequencing the searched solutions in the operation and maintenance log records according to the sequence of the similarity from high to low.
B22, the method according to any one of B15-21, wherein the step of extracting the node identification in the search request further comprises: and extracting search terms in the search request, wherein the operation and maintenance log records also comprise search terms corresponding to the node identifications.
B23, the method according to B22, wherein the step of searching the operation and maintenance log record corresponding to at least one first state feature vector similar to the state feature vector from the one or more operation and maintenance log records further comprises: and searching at least one operation and maintenance log record similar to the extracted search terms from the operation and maintenance log records.
B24, the method of any one of B15-23, further comprising the steps of: and when the solution related to the node identification and/or the search word in the search request is not searched, searching the solution corresponding to the computer room of the node.
Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.
Furthermore, some of the described embodiments are described herein as a method or combination of method elements that can be performed by a processor of a computer system or by other means of performing the described functions. A processor having the necessary instructions for carrying out the method or method elements thus forms a means for carrying out the method or method elements. Further, the elements of the apparatus embodiments described herein are examples of the following apparatus: the apparatus is used to implement the functions performed by the elements for the purpose of carrying out the invention.
As used herein, unless otherwise specified the use of the ordinal adjectives "first", "second", "third", etc., to describe a common object, merely indicate that different instances of like objects are being referred to, and are not intended to imply that the objects so described must be in a given sequence, either temporally, spatially, in ranking, or in any other manner.
While the invention has been described with respect to a limited number of embodiments, those skilled in the art, having benefit of this description, will appreciate that other embodiments can be devised which do not depart from the scope of the invention as described herein. Furthermore, it should be noted that the language used in the specification has been principally selected for readability and instructional purposes, and may not have been selected to delineate or circumscribe the inventive subject matter. Accordingly, many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the appended claims. The present invention has been disclosed in an illustrative rather than a restrictive sense, and the scope of the present invention is defined by the appended claims.

Claims (24)

1. A search engine disposed in a monitoring system of an operation and maintenance machine room, the operation and maintenance machine room including a plurality of nodes, wherein the search engine comprises:
the input/output interface is suitable for receiving a search request from the client and sending an operation command to each node so that the node executes corresponding operation;
the data storage device is suitable for storing one or more node state information and one or more operation and maintenance log records, wherein each node state information comprises a node identification and a state feature vector for representing the node state of the node, and each operation and maintenance log record comprises the node identification, a solution, a first state feature vector of the node before the solution is executed and a second state feature vector of the node after the solution is executed;
the searching module is suitable for extracting the node identification in the searching request, searching the state characteristic vector corresponding to the node identification from the one or more pieces of node state information, searching the operation and maintenance log records with the first state characteristic vector similar to the state characteristic vector from the one or more operation and maintenance log records, and generating a solution list according to the solution in the searched operation and maintenance log records;
the detection module is suitable for sequentially selecting a solution from the solution list, executing detection operation defined in the solution and acquiring current state information of the node or the machine room as a detection result;
the verification module is suitable for comparing the detection result with a second state characteristic vector corresponding to the solution, if the comparison is consistent, the verification is considered to pass, and if the comparison is inconsistent, the verification is considered to fail;
the input/output interface is also suitable for sending the execution operation defined in the solution to the corresponding node when the verification is passed so that the node executes the execution operation of the solution; and
the detection module is further adapted to sequentially select a next solution from the solution list and detect the next solution when the verification fails until the verification passes.
2. The search engine of claim 1, wherein the search module comprises:
an extraction subunit, adapted to extract the node identifier in the search request;
a searching subunit, adapted to search the state feature vector corresponding to the node identifier from the one or more node state information; and
and the calculation subunit is suitable for calculating at least one first state characteristic vector similar to the state characteristic vector from the one or more operation and maintenance log records and generating a solution list according to the solution of the state characteristic vector in the operation and maintenance log records.
3. The search engine of claim 2, wherein the search engine is connected with a collector for collecting node state information in a computer room,
the input/output interface is further adapted to obtain the status of each node via the collector,
the state of the node comprises CPU running state data and memory running state data of the node.
4. The search engine of claim 3, wherein the search engine is coupled to a collector for collecting room status information,
the input/output interface is further adapted to obtain the status of the machine room via the collector,
the state of the machine room comprises machine room temperature, humidity, a power supply and a network connection state.
5. The search engine of claim 4, wherein the data storage device is further adapted to generate state feature vectors based on the state of each node and the state of the machine room, respectively.
6. The search engine of claim 5,
the operation and maintenance log record further comprises a machine room identifier of a machine room where the node is located, a solution, a first state feature vector of the machine room before the solution is executed, and a second state feature vector of the machine room after the solution is executed.
7. The search engine of claim 6,
the data storage device is further adapted to use the solution, the node identification, and the first state feature vector and the second state feature vector of the node as an operation log record when the verification is passed.
8. The search engine of claim 7,
the computing subunit is further adapted to compute similarities between the state feature vectors using a proximity algorithm.
9. The search engine of claim 7,
the calculation subunit is further adapted to sort the calculated one or more operation and maintenance log records in an order from high similarity to low similarity.
10. The search engine of claim 9,
the operation and maintenance log records also comprise search terms corresponding to the node identifications.
11. The search engine of claim 10,
the extracting subunit is further adapted to extract a search term in the search request;
the computing subunit is further adapted to compute a solution from the operation and maintenance log records corresponding to the search term having a relevance to the extracted search term, and to list in a solution list.
12. The search engine of any of claims 1-11,
the search module is further adapted to search for a solution for the computer room for the node when no solution is found that is associated with the node identification and/or the search term in the search request.
13. A monitoring system of an operation and maintenance machine room comprises:
the system comprises a plurality of collectors, a plurality of network controllers and a plurality of network controllers, wherein the collectors are suitable for collecting state information of each node in a machine room and states of the machine room, the states of the nodes comprise CPU (central processing unit) running state data and memory running state data of the nodes, and the states of the machine room comprise machine room temperature, humidity, a power supply and network connection states;
a search engine as claimed in any one of claims 1-12; and
and the executer is arranged on each node in the computer room and is suitable for executing the execution operation of the corresponding solution when receiving the execution command from the search engine.
14. The monitoring system of claim 13, further comprising:
and the client is suitable for receiving user input and sending a search request to the search engine.
15. A monitoring method for an operation and maintenance machine room comprises the following steps:
responding to a search request, and extracting a node identification in the search request;
searching a state feature vector corresponding to the node identifier from one or more node state information, wherein each node state information comprises the node identifier and the state feature vector representing the state of the node;
searching operation log records corresponding to at least one first state feature vector similar to the state feature vector from one or more operation log records, wherein each operation log record comprises a node identification, a solution, a first state feature vector of a node before the solution is executed and a second state feature vector of the node after the solution is executed;
generating a solution list according to the found solution in the operation and maintenance log record;
sequentially selecting a solution from the solution list, executing detection operation defined in the solution, and acquiring current state information of the node or the machine room as a detection result;
comparing the detection result with a second state feature vector associated with the solution, if the comparison is consistent, determining that the verification is passed, and sending the execution operation defined in the solution to a corresponding node so that the node executes the execution operation of the solution; and
if the comparison is inconsistent, the verification is not passed, the next solution is sequentially selected from the solution list, and the steps of detecting, comparing and verifying are repeated until the verification is passed.
16. The method of claim 15, further comprising, before the step of searching for the state feature vector corresponding to the node identifier from one or more node state information, the steps of:
acquiring the state of each node in the machine room and generating a state feature vector of the corresponding node;
the state of the node comprises CPU running state data and memory running state data of the node.
17. The method of claim 16, wherein the step of obtaining the state of each node in the computer room further comprises:
acquiring the state of a machine room and generating a state feature vector of the machine room;
the state of the machine room comprises machine room temperature, humidity, a power supply and a network connection state.
18. The method of claim 17, wherein,
the node state information also comprises a machine room identifier of a machine room where the node is located and a state feature vector representing the state of the machine room;
the operation and maintenance log record also comprises a machine room identifier of the machine room where the node is located, a solution, a first state feature vector of the machine room before the solution is executed and a second state feature vector of the machine room after the solution is executed.
19. The method of claim 18, after the step of considering the verification as passed if the comparison is consistent, further comprising:
and taking the solution, the node identification and the first state feature vector and the second state feature vector of the node as an operation and maintenance log record.
20. The method of claim 19, wherein the step of finding at least one first state feature vector from one or more operation and maintenance log records that is similar to the state feature vector comprises:
and calculating the similarity between the state feature vectors by adopting a proximity algorithm.
21. The method of claim 20, wherein the step of generating a solution list according to the found solutions in the operation and maintenance log record comprises:
and sequencing the searched solutions in the operation and maintenance log records according to the sequence of the similarity from high to low.
22. The method of claim 21, wherein the step of extracting the node identification in the search request further comprises:
extracting a search term in a search request, an
The operation and maintenance log records also comprise search terms corresponding to the node identifications.
23. The method of claim 22, wherein the step of searching one or more operation log records for an operation log record corresponding to at least one first state feature vector similar to the state feature vector further comprises:
and searching at least one operation and maintenance log record similar to the extracted search terms from the operation and maintenance log records.
24. The method of any one of claims 15-23, further comprising the step of:
and when the solution related to the node identification and/or the search word in the search request is not searched, searching the solution corresponding to the computer room of the node.
CN201710067387.6A 2017-02-06 2017-02-06 Monitoring system and method for operation and maintenance machine room and search engine Active CN106776235B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710067387.6A CN106776235B (en) 2017-02-06 2017-02-06 Monitoring system and method for operation and maintenance machine room and search engine

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710067387.6A CN106776235B (en) 2017-02-06 2017-02-06 Monitoring system and method for operation and maintenance machine room and search engine

Publications (2)

Publication Number Publication Date
CN106776235A CN106776235A (en) 2017-05-31
CN106776235B true CN106776235B (en) 2019-12-31

Family

ID=58955302

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710067387.6A Active CN106776235B (en) 2017-02-06 2017-02-06 Monitoring system and method for operation and maintenance machine room and search engine

Country Status (1)

Country Link
CN (1) CN106776235B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107181630A (en) * 2017-07-24 2017-09-19 郑州云海信息技术有限公司 The treating method and apparatus of service fault in cloud system
CN109213655B (en) * 2018-07-19 2022-02-18 东软集团股份有限公司 Solution determination method, device, storage medium and equipment for alarm

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101996126A (en) * 2010-11-01 2011-03-30 北京并行科技有限公司 Computer group performance monitoring interface and method
CN103685456A (en) * 2004-01-23 2014-03-26 蒂弗萨公司 Method of optimally utilizing peer to peer network
CN104112282A (en) * 2014-07-14 2014-10-22 华中科技大学 A method for tracking a plurality of moving objects in a monitor video based on on-line study
CN104317658A (en) * 2014-10-17 2015-01-28 华中科技大学 MapReduce based load self-adaptive task scheduling method
CN106354616A (en) * 2016-08-18 2017-01-25 北京并行科技股份有限公司 Method and device for monitoring application execution performance and high-performance computing system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103685456A (en) * 2004-01-23 2014-03-26 蒂弗萨公司 Method of optimally utilizing peer to peer network
CN101996126A (en) * 2010-11-01 2011-03-30 北京并行科技有限公司 Computer group performance monitoring interface and method
CN104112282A (en) * 2014-07-14 2014-10-22 华中科技大学 A method for tracking a plurality of moving objects in a monitor video based on on-line study
CN104317658A (en) * 2014-10-17 2015-01-28 华中科技大学 MapReduce based load self-adaptive task scheduling method
CN106354616A (en) * 2016-08-18 2017-01-25 北京并行科技股份有限公司 Method and device for monitoring application execution performance and high-performance computing system

Also Published As

Publication number Publication date
CN106776235A (en) 2017-05-31

Similar Documents

Publication Publication Date Title
Li et al. A data-driven strategy for detection and diagnosis of building chiller faults using linear discriminant analysis
CN111835585B (en) Inspection method and device for Internet of things equipment, computer equipment and storage medium
JP5509765B2 (en) Air conditioning control device, air conditioning control method, and air conditioning control program
CN101582195B (en) Method for generating alarm in dynamic environment monitoring system
CN111814999B (en) Fault work order generation method, device and equipment
KR102285987B1 (en) Method, system and computer program for detecting error of facilities in building
CN106776235B (en) Monitoring system and method for operation and maintenance machine room and search engine
CN111582235A (en) Alarm method, system and equipment for monitoring abnormal events in station in real time
CN106445755A (en) Method for automatically testing integral cabinet servers
CN107102929A (en) The detection method and device of failure
CN104566771A (en) Method and device for controlling refrigerant of dehumidifier
WO2014206099A1 (en) Method and device for collecting fault site information about multi-node server system
CN107423171A (en) The detection method and device of insertion slot type function expansion card based on PCIE standards
CN115660262A (en) Intelligent engineering quality inspection method, system and medium based on database application
CN115666097A (en) Computer room temperature control method and device, storage medium and electronic equipment
CN115118581A (en) Internet of things data full-link monitoring and intelligent security system based on 5G
CN109818825B (en) Rack server intelligent test method and system
CN114063582A (en) Method and device for monitoring a product test process
CN102323975B (en) Message correctness judging method of IEC61850-based model file
CN112330063B (en) Equipment fault prediction method, equipment fault prediction device and computer readable storage medium
CN107330884A (en) Ignition point detection method and device
CN109862530A (en) A kind of automatic repair method of sensor node and device
CN109032897A (en) Data dispatching method, host and solid state hard disk
CN115643163A (en) Fault equipment positioning method, device, equipment and storage medium
CN109828186A (en) A kind of long-range Distribution Network Failure active forewarning system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
PE01 Entry into force of the registration of the contract for pledge of patent right
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: A monitoring system, method and search engine for operation and maintenance machine room

Effective date of registration: 20230209

Granted publication date: 20191231

Pledgee: Bank of Hangzhou Limited by Share Ltd. Beijing branch

Pledgor: BEIJING PARATERA TECHNOLOGY Co.,Ltd.

Registration number: Y2023110000057

PC01 Cancellation of the registration of the contract for pledge of patent right
PC01 Cancellation of the registration of the contract for pledge of patent right

Date of cancellation: 20230414

Granted publication date: 20191231

Pledgee: Bank of Hangzhou Limited by Share Ltd. Beijing branch

Pledgor: BEIJING PARATERA TECHNOLOGY Co.,Ltd.

Registration number: Y2023110000057

PE01 Entry into force of the registration of the contract for pledge of patent right
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: A monitoring system, method, and search engine for operation and maintenance computer rooms

Effective date of registration: 20230418

Granted publication date: 20191231

Pledgee: Bank of Hangzhou Co.,Ltd. Beijing Chaoyang Wenchuang Sub branch

Pledgor: BEIJING PARATERA TECHNOLOGY Co.,Ltd.

Registration number: Y2023110000164