CN115709200A - High-performance computing cluster system fault prediction device and use method thereof - Google Patents

High-performance computing cluster system fault prediction device and use method thereof Download PDF

Info

Publication number
CN115709200A
CN115709200A CN202211493434.0A CN202211493434A CN115709200A CN 115709200 A CN115709200 A CN 115709200A CN 202211493434 A CN202211493434 A CN 202211493434A CN 115709200 A CN115709200 A CN 115709200A
Authority
CN
China
Prior art keywords
dust
device body
performance computing
computing cluster
threshold
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211493434.0A
Other languages
Chinese (zh)
Other versions
CN115709200B (en
Inventor
龙玉江
甘润东
卫薇
李洵
王杰峰
王策
孙骏
钟掖
卢仁猛
袁捷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guizhou Power Grid Co Ltd
Original Assignee
Guizhou Power Grid Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guizhou Power Grid Co Ltd filed Critical Guizhou Power Grid Co Ltd
Priority to CN202211493434.0A priority Critical patent/CN115709200B/en
Publication of CN115709200A publication Critical patent/CN115709200A/en
Application granted granted Critical
Publication of CN115709200B publication Critical patent/CN115709200B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Investigating Or Analysing Materials By Optical Means (AREA)

Abstract

The invention discloses a high-performance computing cluster system fault prediction device and a using method thereof, and relates to the field of fault prediction devices. The invention comprises a device body, wherein a controller and a fault prediction mechanism are arranged in the device body; the fault prediction mechanism comprises a dust fault prediction assembly and a system fault prediction assembly, the device body is internally divided into a monitoring chamber and a system chamber through a partition plate, the system chamber is divided into four areas, and the dust fault prediction assembly can be used for cleaning dust by arranging a double-acting air cylinder, a first moving part, a second moving part, a moving block, a transfer ball, a pressure pump, an air inlet pipe, a sliding block, an elastic part, a moving ball, a connecting pipe and an output pipe. The dust detection device can accurately detect dust and automatically clean the dust, and reduces the probability of system failure caused by the dust.

Description

High-performance computing cluster system fault prediction device and use method thereof
Technical Field
The invention relates to a high-performance computing cluster system fault prediction device and a using method thereof, and belongs to the technical field of system fault prediction devices.
Background
High performance computing refers to computing systems and environments that typically use many processors or several computers organized in a cluster, with many types of HPC systems ranging from large clusters of standard computers to highly specialized hardware.
Chinese patent application (publication No. CN 105159815B) discloses a method and an apparatus for predicting faults of a high performance computing cluster system, wherein the method for predicting faults in the patent comprises: the method comprises the steps of obtaining chip working conditions and power output power of each service node in a cluster system, analyzing the working state of each service node according to the chip working conditions and the power output power, and executing a preset maintenance strategy when the working state of the service node is abnormal. The invention analyzes the working state of the service node by acquiring the chip working condition and the power output power of each service node, and executes the preset maintenance strategy when the service node is in the abnormal state. The fault prediction device in the patent is carrying out the in-process of work, and the module in the system receives the dust easily and influences, and the dust also can cause the trouble of system, causes detection efficiency lower, and in addition, the part position that can't very accurate prediction broke down, detection effect is relatively poor.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: the utility model provides a high performance calculates clustering system failure prediction device and application method thereof, solves in the in-process of carrying out the work, and the module in the system is influenced by dust easily, and the dust also can cause the trouble of system, causes detection efficiency lower, moreover, can't very accurate prediction the part position that breaks down, and detection effect is relatively poor problem.
The technical scheme adopted by the invention is as follows: a high-performance computing cluster system fault prediction device comprises a device body, wherein a controller and a fault prediction mechanism are arranged in the device body;
the fault prediction mechanism comprises a dust fault prediction assembly and a system fault prediction assembly, the interior of the device body is divided into a monitoring chamber and a system chamber through partition plates, and the system chamber is divided into four areas;
the system fault prediction assembly comprises a moving part and a display part, the display part is arranged on the moving part and is installed in a monitoring room, the moving part comprises a driving motor, a lead screw and a moving block, the driving motor is installed on one side of the inside of the monitoring room, the lead screw is connected to the output end of the driving motor, the moving block is movably arranged on the lead screw, the display part comprises an installation plate, a dismounting part, a multi-color lamp and a lampshade, the installation plate is connected to the moving block through a connecting rod, the multi-color lamp and the lampshade are installed on the installation plate through the dismounting part, the multi-color lamp is located in the lampshade, three strips are arranged on the outer side of the device body, the outer side of the device body is divided into four regions, and the four regions correspond to the four region positions of the system room;
the device comprises a device body, a movable block, a driving motor, a lead screw, a movable block, a mounting plate, a dismounting piece, a multi-color lamp and a lampshade, wherein the driving motor, the lead screw, the movable block and the mounting plate are arranged on the movable block, the mounting plate is arranged on the display part, the dismounting piece, the multi-color lamp and the lampshade are arranged on the movable block, system faults are predicted through different colors of the multi-color lamp, the driving motor drives the multi-color lamp to move, then four areas of the outer side of the device body are matched with four areas of a system chamber, different parts of a high-performance computing cluster system are connected to different areas of the system body in the device body, the fault area can be predicted accurately, the prediction accuracy is improved, and the detection effect is improved.
The dust fault prediction component comprises a driving part and a cleaning part, the driving part comprises a double-acting cylinder, a first moving part, a second moving part and a movable block, the double-acting cylinder is installed inside the monitoring chamber, the first moving part is connected to one end of the double-acting cylinder, the second moving part is connected to the other end of the double-acting cylinder, one end, far away from the double-acting cylinder, of the second moving part is connected to the movable block, a plurality of annular grooves are formed in the movable block, an installation groove is formed in one side of the device body, and the movable block extends out of the device body through the installation groove;
the cleaning part comprises a transfer ball, a pressure pump, an air inlet pipe, a sliding block, elastic pieces, a movable ball, a connecting pipe and an output pipe, wherein the transfer ball is a hollow ball body and is connected to a movable block through a connecting piece;
wherein, through setting up double-acting cylinder, first moving part, second moving part and movable block cooperation transfer ball, the force pump, the intake pipe, the slider, the elastic component, the movable ball, connecting pipe and output tube, after detecting that the dust is about to influence the normal work of system, double-acting cylinder moves to first moving part direction, double-acting cylinder's one end is stretched out, the other end will contract, the second moving part contracts and drives the movable block shrink, the difference of dust content can make double-acting cylinder move different distances, can make the movable block to the inside distance that moves of device body different, the quantity that the annular on the movable block spills outside the device body also can be different, can be more accurate show that the degree that the dust influences system body is different; in an initial state, the guide groove is blocked by the movable ball in the guide groove of the sliding block, when the first movable part extends out, the movable ball is driven to stop the guide groove, so that airflow can circulate, and the airflow is output through the output pipe to be cleaned, so that the self-cleaning capability of the device is improved, the hidden trouble of faults is eliminated, and the service life of the device is prolonged;
the double-acting air cylinder can move for different distances through different dust contents, the higher the content is, the longer the distance from the double-acting air cylinder to the first movable part is, the farther the movable ball is from the outlet of the guide groove, the air flow circulation can be controlled along with the dust content, the cleaning accuracy is improved, the cleaning time is saved, and the cleaning efficiency is improved;
the system chamber is divided into four areas, different dust contents in the four areas of the system chamber can be classified while system fault prediction is carried out, and in the process that the driving motor moves, the dust fault prediction assembly is driven to move simultaneously to drive the dust fault prediction assembly to carry out different working states in different areas.
Preferably, the bottom of the device body is provided with an ash discharge groove, and the ash discharge groove is positioned in the system chamber.
Preferably, a system body and a dust sensing module are arranged in the system chamber, the system body corresponds to four areas of the system chamber, and the dust sensing module is positioned at the top of the system chamber and corresponds to the system body;
the dust sensing module is provided with an analysis submodule which can analyze the dust content and carry out various operations.
Preferably, the system room is provided with a connecting assembly, the connecting assembly comprises connecting lines and interfaces, the four interfaces are arranged on the outer side surface of the device body, each interface is connected with one connecting line, and one end, far away from the interfaces, of each connecting line is connected with the system body.
Preferably, an air outlet pipe is arranged at one end of the output pipe, which is far away from the sliding block, and an air outlet cover is arranged at one end of the air outlet pipe, which is far away from the output pipe;
wherein, the air-out cover can increase the air-out area, improves the efficiency of removing dust.
Preferably, the device body is provided with an air inlet cover, the air inlet cover is communicated with the air inlet pipe, and the top of the air inlet cover is provided with a plurality of dustproof holes;
wherein, through the air inlet cover with the air suction of external world in to the intake pipe to the dust hole can effectively prevent that the dust from getting into.
Preferably, the system body comprises an analysis module and an acquisition module, the acquisition module is used for acquiring the chip working condition and the power output power of each service node in the high-performance computing cluster system, the analysis module is used for analyzing the working state of each service node according to the chip working condition and the power output power and transmitting different information to the controller according to the working state, and the controller respectively controls the multi-color lamp, the driving motor, the double-acting cylinder and the dust sensing module.
A use method of a fault prediction device of a high-performance computing cluster system comprises the following steps:
s1: preparing:
connecting an interface on the device body with the high-performance computing cluster system, and connecting the system body in the system room with the high-performance computing cluster system for information transmission;
s2: and (3) system fault prediction:
the method comprises the steps that three threshold values are set in an analysis module according to the working state of each service node and respectively comprise a first threshold value, a second threshold value and a third threshold value, wherein the first threshold value indicates that a low-probability fault occurs, the second threshold value indicates that a medium-probability fault occurs, the third threshold value indicates that a high-probability fault occurs, a system body corresponds to four regions of a system room, the system bodies are respectively connected with each component of a high-performance computing cluster system and indicate different regions with faults, a driving motor receives a controller signal to drive a multi-color lamp to move and respectively pass through the four regions outside the device body, when one region passes through, the controller controls the multi-color lamp to display different colors according to different information transmitted by the system body, the multi-color lamp respectively comprises red, orange, yellow and green, the working state of each service node representing the region is lower than the first threshold value, the orange working state of each service node representing the region is located between the first threshold value and the second threshold value, and the working state of each service node representing the region is larger than the third threshold value;
s3: and (3) dust fault prediction:
the dust sensing module detects the dust content in the system room in real time, a plurality of threshold values of the dust content are arranged in the dust sensing module, when different threshold values are reached, different signals are sent to the controller, the controller controls the double-acting air cylinder to move towards the first moving part, the different threshold values represent different moving distances, and at the moment, the quantity of the annular grooves on the moving block exposed on the device body represents the grade of the dust content;
s4: self-cleaning:
in an initial state, the movable ball blocks a diversion trench in the sliding block, when the controller controls the double-acting cylinder to move towards the first movable part, the movable ball does not block the diversion trench any more, gas is led out from the output pipe and the air outlet pipe, the system body is cleaned, and the gas is discharged from the ash discharge trench;
s5: and (4) ending:
and pulling out the high-performance computing cluster system connected with the interface on the device body, and finishing the work of the prediction device.
The invention has the beneficial effects that: compared with the prior art, the invention has the following effects:
1) According to the invention, by arranging the double-acting air cylinder, the first moving part, the second moving part and the movable block matched transfer ball, the pressure pump, the air inlet pipe, the sliding block, the elastic part, the movable ball, the connecting pipe and the output pipe, after the fact that the normal work of the system is influenced by dust is detected, the double-acting air cylinder moves towards the first moving part, one end of the double-acting air cylinder extends out, the other end of the double-acting air cylinder contracts, the second moving part contracts to drive the movable block to contract, the double-acting air cylinder moves different distances due to different dust contents, the movable block moves different distances towards the inside of the device body, the number of annular grooves on the movable block leaking out of the device body is different, and the influence degree of the dust on the system body can be accurately represented; in an initial state, the guide groove is blocked by the movable ball in the guide groove of the sliding block, when the first movable part extends out, the movable ball is driven to stop the guide groove, so that airflow can circulate, and the airflow is output through the output pipe to be cleaned, so that the self-cleaning capability of the device is improved, the hidden trouble of faults is eliminated, and the service life of the device is prolonged;
2) According to the device, the driving motor, the lead screw and the moving block of the moving part, the mounting plate, the dismounting part, the multi-color lamp and the lampshade of the display part are arranged, the system fault is predicted according to different colors of the multi-color lamp, the multi-color lamp is driven to move by the driving motor, and then different parts of the high-performance computing cluster system are connected to different regions of the system body in the device body by matching the four regions outside the device body with the four regions of the system chamber, so that the fault region can be predicted more accurately, the prediction accuracy is improved, and the detection effect is improved;
3) According to the invention, the double-acting cylinder can move for different distances through different dust contents, and the higher the content is, the longer the distance from the double-acting cylinder to the first movable part is, the farther the movable ball is from the outlet of the diversion trench is, the circulation of the air flow can be controlled along with the dust content, so that the cleaning accuracy is improved, the cleaning time is saved, and the cleaning efficiency is improved;
4) The system chamber is divided into four areas, different dust contents in the four areas of the system chamber can be classified while system fault prediction is carried out, and the dust fault prediction component is driven to move to drive the dust fault prediction component to carry out different working states in different areas in the moving process of the driving motor.
Drawings
FIG. 1 is a schematic perspective view of the present invention;
FIG. 2 is a schematic front view of the present invention;
FIG. 3 isbase:Sub>A cross-sectional view taken at A-A of FIG. 2;
FIG. 4 is a cross-sectional view taken at B-B of FIG. 2;
FIG. 5 is an enlarged view of a portion C of FIG. 1;
FIG. 6 is an enlarged view of a portion D of FIG. 2;
FIG. 7 is an enlarged view of a portion E of FIG. 3;
fig. 8 is a system block diagram of the system body.
Attached, 110, apparatus body; 120. a controller; 130. a monitoring room; 140. a system room; 150. a partition plate; 210. a drive motor; 220. a lead screw; 230. a moving block; 240. mounting a plate; 250. disassembling the parts; 260. a plurality of colored lamps; 270. a lamp shade; 280. a connecting rod; 310. a double-acting cylinder; 320. a first movable member; 330. a second movable member; 340. a movable block; 350. a ring groove; 360. a guide rail; 370. a diversion trench; 410. a middle turning ball; 420. a pressure pump; 430. a slider; 440. an elastic member; 450. a movable ball; 460. a connecting pipe; 470. an output pipe; 480. a connecting member; 490. an air inlet pipe; 510. an ash discharge groove; 520. a system body; 530. connecting wires; 540. an interface; 610. an air outlet pipe; 620. an air outlet cover; 630. an air intake hood; 640. a dust hole; 650. an analysis module; 660. an acquisition module; 710. a dust sensing module; 720. a strip.
Detailed Description
The invention is further described with reference to the accompanying drawings and specific embodiments.
Example 1: as shown in fig. 1 to 8, a failure prediction apparatus for a high performance computing cluster system includes an apparatus body 110, a controller 120 and a failure prediction mechanism disposed inside the apparatus body 110;
the failure prediction mechanism comprises a dust failure prediction component and a system failure prediction component, the interior of the device body 110 is divided into a monitoring chamber 130 and a system chamber 140 by a partition plate 150, and the system chamber 140 is divided into four areas;
the system fault prediction component comprises a moving part and a display part, wherein the display part is arranged on the moving part, the moving part comprises a driving motor 210, a lead screw 220 and a moving block 230, the driving motor 210 is arranged on one side inside the monitoring room 130, the lead screw 220 is connected to the output end of the driving motor 210, the moving block 230 is movably arranged on the lead screw 220, the display part comprises a mounting plate 240, a dismounting part 250, a multi-color lamp 260 and a lampshade 270, the mounting plate 240 is connected to the moving block 230 through a connecting rod 280, the multi-color lamp 260 and the lampshade 270 are arranged on the mounting plate 240 through the dismounting part 250, the multi-color lamp 260 is positioned inside the lampshade 270, three strips 720 are arranged on the outer side of the device body 110, the outer side of the device body 110 is divided into four areas, and the four areas correspond to the positions of the four areas of the system room 140;
the system fault is predicted through different colors of the multi-color lamp 260, the multi-color lamp 260 is driven by the driving motor 210 to move, then four areas outside the device body 110 are matched with the four areas of the system chamber 140, different parts of the high-performance computing cluster system are connected to different areas of the system body 520 in the device body 110, the fault area can be predicted more accurately, the prediction accuracy is improved, and the detection effect is improved.
The dust fault prediction component comprises a driving part and a cleaning part, the driving part comprises a double-acting cylinder 310, a first moving part 320, a second moving part 330 and a movable block 340, the double-acting cylinder 310 is installed inside the monitoring chamber 130, the first moving part 320 is connected to one end of the double-acting cylinder 310, the second moving part 330 is connected to the other end of the double-acting cylinder 310, one end, far away from the double-acting cylinder 310, of the second moving part 330 is connected to the movable block 340, a plurality of annular grooves 350 are formed in the movable block 340, an installation groove is formed in one side of the device body 110, and the movable block 340 extends out of the device body 110 through the installation groove;
the cleaning part comprises a transfer ball 410, a pressure pump 420, an air inlet pipe 490, a sliding block 430, elastic pieces 440, a movable ball 450, a connecting pipe 460 and an output pipe 470, the transfer ball 410 is a hollow ball, the transfer ball 410 is connected to the movable block 230 through a connecting piece 480, a guide rail 360 is arranged on the partition plate 150, the sliding block 430 is movably arranged in the guide rail 360, a guide groove 370 is arranged inside the sliding block 430, the guide groove 370 is a circular truncated cone-shaped structure, one end of each elastic piece 440 is connected to the inner wall of the guide groove 370, the other end of each elastic piece 440 is connected to the movable ball 450, one end of the connecting pipe 460 is communicated with the transfer ball 410, the other end of the connecting pipe 460 is communicated with the guide groove 370, the air inlet pipe 490 is arranged on the transfer ball 410 and communicated with the air inlet pipe 490, the pressure pump 420 is arranged on the air inlet pipe 490, and the output pipe 470 is arranged on the sliding block 430 and is communicated with the guide groove 370;
by arranging the double-acting cylinder 310, the first moving part 320, the second moving part 330 and the moving block 340, matching the transfer ball 410, the pressure pump 420, the air inlet pipe 490, the sliding block 430, the elastic part 440, the moving ball 450, the connecting pipe 460 and the output pipe 470, after detecting that dust is about to affect the normal operation of the system, the double-acting cylinder 310 moves towards the first moving part 320, one end of the double-acting cylinder 310 extends out, the other end of the double-acting cylinder 310 contracts, the second moving part 330 contracts to drive the moving block 340 to contract, the double-acting cylinder 310 moves different distances due to different dust contents, the moving block 340 moves different distances towards the inside of the device body 110, the amount of the annular grooves on the moving block 340 leaking out of the device body 110 is also different, and the different degrees of the dust affecting the system body 520 can be accurately represented; in an initial state, the guide groove 370 is blocked by the movable ball 450 in the guide groove 370 of the slider 430, and when the first movable element 320 extends out, the movable ball 450 is driven to stop the guide groove 370, so that the airflow can circulate, and the airflow is output through the output pipe 470 for cleaning, thereby improving the self-cleaning capability of the device, eliminating the hidden trouble of the fault and prolonging the service life of the device;
the double-acting cylinder 310 can move for different distances through different dust contents, and the longer the content is, the longer the distance from the double-acting cylinder 310 to the first moving part 320 is, the farther the moving ball 450 is from the outlet of the guide groove 370, so that the flow rate of the airflow can be controlled along with the dust content, the cleaning accuracy is improved, the cleaning time is saved, and the cleaning efficiency is improved;
the system chamber 140 is divided into four regions, and when the system fault is predicted, the different dust contents in the four regions of the system chamber 140 can be classified, and in the process of moving the driving motor 210, the dust fault prediction component is also driven to move, so that the dust fault prediction component is driven to perform different working states in different regions.
In the present embodiment, the movement of the double acting cylinder 310 in one direction is 5cm, that is, the double acting cylinder 310 can move 5cm in the direction of the first moving part 320 and 5cm in the direction of the second moving part 330, the movable block 340 is provided with four ring grooves 350, one ring groove 350 is provided every 1cm of the side surface of the movable block 340, which leaks out of the apparatus body 110, at the initial position of the double acting cylinder 310, the movable block 340 leaks out of the four ring grooves 350, when the double acting cylinder 310 moves in the direction of the first moving part 320, the number of the leaked ring grooves 350 is continuously reduced, when the leaked four ring grooves 350 represent a failure probability without dust influence, when three ring grooves 350 are leaked out, a failure probability with light dust influence is represented, when two ring grooves 350 are leaked out, a failure probability with medium light dust influence is represented, when one ring groove 350 is leaked out, a failure probability with medium dust influence is represented, and when the leaked out of ring groove 350 represents a failure probability with heavy dust influence.
The bottom of the apparatus body 110 is provided with an ash discharge chute 510, and the ash discharge chute 510 is located in the system chamber 140.
A system body 520 and a dust sensing module 710 are arranged in the system chamber 140, the system body 520 corresponds to four regions of the system chamber 140, and the dust sensing module 710 is positioned at the top of the system chamber 140 and corresponds to the system body 520;
the dust sensing module 710 is provided with an analysis submodule for analyzing the dust content and performing various operations.
The system chamber 140 is provided with a connecting assembly, the connecting assembly includes connecting wires 530 and interfaces 540, the four interfaces 540 are disposed on the outer side of the apparatus body 110, one connecting wire 530 is connected to each interface 540, and one end of each connecting wire 530 far away from the interface 540 is connected to the system body 520.
An air outlet pipe 610 is arranged at one end of the output pipe 470 far away from the sliding block 430, and an air outlet cover 620 is arranged at one end of the air outlet pipe 610 far away from the output pipe 470;
wherein, the air-out cover 620 can increase the air-out area, improves the efficiency of removing dust.
The device body 110 is provided with an air inlet cover 630, the air inlet cover 630 is communicated with an air inlet pipe 490, and the top of the air inlet cover 630 is provided with a plurality of dustproof holes 640;
wherein the external air is drawn into the inlet pipe 490 through the inlet hood 630 and the dust hole 640 can effectively prevent the dust from entering.
The system body 520 comprises an analysis module 650 and an acquisition module 660, the acquisition module 660 is used for acquiring the chip working condition and the power output power of each service node in the high-performance computing cluster system, the analysis module 650 is used for analyzing the working state of each service node according to the chip working condition and the power output power and transmitting different information to the controller 120 according to the working state, and the controller 120 respectively controls the multi-color lamp 260, the driving motor 210, the double-acting cylinder 310 and the dust sensing module 710.
Example 2: a use method of a fault prediction device of a high-performance computing cluster system comprises the following steps:
s1: preparing:
connecting the interface 540 on the device body 110 with the high-performance computing cluster system, so that the system body in the system room 140 is connected with the high-performance computing cluster system for information transmission;
s2: and (3) system fault prediction:
in the analysis module 650, three thresholds are set according to the working state of each service node, which are a first threshold, a second threshold and a third threshold, respectively, where the first threshold represents that a low-probability fault occurs, the second threshold represents that a medium-probability fault occurs, the third threshold represents that a high-probability fault occurs, and the system body 520 corresponds to four regions of the system room 140, the system bodies 520 are respectively connected to each component of the high-performance computing cluster system, and represent different regions where a fault occurs, the driving motor 210 receives a signal from the controller 120 to drive the multi-color lamps 260 to move through four regions outside the device body 110, respectively, when one region passes through, the controller 120 controls the multi-color lamps 260 to display different colors according to different information transmitted by the system bodies 520, and the color lamps are respectively red, orange, yellow and green, the working state of each service node representing the region is lower than the first threshold, the working state of each service node representing the region is between the first threshold and the second threshold, the working state of each service node representing the yellow is between the second threshold and the third threshold, and the working state of each service node representing the third threshold is greater than the third threshold;
in this embodiment, the working state of each service node is converted into a failure rate, and at this time, the first threshold is set to be 5% of the failure rate, the second threshold is set to be 15% of the failure rate, and the third threshold is set to be 50% of the failure rate, which is specifically as follows:
rank of Failure rate Color of multi-color lamp Processing priority
Fourth grade ≤5% Green colour Without the need for treatment
Third level of classification 5-15% of the total weight of the composition is 15% Orange colour Without priority processing
Second grade 15-50% of the total weight of the composition is 50% Yellow colour Prioritization
First level >50% Red colour Treat as soon as possible
As shown in the table above, when the failure rate is less than 5%, the color of the multi-color lamp is green at the fourth level, and the failure occurrence probability is low, so that the processing is not needed; when the failure rate is 5% -15%, the color of the colorful lamp is orange at the third level, the failure occurrence probability is small, and the processing is needed but the priority processing is not needed; when the failure rate is 15% -50%, the failure rate is in a second level, the failure occurrence probability is high, and priority treatment is needed; when the failure rate is greater than 50%, the failure rate is in the first level, and the failure occurrence probability is high at this time, and the failure needs to be processed as soon as possible.
S3: and (3) predicting dust faults:
the dust sensing module 710 detects the dust content in the system chamber 140 in real time, a plurality of threshold values of the dust content are set in the dust sensing module 710, when different threshold values are reached, different signals are sent to the controller 120, the controller 120 controls the double-acting cylinder 310 to move towards the first moving part 320, the different threshold values represent different moving distances, and at the moment, the quantity of the annular grooves 350 on the moving block 340 exposed on the device body 110 represents the grade of the dust content;
s4: self-cleaning:
in an initial state, the movable ball 450 blocks the guiding groove 370 inside the slider 430, and when the controller 120 controls the double-acting cylinder 310 to move towards the first moving member 320, the movable ball 450 no longer blocks the guiding groove 370, and the gas is led out from the output pipe 470 and the air outlet pipe 610, so that the system body 520 is cleaned and discharged from the ash discharge groove 510;
s5: and (4) ending:
the high-performance computing cluster system connected to the interface 540 of the device main body 110 is pulled out, and the operation of the prediction device is ended.
The above description is only an embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of changes or substitutions within the technical scope of the present invention, and therefore, the scope of the present invention should be determined by the scope of the claims.

Claims (9)

1. A high performance computing cluster system failure prediction device is characterized in that: comprises a device body (110), a controller (120) and a failure prediction mechanism which are arranged in the device body (110);
the failure prediction mechanism comprises a dust failure prediction component and a system failure prediction component, the interior of the device body (110) is divided into a monitoring chamber (130) and a system chamber (140) through a partition plate (150), and the system chamber (140) is divided into four areas;
the system fault prediction assembly comprises a moving part and a display part, wherein the display part is arranged on the moving part and is arranged in a monitoring room (130);
the dust fault prediction component comprises a driving part and a cleaning part, wherein the driving part comprises a double-acting cylinder (310), a first moving part (320), a second moving part (330) and a movable block (340), the double-acting cylinder (310) is installed inside a monitoring chamber (130), the first moving part (320) is connected to one end of the double-acting cylinder (310), the second moving part (330) is connected to the other end of the double-acting cylinder (310), one end, far away from the double-acting cylinder (310), of the second moving part (330) is connected to the movable block (340), the movable block (340) is provided with a plurality of annular grooves (350), a mounting groove is formed in one side of the device body (110), and the movable block (340) extends out of the inside of the device body (110) through the mounting groove;
the cleaning part comprises a transfer ball (410), a pressure pump (420), an air inlet pipe (490), a sliding block (430), an elastic part (440), a movable ball (450), a connecting pipe (460) and an output pipe (470), wherein the transfer ball (410) is a hollow sphere, the transfer ball (410) is connected to a moving block (230) of the moving part through a connecting part (480), a guide rail (360) is arranged on the partition plate (150), the sliding block (430) is movably arranged in the guide rail (360), a flow guide groove (370) is arranged inside the sliding block (430), the flow guide groove (370) is of a cone frustum structure, one end of the elastic part (440) is connected to the inner wall of the flow guide groove (370), the other end of the elastic part (440) is connected to the movable ball (450), one end of the connecting pipe (460) is communicated with the transfer ball (410), the other end of the connecting pipe (460) is communicated with the flow guide groove (370), the air inlet pipe (490) is arranged on the transfer ball (410) and communicated with the flow guide groove (370), the pressure pump (420) is arranged on the air inlet pipe (490), and the output pipe (470) is arranged on the sliding block (430) and communicated with the flow guide groove (370).
2. The apparatus of claim 1, wherein the apparatus for predicting faults of a high performance computing cluster system comprises: the movable portion comprises a driving motor (210), a lead screw (220) and a movable block (230), the driving motor (210) is installed on one side inside a monitoring chamber (130), the lead screw (220) is connected to a motor shaft of the driving motor (210), the movable block (230) is connected to the lead screw (220) through a nut in a spiral mode, the display portion comprises an installation plate (240), a dismounting piece (250), a multi-color lamp (260) and a lampshade (270), the installation plate (240) is connected to the movable block (230) through a connecting rod (280), the multi-color lamp (260) and the lampshade (270) are installed on the installation plate (240) through the dismounting piece (250), the multi-color lamp (260) is located inside the lampshade (270), three long strips (720) are arranged on the outer side of the device body (110), the outer side of the device body (110) is divided into four areas, and the four areas correspond to the four area positions of the system chamber (140).
3. The apparatus of claim 1, wherein the apparatus for predicting faults of a high performance computing cluster system comprises: an ash discharge groove (510) is formed in the bottom of the device body (110), and the ash discharge groove (510) is located in the system chamber (140).
4. The apparatus of claim 2, wherein the apparatus for predicting faults of a high performance computing cluster system comprises: the system room (140) is internally provided with a system body (520) and a dust sensing module (710), the system body (520) corresponds to four areas of the system room (140), and the dust sensing module (710) is positioned at the top of the system room (140) and corresponds to the system body (520).
5. The apparatus of claim 3, wherein the apparatus for predicting faults of a high performance computing cluster system comprises: the system room (140) is provided with a connecting assembly, the connecting assembly comprises connecting lines (530) and interfaces (540), the four interfaces (540) are arranged on the outer side face of the device body (110), each interface (540) is connected with one connecting line (530), and one end, far away from the interfaces (540), of each connecting line (530) is connected with the system body (520).
6. The apparatus of claim 4, wherein the apparatus for predicting faults of a high performance computing cluster system comprises: the one end that slider (430) was kept away from in output tube (470) is provided with out tuber pipe (610), the one end that output tube (470) was kept away from in play tuber pipe (610) is provided with out fan housing (620).
7. The apparatus of claim 5, wherein the apparatus for predicting faults of a high performance computing cluster system comprises: the device body (110) is provided with an air inlet cover (630), the air inlet cover (630) is communicated with an air inlet pipe (490), and the top of the air inlet cover (630) is provided with a plurality of dustproof holes (640).
8. The apparatus of claim 6, wherein the apparatus for predicting faults of a high performance computing cluster system comprises: the system body (520) comprises an analysis module (650) and an acquisition module (660), the acquisition module (660) is used for acquiring the chip working condition and the power output power of each service node in the high-performance computing cluster system, the analysis module (650) is used for analyzing the working state of each service node according to the chip working condition and the power output power and transmitting different information to the controller (120) according to the working state, and the controller (120) controls the multi-color lamp (260), the driving motor (210), the double-acting cylinder (310) and the dust sensing module (710) respectively.
9. The use method of the failure prediction device of the high performance computing cluster system according to any of claims 1 to 8, characterized in that: the method comprises the following steps:
s1: preparing:
connecting an interface (540) on the device body (110) with the high-performance computing cluster system to connect the system body in the system room (140) with the high-performance computing cluster system for information transmission;
s2: and (3) system fault prediction:
setting three thresholds in an analysis module (650) according to the working state of each service node, wherein the three thresholds are a first threshold, a second threshold and a third threshold respectively, the first threshold represents that a fault occurs at a low probability, the second threshold represents that a fault occurs at a medium probability, the third threshold represents that a fault occurs at a high probability, a system body (520) corresponds to four regions of a system room (140), the system bodies (520) are respectively connected with each component of the high-performance computing cluster system and represent different regions where the fault occurs, a driving motor (210) receives a signal of a controller (120) to drive a multi-color lamp (260) to move and respectively pass through four regions outside the device body (110), when the system bodies (520) pass through one region, the controller (120) controls the multi-color lamp (260) to display different colors according to different information transmitted by the system bodies (520), the colors are respectively red, orange, yellow and green, the working state of each service node representing the region is lower than the first threshold, the working state of each service node representing the region is higher than the second threshold, and the working state of each service node representing the third threshold, and the working state of the service node representing the third service node is higher than the first threshold;
s3: and (3) dust fault prediction:
the dust sensing module (710) detects the dust content in the system chamber (140) in real time, a plurality of threshold values of the dust content are set in the dust sensing module (710), when different threshold values are reached, different signals are sent to the controller (120), the controller (120) controls the double-acting cylinder (310) to move towards the first moving part (320), the different threshold values represent different moving distances, and at the moment, the quantity of the annular groove (350) on the moving block (340) exposed on the device body (110) represents the grade of the dust content;
s4: self-cleaning:
in an initial state, the movable ball (450) blocks the diversion trench (370) in the sliding block (430), when the controller (120) controls the double-acting cylinder (310) to move towards the first movable piece (320), the movable ball (450) does not block the diversion trench (370), gas is guided out from the output pipe (470) and the air outlet pipe (610), the system body (520) is cleaned, and the gas is discharged from the ash discharge trench (510);
s5: and (4) ending:
the high-performance computing cluster system connected to the interface (540) of the device body (110) is pulled out, and the operation of the prediction device is ended.
CN202211493434.0A 2022-11-25 2022-11-25 High-performance computing cluster system fault prediction device and application method thereof Active CN115709200B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211493434.0A CN115709200B (en) 2022-11-25 2022-11-25 High-performance computing cluster system fault prediction device and application method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211493434.0A CN115709200B (en) 2022-11-25 2022-11-25 High-performance computing cluster system fault prediction device and application method thereof

Publications (2)

Publication Number Publication Date
CN115709200A true CN115709200A (en) 2023-02-24
CN115709200B CN115709200B (en) 2024-06-14

Family

ID=85234798

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211493434.0A Active CN115709200B (en) 2022-11-25 2022-11-25 High-performance computing cluster system fault prediction device and application method thereof

Country Status (1)

Country Link
CN (1) CN115709200B (en)

Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20150135830A (en) * 2014-05-26 2015-12-04 전남대학교산학협력단 Air pulsing controllers for dust collector of petrochemical plants
US20170097863A1 (en) * 2015-10-05 2017-04-06 Fujitsu Limited Detection method and information processing device
JP2018202380A (en) * 2017-12-14 2018-12-27 株式会社セキタ Dust removal device
CN109482573A (en) * 2017-12-29 2019-03-19 国网浙江武义县供电有限公司 A kind of intelligence closed computer host dust pelletizing system and method
CN110694386A (en) * 2019-10-14 2020-01-17 安徽建筑大学 Electric automation control's upset dust type electric appliance cabinet
CN111538396A (en) * 2020-05-07 2020-08-14 杭州浮瓦科技有限公司 Computer mainboard capable of regularly detecting dust condition
CN111530783A (en) * 2020-01-10 2020-08-14 爱景节能科技(上海)有限公司 Automatic purging device of air-cooled screw air compressor and control device thereof
CN111570402A (en) * 2020-06-22 2020-08-25 江苏吉丰自动化设备有限公司 Bidirectional negative pressure type dust remover for full-automatic horn production line
CN111966177A (en) * 2020-08-14 2020-11-20 广州驰创科技有限公司 Big data intelligent processing is with storage hard disk structure
CN212324114U (en) * 2020-04-01 2021-01-08 丽水蓝鸟网络科技有限公司 A nothing hinders detection device for net twine fault detection
CN112960169A (en) * 2021-03-31 2021-06-15 成渝钒钛科技有限公司 High-speed wire bundling machine fault alarm device and using method thereof
CN113641551A (en) * 2021-07-08 2021-11-12 娄底职业技术学院 Computer fault monitoring system based on internet
CN113941534A (en) * 2021-09-16 2022-01-18 泰州市光明电子材料有限公司 Electrochemical detection device with dust removal mechanism for plastic chip manufacturing
WO2022017808A1 (en) * 2020-07-23 2022-01-27 Zf Cv Systems Global Gmbh Cleaning device, sensor cleaning module, vehicle, and method for operating a cleaning device
CN216728582U (en) * 2021-11-03 2022-06-14 青岛双合电力工程有限公司 Online dust collector of high-voltage electrical equipment
CN115254702A (en) * 2022-07-22 2022-11-01 苏州浪潮智能科技有限公司 Automatic server dust removal system and method
KR102464389B1 (en) * 2021-10-19 2022-11-09 주식회사 원어스 Air shower apparatus for positioning beside house component

Patent Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20150135830A (en) * 2014-05-26 2015-12-04 전남대학교산학협력단 Air pulsing controllers for dust collector of petrochemical plants
US20170097863A1 (en) * 2015-10-05 2017-04-06 Fujitsu Limited Detection method and information processing device
JP2018202380A (en) * 2017-12-14 2018-12-27 株式会社セキタ Dust removal device
CN109482573A (en) * 2017-12-29 2019-03-19 国网浙江武义县供电有限公司 A kind of intelligence closed computer host dust pelletizing system and method
CN110694386A (en) * 2019-10-14 2020-01-17 安徽建筑大学 Electric automation control's upset dust type electric appliance cabinet
CN111530783A (en) * 2020-01-10 2020-08-14 爱景节能科技(上海)有限公司 Automatic purging device of air-cooled screw air compressor and control device thereof
CN212324114U (en) * 2020-04-01 2021-01-08 丽水蓝鸟网络科技有限公司 A nothing hinders detection device for net twine fault detection
CN111538396A (en) * 2020-05-07 2020-08-14 杭州浮瓦科技有限公司 Computer mainboard capable of regularly detecting dust condition
CN111570402A (en) * 2020-06-22 2020-08-25 江苏吉丰自动化设备有限公司 Bidirectional negative pressure type dust remover for full-automatic horn production line
WO2022017808A1 (en) * 2020-07-23 2022-01-27 Zf Cv Systems Global Gmbh Cleaning device, sensor cleaning module, vehicle, and method for operating a cleaning device
CN111966177A (en) * 2020-08-14 2020-11-20 广州驰创科技有限公司 Big data intelligent processing is with storage hard disk structure
CN112960169A (en) * 2021-03-31 2021-06-15 成渝钒钛科技有限公司 High-speed wire bundling machine fault alarm device and using method thereof
CN113641551A (en) * 2021-07-08 2021-11-12 娄底职业技术学院 Computer fault monitoring system based on internet
CN113941534A (en) * 2021-09-16 2022-01-18 泰州市光明电子材料有限公司 Electrochemical detection device with dust removal mechanism for plastic chip manufacturing
KR102464389B1 (en) * 2021-10-19 2022-11-09 주식회사 원어스 Air shower apparatus for positioning beside house component
CN216728582U (en) * 2021-11-03 2022-06-14 青岛双合电力工程有限公司 Online dust collector of high-voltage electrical equipment
CN115254702A (en) * 2022-07-22 2022-11-01 苏州浪潮智能科技有限公司 Automatic server dust removal system and method

Also Published As

Publication number Publication date
CN115709200B (en) 2024-06-14

Similar Documents

Publication Publication Date Title
CN110220081A (en) One kind being applied to indoor smart suspension equipped system
CN109974189A (en) A kind of control method of indoor air purification and new wind flow field
CN115709200A (en) High-performance computing cluster system fault prediction device and use method thereof
WO2021114467A1 (en) Multi-station turning tool bit milling fixture system capable of intelligently detecting clamping force
CN106200545B (en) The processing method of large-scale main transformer forced oil circulation water cooler water-flow signal, oil stream signal
CN207769332U (en) A kind of filter element of water purifier replaces alarm and cleaning device
CN110657273B (en) Heat dissipation type solenoid valve that interference killing feature is strong
CN219179814U (en) Cabinet temperature control device and monitoring system
CN104225994B (en) The real-time fine filtering device of high temperature heat conductive oil
CN108918547B (en) Range hood cleanliness detection device
CN108635988B (en) Control method of ventilation and dust removal system with improved control strategy
CN213161325U (en) Environmental protection island intelligence dust pelletizing system with optimization control device
CN213300430U (en) Dirty stifled fault detection device of air-cooled cooling water set
CN114459838A (en) Air sampler for indoor environment detection
CN208791181U (en) Gas control valve open state monitoring device in PSA oxygenerator
CN111122819A (en) Water quality on-line detection device
CN108362137B (en) A kind of rectangular cooling tower and its control method of flowing over
CN207085582U (en) A kind of air purifier
CN217155757U (en) Pneumatic control valve island for high-low pressure air tightness test
CN214471502U (en) Novel wind pressure warning device of diesel locomotive
CN220730161U (en) Carbon dioxide concentration transmitter capable of reducing dust interference
CN104029127A (en) Dust-free stone-carving milling method
CN218358067U (en) A environment measuring device for dust environment
CN220816661U (en) Unidirectional flow control valve body
CN214791671U (en) Air quality monitoring and purifying system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant