CN116627770A - Network card temperature prediction method and device, computer equipment and storage medium - Google Patents
Network card temperature prediction method and device, computer equipment and storage medium Download PDFInfo
- Publication number
- CN116627770A CN116627770A CN202310878390.1A CN202310878390A CN116627770A CN 116627770 A CN116627770 A CN 116627770A CN 202310878390 A CN202310878390 A CN 202310878390A CN 116627770 A CN116627770 A CN 116627770A
- Authority
- CN
- China
- Prior art keywords
- network card
- temperature
- target network
- target
- preset value
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 82
- 238000003860 storage Methods 0.000 title claims abstract description 24
- 238000012549 training Methods 0.000 claims description 80
- 230000003068 static effect Effects 0.000 claims description 55
- 238000012544 monitoring process Methods 0.000 claims description 49
- 230000015654 memory Effects 0.000 claims description 27
- 230000006870 function Effects 0.000 claims description 25
- 238000012360 testing method Methods 0.000 claims description 15
- 238000002372 labelling Methods 0.000 claims description 10
- 230000008569 process Effects 0.000 claims description 8
- 238000012790 confirmation Methods 0.000 claims description 3
- 239000010410 layer Substances 0.000 description 83
- 238000001816 cooling Methods 0.000 description 33
- 230000009286 beneficial effect Effects 0.000 description 12
- 230000004913 activation Effects 0.000 description 9
- 238000012545 processing Methods 0.000 description 7
- 238000004891 communication Methods 0.000 description 6
- 239000000306 component Substances 0.000 description 6
- 238000013528 artificial neural network Methods 0.000 description 5
- 230000008901 benefit Effects 0.000 description 4
- 238000009826 distribution Methods 0.000 description 4
- 238000011161 development Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000013021 overheating Methods 0.000 description 3
- 238000012800 visualization Methods 0.000 description 3
- 101100076239 Drosophila melanogaster Mctp gene Proteins 0.000 description 2
- 230000009471 action Effects 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000011478 gradient descent method Methods 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 230000008447 perception Effects 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 239000007787 solid Substances 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 206010063385 Intellectualisation Diseases 0.000 description 1
- 241001290266 Sciaenops ocellatus Species 0.000 description 1
- 238000013529 biological neural network Methods 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 239000008358 core component Substances 0.000 description 1
- 238000005520 cutting process Methods 0.000 description 1
- 230000008034 disappearance Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 230000003862 health status Effects 0.000 description 1
- 230000017525 heat dissipation Effects 0.000 description 1
- 239000011229 interlayer Substances 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 230000002035 prolonged effect Effects 0.000 description 1
- 230000000630 rising effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3058—Monitoring arrangements for monitoring environmental properties or parameters of the computing system or of the computing system component, e.g. monitoring of power, currents, temperature, humidity, position, vibrations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/30—Authentication, i.e. establishing the identity or authorisation of security principals
- G06F21/31—User authentication
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Computer Security & Cryptography (AREA)
- Quality & Reliability (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Mathematical Physics (AREA)
- Artificial Intelligence (AREA)
- Computer Hardware Design (AREA)
- Computer And Data Communications (AREA)
Abstract
The invention relates to the technical field of servers and discloses a network card temperature prediction method, a device, computer equipment and a storage medium, wherein the network card temperature prediction method comprises the steps of creating a target object for a target network card, starting a target timer for the target network card, wherein the timer is used for polling dynamic information of the target network card, and the dynamic information comprises temperature data of the target network card; judging whether the basic input and output system is self-checked; if so, acquiring temperature data of the target network card, wherein the temperature data comprise a plurality of temperatures respectively corresponding to a plurality of preset time intervals; the target network card is a server network card; loading a trained multi-layer perceptron model; inputting temperature data into a trained multi-layer sensor model to obtain a predicted temperature; and determining the alarm state of the target network card according to the predicted temperature. According to the invention, the temperature of the network card is predicted by the multi-layer perceptron model, the accuracy is high, the alarm state of the target network card is obtained according to the predicted temperature, and the alarm requirement is met.
Description
Technical Field
The invention relates to the technical field of servers, in particular to a network card temperature prediction method, a network card temperature prediction device, computer equipment and a storage medium.
Background
In the information age, with the development and maturity of the internet of things technology, the technical requirements for high-speed data operation, high-efficiency data processing and data security storage are becoming huge, the requirements on server technology are also becoming higher and higher, the high-performance operation server plays an indispensable role in the information age, and the optimization of the high-performance operation server is very important.
The baseboard management controller (Baseboard Management Controller, BMC) is a core unit of the server, is a main board processor for managing the server on an advanced reduced instruction set machine (Advanced RISC Machine, ARM) architecture, and the open baseboard management controller (open BMC) is an open source software architecture for constructing a special Linux system image of the complete BMC. Compared with the traditional BMC development, the open BMC has the advantages of modularized programming, modularized debugging, asynchronous scheme management and the like, and in the field of servers, the BMC plays a great role in detecting the overall performance, power consumption, journals, monitoring and other health states of the servers as a core component, and along with the progress of science and technology, new requirements are put forward on the aspects of computation density, resource scheduling, autonomous controllability and the like of a data center, so that network card monitoring and management are particularly important in the application of the servers in development.
Conventional open BMC network card monitoring typically includes a network card driver, a custom protocol library program, a network card monitoring module program, a system monitoring tool, log records, and provides an upper layer redfish interface for the network. The traditional open BMC network card monitoring method has low accuracy and poor real-time performance, and cannot predict the network card temperature according to the real-time temperature for complex system and environment changes, so that the alarm requirement can not be met.
Disclosure of Invention
In view of the above, the present invention provides a method, apparatus, computer device and storage medium for predicting the temperature of a network card, so as to solve the problem that the temperature of the network card cannot be predicted according to the real-time temperature.
In a first aspect, the present invention provides a method for predicting a network card temperature, including: creating a target object for the target network card, wherein the target object is used for judging the type of the target network card and displaying the static information of the target network card so as to identify the target network card according to the type and the static information of the target network card; starting a target timer for the target network card, wherein the timer is used for polling dynamic information of the target network card, and the dynamic information comprises temperature data of the target network card; judging whether the basic input and output system is self-checked; if the self-checking of the basic input and output system is finished, acquiring temperature data of the target network card, wherein the temperature data comprises a plurality of temperatures corresponding to a plurality of preset time intervals respectively; the target network card is a server network card; loading a trained multi-layer perceptron model; inputting temperature data into a trained multi-layer sensor model to obtain a predicted temperature; and determining the alarm state of the target network card according to the predicted temperature.
The beneficial effects are that: the method comprises the steps of firstly creating a target object for the target network card, wherein the target object is used for judging the type of the target network card and displaying the static information of the target network card, so that the identity of the target network card is identified according to the type and the static information of the target network card, and the clearly obtained predicted temperature is the temperature of which network card. And starting a target timer for the target network card, wherein the timer is used for polling the dynamic information of the target network card, the dynamic information comprises the temperature data of the target network card, and the temperature data of the target network card is detected once at every preset time interval through the timer. Judging whether the basic input and output system is self-checked, and indicating that the system can acquire temperature data in the dynamic information after the system finishes reporting the dynamic information of the target network card.
The method comprises the steps of loading a trained multi-layer sensor model, obtaining temperature data of a target network card, inputting the temperature data of the target network card into the trained multi-layer sensor model to obtain a predicted temperature, predicting the temperature of a server network card, and determining the alarm state of the target network card according to the predicted temperature.
In an alternative embodiment, before creating the target object for the target network card, the method further includes: creating an input/output port, a connection port and a service name of the target network card; judging whether the target server is started; and if the target server is started, acquiring asset information of the basic input/output system.
The beneficial effects are that: the input/output port, the connection port and the service name of the target network card are created to create a monitoring process capable of realizing asynchronous communication, information display of the target network card can be realized on the service, information display of the monitored target network card can be realized, and visualization of monitoring information can be realized.
In an alternative embodiment, after determining whether the bios is self-checking complete, the method further comprises: and creating a service name for determining the service for monitoring the temperature of the target network card according to the service name.
The beneficial effects are that: a service name is created for the service, one server corresponds to a plurality of services, and the service name is created for the service for monitoring the temperature of the target network card so as to find the service for monitoring the temperature of the target network card.
In an alternative embodiment, after creating the service name, the method further comprises: and adding an interface on the service for monitoring the temperature of the target network card, wherein the interface is used for acquiring static information and dynamic information of the target network card.
The beneficial effects are that: and adding an interface on a service for monitoring the temperature of the target network card, wherein the interface is connected with the static information and the dynamic information of the target network card, so that the static information and the dynamic information of the target network card are acquired through the interface when the static information and the dynamic information of the target network card are required to be acquired.
In an alternative embodiment, the static information of the target network card includes version information of the network card, asset information of the network card, connection status information of the network card, and a network media access control address.
In an alternative embodiment, the predicted temperature is compared with a first preset value and a second preset value, and an alarm state of the predicted temperature is determined according to the comparison result.
The beneficial effects are that: after the predicted temperature is obtained, the predicted temperature is compared with a first preset value and a second preset value, so that an alarm state of the predicted temperature is obtained according to a comparison result, and the temperature of the network card is alarmed to take countermeasures.
In an alternative embodiment, if the predicted temperature is greater than or equal to a first preset value, determining that the alarm state of the predicted temperature is a first alarm state; if the predicted temperature is greater than or equal to the second preset value and less than the first preset value, judging that the alarm state of the predicted temperature is the second alarm state; if the predicted temperature is smaller than the second preset value, judging that the alarm state of the predicted temperature is a third alarm state; the first preset value is greater than the second preset value.
In an alternative embodiment, the first alarm state has an alarm level greater than the alarm level of the second alarm state, and the second alarm state has an alarm level greater than the alarm level of the third alarm state.
The beneficial effects are that: if the predicted temperature of the target network card is greater than or equal to the first preset value, the temperature of the target network card is excessively high, so that the alarm state of the predicted temperature is judged to be the first alarm state, and the temperature of the target network card is warned to be excessively high, and cooling treatment is needed. If the predicted temperature of the target network card is greater than or equal to the second preset value and less than the first preset value, the temperature of the target network card is higher, so that the alarm state of the predicted temperature is judged to be a second alarm state, and the target network card is warned to be higher in temperature, and cooling treatment is needed. If the predicted temperature of the target network card is smaller than the second preset value, the predicted temperature of the target network card is not high, so that the alarm state of the predicted temperature is judged to be the third alarm state, and the target network card is not required to be alarmed.
In an alternative embodiment, the process of training a multi-layer perceptron model includes: acquiring a temperature training data set of the network card, wherein the temperature training data set comprises the temperature and time of the network card; labeling the temperature training data set; dividing the temperature training data set with the marked temperature into a training set and a testing set; constructing a multi-layer perceptron model, wherein the multi-layer perceptron model comprises an input layer, a plurality of hidden layers and an output layer; inputting the training set into a multi-layer perceptron model, and training the multi-layer perceptron model; inputting the test set into a multi-layer perceptron model, and evaluating the multi-layer perceptron model; and saving the trained multi-layer perceptron model.
In an alternative embodiment, acquiring the network card temperature and time of the network card; writing the network card temperature and time of the network card into a data file; the data file is taken as a temperature training data set.
The beneficial effects are that: the data file is used to provide data support for model training, so that the network card temperature and time of the network card are written into the data file to take the data file as a temperature training data set.
In an alternative embodiment, a corresponding label is assigned to each network card temperature in the temperature training dataset.
In an alternative embodiment, if the temperature of the network card is greater than or equal to a first preset value, a first label is allocated to the temperature of the network card; if the network card temperature is smaller than the first preset value and larger than or equal to the second preset value, a second label is distributed to the network card temperature; if the network card temperature is smaller than the second preset value, a third label is allocated to the network card temperature, and the first preset value is larger than the second preset value.
In an alternative embodiment, the alert level of the first tag is greater than the alert level of the second tag, and the alert level of the second tag is greater than the alert level of the third tag.
The beneficial effects are that: when the model is trained, corresponding labels are allocated to the temperatures of all network cards in the temperature training data set, if the temperatures of the network cards are larger than or equal to a first preset value, the temperature of the target network card is too high, the first labels are allocated to the temperatures of the network cards, so that the condition that the temperatures of the network cards are too high is marked, and cooling treatment is needed. If the temperature of the network card is smaller than the first preset value and larger than or equal to the second preset value, the temperature of the target network card is higher, and cooling treatment is needed. If the network card temperature is smaller than the second preset value, the fact that the temperature of the target network card is not high is indicated, and a third label is allocated to the network card temperature so as to mark that the network card temperature is not high, and warning is not needed.
In an alternative embodiment, a temperature training dataset is loaded; normalizing the temperature training data set; model algorithm parameters and loss functions are determined.
In an alternative embodiment, the test set is input into a multi-layer perceptron model to obtain a predicted result; comparing the predicted result with the actual result to obtain a comparison result; and evaluating the multi-layer perceptron model according to the comparison result.
The beneficial effects are that: the accuracy and stability of the multi-layer sensor model can be known by evaluating the multi-layer sensor model.
In an alternative embodiment, the multi-layer perceptron model employs a two-layer fully connected network comprising:
,
wherein ,for inputting the temperature of the network card, < >>For the first weight, ++>For the second weight, ++>For the first bias->For the second bias->To predict temperature.
The beneficial effects are that: in the formulaFor the first weight, ++>For the second weight, the magnitudes of the first weight and the second weight represent the magnitude of the likelihood, and the weights can be set manually or can be set automatically by a back propagation algorithm. By calculating eachAnd the weight parameter is used for knowing the overall performance of the whole neural network, so that the prediction result is more accurate, the two layers of fully-connected networks are used for continuous training, the weight is adjusted according to the difference value between the actual output and the expected output, whether the training output result is identical to the expected output result is judged, if so, the optimal weight is obtained, and the most accurate prediction temperature can be obtained according to the optimal weight.
In an alternative embodiment, the method further includes a process of monitoring the target network card; a monitoring target network card, comprising: creating an input/output port, a connection port and a service name of a monitoring target network card; judging whether the target server is started; if the target server is started, acquiring asset information of the basic input/output system; acquiring equipment information of a target network card; creating a target object for the target network card, wherein the target object is used for judging the type of the target network card and displaying the static information of the target network card so as to identify the target network card according to the type and the static information of the target network card; starting a target timer for the target network card, wherein the timer is used for polling dynamic information of the target network card, and the dynamic information comprises temperature data of the target network card; judging whether the basic input and output system is self-checked; if the self-checking of the basic input and output system is finished, creating a service name, wherein the service name is used for determining the service for monitoring the temperature of the target network card according to the service name; and adding an interface on the service for monitoring the temperature of the target network card, wherein the interface is used for acquiring static information and dynamic information of the target network card.
The beneficial effects are that: the invention creates the input/output port, the connection port and the service name of the monitoring target network card, and aims to create a monitoring process capable of realizing asynchronous communication, so that the information display of the target network card can be realized on the service, the information display of the monitored target network card can be realized, and the visualization of monitoring information can be realized. Judging whether the target server is started or not, and only starting can acquire asset information of the basic input/output system. And acquiring the device information of the target network card so as to distinguish the target device from other devices. Creating a target object for the target network card, wherein the target object is used for judging the type of the target network card and displaying the static information of the target network card so as to identify the target network card according to the type and the static information of the target network card, and the clearly obtained predicted temperature is the temperature of which network card. And starting a target timer for the target network card, wherein the timer is used for polling the dynamic information of the target network card, the dynamic information comprises the temperature data of the target network card, and the temperature data of the target network card is detected once at every preset time interval through the timer. Judging whether the basic input and output system is self-checked, and indicating that the system can acquire temperature data in the dynamic information after the system finishes reporting the dynamic information of the target network card. If the self-checking of the basic input and output system is finished, a service name is created and used for determining the service for monitoring the temperature of the target network card according to the service name, and if the self-checking of the basic input and output system is finished, the service name is created and used for determining the service for monitoring the temperature of the target network card according to the service name. The interface is added on the service for monitoring the temperature of the target network card, the interface is used for acquiring the static information and the dynamic information of the target network card, the interface is added on the service for monitoring the temperature of the target network card, and the interface is connected with the static information and the dynamic information of the target network card, so that the static information and the dynamic information of the target network card are acquired through the interface when the static information and the dynamic information of the target network card are required to be acquired.
In a second aspect, the present invention provides a network card temperature prediction apparatus, including: the target object creating module is used for creating a target object for the target network card, wherein the target object is used for judging the type of the target network card and displaying the static information of the target network card so as to identify the target network card according to the type and the static information of the target network card; the target timer starting module is used for starting a target timer for the target network card, wherein the timer is used for polling the dynamic information of the target network card, and the dynamic information comprises the temperature data of the target network card; the judging module is used for judging whether the basic input and output system is self-checked; the data acquisition module is used for acquiring temperature data of the target network card if the self-checking of the basic input/output system is completed, wherein the temperature data comprises a plurality of temperatures corresponding to a plurality of preset time intervals respectively; the target network card is a server network card; the model loading module is used for loading the trained multi-layer perceptron model; the prediction module is used for inputting the temperature data into the trained multi-layer sensor model to obtain a predicted temperature; and the alarm state confirmation module is used for determining the alarm state of the target network card according to the predicted temperature.
The beneficial effects are that: the invention firstly creates a target object for the target network card by the target object creation module, wherein the target object is used for judging the type of the target network card and displaying the static information of the target network card so as to identify the target network card according to the type and the static information of the target network card, and the clearly obtained predicted temperature is the temperature of which network card. The target timer starting module starts a target timer for the target network card, the timer is used for polling dynamic information of the target network card, the dynamic information comprises temperature data of the target network card, and the temperature data of the target network card is detected once at every preset time interval through the timer. The judging module judges whether the basic input and output system is self-checked, and the basic input and output system is self-checked, which indicates that the system finishes reporting the dynamic information of the target network card and can acquire the temperature data in the dynamic information. The data acquisition module acquires temperature data of the target network card, the model loading module is used for loading the trained multi-layer sensor model, the prediction module inputs the temperature data of the target network card into the trained multi-layer sensor model to obtain a predicted temperature, the temperature prediction of the server network card is realized, and the alarm state determining module obtains the alarm state of the target network card according to the predicted temperature.
In a third aspect, the present invention provides a computer device comprising: the memory and the processor are in communication connection, the memory stores computer instructions, and the processor executes the computer instructions, so that the network card temperature prediction method of the first aspect or any corresponding embodiment of the first aspect is executed.
In a fourth aspect, the present invention provides a computer readable storage medium having stored thereon computer instructions for causing a computer to execute the network card temperature prediction method according to the first aspect or any one of the embodiments corresponding thereto.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a network card temperature prediction method according to an embodiment of the invention;
FIG. 2 is a flow chart of another network card temperature prediction method according to an embodiment of the invention;
FIG. 3 is a flow chart of training a multi-layer perceptron model, in accordance with an embodiment of the present invention;
FIG. 4 is a flow chart of a fan adjustment method according to an embodiment of the present invention;
FIG. 5 is a schematic flow chart of a monitoring target network card according to an embodiment of the invention;
FIG. 6 is a block diagram of a network card temperature prediction apparatus according to an embodiment of the present invention;
fig. 7 is a schematic diagram of a hardware structure of a computer device according to an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The embodiment of the invention provides a network card temperature prediction method, which achieves the effect of temperature prediction by inputting temperature data into a trained multi-layer sensor model.
Too high a temperature of the server network card may cause some damage to the server, and first, the performance of the server may be reduced, and overheating of the server network card may cause a reduction in the operation efficiency of the server, and may cause a reduction in performance. High temperatures can affect the performance of server components (e.g., central processing units, memory and storage devices), resulting in slower response times and reduced throughput. Second, system instability and failure can result: overheating can destabilize the server and cause an unexpected system crash or shutdown. This may result in service interruption, data loss, and possible damage to hardware components. Again, hardware damage can occur and prolonged high temperatures can cause permanent damage to the hardware components of the server. The heat can degrade sensitive electronic circuitry, shorten the useful life of the assembly, and increase the likelihood of hardware failure. Finally, there is even a fire hazard, and in extreme cases, the combination of overheating of the server network card and other factors such as electrical failure or insufficient cooling system may cause a fire hazard to the server storage, which constitutes a significant risk to the safety of equipment and nearby personnel.
In the related art, the temperature of the server network card is monitored in real time to obtain the temperature of the server network card, when the temperature of the server network card is too high, a response measure is timely implemented, and the method is simple, but when the real-time temperature is monitored and the temperature is too high, the server network card is cooled, and the damage to the server is possibly caused by the too high temperature, so that the real-time performance of the method in the related art is poor, the alarm cannot be given under the condition that the temperature is about to be too high, and the damage to the server network card cannot be avoided.
According to an embodiment of the present invention, there is provided an embodiment of a network card temperature prediction method, it should be noted that the steps illustrated in the flowchart of the drawings may be performed in a computer system such as a set of computer executable instructions, and that although a logical order is illustrated in the flowchart, in some cases, the steps illustrated or described may be performed in an order different from that illustrated herein.
In this embodiment, a network card temperature prediction method is provided, which may be used for the server network card described above, and fig. 1 is a flowchart of a network card temperature prediction method according to an embodiment of the present invention, as shown in fig. 1, where the flowchart includes the following steps:
step S101, a target object is created for the target network card, and the target object is used for judging the type of the target network card and displaying the static information of the target network card so as to identify the target network card according to the type and the static information of the target network card.
The types of network cards include an open computing item (OpenComputeProject, OCP) network card and a peripheral component interconnect-express (PeripheralComponentInterconnectExpress, PCIe) network card.
In some alternative embodiments, the static information of the target network card includes: version information of the network card, network card asset information, network card connection status information, and network media access control address.
Judging the type of the target network card, acquiring static information of the target network card, and identifying the target network card according to the type and the static information of the target network card, so that the clearly obtained predicted temperature is the temperature of which network card.
Step S102, a target timer is started for the target network card, wherein the timer is used for polling the dynamic information of the target network card, and the dynamic information comprises the temperature data of the target network card.
The dynamic information includes: network card temperature, health status, light module temperature, etc.
Step S103, judging whether the basic input and output system is self-checked.
Step S104, if the self-checking of the basic input/output system is completed, acquiring temperature data of the target network card, wherein the temperature data comprises a plurality of temperatures respectively corresponding to a plurality of preset time intervals; the target network card is a server network card.
The method comprises the steps of continuously monitoring temperature data of a target network card, collecting the temperature data of the target network card once at preset time intervals, storing the collected temperature data, and further comprising time corresponding to the obtained temperature.
The temperature data of the target network card is collected at intervals of 1min, for example.
Step S105, loading the trained multi-layer sensor model.
And S106, inputting the temperature data into the trained multi-layer sensor model to obtain the predicted temperature.
And inputting a plurality of temperature data corresponding to the preset time intervals into the trained multi-layer sensor model to obtain the predicted temperatures of the temperature data corresponding to the preset time intervals.
Each predicted temperature is a temperature after a preset time of a plurality of temperature data corresponding to a plurality of preset time intervals, and the predicted temperature is a temperature after 5 minutes of a certain temperature data, for example.
Step S107, according to the predicted temperature, the alarm state of the target network card is determined.
In some alternative embodiments, the predicted temperature is compared with a first preset value and a second preset value, and an alarm state of the predicted temperature is determined according to the result of the comparison.
In some alternative embodiments, if the predicted temperature is greater than or equal to the first preset value, determining that the alarm state of the predicted temperature is the first alarm state; if the predicted temperature is greater than or equal to the second preset value and less than the first preset value, judging that the alarm state of the predicted temperature is the second alarm state; if the predicted temperature is smaller than the second preset value, judging that the alarm state of the predicted temperature is a third alarm state; the first preset value is greater than the second preset value.
In some alternative embodiments, the first alert state has a greater alert level than the second alert state, and the second alert state has a greater alert level than the third alert state.
If the predicted temperature of the target network card is greater than or equal to the first preset value, the temperature of the target network card is excessively high, so that the alarm state of the predicted temperature is judged to be the first alarm state, and the temperature of the target network card is warned to be excessively high, and cooling treatment is needed. If the predicted temperature of the target network card is greater than or equal to the second preset value and less than the first preset value, the temperature of the target network card is higher, so that the alarm state of the predicted temperature is judged to be a second alarm state, and the target network card is warned to be higher in temperature, and cooling treatment is needed. If the predicted temperature of the target network card is smaller than the second preset value, the predicted temperature of the target network card is not high, so that the alarm state of the predicted temperature is judged to be the third alarm state, and the target network card is not required to be alarmed.
According to the network card temperature prediction method provided by the embodiment, firstly, a target object is created for a target network card, the target object is used for judging the type of the target network card and displaying static information of the target network card, so that identity recognition is carried out on the target network card according to the type and the static information of the target network card, and the explicitly obtained predicted temperature is the temperature of which network card. And starting a target timer for the target network card, wherein the timer is used for polling the dynamic information of the target network card, the dynamic information comprises the temperature data of the target network card, and the temperature data of the target network card is detected once at every preset time interval through the timer. Judging whether the basic input and output system is self-checked, and indicating that the system can acquire temperature data in the dynamic information after the system finishes reporting the dynamic information of the target network card. The method comprises the steps of loading a trained multi-layer sensor model, obtaining temperature data of a target network card, inputting the temperature data of the target network card into the trained multi-layer sensor model to obtain a predicted temperature, predicting the temperature of a server network card, and determining the alarm state of the target network card according to the predicted temperature.
The prediction temperature based on the neural network can reduce the cost of manual intervention and maintenance through automation and intellectualization, and improve the efficiency and benefit.
In this embodiment, a network card temperature prediction method is provided, which may be used for the server network card described above, and fig. 2 is a flowchart of another network card temperature prediction method according to an embodiment of the present invention, as shown in fig. 2, where the flowchart includes the following steps:
step S201, a target object is created for the target network card, and the target object is used for judging the type of the target network card and displaying the static information of the target network card so as to identify the target network card according to the type and the static information of the target network card. Please refer to step S101 in the embodiment shown in fig. 1 in detail, which is not described herein.
In an alternative embodiment, before creating the target object for the target network card, the method further includes: creating an input/output port, a connection port and a service name of the target network card; judging whether the target server is started; and if the target server is started, acquiring asset information of the basic input/output system.
The method comprises the steps of creating an input/output port (io), a connection port (conn) and a service name (service) of a target network card, wherein the purpose of creating a monitoring process capable of realizing asynchronous communication is to realize information display of the target network card on the service, display the information of the monitored target network card, and realize visualization of monitoring information.
Step S202, a target timer is started for the target network card, wherein the timer is used for polling the dynamic information of the target network card, and the dynamic information comprises the temperature data of the target network card. Please refer to step S102 in the embodiment shown in fig. 1 in detail, which is not described herein.
Step S203, judging whether the basic input/output system is self-checked. Please refer to step S103 in the embodiment shown in fig. 1 in detail, which is not described herein.
In an alternative embodiment, after determining whether the bios is self-checking complete, the method further comprises: and creating a service name for determining the service for monitoring the temperature of the target network card according to the service name.
In the embodiment of the invention, the service name is created for the service, one server corresponds to a plurality of services, and the service name is created for the service for monitoring the temperature of the target network card so as to find the service for monitoring the temperature of the target network card.
In an alternative embodiment, after creating the service name, the method includes: and adding an interface on the service for monitoring the temperature of the target network card, wherein the interface is used for acquiring static information and dynamic information of the target network card.
In the embodiment of the invention, an interface is added on the service for monitoring the temperature of the target network card, and the interface is connected with the static information and the dynamic information of the target network card, so that the static information and the dynamic information of the target network card are acquired through the interface when the static information and the dynamic information of the target network card are required to be acquired.
Step S204, if the self-checking of the basic input/output system is completed, acquiring temperature data of the target network card, wherein the temperature data comprises a plurality of temperatures respectively corresponding to a plurality of preset time intervals; the target network card is a server network card. Please refer to step S104 in the embodiment shown in fig. 1 in detail, which is not described herein.
Step S205, loading the trained multi-layer perceptron model. Please refer to step S105 in the embodiment shown in fig. 1 in detail, which is not described herein.
Step S206, inputting the temperature data into the trained multi-layer sensor model to obtain the predicted temperature. Please refer to step S106 in the embodiment shown in fig. 1 in detail, which is not described herein.
Step S207, according to the predicted temperature, the alarm state of the target network card is determined. Please refer to step S107 in the embodiment shown in fig. 1 in detail, which is not described herein.
Specifically, the step S207 includes:
step S2071, comparing the predicted temperature with the first preset value and the second preset value, and determining the alarm state of the predicted temperature according to the comparison result.
Illustratively, the first preset value may be 100 ℃ and the second preset value may be 60 ℃.
In some alternative embodiments, if the predicted temperature is greater than or equal to the first preset value, determining that the alarm state of the predicted temperature is the first alarm state; if the predicted temperature is greater than or equal to the second preset value and less than the first preset value, judging that the alarm state of the predicted temperature is the second alarm state; if the predicted temperature is smaller than the second preset value, judging that the alarm state of the predicted temperature is a third alarm state; the first preset value is greater than the second preset value.
In some alternative embodiments, the first alert state has a greater alert level than the second alert state, and the second alert state has a greater alert level than the third alert state.
Illustratively, the first alarm state is a severe alarm, the second alarm state is a prompt alarm, and the third alarm state is a no alarm.
If the predicted temperature of the target network card is greater than or equal to the first preset value, the temperature of the target network card is excessively high, so that the alarm state of the predicted temperature is judged to be the first alarm state, and the temperature of the target network card is warned to be excessively high, and cooling treatment is needed. If the predicted temperature of the target network card is greater than or equal to the second preset value and less than the first preset value, the temperature of the target network card is higher, so that the alarm state of the predicted temperature is judged to be a second alarm state, and the target network card is warned to be higher in temperature, and cooling treatment is needed. If the predicted temperature of the target network card is smaller than the second preset value, the predicted temperature of the target network card is not high, so that the alarm state of the predicted temperature is judged to be the third alarm state, and the target network card is not required to be alarmed.
In this embodiment, a network card temperature prediction method is provided, which may be used for the server network card described above, and fig. 3 is a flowchart of training a multi-layer sensor model according to an embodiment of the present invention, as shown in fig. 3, where the flowchart includes the following steps:
Step S301, a temperature training data set of the network card is obtained, where the temperature training data set includes the network card temperature and time.
Specifically, the step S301 includes obtaining a network card temperature and time of the network card; writing the network card temperature and time of the network card into a data file; the data file is taken as a temperature training data set.
The data file is used to provide data support for model training, so that the network card temperature and time of the network card are written into the data file to take the data file as a temperature training data set.
Step S302, labeling the temperature training data set.
In some alternative embodiments, the method for labeling the temperature training data set is to assign corresponding labels to the temperatures of the network cards in the temperature training data set.
In some optional embodiments, if the network card temperature is greater than or equal to a first preset value, a first label is assigned to the network card temperature; if the network card temperature is smaller than the first preset value and larger than or equal to the second preset value, a second label is distributed to the network card temperature; if the network card temperature is smaller than the second preset value, a third label is allocated to the network card temperature, and the first preset value is larger than the second preset value.
In some alternative embodiments, the alert level of the first tag is greater than the alert level of the second tag, and the alert level of the second tag is greater than the alert level of the third tag.
When the model is trained, corresponding labels are allocated to the temperatures of all network cards in the temperature training data set, if the temperatures of the network cards are larger than or equal to a first preset value, the temperature of the target network card is too high, the first labels are allocated to the temperatures of the network cards, so that the condition that the temperatures of the network cards are too high is marked, and cooling treatment is needed. If the temperature of the network card is smaller than the first preset value and larger than or equal to the second preset value, the temperature of the target network card is higher, and cooling treatment is needed. If the network card temperature is smaller than the second preset value, the fact that the temperature of the target network card is not high is indicated, and a third label is allocated to the network card temperature so as to mark that the network card temperature is not high, and warning is not needed.
The temperature data is preprocessed for training the multi-layer perceptron model prior to labeling the temperature training dataset.
Preprocessing includes operations such as data cleansing, outlier processing, normalization or normalization to ensure quality and trainability of the temperature data.
Data cleansing refers to the last procedure to find and correct identifiable errors in a data file, including checking for data consistency, processing invalid and missing values, etc.
The outlier processing method includes deleting outliers, replacing outliers, binning (converting outliers into normal values) and using model processing.
Labeling the temperature training data set, namely, distributing a corresponding label for each temperature sample, and marking the temperature as a first alarm state if the temperature is larger than or equal to a first preset value; if the predicted temperature is greater than or equal to the second preset value and less than the first preset value, marking the temperature as a second alarm state; if the predicted temperature is less than the second preset value, the temperature is marked as a third alarm state, and the first preset value is greater than the second preset value.
In step S303, the labeled temperature training dataset is divided into a training set and a test set.
The training set is used for training the multi-layer perceptron model, and the testing set is used for evaluating the performance and generalization capability of the multi-layer perceptron model.
Step S304, a multi-layer perceptron model is constructed, wherein the multi-layer perceptron model comprises an input layer, a plurality of hidden layers and an output layer.
The multi-layer perceptron model is an artificial neural network, the problem of inseparable linearity is solved by using a perceptron, an activation function is added to the perceptron model (linearity is converted into nonlinearity), and the output of the perceptron is input into the activation function to be nonlinear. The common activation function is a ReLU max (0, x) activation function, gradient disappearance does not exist, unimportant features are lost, the operation is simple, the convergence speed is high, and the biological neural network activation mechanism is more met. The main purpose of the activation function is to continuously constrain the guiding input to the desired target.
In some alternative embodiments, the complexity of the model may be controlled by adjusting the number of neurons and the number of layers of the hidden layer.
The multi-layer perceptron model is trained by adopting a fully-connected neural network and is divided into two processes of forward propagation and backward propagation, the forward propagation data is input to output, then the loss function value is calculated, the backward propagation is an optimization process, and the loss function value generated by the forward propagation is reduced by using a gradient descent method, so that parameters are optimized and updated.
In some alternative embodiments, the temperature training dataset is loaded prior to the step of inputting the training set into the multi-layered perceptron model, training the multi-layered perceptron model; normalizing the temperature training data set; model algorithm parameters and loss functions are determined.
The invention adopts two layers of fully-connected networks, the interlayer activation function is RuLE, the activation function of the last layer is SoftMax, the input data is a 3-dimensional vector, the hidden layer has 4 nodes, which means that the 3-dimensional vector is mapped into a 4-dimensional vector through linear mapping, and finally the 4-dimensional vector is changed into a 3-dimensional vector for output.
In some alternative embodiments, the loading the temperature training data set is loading an acquired temperature training data set of the network card.
Model algorithm parameters are determined along with a loss function, illustratively adam algorithm parameters employ published parameters by default of keras, and the loss function employs a cross entropy loss function.
And inputting the output result into a softmax activation function to obtain probability distribution of a sample, and measuring error between budget distribution and real distribution of the classifier by using a cross entropy loss function, so that prediction accuracy is improved. The scheme adopts one-hot one-bit effective coding to compare the predicted value with the real distribution, and the difference between the real value and the model predicted value is measured. And then, the gradient descent method is used for back propagation, the gradient is transmitted from back to front in the back propagation, the newly calculated weight is obtained, and the main effect is to optimize the weight according to the loss value obtained by the forward propagation, so that the output loss function is smaller and smaller, and the network predicted value is more accurate.
In some alternative embodiments, the multi-layer perceptron model employs a two-layer fully connected network comprising:
,
wherein ,for inputting the temperature of the network card, < >>For the first weight, ++>For the second weight, ++>For the first bias->For the second bias->To predict temperature.
In the embodiment of the invention, the formula is that For the first weight, ++>For the second weight, the magnitudes of the first weight and the second weight represent the magnitude of the likelihood, and the weights can be set manually or can be set automatically by a back propagation algorithm. The overall performance of the whole neural network can be known by calculating each weight parameter, so that the prediction result is more accurate, the two layers of fully-connected networks are used for continuous training, the weight is adjusted according to the difference value between the actual output and the expected output, whether the training output result is identical to the expected output result is judged, if so, the optimal weight is obtained, and the most accurate prediction temperature can be obtained according to the optimal weight.
Step S305, inputting the training set into the multi-layer perceptron model, and training the multi-layer perceptron model.
And inputting the temperature training data set of the training set into a multi-layer perception model, and training the multi-layer perception model to obtain the predicted temperature corresponding to the temperature of each network card in the temperature training data set of the training set.
Step S306, inputting the test set into the multi-layer sensor model, and evaluating the multi-layer sensor model.
Specifically, the step S306 includes:
step S3061, inputting the test set into the multi-layer perceptron model to obtain a prediction result.
And step S3062, comparing the predicted result with the actual result to obtain a comparison result.
And step 3063, evaluating the multi-layer sensor model according to the comparison result.
If the difference between the predicted result and the actual result is smaller than or equal to the preset value, the reliability of the multi-layer sensor model is higher.
If the difference between the predicted result and the actual result is greater than the preset value, the reliability of the multi-layer sensor model is lower.
The accuracy and stability of the multi-layer sensor model can be known by evaluating the multi-layer sensor model.
Step S307, save the trained multi-layered perceptron model.
The saved formats include HDF5 format and SavedModel format, and the manner of saving the model can be divided into two types, model. Save_weights () saves model parameters and model. Save () saves the entire model.
In some alternative embodiments, the training set loss, the testing set loss, the training set accuracy, the testing set accuracy are visualized to a line graph for viewing.
In this embodiment, a fan adjusting method is provided, which may be used in the server network card described above, and fig. 4 is a flowchart of a fan adjusting method according to an embodiment of the present invention, as shown in fig. 4, where the flowchart includes the following steps:
Step S401, determining an alarm state of the target network card according to the network card temperature prediction method in any one of the above embodiments.
Step S402, a control signal is generated according to the alarm state of the target network card.
In some optional embodiments, if the alarm state of the target network card is the first alarm state, generating a first control signal; if the alarm state of the target network card is the second alarm state, generating a second control signal; and if the alarm state of the target network card is the third alarm state, generating a third control signal.
Step S403, the control signal is sent to the controller of the cooling fan, so that the controller of the cooling fan adjusts the gear of the cooling fan according to the control signal, and the cooling fan is a fan on the server.
In some alternative embodiments, the first control signal controls the controller of the radiator fan to adjust the gear of the radiator fan to the first gear; the controller of the second control signal control the radiator fan to adjust the gear of the radiator fan to a second gear; the third control signal controls the controller of the cooling fan to adjust the gear of the cooling fan to a third gear; the first gear is greater than the second gear, and the second gear is greater than the third gear.
If the alarm state is the first alarm state, the predicted temperature of the target network card is too high, a first control signal is generated, and the controller of the cooling fan is controlled by the first control signal to adjust the gear of the cooling fan to the first gear, namely the highest gear, so as to cool the target network card and prevent the target network card from being damaged due to the too high temperature. If the alarm state is the second alarm state, the predicted temperature of the target network card is higher, a second control signal is generated, the second control signal controls the controller of the radiator fan to adjust the gear of the radiator fan to the second gear, and the second gear is the middle gear so as to cool the target network card, so that the target network card is recovered to the normal operation temperature, and the target network card is prevented from being damaged due to the continuous rising of the temperature. If the alarm state is the third alarm state, indicating that the temperature of the target network card is not high, generating a third control signal, wherein the third control signal controls a controller of the cooling fan to adjust the gear of the cooling fan to a third gear, and the third gear is the lowest gear so as to enable the target network card to keep normal temperature operation.
The method includes the steps that when the alarm state of the target network card is a serious alarm, the cooling fan is controlled to adjust the gear to the highest gear, when the alarm state of the target network card is a prompt alarm, the cooling fan is controlled to adjust the gear to the middle gear, and when the alarm state of the target network card is a non-alarm, the cooling fan is controlled to adjust the gear to the lowest gear.
Generating control signals according to the alarm state of the target network card, sending the control signals to the cooling fan, so that the cooling fan adjusts the gear according to the control signals to cool the target network card, when the alarm state of the target network card is determined, generating corresponding control signals according to the alarm state of the target network card automatically, wherein the alarm states are different, the generated control signals are different, the adjustment of the gear of the cooling fan is also different, the problem of heat dissipation of the target network card under different advertisement states is solved in a targeted manner, the damage of the target network card is avoided, the gear of the cooling fan is not required to be adjusted manually, manpower resources are saved, and meanwhile, the accuracy of adjusting the gear of the cooling fan is enhanced.
In this embodiment, a network card temperature prediction method is provided, which may be used for the above-mentioned server network card, and fig. 5 is a flowchart of a monitoring target network card according to an embodiment of the present invention, as shown in fig. 5, where the flowchart includes the following steps:
in step S501, an input/output port, a connection port, and a service name of the monitoring target network card are created.
Step S502, judging whether the target server is started.
In step S503, if the target server is powered on, the asset information of the bios is obtained.
Step S504, obtaining the device information of the target network card.
Step S505, a target object is created for the target network card, and the target object is used for judging the type of the target network card and displaying the static information of the target network card so as to identify the target network card according to the type and the static information of the target network card.
Step S506, a target timer is started for the target network card, wherein the timer is used for polling the dynamic information of the target network card, and the dynamic information comprises the temperature data of the target network card.
Step S507, judging whether the basic input/output system is self-checked.
And step S508, if the self-checking of the basic input/output system is completed, creating a service name for determining the service for monitoring the temperature of the target network card according to the service name.
Step S509, adding an interface to the service for monitoring the temperature of the target network card, wherein the interface is used for acquiring static information and dynamic information of the target network card.
The method for adding interfaces is to provide some interface functions to the outside, the registration method of the interface functions is equivalent to the realization of the functions, the embodiment of the invention adopts an interface library (an sdbus library, a set of asynchronous operation data bus interface library provided by communities) to realize the function of adding interfaces, and executable actions include cleaning initialization actions, acquiring management and control transmission (mctp) equipment information, creating an enabling channel, a disabling channel and a reset channel, setting connection, acquiring connection states, setting virtual local area network filtering, enabling virtual local area network and disabling virtual local area network, acquiring version numbers, acquiring static information and customizing manufacturer command interfaces.
In the embodiment of the invention, the sdbus library has the advantages of high stability and abundant asynchronous operation interfaces, can reduce the occupancy rate of a central processing unit (openbmc cpu) of the open type baseboard management controller, and improves the resource utilization rate.
The embodiment of the invention creates asynchronous monitoring service based on a boost amplifier (boost amplifier) object, and monitors the basic information of a management object (for initializing a plurality of network cards and acquiring the identity (eidpoint) of each network card, a file format (bdf) and a monitoring mechanism for monitoring a plurality of objects of a plurality of network cards), and performs information acquisition processing (including static information acquisition of a constructor, data bus (dbus) interface creation and dynamic information acquisition of a timer) on each network card to monitor the temperature of the network card.
In the embodiment of the invention, whether a server is started or not is judged, asset information reported by a basic input and output system is analyzed, network card monitoring equipment information is initialized, management and control transmission (mctp) protocol network card equipment information is updated, each network card equipment information creates an object for cutting off the type of a target network card, static information of the target network card is obtained, identity identification is carried out on the target network card according to the type and the static information of the target network card, each network card object starts a timer, and the timer polls the network card to obtain the dynamic information of the network card, namely the network card temperature. And inquiring whether the basic input and output system is reset, initializing the management object, and creating a connection interface request service.
The embodiment also provides a device for predicting the temperature of the network card, which is used for realizing the embodiment and the preferred implementation manner, and the description is omitted. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. While the means described in the following embodiments are preferably implemented in software, implementation in hardware, or a combination of software and hardware, is also possible and contemplated.
The embodiment provides a network card temperature prediction device, as shown in fig. 6, including:
the target object creating module 601 is configured to create a target object for the target network card, where the target object is configured to determine a type of the target network card and display static information of the target network card, so as to identify the target network card according to the type and the static information of the target network card.
The target timer opening module 602 is configured to open a target timer for the target network card, where the timer is configured to poll dynamic information of the target network card, and the dynamic information includes temperature data of the target network card.
The judging module 603 is configured to judge whether the bios is self-checking complete.
The data obtaining module 604 is configured to obtain temperature data of the target network card if the basic input/output system self-checking is completed, where the temperature data includes a plurality of temperatures corresponding to a plurality of preset time intervals respectively; the target network card is a server network card.
The model loading module 605 is used for loading the trained multi-layer perceptron model.
The prediction module 606 is configured to input temperature data into the trained multi-layer sensor model to obtain a predicted temperature.
The alarm state confirmation module 607 is configured to determine an alarm state of the target network card according to the predicted temperature.
In some alternative embodiments, model loading module 605 includes:
a build multi-layer perceptron model unit for building a multi-layer perceptron model comprising:
the acquisition subunit is used for acquiring a temperature training data set of the network card, wherein the temperature training data set comprises the temperature and time of the network card.
And the labeling subunit is used for labeling the temperature training data set.
The method for labeling the temperature training data set comprises the following steps: and distributing corresponding labels for the temperatures of all network cards in the temperature training data set.
In an alternative embodiment, if the temperature of the network card is greater than or equal to a first preset value, a first label is allocated to the temperature of the network card; if the network card temperature is smaller than the first preset value and larger than or equal to the second preset value, a second label is distributed to the network card temperature; if the network card temperature is smaller than the second preset value, a third label is allocated to the network card temperature, and the first preset value is larger than the second preset value. The alarm degree of the first label is larger than that of the second label, and the alarm degree of the second label is larger than that of the third label.
When the model is trained, corresponding labels are allocated to the temperatures of all network cards in the temperature training data set, if the temperatures of the network cards are larger than or equal to a first preset value, the temperature of the target network card is too high, the first labels are allocated to the temperatures of the network cards, so that the condition that the temperatures of the network cards are too high is marked, and cooling treatment is needed. If the temperature of the network card is smaller than the first preset value and larger than or equal to the second preset value, the temperature of the target network card is higher, and cooling treatment is needed. If the network card temperature is smaller than the second preset value, the fact that the temperature of the target network card is not high is indicated, and a third label is allocated to the network card temperature so as to mark that the network card temperature is not high, and warning is not needed.
And the dividing subunit is used for dividing the labeled temperature training data set into a training set and a testing set.
And the construction subunit is used for constructing a multi-layer perceptron model, and the multi-layer perceptron model comprises an input layer, a plurality of hidden layers and an output layer.
And the training subunit is used for inputting the training set into the multi-layer perceptron model and training the multi-layer perceptron model.
And the evaluation subunit is used for inputting the test set into the multi-layer perceptron model and evaluating the multi-layer perceptron model.
And the storage subunit is used for storing the trained multi-layer perceptron model.
In some alternative embodiments, the alert status determination module 607 includes:
and the comparison unit is used for comparing the predicted temperature with the first preset value and the second preset value to obtain an alarm state of the predicted temperature.
If the predicted temperature is greater than or equal to a first preset value, judging that the alarm state of the predicted temperature is a first alarm state; if the predicted temperature is greater than or equal to the second preset value and less than the first preset value, judging that the alarm state of the predicted temperature is the second alarm state; if the predicted temperature is smaller than the second preset value, judging that the alarm state of the predicted temperature is a third alarm state; the first preset value is greater than the second preset value. The alarm degree of the first alarm state is greater than the alarm degree of the second alarm state, and the alarm degree of the second alarm state is greater than the alarm degree of the third alarm state.
If the predicted temperature of the target network card is greater than or equal to the first preset value, the temperature of the target network card is excessively high, so that the alarm state of the predicted temperature is judged to be the first alarm state, and the temperature of the target network card is warned to be excessively high, and cooling treatment is needed. If the predicted temperature of the target network card is greater than or equal to the second preset value and less than the first preset value, the temperature of the target network card is higher, so that the alarm state of the predicted temperature is judged to be a second alarm state, and the target network card is warned to be higher in temperature, and cooling treatment is needed. If the predicted temperature of the target network card is smaller than the second preset value, the predicted temperature of the target network card is not high, so that the alarm state of the predicted temperature is judged to be the third alarm state, and the target network card is not required to be alarmed.
In the embodiment of the invention, the target object creating module 601 creates a target object for the target network card, and the target object is used for judging the type of the target network card and displaying the static information of the target network card, so that the identity of the target network card is identified according to the type and the static information of the target network card, and the explicitly obtained predicted temperature is the temperature of which network card. The target timer starting module 602 starts a target timer for the target network card, the timer is used for polling dynamic information of the target network card, the dynamic information comprises temperature data of the target network card, and the temperature data of the target network card is detected once every preset time interval through the timer. The judging module 603 judges whether the basic input and output system is self-checked, the system is finished reporting the dynamic information of the target network card, the temperature data in the dynamic information can be obtained, the data obtaining module 604 obtains the temperature data of the target network card, the model loading module 605 is used for loading the trained multi-layer sensor model, the predicting module 606 inputs the temperature data of the target network card into the trained multi-layer sensor model to obtain the predicted temperature, the temperature prediction of the server network card is realized, the alarming state determining module 607 obtains the alarming state of the target network card according to the predicted temperature, and the alarming state of the server network card is obtained through the predicted temperature.
Further functional descriptions of the above respective modules and units are the same as those of the above corresponding embodiments, and are not repeated here.
The network card temperature prediction device in this embodiment is presented in the form of a functional unit, where the unit refers to an ASIC (Application Specific Integrated Circuit ) circuit, a processor and a memory executing one or more software or fixed programs, and/or other devices that can provide the above functions.
The embodiment of the invention also provides computer equipment, which is provided with the network card temperature prediction device shown in the figure 6.
Referring to fig. 7, fig. 7 is a schematic structural diagram of a computer device according to an alternative embodiment of the present invention, as shown in fig. 7, the computer device includes: one or more processors 10, memory 20, and interfaces for connecting the various components, including high-speed interfaces and low-speed interfaces. The various components are communicatively coupled to each other using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions executing within the computer device, including instructions stored in or on memory to display graphical information of the GUI on an external input/output device, such as a display device coupled to the interface. In some alternative embodiments, multiple processors and/or multiple buses may be used, if desired, along with multiple memories and multiple memories. Also, multiple computer devices may be connected, each providing a portion of the necessary operations (e.g., as a server array, a set of blade servers, or a multiprocessor system). One processor 10 is illustrated in fig. 7.
The processor 10 may be a central processor, a network processor, or a combination thereof. The processor 10 may further include a hardware chip, among others. The hardware chip may be an application specific integrated circuit, a programmable logic device, or a combination thereof. The programmable logic device may be a complex programmable logic device, a field programmable gate array, a general-purpose array logic, or any combination thereof.
Wherein the memory 20 stores instructions executable by the at least one processor 10 to cause the at least one processor 10 to perform the methods shown in implementing the above embodiments.
The memory 20 may include a storage program area that may store an operating system, at least one application program required for functions, and a storage data area; the storage data area may store data created according to the use of the computer device, etc. In addition, the memory 20 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some alternative embodiments, memory 20 may optionally include memory located remotely from processor 10, which may be connected to the computer device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
Memory 20 may include volatile memory, such as random access memory; the memory may also include non-volatile memory, such as flash memory, hard disk, or solid state disk; the memory 20 may also comprise a combination of the above types of memories.
The computer device also includes a communication interface 30 for the computer device to communicate with other devices or communication networks.
The embodiments of the present invention also provide a computer readable storage medium, and the method according to the embodiments of the present invention described above may be implemented in hardware, firmware, or as a computer code which may be recorded on a storage medium, or as original stored in a remote storage medium or a non-transitory machine readable storage medium downloaded through a network and to be stored in a local storage medium, so that the method described herein may be stored on such software process on a storage medium using a general purpose computer, a special purpose processor, or programmable or special purpose hardware. The storage medium can be a magnetic disk, an optical disk, a read-only memory, a random access memory, a flash memory, a hard disk, a solid state disk or the like; further, the storage medium may also comprise a combination of memories of the kind described above. It will be appreciated that a computer, processor, microprocessor controller or programmable hardware includes a storage element that can store or receive software or computer code that, when accessed and executed by the computer, processor or hardware, implements the methods illustrated by the above embodiments.
Although embodiments of the present invention have been described in connection with the accompanying drawings, various modifications and variations may be made by those skilled in the art without departing from the spirit and scope of the invention, and such modifications and variations fall within the scope of the invention as defined by the appended claims.
Claims (20)
1. A network card temperature prediction method, the method comprising:
creating a target object for a target network card, wherein the target object is used for judging the type of the target network card and displaying static information of the target network card so as to identify the target network card according to the type of the target network card and the static information;
starting a target timer for the target network card, wherein the timer is used for polling dynamic information of the target network card, and the dynamic information comprises temperature data of the target network card;
judging whether the basic input and output system is self-checked;
if the self-checking of the basic input and output system is finished, acquiring temperature data of a target network card, wherein the temperature data comprises a plurality of temperatures corresponding to a plurality of preset time intervals respectively; the target network card is a server network card;
loading a trained multi-layer perceptron model;
inputting the temperature data into a trained multi-layer sensor model to obtain a predicted temperature;
And determining the alarm state of the target network card according to the predicted temperature.
2. The method of claim 1, wherein prior to said creating a target object for a target network card, the method further comprises:
creating an input/output port, a connection port and a service name of the target network card;
judging whether the target server is started;
and if the target server is started, acquiring asset information of the basic input/output system.
3. The method of claim 1, wherein after the determining whether the basic input output system is self-checking is complete, the method further comprises:
and creating a service name, wherein the service name is used for determining the service for monitoring the temperature of the target network card according to the service name.
4. A method according to claim 3, wherein after said creating a service name, the method further comprises:
and adding an interface on a service for monitoring the temperature of the target network card, wherein the interface is used for acquiring static information and dynamic information of the target network card.
5. The method according to claim 1, characterized in that the method comprises:
the static information of the target network card comprises version information of the network card, asset information of the network card, connection state information of the network card and a network media access control address.
6. The method of claim 1, wherein determining the alarm state of the target network card based on the predicted temperature comprises:
comparing the predicted temperature with a first preset value and a second preset value, and determining an alarm state of the predicted temperature according to a comparison result.
7. The method of claim 6, wherein comparing the predicted temperature with the first preset value and the second preset value and determining the alarm state of the predicted temperature based on the comparison result comprises:
if the predicted temperature is greater than or equal to the first preset value, judging that the alarm state of the predicted temperature is a first alarm state;
if the predicted temperature is greater than or equal to the second preset value and smaller than the first preset value, judging that the alarm state of the predicted temperature is a second alarm state;
if the predicted temperature is smaller than the second preset value, judging that the alarm state of the predicted temperature is a third alarm state; the first preset value is greater than the second preset value.
8. The method according to claim 7, characterized in that the method comprises:
The alarm degree of the first alarm state is larger than the alarm degree of the second alarm state, and the alarm degree of the second alarm state is larger than the alarm degree of the third alarm state.
9. The method of claim 1, further comprising a process of training the multi-layer perceptron model; a process for training the multi-layered perceptron model, comprising:
acquiring a temperature training data set of the network card, wherein the temperature training data set comprises the temperature and time of the network card;
labeling the temperature training data set;
dividing the temperature training data set with the marked temperature into a training set and a testing set;
constructing a multi-layer perceptron model, wherein the multi-layer perceptron model comprises an input layer, a plurality of hidden layers and an output layer;
inputting the training set into the multi-layer perceptron model, and training the multi-layer perceptron model;
inputting the test set into the multi-layer perceptron model, and evaluating the multi-layer perceptron model;
and storing the trained multi-layer perceptron model.
10. The method of claim 9, wherein the acquiring the temperature training dataset of the network card comprises:
Acquiring the network card temperature and time of the network card;
writing the network card temperature and time of the network card into a data file;
the data file is used as a temperature training data set.
11. The method of claim 10, wherein labeling the temperature training dataset comprises:
and distributing corresponding labels for the temperatures of the network cards in the temperature training data set.
12. The method of claim 11, wherein said assigning a corresponding label to each of said network card temperatures in said temperature training dataset comprises:
if the network card temperature is greater than or equal to a first preset value, a first label is allocated to the network card temperature;
if the network card temperature is smaller than the first preset value and larger than or equal to a second preset value, a second label is distributed for the network card temperature;
if the network card temperature is smaller than the second preset value, a third label is allocated to the network card temperature, and the first preset value is larger than the second preset value.
13. The method according to claim 12, characterized in that the method comprises:
the alarm degree of the first tag is greater than that of the second tag, and the alarm degree of the second tag is greater than that of the third tag.
14. The method of claim 9, wherein prior to training the multi-layer perceptron model, further comprising:
loading the temperature training data set;
normalizing the temperature training data set;
model algorithm parameters and loss functions are determined.
15. The method of claim 9, wherein the inputting the test set into the multi-layer perceptron model to evaluate the multi-layer perceptron model comprises:
inputting the test set into the multi-layer perceptron model to obtain a prediction result;
comparing the predicted result with the actual result to obtain a comparison result;
and evaluating the multi-layer perceptron model according to the comparison result.
16. The method of claim 9, wherein the multi-layer perceptron model employs a two-layer fully connected network comprising:
,
wherein ,for inputting the temperature of the network card, < >>For the first weight, ++>For the second weight, ++>For the first bias->For the second bias->To predict temperature.
17. The method of claim 1, further comprising a monitoring process of the target network card; monitoring the target network card, including:
Creating an input/output port, a connection port and a service name for monitoring the target network card;
judging whether the target server is started or not;
if the target server is started, acquiring asset information of the basic input/output system;
acquiring equipment information of the target network card;
creating a target object for the target network card, wherein the target object is used for judging the type of the target network card and displaying static information of the target network card so as to identify the target network card according to the type of the target network card and the static information;
starting a target timer for the target network card, wherein the timer is used for polling dynamic information of the target network card, and the dynamic information comprises temperature data of the target network card;
judging whether the basic input and output system is self-checked;
if the self-checking of the basic input and output system is finished, creating a service name, wherein the service name is used for determining a service for monitoring the temperature of the target network card according to the service name;
and adding an interface on a service for monitoring the temperature of the target network card, wherein the interface is used for acquiring static information and dynamic information of the target network card.
18. A network card temperature prediction apparatus, the apparatus comprising:
The system comprises a target object creation module, a target network card identification module and a target object identification module, wherein the target object creation module is used for creating a target object for a target network card, and the target object is used for judging the type of the target network card and displaying static information of the target network card so as to identify the target network card according to the type of the target network card and the static information;
the target timer starting module is used for starting a target timer for the target network card, wherein the timer is used for polling the dynamic information of the target network card, and the dynamic information comprises the temperature data of the target network card;
the judging module is used for judging whether the basic input and output system is self-checked;
the data acquisition module is used for acquiring temperature data of the target network card if the self-checking of the basic input and output system is completed, wherein the temperature data comprises a plurality of temperatures corresponding to a plurality of preset time intervals respectively; the target network card is a server network card;
the model loading module is used for loading the trained multi-layer perceptron model;
the prediction module is used for inputting the temperature data into the trained multi-layer sensor model to obtain a predicted temperature;
and the alarm state confirmation module is used for determining the alarm state of the target network card according to the predicted temperature.
19. A computer device, comprising:
a memory and a processor, the memory and the processor being communicatively connected to each other, the memory having stored therein computer instructions, the processor executing the computer instructions to perform the network card temperature prediction method of any one of claims 1 to 17.
20. A computer-readable storage medium having stored thereon computer instructions for causing a computer to perform the network card temperature prediction method of any one of claims 1 to 17.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310878390.1A CN116627770B (en) | 2023-07-18 | 2023-07-18 | Network card temperature prediction method and device, computer equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310878390.1A CN116627770B (en) | 2023-07-18 | 2023-07-18 | Network card temperature prediction method and device, computer equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116627770A true CN116627770A (en) | 2023-08-22 |
CN116627770B CN116627770B (en) | 2023-09-26 |
Family
ID=87638448
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310878390.1A Active CN116627770B (en) | 2023-07-18 | 2023-07-18 | Network card temperature prediction method and device, computer equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116627770B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117806912A (en) * | 2024-02-28 | 2024-04-02 | 济南聚格信息技术有限公司 | Method and system for monitoring server abnormality |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111367773A (en) * | 2020-02-29 | 2020-07-03 | 苏州浪潮智能科技有限公司 | Method, system, equipment and medium for detecting network card of server |
CN113497725A (en) * | 2020-04-01 | 2021-10-12 | 中国移动通信集团山东有限公司 | Alarm monitoring method, alarm monitoring system, computer readable storage medium and electronic equipment |
CN114330099A (en) * | 2021-11-30 | 2022-04-12 | 广东浪潮智慧计算技术有限公司 | Network card power consumption adjusting method, device, equipment and readable storage medium |
CN114840263A (en) * | 2022-05-31 | 2022-08-02 | 苏州浪潮智能科技有限公司 | Network card management method, device, equipment and storage medium |
CN114885032A (en) * | 2022-04-29 | 2022-08-09 | 苏州浪潮智能科技有限公司 | Equipment information generating and displaying method, device, equipment and medium |
CN115221017A (en) * | 2022-08-19 | 2022-10-21 | 山东云海国创云计算装备产业创新中心有限公司 | Method, system, equipment and storage medium for self-checking of server temperature sensor |
CN115314416A (en) * | 2022-07-15 | 2022-11-08 | 苏州浪潮智能科技有限公司 | Network card state automatic detection method and device, electronic equipment and storage medium |
CN115525512A (en) * | 2022-09-30 | 2022-12-27 | 苏州浪潮智能科技有限公司 | Server fan control method and device and electronic equipment |
-
2023
- 2023-07-18 CN CN202310878390.1A patent/CN116627770B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111367773A (en) * | 2020-02-29 | 2020-07-03 | 苏州浪潮智能科技有限公司 | Method, system, equipment and medium for detecting network card of server |
CN113497725A (en) * | 2020-04-01 | 2021-10-12 | 中国移动通信集团山东有限公司 | Alarm monitoring method, alarm monitoring system, computer readable storage medium and electronic equipment |
CN114330099A (en) * | 2021-11-30 | 2022-04-12 | 广东浪潮智慧计算技术有限公司 | Network card power consumption adjusting method, device, equipment and readable storage medium |
CN114885032A (en) * | 2022-04-29 | 2022-08-09 | 苏州浪潮智能科技有限公司 | Equipment information generating and displaying method, device, equipment and medium |
CN114840263A (en) * | 2022-05-31 | 2022-08-02 | 苏州浪潮智能科技有限公司 | Network card management method, device, equipment and storage medium |
CN115314416A (en) * | 2022-07-15 | 2022-11-08 | 苏州浪潮智能科技有限公司 | Network card state automatic detection method and device, electronic equipment and storage medium |
CN115221017A (en) * | 2022-08-19 | 2022-10-21 | 山东云海国创云计算装备产业创新中心有限公司 | Method, system, equipment and storage medium for self-checking of server temperature sensor |
CN115525512A (en) * | 2022-09-30 | 2022-12-27 | 苏州浪潮智能科技有限公司 | Server fan control method and device and electronic equipment |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117806912A (en) * | 2024-02-28 | 2024-04-02 | 济南聚格信息技术有限公司 | Method and system for monitoring server abnormality |
CN117806912B (en) * | 2024-02-28 | 2024-05-14 | 济南聚格信息技术有限公司 | Method and system for monitoring server abnormality |
Also Published As
Publication number | Publication date |
---|---|
CN116627770B (en) | 2023-09-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108197658B (en) | Image annotation information processing method, device, server and system | |
US6772099B2 (en) | System and method for interpreting sensor data utilizing virtual sensors | |
CN105474577B (en) | System and method for monitoring system performance and availability | |
CN116627770B (en) | Network card temperature prediction method and device, computer equipment and storage medium | |
JP6871877B2 (en) | Information processing equipment, information processing methods and computer programs | |
US20150005946A1 (en) | Multiple level computer system temperature management | |
US9355010B2 (en) | Deriving an operational state of a data center using a predictive computer analysis model | |
CN101999101B (en) | The defining method of system cloud gray model prediction | |
CN115185721B (en) | Data processing method and system based on artificial intelligence | |
CN115549313A (en) | Electricity utilization monitoring method and system based on artificial intelligence | |
CN113487086B (en) | Method, device, computer equipment and medium for predicting residual service life of equipment | |
CN111310778A (en) | Detection device, detection method, and recording medium on which detection program is recorded | |
CN116381479A (en) | State monitoring method, state monitoring device, computer equipment, storage medium and program product | |
WO2023101812A1 (en) | Systems and methods for identifying machine anomaly root cause | |
CN113900718B (en) | Decoupling method, system and device for BMC and BIOS asset information | |
US10573147B1 (en) | Technologies for managing safety at industrial sites | |
CN108880916B (en) | IIC bus-based fault positioning method and system | |
US11874008B2 (en) | HVAC system discomfort index and display | |
CN118394606B (en) | Processor detection device, method, controller, medium, and program product | |
EP4339964A1 (en) | A monitoring agent for medical devices | |
KR102210803B1 (en) | System, method and apparatus for smart management based on augmented reality | |
CN117194049B (en) | Cloud host intelligent behavior analysis method and system based on machine learning algorithm | |
US20220011169A1 (en) | Thermal management system, method, and device for monitoring health of electronic devices | |
CN114756427A (en) | Constant temperature box and control method and system thereof | |
CN116521473A (en) | Method and computing device for determining CPU temperature abnormality |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |