CN115714692A - Model training method for monitoring network card, application and system thereof, and electronic equipment - Google Patents

Model training method for monitoring network card, application and system thereof, and electronic equipment Download PDF

Info

Publication number
CN115714692A
CN115714692A CN202211453132.0A CN202211453132A CN115714692A CN 115714692 A CN115714692 A CN 115714692A CN 202211453132 A CN202211453132 A CN 202211453132A CN 115714692 A CN115714692 A CN 115714692A
Authority
CN
China
Prior art keywords
eviction
threshold
network card
soft
threshold value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211453132.0A
Other languages
Chinese (zh)
Inventor
邱述洪
林栋�
刘汉亮
刘俊镜
黄民兴
麦福全
龙步云
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Unicom Guangdong Industrial Internet Co Ltd
Original Assignee
China Unicom Guangdong Industrial Internet Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Unicom Guangdong Industrial Internet Co Ltd filed Critical China Unicom Guangdong Industrial Internet Co Ltd
Priority to CN202211453132.0A priority Critical patent/CN115714692A/en
Publication of CN115714692A publication Critical patent/CN115714692A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Debugging And Monitoring (AREA)

Abstract

The invention provides a model training method for monitoring a network card, and application, a system and electronic equipment thereof, wherein the method comprises the following steps: acquiring an eviction history record set, calculating to generate a verification set matrix, and constructing a training set by using the verification set matrix; inputting the optimized training set into a convolutional neural network, training in a random gradient descent mode combined with a back propagation algorithm to obtain a trained parameter adjusting model, wherein the parameter adjusting model is used for dynamically calculating a soft drive threshold value by threshold value and a hard drive threshold value by threshold value according to the data condition on the current node cluster machine, and when the flow of a network card meets the soft eviction threshold value or the hard drive threshold value, respectively executing soft eviction and hard drive eviction on a pod; the eviction history records are performance parameter indexes of the node cluster machines during historical eviction and corresponding floppy drive threshold-by-threshold and hard drive threshold-by-threshold. Compared with the prior art, the optimal eviction threshold value is dynamically calculated through the neural network model, and intelligent dynamic eviction of the network card resource is achieved.

Description

Model training method for monitoring network card, application and system thereof, and electronic equipment
Technical Field
The invention relates to the technical field of network communication, in particular to a model training method for monitoring a network card, application and system thereof, and electronic equipment.
Background
With the popularity of containerization and kubernets orchestration technologies, almost all applications are currently running in kubernets clusters. Countless services run on the cluster in the form of pod, and when performance bottlenecks such as CPU, IO, disk and the like occur in cluster nodes, the pod cluster of the node reschedules and migrates the pod of the node to a new node meeting resources according to a certain strategy, so that bidirectional balance of node resources and services is ensured. The kubbelet monitors various indexes of the nodes and compares the indexes with a threshold value to trigger active eviction, which is a core and key means for the kubberenets rescheduline.
Although kubernets have wide application to monitoring of CPUs, IOs, disks and the like on machines, monitoring and eviction of machine network card flow is lacked, and when the network card pressure is high, kubernets cannot sense and trigger pod eviction, so that pod migration and network flow load balancing cannot be performed actively in time. And the network resources are dynamically changed, and dynamic balance needs to be performed according to the conditions of each network node at different moments.
Disclosure of Invention
The invention aims to overcome at least one defect of the prior art, and provides a model training method for monitoring a network card, and application, a system and electronic equipment thereof, which are used for intelligently monitoring and expelling the network card.
The technical scheme adopted by the invention is as follows:
a model training method for monitoring a network card is provided, and the method comprises the following steps:
acquiring an eviction history record set, calculating to generate a verification set matrix, and constructing a training set D and a verification set V by using the verification set matrix;
inputting a training set into a convolutional neural network, training in a random gradient descent mode combined with a back propagation algorithm to obtain a trained parameter adjusting model, wherein the parameter adjusting model is used for dynamically calculating a soft drive gradual threshold and a hard drive gradual threshold according to the data condition on a current node cluster machine, and when the network card flow meets the soft eviction threshold or the hard drive gradual threshold, respectively executing soft eviction and hard drive gradual on pod;
the training by the random gradient descent mode combined with the back propagation algorithm comprises the following steps:
inputting the training set D into a neural network model to obtain a network output of
Figure BDA0003949781100000021
Assuming a loss function of
Figure BDA0003949781100000022
Parameter learning is performed by calculating a derivative of the loss function with respect to each parameter, and the specific steps are as follows:
a1: randomly initializing a parameter weight matrix w and an offset b;
a2: randomly reordering the samples in the training set;
a3: selecting sample x from training set D (n) ,y (n) Initial n =0;
a4: feed-forward calculation of the net input z for each layer (l) And an activation value a (l) Until the last layer;
a5: back propagation calculation of error delta for each layer (l) (ii) a Derived to
Figure BDA0003949781100000023
Bias W with respect to layer l (l) The gradient of (d) is:
Figure BDA0003949781100000024
a6: computing
Figure BDA0003949781100000025
Bias b with respect to layer l (l) The gradient of (d) is:
Figure BDA0003949781100000026
a7, updating the W, b parameter by the formula:
Figure BDA0003949781100000027
b (l) ←b (l) -αδ (l) (ii) a A8: adding 1 to the value of N and repeating the A3-A7 steps until N = N is trained;
a9: the steps A2-A8 are repeated until the error rate of the convolutional neural network model on the validation set V no longer decreases.
The eviction history records are performance parameter indexes of the node cluster machines during historical eviction and corresponding floppy drive threshold-by-threshold and hard drive threshold-by-threshold.
The method comprises the steps of obtaining various performance parameters of a node cluster machine during historical eviction, wherein the performance parameters comprise a soft drive threshold value by threshold value, a hard drive threshold value by threshold value, a CPU utilization rate, a memory utilization rate, a network card utilization rate, an eviction signal and the like, storing the information in a time sequence database of the node cluster machine, generating a verification set matrix, generating a training set together with the soft drive threshold value by threshold value and the hard drive threshold value by threshold value, training in a random gradient descent mode combined with a back propagation algorithm, obtaining a trained parameter adjusting model, calculating the optimal soft drive threshold value by threshold value and hard drive threshold value by threshold value under performance indexes according to the various performance parameters, executing eviction on pod when the machine meets the soft drive threshold value by threshold value or the hard drive threshold value by threshold value, and calculating by using the parameter adjusting model in combination with the performance parameters of the machine which dynamically changes at present, and setting the optimal soft drive threshold value by threshold value and hard drive threshold value by threshold value which the machine under the present machine state is dynamically changed. The machine can dynamically set an optimal eviction threshold according to the self state, and intelligent monitoring and eviction of the network card are realized. And because the network card information data volume is very large, the training efficiency is improved by using a training mode combining random gradient descent and a back propagation algorithm.
Further, the calculating generates a verification set matrix, and the constructing of the training set D and the verification set V by using the verification set matrix X specifically includes:
extracting soft drive threshold values, hard drive threshold values, a CPU, a memory network card index, an eviction semaphore and an eviction record from the obtained records, jointly extracting data to form a verification set matrix, and then generating training set data D and a verification set V by the verification set matrix;
training set
Figure BDA0003949781100000031
Wherein X is a verification set matrix, X0 represents CPU utilization rate, X1 represents memory utilization rate, X2 represents network card utilization rate, and y is the eviction ratio of pod under the corresponding CPU utilization rate, memory utilization rate and network card utilization rate;
verification set
Figure BDA0003949781100000032
The data format is consistent with the training set.
Using a disk reading and writing program, a flow pressure measurement program and an intensive CPU operation program to adjust the utilization rates of the CPU, the memory and the network card, recording the total number y1 of the pod operation numbers at the moment, the total number y0 of the pods operated by the machine before the pressure measurement, the CPU utilization rate, the memory utilization rate and the network card utilization rate, wherein the total number y2= y0-y1 for the pod eviction. x 3 (= y2/y 0), and combining the three indexes of the CPU utilization rate, the memory utilization rate and the network card utilization rate at the moment to form an eviction record matrix 4 tuple. Because the CPU, the memory and the network card are main parameters influencing the performance of the computer, the association relationship between the CPU, the memory and the network card and the eviction threshold can be better calculated by combining analysis, and the optimal eviction threshold is further obtained.
Further, the floppy drive gradual threshold evaluation-soft comprises: available (CPU use threshold), memory available (memory use threshold) and network available (network card use threshold);
the hard eviction threshold evaluation-hard comprises: available (CPU use threshold), memory available (memory use threshold) and network available (network card use threshold).
CPU use threshold values are set for the soft eviction and the hard drive of k8s respectively, the memory use threshold value and the network card use threshold value can monitor the resources of the CPU, the memory and the network card respectively, when the current CPU utilization rate, the memory utilization rate or the network card utilization rate exceed the corresponding soft eviction or hard drive threshold values, the eviction is executed to evict the pod, and the eviction of the pod is more intelligent through the judgment of the three elements.
The invention also provides an application of the model for monitoring the network card, and the method comprises the following steps:
the method comprises the steps that an acquisition module is configured to monitor and acquire data of a cluster node machine;
preprocessing and storing the acquired data;
improving an eviction algorithm model of the network card based on K8s, comprising the following steps: setting initial values of a soft eviction threshold and a hard drive eviction threshold for the network card, setting the initial values into a monitoring service, and sending an eviction signal to evict a pod corresponding to the network card node when the network card node flow occupation is smaller than the soft eviction threshold or the hard drive eviction threshold;
and constructing an intelligent parameter adjusting component, wherein the intelligent parameter adjusting component adopts the parameter adjusting model trained by the model training method for monitoring the network card, and dynamically updates the soft drive threshold value by threshold value and the hard drive threshold value by threshold value.
The method comprises the steps that an acquisition module acquires information such as real-time CPU (central processing unit), memory and network card flow of a node cluster machine by using a cAdviror, stores the information into a time sequence database, configures a K8s model for the node cluster machine, improves the K8s model based on monitoring of network card flow, rewrites monitoring and eviction of system CPU, memory and disk capacity in the K8s into monitoring and eviction of network card flow resources, sets corresponding initial soft drive threshold values and hard drive threshold values, and evicts a certain number of pots in the node cluster when network card flow data meet the soft drive threshold values and the hard drive threshold values so as to release network card resources to supply other pots needing network card resources; and the intelligent parameter adjusting component is used for adjusting the soft drive threshold value by threshold value and the hard drive threshold value by threshold value in real time, the trained model is used for calculating the flow information of the real-time CPU, the memory and the network card, calculating the optimal soft drive threshold value by threshold value and hard drive threshold value by threshold value under the state, updating in real time, storing the soft eviction threshold value and the hard drive threshold value by threshold value before updating into a time sequence database together with state parameter information, and providing data for model training as new historical data. Through the application mode, the node cluster machine can monitor the network card and intelligently evict the pod.
Further, the preprocessing and storing comprises:
storing the original data accumulated receiving flow (rx _ bytes), the accumulated receiving error flow (rx _ errors), the accumulated transmission flow (tx _ bytes) and the accumulated transmission error flow (tx _ errors) into a memory, and calculating the receiving flow per second (rx _ bytes _ persec) and the transmission flow per second (tx _ bytes _ persec);
and recording the calculated receiving flow per second and the calculated transmitting flow per second to a time sequence database.
Setting a node-status-update-frequency parameter of an acquisition period as 5s, and dividing the accumulated value by the acquisition period to calculate the transmitted and received flow per second respectively, wherein the second transmission quantity and the second receiving quantity can better reflect the network performance of the current network card.
Further, the specific steps of improving the network card eviction algorithm model based on K8s are as follows:
s1: setting initial values for the soft drive gradual threshold value evaluation-soft and the hard drive gradual threshold value evaluation-hard;
s2: loading a soft drive threshold-by-threshold and a hard eviction to an eviction manager;
s3: starting a coroutine monitoring service threshold dNote, wherein the monitoring service acquires the preprocessed data and forms a data set thresholds by combining the soft drive threshold by threshold and the hard drive threshold by threshold in S2;
s4: configuring a network card flow judging unit, performing matching operation according to the data set in the step S3, judging whether a soft eviction threshold value or a hard eviction threshold value is met, if so, sending an eviction signal and recording the eviction signal into a time sequence database, otherwise, ignoring;
s5: evicting the pod with activity according to the eviction signal sent by the step S4;
s6: steps S3-S5 are performed in a loop.
Further, the initial value of the floppy drive evasion-soft threshold is set as:
eviction-soft=network.available<20%;
the hard eviction threshold evaluation-hard initial value is set to:
eviction-hard=network.available<20%。
when the network card flow is checked by 80% (1-20%), an eviction signal eviction pod is generated, and the eviction signal is soft eviction or hard drive.
Further, the evicting of the pod with activity is specifically:
and after receiving the eviction signal, the eviction manager acquires the resource use condition of the current node and all active pods, performs priority sequencing on all active pods, evicts the pods with low priority according to the sequenced sequence, and records the evicted pods in a time sequence database.
The eviction manager comprises three components, namely a notifier, a monitor and a synchronize, and the invention optimizes and improves the three components to support network card flow monitoring and pod eviction. The notifier is improved into a network card monitoring service threshold dNote, preprocessed data, a soft drive threshold and a hard drive threshold are received, data sets are generated, the data sets are transmitted to a network card flow judging unit, the network card flow judging unit carries out matching operation according to the data in the thresholds, an eviction signal is generated if the eviction threshold condition is met, the threshold dNote sends the signal to a channal message channel, synchronization eviction work is triggered, the signal is stored in a time sequence database, and after the synchronization receives the signal, the pods in the cluster are sorted according to a lower priority order, a certain number of pods with lower priorities are evicted, and network card resources are released. Initial values are set for the soft eviction threshold and the hard drive threshold, and then the optimal soft eviction threshold and the hard drive threshold are calculated according to the model and updated in real time.
The invention also provides a system for monitoring the network card, which adopts the application of the model for monitoring the network card, and the system comprises: the system comprises an acquisition module, a storage module, an expulsion management module and an intelligent parameter adjusting assembly;
the acquisition module is used for monitoring and acquiring data of the cluster node machine;
the storage module comprises a time sequence database for storing the data collected by the collection module and the data generated in other processing processes;
the system comprises an eviction management module, an acquisition module, a soft drive eviction management module, a hard drive eviction management module and a hardware drive eviction management module, wherein the eviction management module is used for loading a K8s eviction manager and a corresponding component thereof, setting a soft drive eviction threshold and a hard drive eviction threshold, analyzing and calculating data acquired by the acquisition module, and judging whether the conditions of the soft eviction threshold and the hard drive eviction threshold are met, if so, executing eviction, otherwise, neglecting;
the intelligent parameter adjusting component is used for dynamically updating the soft drive threshold value by threshold value and the hard drive threshold value by threshold value.
The invention also provides an electronic device for monitoring the network card, which comprises:
a memory area and a processor;
the memory has stored thereon computer readable instructions that, when executed by the processor, are in accordance with the above-described application of a model for monitoring a network card or a system for monitoring a network card.
Compared with the prior art, the invention has the beneficial effects that: 1. the K8s component is rewritten aiming at the network card resource, so that the network card resource of the cluster machine is monitored and expelled, and the problem that the pod cannot be expelled in time when the network card pressure is high is solved;
2. an intelligent parameter adjusting component is constructed by training a neural network model, and intelligent and dynamic adjusting and expelling conditions are realized.
Drawings
FIG. 1 is a diagram of the steps of a model training method for monitoring a network card according to the present invention;
FIG. 2 is a diagram illustrating an application of a monitoring network card model according to the present invention;
FIG. 3 is a system for monitoring a network card according to the present invention;
the attached drawings are marked as follows: the system comprises an acquisition module 1, a storage module 2, an expulsion management module 3 and an intelligent parameter adjusting assembly 4.
Detailed Description
The drawings are only for purposes of illustration and are not to be construed as limiting the invention. For a better understanding of the following embodiments, certain features of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product; it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.
Example 1
The embodiment provides a model training method for monitoring a network card, which comprises the following steps:
acquiring an eviction history record set, calculating to generate a verification set matrix, and constructing a training set D and a verification set V by using the verification set matrix; acquiring an eviction history set from a time sequence database, wherein the eviction history set comprises various performance parameters on a node cluster machine during eviction, specifically comprises a floppy drive per-threshold, a hard drive per-threshold, a CPU utilization, a memory utilization, a network card utilization and an eviction signal, and a verification set matrix X is generated by the information, wherein X [0] is the CPU utilization of the cluster machine during eviction, X [1] is the memory utilization during eviction, X [2] is the network card utilization during eviction, and X [3] is the proportion of eviction pod to the total pod number, in a specific implementation mode, a disk read-write program, a flow pressure test program and an intensive CPU operation program are used to adjust the utilization of the CPU, the memory and the network card, the total number y1 of pod operation at the moment is recorded, the total number y0 of pod operation of the machine before pressure test is carried out, the CPU utilization, the memory utilization and the network card utilization, and the total number y2= y0-y1 of eviction. X [3] = y2/y0, and combining three indexes of CPU utilization, memory utilization, and network card utilization at this time to form an eviction record matrix 4 tuple, which is a specific example of a generated verification set matrix as follows:
Figure BDA0003949781100000081
after the verification set matrix X is generated, a training set is constructed by soft drive threshold values by threshold values and hard drive threshold values by threshold values in corresponding eviction
Figure BDA0003949781100000091
And verification set
Figure BDA0003949781100000092
And X is a verification set matrix, y is the eviction ratio of the pod under the corresponding CPU utilization rate, memory utilization rate and network card utilization rate, a training set is generated, then deduplication optimization is carried out on the training set, the calculation amount of convolution operation in the convolutional neural network is reduced, and the training operation speed is improved.
Inputting the optimized training set into a convolutional neural network, training in a random gradient descent mode combined with a back propagation algorithm, inputting the training set D into a neural network model, and obtaining a network output of
Figure BDA0003949781100000093
Assuming a loss function of
Figure BDA0003949781100000094
Parameter learning is carried out by calculating a derivative of the loss function with respect to each parameter, and the specific training steps are as follows:
a1: randomly initializing a parameter weight matrix w and an offset b;
a2: randomly reordering the samples in the training set;
a3: selecting sample x from training set D (n) ,y (n) Initial n =0;
a4: feed-forward calculation of the net input z for each layer (l) And an activation value a (l) Up to the last layer;
a5: back propagation calculation of error delta for each layer (l) (ii) a Derived to
Figure BDA0003949781100000095
Bias W with respect to layer l (l) The gradient of (d) is:
Figure BDA0003949781100000096
a6: computing
Figure BDA0003949781100000097
Bias b with respect to layer l (l) The gradient of (d) is:
Figure BDA0003949781100000098
a7, updating W, b parameters by the formula:
Figure BDA0003949781100000099
b (l) ←b (l) -αδ (l) (ii) a A8: adding 1 to the value of N and repeating the A3-A7 steps until N = N is trained;
a9: the steps A2-A8 are repeated until the error rate of the convolutional neural network model on the validation set V no longer decreases.
Because the acquired and acquired network card data is huge, the training efficiency can be reduced by adopting a general training method, and the training efficiency can be greatly improved by training the model in a training mode combining random gradient descent and a back propagation algorithm. And acquiring a trained parameter adjusting model, wherein the parameter adjusting model is used for dynamically calculating a soft drive gradual threshold and a hard drive gradual threshold according to the data condition on the current node cluster machine, and when the network card flow meets the soft eviction threshold or the hard drive gradual threshold, respectively executing soft eviction and hard eviction on the pod.
Specifically, the floppy drive gradual threshold evaluation-soft includes: available (CPU use threshold), memory available (memory use threshold) and network available (network card use threshold); the hard eviction threshold evaluation-hard comprises: available (CPU use threshold), memory available (memory use threshold) and network available (network card use threshold). The CPU use threshold value is set for the soft eviction and the hard drive of k8s, the memory use threshold value and the network card use threshold value can be used for monitoring the resources of the CPU, the memory and the network card respectively, when the current CPU utilization rate, the memory utilization rate or the network card utilization rate exceed the corresponding soft eviction or hard drive threshold value, the eviction is executed to evict the pod, and the eviction of the pod is more intelligent through the judgment of the three elements.
Through the trained parameter adjusting model, the eviction threshold value for eviction is dynamically calculated according to the state parameters of the cluster machine, so that the machine can intelligently adjust the eviction condition of the network card resource of the cluster machine through the model. Meanwhile, a training mode combining random gradient descent and a back propagation algorithm is adopted to train the model, and the training efficiency is greatly improved.
Example 2
The embodiment provides an application of a model for monitoring a network card, and the method includes:
the configuration acquisition module 1 is used for monitoring and acquiring data of the cluster node machine; preprocessing and storing the acquired data; the acquisition module 1 acquires information such as real-time CPU, memory, network card traffic and the like of the node cluster machine by using cAdvisor, wherein the network card traffic contains data accumulated received traffic (rx _ bytes), accumulated received error traffic (rx _ errors), accumulated transmitted traffic (tx _ bytes) and accumulated transmitted error traffic (tx _ errors), an acquisition period node-status-update-frequency parameter is set to be 5s, the parameters are stored in the memory, the acquisition period node-status-update-frequency parameter and the acquisition period parameter are divided by the acquisition period (5 s) respectively by using the accumulated value, and the received traffic per second (rx _ bytes _ persec) and the transmitted traffic per second (tx _ bytes _ persec) are calculated and stored in the time sequence database.
The method comprises the following steps of improving an eviction algorithm model of the network card based on K8 s:
s1: setting initial values for the soft drive gradual threshold value evaluation-soft and the hard drive gradual threshold value evaluation-hard; the soft eviction threshold evaluation-soft initial value is set to: evaluation-soft = network.available <20%; the hard eviction threshold evaluation-hard initial value is set to: eviction-hard = network.
Configuring a K8s model for a node cluster machine, improving the K8s model based on network card flow monitoring, rewriting monitoring and eviction aiming at system CPU, memory and disk capacity in K8s into monitoring and eviction aiming at network card flow resources, and increasing a network card eviction threshold starting parameter: setting corresponding initial floppy drive gradual threshold and hard drive gradual threshold, and when network card flow data meets the floppy drive gradual threshold and the hard drive gradual threshold, expelling a certain number of pods in the node cluster to release network card resources to supply other pods which more need the network card resources; in this embodiment, network available <20% is set, that is, when the network traffic exceeds 80% (1-20%), an eviction signal eviction pod is generated, and the eviction signal is soft eviction or hard eviction.
S2: loading a soft drive threshold-by-threshold and a hard eviction to an eviction manager;
s3: starting a coroutine monitoring service threshold dNote, wherein the monitoring service acquires the preprocessed data and forms a data set thresholds by combining the soft drive threshold by threshold and the hard drive threshold by threshold in S2;
specifically, the eviction manager comprises three components, namely a notifier, a monitor and a synchronize, and the eviction manager is started firstly, loads a soft drive gradual threshold and a hard drive gradual threshold and shares the soft drive gradual threshold and the hard drive gradual threshold to each component of the eviction manager for use; rewriting a notifier, starting a coordinator Cheng Jianting to serve a threshold dNote, receiving performance values such as transmission flow per second and receiving flow per second after collection and pretreatment from a cAdvisor, and soft drive threshold values and hard drive threshold values loaded before an eviction manager, and combining to generate a data set thresholds;
s4: configuring a network card flow judging unit, transmitting a data set threshold generated by a monitoring service to the network card flow judging unit, carrying out matching operation by the network card flow judging unit according to the data set threshold, judging whether the data set threshold is in accordance with a soft eviction threshold or a hard drive threshold, sending an eviction signal if the data set threshold is in accordance with the hard drive threshold, sending the signal to a channal message channel by a threshold terminal to trigger the eviction work of a synchronize component, and recording the signal eviction signal into a time sequence database; if the eviction condition is not met, ignoring;
s5: according to the eviction signal sent by the step S4, the pod with activity is evicted; specifically, after receiving a signal, synchronizing acquires the use condition of the current node group member and all active pods, performs priority ordering on all active pods, performs eviction on the pods with low priority according to the ordered order, and records the evicted pods in a time sequence database so as to release network card resources to supply other pods which need network card resources more.
S6: steps S3-S5 are performed in a loop.
After the eviction algorithm model is configured, an intelligent parameter adjusting component 4 is constructed, and the intelligent parameter adjusting component 4 dynamically updates the soft drive threshold-by-threshold value and the hard drive threshold-by-threshold value by adopting the parameter adjusting model trained by the model training method for monitoring the network card described in embodiment 1. Specifically, the trained model in embodiment 1 is used to calculate the real-time CPU, memory, and network card flow information, calculate the optimal soft drive gradual threshold and hard drive gradual threshold in this state, update in real time, and store the soft eviction threshold and hard drive gradual threshold before update in the timing database together with state parameter information, as new historical data, to provide data for model training.
Example 3
This embodiment provides a system for monitoring a network card, where the system employs an application of the model for monitoring a network card in embodiment 2, and the system includes: the system comprises an acquisition module 1, a storage module 2, an expulsion management module 3 and an intelligent parameter adjusting component 4;
the acquisition module 1 is used for monitoring and acquiring data of the cluster node machine;
the storage module 2 comprises a time sequence database and is used for storing the data acquired by the acquisition module 1 and the data generated in other processing processes;
the eviction management module 3 is used for loading a K8s eviction manager and a corresponding component thereof, setting a soft drive threshold value by threshold value and a hard drive threshold value by threshold value, analyzing and calculating data acquired by the acquisition module 1, judging whether the conditions of the soft eviction threshold value and the hard drive threshold value are met, if so, executing eviction, otherwise, neglecting;
the intelligent parameter adjusting component 4 is used for dynamically updating the soft drive threshold value by threshold value and the hard drive threshold value by threshold value.
Example 4
The embodiment provides an electronic device for monitoring a network card, which includes:
a memory area and a processor;
the memory has stored thereon computer-readable instructions that, when executed by the processor, are in accordance with the application of the model for monitoring a network card of embodiment 2 or the system for monitoring a network card of embodiment 3.
It should be understood that the above-mentioned embodiments of the present invention are only examples for clearly illustrating the technical solutions of the present invention, and are not intended to limit the specific embodiments of the present invention. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention claims should be included in the protection scope of the present invention claims.

Claims (10)

1. A model training method for monitoring a network card is characterized by comprising the following steps:
acquiring an eviction history record set, calculating and generating a verification set matrix X, and constructing a training set D and a verification set V by using the verification set matrix X;
inputting a training set into a convolutional neural network, training in a random gradient descent mode combined with a back propagation algorithm to obtain a trained parameter adjusting model, wherein the parameter adjusting model is used for dynamically calculating a soft drive gradual threshold and a hard drive gradual threshold according to the data condition on a current node cluster machine, and when the network card flow meets the soft eviction threshold or the hard drive gradual threshold, respectively executing soft eviction and hard drive gradual on pod;
the training by the random gradient descent mode combined with the back propagation algorithm comprises the following steps:
inputting the training set D into a neural network model to obtain a network output of
Figure FDA0003949781090000011
Assuming a loss function of
Figure FDA0003949781090000012
Parameter learning is performed by calculating a derivative of the loss function with respect to each parameter, and the specific steps are as follows:
a1: randomly initializing a parameter weight matrix w and an offset b;
a2: randomly reordering the samples in the training set;
a3: selecting sample x from training set D (n) ,y (n) Initial n =0; y is (n) x (n)
A4: feed-forward calculation of the net input z for each layer (l) And an activation value a (l) Up to the last layer;
a5: back propagation calculation of error delta for each layer (l) (ii) a Derived to
Figure FDA0003949781090000013
Bias W with respect to layer 1 (l) The gradient of (a) is:
Figure FDA0003949781090000014
a6: computing
Figure FDA0003949781090000015
Bias b with respect to layer 1 (l) The gradient of (d) is:
Figure FDA0003949781090000016
a7: the update of the W, b parameter is done by the formula: w (l) ←W (l) -α(δ (l) (a (l-1) ) T +λW (l) )、b (l) ←b (l) -αδ (l)
A8: adding 1 to the value of N and repeating the A3-A7 steps until N = N is trained;
a9: the steps A2-A8 are repeated until the error rate of the convolutional neural network model on the validation set V no longer decreases.
The eviction history records are performance parameter indexes of the node cluster machines during historical eviction and corresponding floppy drive threshold-by-threshold and hard drive threshold-by-threshold.
2. The model training method for monitoring the network card according to claim 1, wherein the calculation generates a validation set matrix, and the construction of the training set D and the validation set V by using the validation set matrix X specifically comprises:
extracting a soft drive threshold value by threshold value, a hard drive threshold value by threshold value, a CPU, a memory network card index, an eviction semaphore and an eviction record from the obtained records, jointly extracting data to form a verification set matrix, and then generating training set data D and a verification set V by the verification set matrix;
training set
Figure FDA0003949781090000021
Wherein X is a verification set matrix, X0 represents CPU utilization rate, X1 represents memory utilization rate, X2 represents network card utilization rate, and y is the eviction ratio of the corresponding CPU utilization rate, memory utilization rate and pod under the network card utilization rate;
verification set
Figure FDA0003949781090000022
The data format is consistent with the training set.
3. The method of claim 1, wherein the network card is monitored by the model training method,
the soft eviction threshold evaluation-soft comprises: available (CPU use threshold), memory available (memory use threshold) and network available (network card use threshold);
the hard eviction threshold evaluation-hard comprises: available (CPU use threshold), memory available (memory use threshold) and network available (network card use threshold).
4. An application of a model for monitoring a network card, the method comprising:
the method comprises the steps that an acquisition module is configured to monitor and acquire data of a cluster node machine;
preprocessing and storing the acquired data;
improving an eviction algorithm model of the network card based on K8s, comprising the following steps: setting initial values of a soft eviction threshold and a hard drive eviction threshold for the network card, setting the initial values into a monitoring service, and sending an eviction signal to evict a pod corresponding to the network card node when the network card node flow occupation is smaller than the soft eviction threshold or the hard drive eviction threshold;
constructing an intelligent parameter adjusting component, wherein the intelligent parameter adjusting component adopts the parameter adjusting model trained by the model training method for monitoring the network card according to any one of claims 1 to 3, and dynamically updates the soft drive threshold value by threshold value and the hard drive threshold value by threshold value.
5. The use of the model for monitoring a network card of claim 4, wherein the pre-processing and storing comprises:
storing the original data accumulated receiving flow (rx _ bytes), the accumulated receiving error flow (rx _ errors), the accumulated transmission flow (tx _ bytes) and the accumulated transmission error flow (tx _ errors) into a memory, and calculating the receiving flow per second (rx _ bytes _ persec) and the transmission flow per second (tx _ bytes _ persec);
and recording the calculated receiving flow per second and the calculated transmitting flow per second to a time sequence database.
6. The application of the model for monitoring the network card according to claim 5, wherein the specific steps for improving the eviction algorithm model of the network card based on K8s are as follows:
s1: setting initial values for the soft drive gradual threshold value evaluation-soft and the hard drive gradual threshold value evaluation-hard;
s2: loading a soft drive threshold-by-threshold and a hard eviction to an eviction manager;
s3: starting a co-program monitoring service threshold dNote, wherein the monitoring service acquires the preprocessed data and forms a data set threshold by combining a soft drive threshold by threshold and a hard drive threshold by threshold in S2;
s4: configuring a network card flow judging unit, performing matching operation according to the data set in the step S3, judging whether a soft eviction threshold value or a hard eviction threshold value is met, if so, sending an eviction signal and recording the eviction signal into a time sequence database, otherwise, ignoring;
s5: evicting the pod with activity according to the eviction signal sent by the step S4;
s6: steps S3-S5 are performed in a loop.
7. The method for monitoring network card based on K8s cluster as claimed in claim 6,
the soft eviction threshold evaluation-soft initial value is set to:
network.available<20%;
the hard eviction threshold evaluation-hard initial value is set to:
network.available<20%。
8. the application of the model for monitoring a network card according to claim 7, wherein the evicting of a pod with an active pod is specifically:
and after receiving the eviction signal, the eviction manager acquires the resource use condition of the current node and all active pods, performs priority sequencing on all active pods, evicts the pods with low priority according to the sequenced sequence, and records the evicted pods in a time sequence database.
9. A system for monitoring a network card, the system employing the use of a model for monitoring a network card of any one of claims 4 to 8, the system comprising: the system comprises an acquisition module, a storage module, an expulsion management module and an intelligent parameter adjusting assembly;
the acquisition module is used for monitoring and acquiring data of the cluster node machine;
the storage module comprises a time sequence database for storing the data collected by the collection module and the data generated in other processing processes;
the system comprises an eviction management module, an acquisition module, a soft drive eviction management module, a hard drive eviction management module and a hardware drive eviction management module, wherein the eviction management module is used for loading a K8s eviction manager and a corresponding component thereof, setting a soft drive eviction threshold and a hard drive eviction threshold, analyzing and calculating data acquired by the acquisition module, and judging whether the conditions of the soft eviction threshold and the hard drive eviction threshold are met, if so, executing eviction, otherwise, neglecting;
the intelligent parameter adjusting component is used for dynamically updating the soft drive threshold value by threshold value and the hard drive threshold value by threshold value.
10. An electronic device for monitoring a network card, comprising:
a memory area and a processor;
the memory has stored thereon computer readable instructions which, when executed by the processor, are used in accordance with any one of claims 4-8 to monitor a model of a network card or a system for monitoring a network card of claim 9.
CN202211453132.0A 2022-11-18 2022-11-18 Model training method for monitoring network card, application and system thereof, and electronic equipment Pending CN115714692A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211453132.0A CN115714692A (en) 2022-11-18 2022-11-18 Model training method for monitoring network card, application and system thereof, and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211453132.0A CN115714692A (en) 2022-11-18 2022-11-18 Model training method for monitoring network card, application and system thereof, and electronic equipment

Publications (1)

Publication Number Publication Date
CN115714692A true CN115714692A (en) 2023-02-24

Family

ID=85233889

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211453132.0A Pending CN115714692A (en) 2022-11-18 2022-11-18 Model training method for monitoring network card, application and system thereof, and electronic equipment

Country Status (1)

Country Link
CN (1) CN115714692A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116628508A (en) * 2023-07-20 2023-08-22 科大讯飞股份有限公司 Model training process anomaly detection method, device, equipment and storage medium
CN117251551A (en) * 2023-11-06 2023-12-19 联通(广东)产业互联网有限公司 Natural language processing system and method based on large language model

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116628508A (en) * 2023-07-20 2023-08-22 科大讯飞股份有限公司 Model training process anomaly detection method, device, equipment and storage medium
CN116628508B (en) * 2023-07-20 2023-12-01 科大讯飞股份有限公司 Model training process anomaly detection method, device, equipment and storage medium
CN117251551A (en) * 2023-11-06 2023-12-19 联通(广东)产业互联网有限公司 Natural language processing system and method based on large language model
CN117251551B (en) * 2023-11-06 2024-05-07 联通(广东)产业互联网有限公司 Natural language processing system and method based on large language model

Similar Documents

Publication Publication Date Title
CN115714692A (en) Model training method for monitoring network card, application and system thereof, and electronic equipment
JP4912401B2 (en) System and method for adaptively collecting performance and event information
CN109246229A (en) A kind of method and apparatus of distribution resource acquisition request
CN104584524B (en) It polymerize the data in intermediary system
CN107103068A (en) The update method and device of service buffer
CN103152393A (en) Charging method and charging system for cloud computing
CN109788489A (en) A kind of base station planning method and device
CN108632309A (en) A kind of method and device of upgrading narrowband internet-of-things terminal
CN107544926A (en) Processing system and its access method
CN109981702A (en) File storage method and system
CN107977167A (en) Optimization method is read in a kind of degeneration of distributed memory system based on correcting and eleting codes
CN111901405B (en) Multi-node monitoring method and device, electronic equipment and storage medium
CN106055271B (en) A kind of repeated data based on cloud computing removes reselection method and device
CN108875035A (en) The date storage method and relevant device of distributed file system
EP4189542A1 (en) Sharing of compute resources between the virtualized radio access network (vran) and other workloads
CN118355366A (en) Database simulation modeling framework
CN113902128B (en) Asynchronous federal learning method, device and medium for improving utilization efficiency of edge device
CN109995834A (en) Massive dataflow processing method, calculates equipment and storage medium at device
CN111324644B (en) Method and device for monitoring database connection storm under large-scale micro-service architecture
CN117332881A (en) Distributed training method and electronic equipment
CN110502495A (en) A kind of log collecting method and device of application server
CN109753225A (en) A kind of date storage method and equipment
CN109976896A (en) Business re-scheduling treating method and apparatus
CN110399095A (en) A kind of statistical method and device of memory space
CN114546610B (en) Mass data distributed desensitization device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination