CN112463531A - File transmission early warning method, device, equipment and storage medium - Google Patents

File transmission early warning method, device, equipment and storage medium Download PDF

Info

Publication number
CN112463531A
CN112463531A CN202011328957.0A CN202011328957A CN112463531A CN 112463531 A CN112463531 A CN 112463531A CN 202011328957 A CN202011328957 A CN 202011328957A CN 112463531 A CN112463531 A CN 112463531A
Authority
CN
China
Prior art keywords
file
data
preset
arrival
time sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011328957.0A
Other languages
Chinese (zh)
Inventor
谢凌杰
王立新
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Construction Bank Corp
Original Assignee
China Construction Bank Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Construction Bank Corp filed Critical China Construction Bank Corp
Priority to CN202011328957.0A priority Critical patent/CN112463531A/en
Publication of CN112463531A publication Critical patent/CN112463531A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3055Monitoring arrangements for monitoring the status of the computing system or of the computing system component, e.g. monitoring if the computing system is on, off, available, not available
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3447Performance evaluation by modeling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3452Performance evaluation by statistical analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring
    • G06F11/3476Data logging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring
    • G06F11/3495Performance evaluation by tracing or monitoring for systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • H04L41/064Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis involving time analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/06Protocols specially adapted for file transfer, e.g. file transfer protocol [FTP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/14Session management
    • H04L67/146Markers for unambiguous identification of a particular session, e.g. session cookie or URL-encoding

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention relates to a file transmission early warning method, a device, equipment and a storage medium, wherein the method comprises the following steps: acquiring file state data of a preset file in a current statistical period and at least one historical statistical period before the current statistical period, wherein the file state data is associated with period identification of the corresponding statistical period and comprises file arrival state data; processing the file state data according to the cycle identifier to obtain target historical time sequence data; inputting the target historical time sequence data into a preset time sequence prediction model to obtain future time sequence data of the preset file in a future preset time period; and if the future time sequence data meet a preset condition, generating arrival early warning information of the preset file. By implementing the invention, the technical problem of poor alarm timeliness in the prior art can be solved.

Description

File transmission early warning method, device, equipment and storage medium
Technical Field
The present invention relates to the field of file transmission, and in particular, to a method, an apparatus, a device, and a storage medium for file transmission warning.
Background
The commercial bank IT system has a large amount of data exchange, the links from business flow entry, product and report inquiry of a front stage to accounting, batch analysis and the like of a back stage need to be subjected to data exchange, and the timeliness requirement is high. In daily operation and maintenance, data exchange delay frequently occurs, and important services such as bank supervision and delivery, service handling and the like are seriously affected.
The current main solution is to monitor the important data file, and specifically, the method may be implemented by scanning a file directory or transmitting a log at a fixed time, triggering an event, and the like, and if an abnormal file occurs, an alarm is generated to notify an operation and maintenance worker to perform emergency processing, for example, if an ABC _ yymmdd.dat file normally arrives at 7:00 every day, a Filewatch (file monitoring) program is set to check the receiving directory where the file is located at 7:00, and if the file is not matched with the ABC _ dat file, an alarm is generated, but this method is often passive and difficult to ensure timeliness.
Disclosure of Invention
The invention provides a file transmission early warning method, a file transmission early warning device, file transmission early warning equipment and a storage medium, and aims to solve the technical problem of poor alarm timeliness in the prior art.
An embodiment of the present invention provides a file transmission early warning method, in one aspect, where the method includes: acquiring file state data of a preset file in a current statistical period and at least one historical statistical period before the current statistical period, wherein the file state data is associated with period identification of the corresponding statistical period and comprises file arrival state data; processing the file state data according to the cycle identifier to obtain target historical time sequence data; inputting the target historical time sequence data into a preset time sequence prediction model to obtain future time sequence data of the preset file in a future preset time period, wherein the preset time sequence model is obtained by utilizing file state sample data of the preset file in a historical statistical period to train an initial time sequence model; and if the future time sequence data meet a preset condition, generating arrival early warning information of the preset file.
Further, the future time sequence data includes a future arrival time of the preset file, and correspondingly, if the future time sequence data satisfies a preset condition, the generating of the arrival warning information of the preset file includes: and if the difference value between the future arrival time and the preset arrival time of the preset file is greater than or equal to a preset time threshold value, generating first arrival early warning information of the preset file.
Further, the future time series data includes a size of a future arrival file of the preset file, and correspondingly, if the future time series data satisfies a preset condition, the generating of the arrival warning information of the preset file includes: and if the difference value between the size of the future arrival file and the preset arrival file size of the preset file is larger than or equal to a preset size threshold value, generating second arrival early warning information of the preset file.
Further, the file state data further includes link transmission state data, the file state sample data includes file arrival state sample data and link transmission state sample data, and correspondingly, the processing the file state data according to the cycle identifier to obtain target historical time series data includes: and processing the file arrival state data and the link transmission state data according to the cycle identifier to obtain target historical time sequence data.
Further, the processing the file state data according to the cycle identifier to obtain target historical time series data includes: processing the file state data according to the cycle identifier to obtain initial historical time sequence data; judging whether the initial historical time sequence data contains a vacancy or not; and if not, determining the initial historical time sequence data as target historical time sequence data.
Further, the processing the file state data according to the cycle identifier to obtain target historical time-series data further includes: if the initial historical time sequence data contains a vacancy, estimating a vacancy value to be filled in the vacancy by utilizing numerical values distributed on two sides of the vacancy; and filling the vacancy value in the initial historical sequence data to obtain target historical time sequence data.
On the other hand, an embodiment of the present invention provides a file transmission early warning apparatus, where the apparatus includes: the data acquisition module is used for acquiring file state data of a preset file in a current statistical period and at least one historical statistical period before the current statistical period, the file state data is associated with a period identifier of the corresponding statistical period, and the file state data comprises file arrival state data; the first data processing module is used for processing the file state data according to the cycle identifier to obtain target historical time sequence data; the second data processing module is used for inputting the target historical time sequence data into a preset time sequence prediction model to obtain future time sequence data of the preset file in a future preset time period, and the preset time sequence model is obtained by utilizing file state sample data of the preset file in a historical statistical period to train an initial time sequence model; and the early warning information generating module is used for generating the arrival early warning information of the preset file when the future time sequence data meets a preset condition.
Further, the future time series data includes a future arrival time of the preset file, and correspondingly, the warning information generating module includes: the first early warning information generation submodule is used for generating first arrival early warning information of the preset file when the difference value between the future arrival time and the preset arrival time of the preset file is greater than or equal to a preset time threshold value; and/or the future time series data comprise the size of a future arrival file of the preset file, and correspondingly, the early warning information generating module comprises: and the second early warning information generation submodule is used for generating second arrival early warning information of the preset file when the difference value between the size of the future arrival file and the preset arrival file size of the preset file is greater than or equal to a preset size threshold value.
In another aspect, an embodiment of the present invention provides an apparatus, where the apparatus includes a processor and a memory, where the memory stores at least one instruction, at least one program, a code set, or a set of instructions, and the at least one instruction, the at least one program, the code set, or the set of instructions is loaded and executed by the processor to implement any one of the file transfer warning methods.
In another aspect, an embodiment of the present invention provides a computer-readable storage medium, where at least one instruction, at least one program, a set of codes, or a set of instructions is stored, and the at least one instruction, the at least one program, the set of codes, or the set of instructions causes the computer to execute any one of the file transfer warning methods.
Due to the technical scheme, the invention has the following beneficial effects:
acquiring file state data of a preset file in a current statistical period and at least one historical statistical period before the current statistical period, the file status data being associated with a period identification of a respective statistical period, the file status data comprising file arrival status data, processing the file state data according to the cycle identifier to obtain target historical time sequence data, inputting the target historical time sequence data into a preset time sequence prediction model to obtain future time sequence data of the preset file in a future preset time period, and generating arrival early warning information of the preset file when the future time series data meet a preset condition, therefore, the arrival state of the data file can be predicted in advance, early warning is carried out when the arrival state meets the preset condition, and the influence caused by untimely emergency response is reduced.
Drawings
In order to more clearly illustrate the technical solution of the present invention, the drawings used in the description of the embodiment or the prior art will be briefly described below. It is obvious that the drawings in the following description are only some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.
FIG. 1 is a schematic diagram of the structure of an RNN (recurrent neural network);
FIG. 2 is a schematic diagram of the structure of LSTM (Long Short-Term Memory network model);
fig. 3 is a schematic flowchart of a file transmission warning method according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a timeline provided by an embodiment of the present invention;
FIG. 5 is a schematic structural diagram of a data exchange link through which a default file passes during transmission according to an embodiment of the present invention;
FIG. 6 is a full view of a data link provided by an embodiment of the present invention;
fig. 7 is a schematic structural diagram of a file transmission warning device according to an embodiment of the present invention;
fig. 8 is a schematic structural diagram of a file transfer early warning device according to an embodiment of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprising" and "having," as well as any variations thereof, are intended to cover non-exclusive inclusions.
Time series model: time series prediction analysis is to use the characteristics of an event time in the past period of time to predict the characteristics of the event in the future period of time. The time series model is dependent on the sequence of events, and the results generated by inputting the time series model after the sequence of values with the same size is changed are different. In a data exchange scenario, the daily arrival time of each data file may be collected and processed to form an initial time series for analysis.
A neural network: the method is an arithmetic mathematical model simulating animal neural network behavior characteristics and performing distributed parallel information processing, and is mainly used for estimating or approximating functions. A neural network consists of an input layer, a hidden layer, an output layer, and usually multiple hidden layers.
RNN: compared with the characteristic that the calculation results of the common neural network are independent, the calculation result of each hidden layer of the RNN is related to the current input and the previous hidden layer, and the model is shown in figure 1. With this property, RNN is widely used to model time series models.
The right side of fig. 1 is a structure expanded for easy understanding of memory at the time of calculation. In brief, x is the input layer, o is the output layer, s is the hidden layer, and t refers to the calculation of the number of times; and V, W and U are weights, wherein St is f (U is Xt + W is St-1) when the t-th hidden layer state is calculated, and the purpose of hooking the current input result with the previous calculation is realized.
LSTM (Long Short-Term Memory model) is a variation of RNN, and is characterized in that valve nodes of each layer, including forgetting valves, input valves and output valves, are added to the RNN structure, and whether to add the output result of the layer to the current calculation of the layer is determined by opening or closing the valves and setting the threshold. This solves a significant problem in RNN, namely gradient extinction and gradient detonation, which is modeled in fig. 2.
Wherein g is an input gate function, h is an output gate function, and s is a forgetting gate function. The gate function typically uses a sigmoid activation function. sigmoid is used to help regulate the value flowing through the network, always limiting the value to between 0 and 1.
Figure BDA0002795244140000061
W is a weight matrix, net is an input matrix of each layer, and W and net are subjected to matrix dot multiplication to obtain an output value y of each layer.
S is a hidden layer state value, and the input of the hidden layer state value is a current state and a historical state.
The valve node calculates the memory state of the network as input by using a sigmoid function, and multiplies the valve output by the calculation result of the current layer (matrix dot multiplication) as the input of the next layer if the output result reaches a threshold value; and if the threshold value is not reached, forgetting the output result. The weights of each layer are updated during the model back propagation training process.
Referring to fig. 3, fig. 3 is a schematic flow chart of a file transmission warning method according to an embodiment of the present invention, and this specification provides the method operation steps described in the embodiment or the flow chart, but the method may include more or less operation steps based on conventional or non-creative labor. The order of steps recited in the embodiments is merely one manner of performing the steps in a multitude of orders and does not represent the only order of execution. In practice, the devices or apparatuses may be executed sequentially or in parallel (for example, in the context of parallel processors or multi-thread processing) according to the methods shown in the embodiments or figures. Specifically, as shown in fig. 3, the method may include:
step S301: acquiring file state data of a preset file in a current statistical period and at least one historical statistical period before the current statistical period, wherein the file state data is associated with period identification of the corresponding statistical period and comprises file arrival state data;
in the embodiment of the present invention, as shown in fig. 4, the time axis may be divided into a plurality of statistical cycles (i.e. a plurality of time intervals with continuous equal length), the statistical cycle duration of each statistical cycle is the same, for example, the time duration is T, and the time duration T may be determined according to the practical application requirements, for example, for the ABC _ yymmdd.dat file described in the background art, it needs to arrive at 7:00 of each day, and therefore, the time duration T of the statistical cycle may be set to be one day, that is, set to be 24 hours.
The current statistical period refers to a statistical period in which a file state data acquisition event is located, and the current statistical period includes a starting time (e.g., 0:00) of the statistical period to an ending time (e.g., 24:00) of the statistical period, and it can be understood that the file state data is not necessarily generated when the file state data acquisition event is generated, and therefore, generating the file state data acquisition event does not mean that the file state data in the current statistical period can be acquired, that is, when the file state data acquisition event is generated, if the file state data is not generated, the file state data in the current statistical period cannot be acquired; when the file state data acquisition event is generated, if the file state data is generated, the file state data in the current statistical period can be acquired.
The historical statistic period refers to a complete statistic period before the current statistic period, for example, in fig. 4, the historical statistic period 1 is before the current statistic period, and the historical statistic period 1 includes a start time of the statistic period to an end time of the statistic period, which may be referred to as a historical statistic period, and similarly, the historical statistic period 2 may also be referred to as a historical statistic period.
The period flag is used to distinguish different statistical periods, for example, the flag of the historical statistical period 1 is set as the period flag (T1), and the flag of the historical statistical period 2 is set as the period flag (T2), so that different historical statistical periods can be distinguished by the period flag T1 or the period flag T2.
By associating the file state data with the period identifier, it can be determined which statistical period a certain file state data is generated in, for example, file state data 1 is associated with the period identifier (T1), which indicates that the file state data 1 is generated in the historical statistical period 1 corresponding to the period identifier (TI); similarly, the file state data 2 is associated with the period identifier (T2), and it is indicated that the file state data 2 is generated in the historical statistical period 2 corresponding to the period identifier (T2).
The file arrival state data is state data of the preset file arriving at the current device, and may include time state data of the preset file arriving at the current device, and may also include file size data of the preset file arriving at the current device.
Step S303: processing the file state data according to the cycle identifier to obtain target historical time sequence data;
in the embodiment of the present invention, since the file state data is associated with the period identifier, the file state data generated in different statistical periods may be processed based on the period identifier to obtain the target historical time series data.
Specifically, the file state data may be arranged according to the period identifier in a time sequence, and the file state data after being arranged in the sequence is stored in a data matrix, so as to obtain the target historical time series data of the preset file.
Step S305: inputting the target historical time sequence data into a preset time sequence prediction model to obtain future time sequence data of the preset file in a future preset time period, wherein the preset time sequence model is obtained by utilizing file state sample data of the preset file in a historical statistical period to train an initial time sequence model;
in the embodiment of the present invention, the preset time sequence model is obtained by training using file state sample data, wherein the file state sample data is obtained by data acquisition and processing.
Specifically, the data collection process of the file state sample data is to deploy a collection program (shell script implementation) on each host, collect index information related to data transmission, where the index information may include:
data file information: including file name, file path, file arrival time, file arrival size, etc.
The file state sample data may include file state data of a plurality of historical statistical periods, for example, each historical statistical period may be set to one day, and then file state data of at least one natural month may be collected, so that periodicity and front-back relevance possibly contained in the data are considered, and rules of banking peak at the beginning of the month, at the end of the month and the like are excavated as much as possible.
The data processing process of the file state sample data may include:
(1) and (3) standardization treatment process: the format of the collected raw time (HH: MM format) is converted into a numerical format. Specifically, time may be converted to minutes from zero, such as 1:01 to 61 and 6:30 to 390, based on zero.
(2) Missing value filling process: and filling null values caused by abnormal acquisition and the like by adopting a linear interpolation method. Specifically, the parameter for linear selection can be performed by using the interplate method (shown below) in the pandas package of Python.
d['val']=d['val'].interpolate(method='linear')
(3) Original data set generation process: generating an original data set as shown in the following table 1, wherein day refers to a period identifier, arr refers to a file arrival time, and size refers to a file arrival size;
TABLE 1
No day arr size
1 1 632 9813
2 2 598 9469
3 3 612 8823
(4) A training set and a test set segmentation process: because supervised machine learning is adopted, a certain test set is required for training, and the test result is continuously optimized through evaluation. The coefficient may be set to 0.7, i.e. 70% of the data is used for training and 30% for testing. Specifically, data of 30 days of history can be selected, the first 21 days are taken as training data, and the last 9 days are taken as test data. It should be noted that, for the processing of the time series, the test set cannot be randomly selected, so as to avoid disturbing the time law of the data.
After obtaining the file state sample data, model training may be performed, and the preset time sequence model may be implemented using an LSTM function of a Keras framework (Keras is a deep learning framework based on therano, and its design refers to Torch, is written in Python language, is a highly modular neural network library, and supports GPU and CPU), and specifically includes:
(1) modeling
model=Sequential()
model.add(LSTM(units=30,
return_sequences=True,input_shape=(step,X.shape[2])))model.add(LSTM(units=50,dropout=0.2,recurrent_dropout=0.2))
model.add(Dense(X.shape[2]))
model.compile(optimizer='adam',loss='mse')
(2) Model training
With the initial model, model training is required through training data. The epochs and batch _ size parameters at model training are set by providing parameters to the fit () function. epochs is the number of iterations of the data set, and batch _ size represents the number of instances used per iteration.
model=seq2seqModel(X_train,step)
model.fit(X_train,y_train,epochs=100,batch_size=30,verbose=0)
(3) Evaluation model
train_score=model.evaluate(X_train,y_train,verbose=0)
print(‘Train result:%.2f MSE(%.2f RMSE)’%(train_score,math.sqrt(train_score)))
validation_score=model.evaluate(X_validation,y_validation,verbose=0)
print(‘Validation result:%.2f MSE(%.2f RMSE)’%
(validation_score,math.sqrt(validation_score)))
And obtaining the mean square error of the test value and the verification value, wherein the smaller the value is, the more accurate the prediction is.
And after the preset time sequence prediction model is obtained, storing the collected text state sample data into a data matrix, and obtaining a two-dimensional array by using the following sentences, wherein the predicted value of the state data of each file in the future arrival time and the predicted value of the size of the file are correspondingly obtained.
y_pre=model.predict(data,verbose=0)
Step S307: and if the future time sequence data meet a preset condition, generating arrival early warning information of the preset file.
In the embodiment of the invention, the file state data of a preset file in a current statistical period and at least one historical statistical period before the current statistical period are acquired, the file state data are associated with period identifiers of corresponding statistical periods, the file state data comprise file arrival state data, the file state data are processed according to the period identifiers to obtain target historical time sequence data, the target historical time sequence data are input into a preset time sequence prediction model to obtain future time sequence data of the preset file in a future preset time period, arrival early warning information of the preset file is generated when the future time sequence data meet preset conditions, so that the arrival state of the data file can be predicted in advance, and early warning is performed when the arrival state meets the preset conditions, the emergency efficiency is improved, the influence caused by untimely emergency response is reduced, and intelligent operation and maintenance are realized.
In addition, the time sequence prediction is carried out by using the LSTM model, so that the rule of data in time can be better found, and the method has better performance than the traditional algorithms such as ARIMA, RNN and the like.
In some embodiments, the future time-series data includes a future arrival time of the preset file, and correspondingly, if the future time-series data satisfies a preset condition, the generating the arrival warning information of the preset file may include:
and if the difference value between the future arrival time and the preset arrival time of the preset file is greater than or equal to a preset time threshold value, generating first arrival early warning information of the preset file.
For example, if the delay exceeds 60 minutes (i.e. 1 hour) for the predicted value obtained by y _ pre, an alarm is triggered, and the alarm can be sent in a form of a short message to inform the operation and maintenance personnel of attention in advance.
In some embodiments, if the file is used directly without monitoring and determining the size of the file, unknown effects may be caused on file processing, for example, an excessively large file may prolong a corresponding processing time and affect data supply or service access to a downstream; too small a file may have data loss or quality issues, resulting in a job error. Accordingly, the future time series data may include a future arrival file size of the preset file, and correspondingly, if the future time series data satisfies a preset condition, the generating of the arrival warning information of the preset file may include:
and if the difference value between the size of the future arrival file and the preset arrival file size of the preset file is larger than or equal to a preset size threshold value, generating second arrival early warning information of the preset file.
In practical applications, by predicting the size of the arriving file, the risk of job delay or failure due to data quality problems or large data volume can be reduced.
In some embodiments, as shown in FIG. 5, the A system is the most upstream donor system, which generates multiple files of data files, some of which are supplied to the B1 system and some of which are supplied to the B2 system; b1 and B2 are used as intermediate systems, and the received data files are processed in a personalized mode through a plurality of different operations and then transmitted to a downstream C system; and after the system C receives the required data, the required data file is generated through self operation processing. That is, the problem at the upstream is likely to affect the downstream, and therefore, the file status data may further include link transmission status data, where the file status sample data includes file arrival status sample data and link transmission status sample data, and correspondingly, the processing the file status data according to the cycle identifier to obtain the target historical time series data may include:
and processing the file arrival state data and the link transmission state data according to the cycle identifier to obtain target historical time sequence data.
In this embodiment of the present invention, the file arrival state data includes at least one of the following: the file arrival time and the file arrival size, and the link transmission state data at least includes one of the following data: CPU utilization rate, memory utilization rate, disk io response time, database state, job execution condition and transmission queue state of the server.
Specifically, the data collection process of the file state sample data is to deploy a collection program (shell script implementation) on each host, collect index information related to data transmission, where the index information may include:
data file information: including file name, file path, file arrival time, file arrival size, etc.;
file transmission information: a sending and receiving system corresponding to each data file, a sending and receiving directory, transmission start and end time, transmission queue length and the like;
system resources: CPU utilization rate, memory utilization rate, disk IO response time, file system utilization rate and the like;
database indexes are as follows: very long sql (Structured Query Language), big transactions, whether a session is blocked, deadlock detection, etc.;
job execution conditions: job execution status, job execution duration, etc.
Transmission queue status: a file transfer task return value, a transfer queue length (if the number of files waiting to be transferred exceeds a threshold, there may be an exception to the transfer), etc.
The collected information can be used for both input sample generation for machine learning and generation of a full view of the data link as shown in fig. 6.
The data link full view is specifically that on the basis of automatically acquiring all data file information, an upstream and downstream dependency relationship of each data file is generated in an automatic judgment and manual confirmation mode, all links form a directed graph (namely, the data link full view), and the dependency relationship of all systems and data files on the data link can be found out through the data link full view.
After data acquisition is completed, data is processed, and the data processing process may include:
(1) and (3) standardization treatment process: the format of the collected raw time (HH: MM format) is converted into a numerical format. Specifically, time may be converted to minutes from zero, such as 1:01 to 61 and 6:30 to 390, based on zero.
(2) Missing value filling process: and filling null values caused by abnormal acquisition and the like by adopting a linear interpolation method. Specifically, the parameter for linear selection can be performed by using the interplate method (shown below) in the pandas package of Python.
d['val']=d['val'].interpolate(method='linear')
(3) Original data set generation process: in addition to the file arrival time and the file size (i.e., the file arrival state sample data), one or more of 6 indexes of the CPU (central processing unit) utilization rate, the memory utilization rate, the disk io (input-output) response time, the database state, the job execution condition, and the transmission queue state (i.e., the link transmission state sample data) of the server may be selected. The first 5 are specific numerical values, and the last 3 are boolean values (1 indicates normal, 0 indicates abnormal).
Because a plurality of systems and servers are involved in the transmission queue, in order to simplify the processing, the method for averaging and carrying out the operation processing on a plurality of data comprises the following specific steps:
for CPU utilization rate, memory utilization rate and disk io response time, taking the average value of the first half hour of data transmission and putting the average value into a matrix;
and performing AND operation on the state codes of the plurality of related servers according to the state of the database, the operation execution condition and the transmission queue state, wherein if one of the state codes is abnormal, the overall index value is abnormal.
Table 2 below is an original data set finally generated according to the file arrival time, the file size, and 6 indexes of the server, where CPU refers to CPU utilization, mem refers to memory utilization, io refers to disk io response time, db refers to a database state, joba refers to job execution, and trans refers to a transmission queue state:
TABLE 2
No day arr size cpu mem io db job trans
1 1 632 9813 32 22 3.11 1 1 1
2 2 598 9469 14 19 2.20 1 1 1
3 3 612 8823 53 41 1.93 1 0 1
After the file state sample data is obtained, model training can be performed, and the model training process is consistent with the process of performing model training by using the file arrival state sample data alone, and is not repeated here.
In practical application, the file arrival state data and the link transmission state data are processed to obtain target historical time sequence data, the target historical time sequence data are input into a preset time sequence prediction model, and arrival early warning information of the preset file is generated when the future time sequence data meet preset conditions, so that the influence of upstream problems on downstream is fully considered, the arrival state of the data file can be predicted more accurately, and early warning can be performed more accurately when the arrival state meets the preset conditions.
In some embodiments, the processing the file status data according to the cycle identifier to obtain the target historical time-series data may include:
processing the file state data according to the cycle identifier to obtain initial historical time sequence data;
judging whether the initial historical time sequence data contains a vacancy or not;
and if not, determining the initial historical time sequence data as target historical time sequence data.
In some embodiments, the processing the file status data according to the cycle identifier to obtain the target historical time-series data may further include:
if the initial historical time sequence data contains a vacancy, estimating a vacancy value to be filled in the vacancy by utilizing numerical values distributed on two sides of the vacancy;
and filling the vacancy value in the initial historical sequence data to obtain target historical time sequence data.
The embodiment further provides a file transmission early warning device, as shown in fig. 7, the device may include:
a data obtaining module 710, configured to obtain file state data of a preset file in at least one historical statistics period before a current statistics period, where the file state data includes file arrival state data;
the first data processing module 720 is configured to process the file state data according to the cycle identifier to obtain target historical time series data;
the second data processing module 730 is configured to input the target historical time series data into a preset time series prediction model to obtain future time series data of the preset file in a future preset time period, where the preset time series model is obtained by training an initial time series model using file state sample data of the preset file in a historical statistics period;
and an early warning information generating module 740, configured to generate arrival early warning information of the preset file when the future time series data meets a preset condition.
In some embodiments, the future time-series data may include a future arrival time of the preset file, and the warning information generating module may include:
and the first early warning information generation submodule is used for generating first arrival early warning information of the preset file when the difference value between the future arrival time and the preset arrival time of the preset file is greater than or equal to a preset time threshold value.
In some embodiments, the future time-series data may further include a future arrival file size of the preset file, and the warning information generating module may further include:
and the second early warning information generation submodule is used for generating second arrival early warning information of the preset file when the difference value between the size of the future arrival file and the preset arrival file size of the preset file is greater than or equal to a preset size threshold value.
The device provided in the above embodiments can execute the method provided in any embodiment of the present application, and has corresponding functional modules and beneficial effects for executing the method. Technical details not described in detail in the above embodiments may be referred to a method provided in any of the embodiments of the present application.
The present embodiments also provide a computer-readable storage medium having stored therein at least one instruction, at least one program, set of codes, or set of instructions that is loaded by a processor and performs any of the methods described above in the present embodiments.
Referring to fig. 8, the apparatus 800 may include one or more Central Processing Units (CPUs) (e.g., one or more processors) and a memory, and one or more storage media (e.g., one or more mass storage devices) for storing applications or data. The memory and storage medium may be, among other things, transient or persistent storage. The program stored on the storage medium may include one or more modules (not shown), each of which may include a sequence of instruction operations for the device. Still further, the central processor may be configured to communicate with the storage medium to perform a series of instruction operations on the storage medium on the device. The apparatus 800 may also include one or more power supplies, one or more wired or wireless network interfaces, one or more input-output interfaces, and/or one or more operating systems, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, and so forth. Any of the methods described above in this embodiment can be implemented based on the apparatus shown in fig. 8.
The present specification provides method steps as described in the examples or flowcharts, but may include more or fewer steps based on routine or non-inventive labor. The steps and sequences recited in the embodiments are but one manner of performing the steps in a multitude of sequences and do not represent a unique order of performance. In the actual system or interrupted product execution, it may be performed sequentially or in parallel (e.g., in the context of parallel processors or multi-threaded processing) according to the embodiments or methods shown in the figures.
The configurations shown in the present embodiment are only partial configurations related to the present application, and do not constitute a limitation on the devices to which the present application is applied, and a specific device may include more or less components than those shown, or combine some components, or have an arrangement of different components. It should be understood that the methods, apparatuses, and the like disclosed in the embodiments may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a division of one logic function, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or unit modules.
Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
Those of skill would further appreciate that the various illustrative components and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims (10)

1. A file transmission early warning method is characterized by comprising the following steps:
acquiring file state data of a preset file in a current statistical period and at least one historical statistical period before the current statistical period, wherein the file state data is associated with period identification of the corresponding statistical period and comprises file arrival state data;
processing the file state data according to the cycle identifier to obtain target historical time sequence data;
inputting the target historical time sequence data into a preset time sequence prediction model to obtain future time sequence data of the preset file in a future preset time period, wherein the preset time sequence model is obtained by utilizing file state sample data of the preset file in a historical statistical period to train an initial time sequence model;
and if the future time sequence data meet a preset condition, generating arrival early warning information of the preset file.
2. The method according to claim 1, wherein the future time sequence data includes a future arrival time of the predetermined file, and accordingly, if the future time sequence data satisfies a predetermined condition, the generating the arrival warning information of the predetermined file includes:
and if the difference value between the future arrival time and the preset arrival time of the preset file is greater than or equal to a preset time threshold value, generating first arrival early warning information of the preset file.
3. The method according to claim 1, wherein the future time series data includes a size of a future arrival file of the predetermined file, and accordingly, if the future time series data satisfies a predetermined condition, the generating the arrival warning information of the predetermined file includes:
and if the difference value between the size of the future arrival file and the preset arrival file size of the preset file is larger than or equal to a preset size threshold value, generating second arrival early warning information of the preset file.
4. The method according to claim 1, wherein the file status data further includes link transmission status data, the file status sample data includes file arrival status sample data and link transmission status sample data, and correspondingly, the processing the file status data according to the cycle identifier to obtain the target historical time series data includes:
and processing the file arrival state data and the link transmission state data according to the cycle identifier to obtain target historical time sequence data.
5. The file transmission early warning method according to claim 1, wherein the processing the file status data according to the cycle identifier to obtain the target historical time series data comprises:
processing the file state data according to the cycle identifier to obtain initial historical time sequence data;
judging whether the initial historical time sequence data contains a vacancy or not;
and if not, determining the initial historical time sequence data as target historical time sequence data.
6. The method for file transmission early warning according to claim 5, wherein the processing the file status data according to the cycle identifier to obtain the target historical time series data further comprises:
if the initial historical time sequence data contains a vacancy, estimating a vacancy value to be filled in the vacancy by utilizing numerical values distributed on two sides of the vacancy;
and filling the vacancy value in the initial historical sequence data to obtain target historical time sequence data.
7. A file transfer early warning device, characterized in that the device comprises:
the data acquisition module is used for acquiring file state data of a preset file in a current statistical period and at least one historical statistical period before the current statistical period, the file state data is associated with a period identifier of the corresponding statistical period, and the file state data comprises file arrival state data;
the first data processing module is used for processing the file state data according to the cycle identifier to obtain target historical time sequence data;
the second data processing module is used for inputting the target historical time sequence data into a preset time sequence prediction model to obtain future time sequence data of the preset file in a future preset time period, and the preset time sequence model is obtained by utilizing file state sample data of the preset file in a historical statistical period to train an initial time sequence model;
and the early warning information generating module is used for generating the arrival early warning information of the preset file when the future time sequence data meets a preset condition.
8. The apparatus according to claim 7, wherein the future time-series data includes a future arrival time of the predetermined file, and the warning information generating module comprises:
the first early warning information generation submodule is used for generating first arrival early warning information of the preset file when the difference value between the future arrival time and the preset arrival time of the preset file is greater than or equal to a preset time threshold value; and/or the future time series data comprise the size of a future arrival file of the preset file, and correspondingly, the early warning information generating module comprises:
and the second early warning information generation submodule is used for generating second arrival early warning information of the preset file when the difference value between the size of the future arrival file and the preset arrival file size of the preset file is greater than or equal to a preset size threshold value.
9. An apparatus comprising a processor and a memory, the memory having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, the at least one instruction, the at least one program, the set of codes, or the set of instructions being loaded and executed by the processor to implement the file transfer warning method according to any one of claims 1 to 6.
10. A computer-readable storage medium having stored thereon at least one instruction, at least one program, a set of codes, or a set of instructions for causing a computer to perform the file transfer warning method according to any one of claims 1-6.
CN202011328957.0A 2020-11-24 2020-11-24 File transmission early warning method, device, equipment and storage medium Pending CN112463531A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011328957.0A CN112463531A (en) 2020-11-24 2020-11-24 File transmission early warning method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011328957.0A CN112463531A (en) 2020-11-24 2020-11-24 File transmission early warning method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN112463531A true CN112463531A (en) 2021-03-09

Family

ID=74799810

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011328957.0A Pending CN112463531A (en) 2020-11-24 2020-11-24 File transmission early warning method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112463531A (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107202604A (en) * 2017-03-02 2017-09-26 湖南工业大学 A kind of alert processing method and system
CN109032829A (en) * 2018-07-23 2018-12-18 腾讯科技(深圳)有限公司 Data exception detection method, device, computer equipment and storage medium
CN110517768A (en) * 2019-08-28 2019-11-29 泰康保险集团股份有限公司 Predicting abnormality method, predicting abnormality device, electronic equipment and storage medium
CN110555578A (en) * 2018-06-01 2019-12-10 北京京东尚科信息技术有限公司 sales prediction method and device
CN110851333A (en) * 2019-11-14 2020-02-28 北京金山云网络技术有限公司 Monitoring method and device of root partition and monitoring server
CN110866628A (en) * 2018-08-28 2020-03-06 北京京东尚科信息技术有限公司 System and method for multi-bounded time series prediction using dynamic time context learning
CN110888788A (en) * 2019-10-16 2020-03-17 平安科技(深圳)有限公司 Anomaly detection method and device, computer equipment and storage medium
CN110971435A (en) * 2018-09-30 2020-04-07 北京金山云网络技术有限公司 Alarm method and device

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107202604A (en) * 2017-03-02 2017-09-26 湖南工业大学 A kind of alert processing method and system
CN110555578A (en) * 2018-06-01 2019-12-10 北京京东尚科信息技术有限公司 sales prediction method and device
CN109032829A (en) * 2018-07-23 2018-12-18 腾讯科技(深圳)有限公司 Data exception detection method, device, computer equipment and storage medium
CN110866628A (en) * 2018-08-28 2020-03-06 北京京东尚科信息技术有限公司 System and method for multi-bounded time series prediction using dynamic time context learning
CN110971435A (en) * 2018-09-30 2020-04-07 北京金山云网络技术有限公司 Alarm method and device
CN110517768A (en) * 2019-08-28 2019-11-29 泰康保险集团股份有限公司 Predicting abnormality method, predicting abnormality device, electronic equipment and storage medium
CN110888788A (en) * 2019-10-16 2020-03-17 平安科技(深圳)有限公司 Anomaly detection method and device, computer equipment and storage medium
CN110851333A (en) * 2019-11-14 2020-02-28 北京金山云网络技术有限公司 Monitoring method and device of root partition and monitoring server

Similar Documents

Publication Publication Date Title
EP3629553B1 (en) Method and device for service scheduling
Aizpurua et al. Supporting group maintenance through prognostics-enhanced dynamic dependability prediction
Wang et al. Reliability importance of components in a complex system
JP4846923B2 (en) How to predict the timing of future service events for a product
CA2756198C (en) Digital analytics system
CN108052528A (en) A kind of storage device sequential classification method for early warning
WO2020134783A1 (en) Method, device and system for dispatching alarm ticket, and computer readable storage medium
CN110321240B (en) Business impact assessment method and device based on time sequence prediction
CN116664019B (en) Intelligent gas data timeliness management method, internet of things system, device and medium
Jones et al. Application of a usage profile in software quality models
Branchi et al. Learning to act: a reinforcement learning approach to recommend the best next activities
Brochado et al. A data-driven model with minimal information for bottleneck detection-application at Bosch thermotechnology
CN112463531A (en) File transmission early warning method, device, equipment and storage medium
CN110413482B (en) Detection method and device
US20230177443A1 (en) Systems and methods for automated modeling of processes
US20210133010A1 (en) Forecasting failures of interchangeable parts
Saleh et al. Application of quality function deployment and genetic algorithm for replacement of medical equipment
Mohammadi et al. Performance evaluation of single and multi-class production systems using an approximating queuing network
Salsabila et al. Throughput Analysis on a Multi-state Manufacturing System by Considering Availability
CN107133104A (en) A kind of distributed stream data processing method
US11886451B2 (en) Quantization of data streams of instrumented software and handling of delayed data by adjustment of a maximum delay
Namakshenas Real-time scheduling of a flexible manufacturing system using a two-phase machine learning algorithm
CN113742118B (en) Method and system for detecting anomalies in data pipes
US20230259849A1 (en) Methods and internet of things (iot) systems for maintenance management of smart gas call center
Li Safety stock planning and supply chain optimization in stock status

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination