CN116366628A - File transmission prediction method, device, equipment, storage medium and product - Google Patents
File transmission prediction method, device, equipment, storage medium and product Download PDFInfo
- Publication number
- CN116366628A CN116366628A CN202211597582.7A CN202211597582A CN116366628A CN 116366628 A CN116366628 A CN 116366628A CN 202211597582 A CN202211597582 A CN 202211597582A CN 116366628 A CN116366628 A CN 116366628A
- Authority
- CN
- China
- Prior art keywords
- transmission
- preset
- target
- feature
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/06—Protocols specially adapted for file transfer, e.g. file transfer protocol [FTP]
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a file transmission prediction method, a file transmission prediction device, file transmission prediction equipment, a storage medium and a file transmission prediction product. The invention relates to the technical field of artificial intelligence. The method comprises the following steps: the method comprises the steps of obtaining historical transmission characteristics corresponding to a data table to be predicted, wherein the historical transmission characteristics comprise first transmission behavior characteristics of a preset type file corresponding to the data table to be predicted in a preset historical transmission period, second transmission behavior characteristics of a preset type file corresponding to the data table in a target preset service system and third transmission behavior characteristics of a preset type file corresponding to the data table in a plurality of preset service systems, constructing input characteristics according to the historical transmission characteristics, inputting the input characteristics into a target preset prediction model, and predicting whether transmission delay occurs in the preset type file corresponding to the data table to be predicted in the transmission period. By adopting the technical scheme, whether the transmission delay of the preset type file corresponding to the data table to be predicted in the period can be accurately predicted.
Description
Technical Field
The embodiment of the invention relates to the technical field of artificial intelligence, in particular to a file transmission prediction method, a file transmission prediction device, file transmission prediction equipment, a storage medium and a file transmission prediction product.
Background
A Data Warehouse (DW or DWH) is a Data storage set in an enterprise, and has the characteristics of multi-theme oriented, layered, enterprise internal Data integrated, processing redundancy reduction, history change reflection and the like.
In general, a data warehouse needs to be in butt joint with a plurality of upstream service systems, each upstream service system leads a table in the database to form a table file, then the table file is transmitted to the data warehouse, the table file received by the data warehouse can reach thousands of table files each day, whether the batch running task of the data warehouse can successfully finish depending on the arrival condition of the table file or not, thousands of files must be transmitted to the data warehouse before a specified time point, the batch running task of the data warehouse can finish before a desired time point, and if only one file fails to arrive on time, the operation of one or more data warehouses is caused to be in a waiting state, so that the batch running task is delayed to finish, and the use of each report system downstream of the data warehouse is affected.
In an actual application scenario, a phenomenon that an upstream service system cannot transmit a file to a data warehouse before a specified time point due to various reasons often occurs, if the file reaches the specified time point, an undelivered list file is found, and then the problem in the upstream service system is solved by analysis, so that the normal use of the data warehouse is seriously affected.
Disclosure of Invention
The embodiment of the invention provides a file transmission prediction method, a device, equipment, a system and a storage medium, which can accurately predict whether a file transmitted to a data warehouse by a service system has transmission delay or not.
In a first aspect, an embodiment of the present invention provides a file transfer prediction method, including:
acquiring a history transmission characteristic corresponding to a data table to be predicted, wherein the data table to be predicted corresponds to a data table corresponding to a preset type file transmitted to a data warehouse by a target preset service system for data warehouse docking, the data warehouse is docked with a plurality of preset service systems, and the history transmission characteristic comprises a first transmission behavior characteristic of the preset type file corresponding to the data table to be predicted in a preset history transmission period, a second transmission behavior characteristic of the preset type file corresponding to a plurality of data tables in the target preset service system in the preset history transmission period and a third transmission behavior characteristic of the preset type file corresponding to a plurality of data tables in the preset service systems in the preset history transmission period;
constructing input features according to the historical transmission features;
And inputting the input characteristics into a target preset prediction model corresponding to the target preset business system, and predicting whether the preset type file corresponding to the data table to be predicted in the transmission period will have transmission delay or not according to the output result of the target preset prediction model.
In a second aspect, an embodiment of the present invention further provides a file transfer prediction apparatus, including:
the historical transmission characteristic acquisition module is used for acquiring historical transmission characteristics corresponding to a data table to be predicted, wherein the data table to be predicted is a data table corresponding to a preset type file transmitted to the data warehouse by a target preset service system for docking the data warehouse, the data warehouse is docked with a plurality of preset service systems, and the historical transmission characteristics comprise first transmission behavior characteristics of the preset type file corresponding to the data table to be predicted in a preset historical transmission period, second transmission behavior characteristics of the preset type file corresponding to a plurality of data tables in the target preset service system in the preset historical transmission period and third transmission behavior characteristics of the preset type file corresponding to a plurality of data tables in the preset service systems in the preset historical transmission period;
The input feature construction module is used for constructing input features according to the historical transmission features;
and the transmission delay prediction module is used for inputting the input characteristics into a target preset prediction model corresponding to the target preset service system, and predicting whether the preset type file corresponding to the data table to be predicted in the transmission period will have transmission delay or not according to the output result of the target preset prediction model.
In a third aspect, an embodiment of the present invention further provides an electronic device, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where the processor implements the file transfer prediction method according to any one of the embodiments of the present invention when executing the program.
In a fourth aspect, embodiments of the present invention further provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a file transfer prediction method according to any of the embodiments of the present invention.
In a fifth aspect, embodiments of the present invention further provide a computer program product comprising a computer program which, when executed by a processor, implements a file transfer prediction method according to any of the embodiments of the present invention.
According to the file transmission prediction scheme provided by the embodiment of the invention, the historical transmission characteristics corresponding to the data table to be predicted are obtained, wherein the data table corresponding to the preset type file in the target preset service system in the multiple preset service systems which are butted with the data warehouse is transmitted to the data warehouse by the data table to be predicted, the historical transmission characteristics comprise the first transmission behavior characteristics of the preset type file corresponding to the data table to be predicted in the preset historical transmission period, the second transmission behavior characteristics of the preset type file corresponding to the data table in the target preset service system in the preset historical transmission period and the third transmission behavior characteristics of the preset type file corresponding to the data table in the multiple preset service systems in the preset historical transmission period, the input characteristics are constructed according to the historical transmission characteristics, the input characteristics are input into the target preset prediction model corresponding to the target preset service system, and whether the preset type file corresponding to the data table to be predicted is delayed in the transmission period is predicted according to the output result of the target preset prediction model. By adopting the technical scheme, the data table transmitted to the data warehouse by a certain business system in butt joint of the data warehouse corresponds to the preset type file, the historical transmission characteristics of the data table to be predicted are comprehensively represented by adopting three different types of transmission behavior characteristics, whether the transmission delay of the preset type file corresponding to the data table to be predicted in the period occurs or not is accurately predicted by using the preset prediction model, the possible transmission delay is conveniently interfered in advance according to the prediction result, the occurrence of delay is reduced, and further the running batch task of the data warehouse can be more smoothly completed.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart of a method for predicting file transmission according to an embodiment of the present invention;
FIG. 2 is a flowchart of another method for predicting file transfer according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a file transfer prediction apparatus according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an electronic device implementing a file transfer prediction method according to an embodiment of the present invention.
Detailed Description
The invention is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting thereof. It should be further noted that, for convenience of description, only some, but not all of the structures related to the present invention are shown in the drawings.
It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures. Meanwhile, in the description of the present invention, the terms "first", "second", and the like are used only to distinguish the description, and are not to be construed as indicating or implying relative importance. The data acquisition, storage, use, processing and the like in the technical scheme meet the relevant regulations of national laws and regulations.
In order to facilitate understanding of the embodiments of the present invention, the following description of the related art will be given.
A data warehouse is a collection of data stores within an enterprise that gathers and manages data from different sources, providing relevant information downstream. Data warehouses are commonly used to connect and analyze business data from heterogeneous sources, are built for data analysis and reporting, are a mixture of technologies and components that facilitate the rational use of data, are electronic stores of large amounts of information by businesses, are intended for query and analysis, are not generally used for transaction processing, and can convert data into information and provide it to users in time to function. The data warehouse is subject-oriented, and data is derived from the database or the file and is obtained through conversion of certain rules and is used for analysis; data warehouses typically store historical data; the design of data warehouses is generally star-shaped with data redundancy to facilitate query analysis of data.
Data warehouses typically require hierarchical processing during construction. The layering technical processing means are different according to different services. The data warehouse hierarchy has the benefits of reducing repeated development, shielding anomalies of the original data, tracking of data blood edges, and the like. A classical data warehouse layering approach is to divide into a data preparation area (Operation Data Store, ODS, also called the paste source layer), a data detail layer (Data Warehouse Details, DWD), a data middle layer (Data Warehouse Middle, DWM) and a data service layer (Data Warehouse Service, DWS). Wherein, the ODS is the first layer of the data warehouse, and the data of other layers are derived based on the data of the ODS layer by processing step. The data tables of the data warehouse source system are typically stored as one copy to the ODS layer, which is the source of the data for subsequent data warehouse layer processing. The main source of the ODS layer data is a data source such as a business database or a buried point log. The data of the ODS layer is acquired by a data acquisition technology, and file transmission is a common data acquisition technology. Typically, the data warehouse will have a file storage system, and each upstream service system that the data warehouse interfaces with will export the table in the database (such as oracle or mysql) into a file, then transmit the file to the file storage system through the file transmission tool, and after the data warehouse has files, load the files into the table of the corresponding ODS layer through the loading script.
Batch processing of data warehouses is typically done on a batch-by-batch basis. There are many, typically tens or even hundreds, of upstream business systems to which the data warehouse interfaces. Each business system contains tens or hundreds of tables, each table produces a file or a plurality of subfiles and transmits the subfiles to the data warehouse, so that the data warehouse has more than thousands of ODS table files received every day. Whether the batch running task of the data warehouse can be successfully completed every day depends on the arrival condition of the ODS table files, thousands of files must be transmitted to a file server of the data warehouse before a specified time point, the batch running task of the data warehouse can be completed before a desired time point, and if one file fails to arrive on time, one or more operations of the data warehouse are caused to be in a waiting state, so that the batch running task is delayed to be completed, and the use of each report system downstream of the data warehouse is affected.
In practical application scenarios, the file transfer behavior of a data warehouse upstream business system often occurs when, for various reasons, a file cannot be transferred to a data warehouse file server before a specified point in time. In the face of the delay of file transmission of the service system, the operation and maintenance personnel often adopt a post emergency treatment mode. When the operation and maintenance personnel find that the file transmission of a certain table is delayed, the operation and maintenance personnel can be in urgent contact with the related personnel of the upstream system of the table to conduct problem investigation, the related personnel of the upstream system need to arrive at a machine room first, and then the delay time of the file transmission is reduced as much as possible through the problem investigation, so that adverse effects caused by the file transmission are reduced as much as possible. Some data warehouse manager takes some statistical measures to make simple predictions, such as counting which service systems or which ODS tables have been delayed a relatively large number of times in the last period of time, and then informing the relevant service systems that delay conditions are being prevented. However, the statistical means is too simple and rough, and the transmission delay of a certain file in each service system cannot be accurately predicted.
In the embodiment of the invention, whether the data table to be predicted is about to have transmission delay or not is predicted by utilizing the prediction model, and three different types of transmission behavior characteristics are fully considered so as to comprehensively represent the historical transmission characteristics of the file to be predicted, and the input characteristics of the model are constructed according to the historical transmission characteristics, so that accurate prediction is realized.
Fig. 1 is a flowchart of a file transmission prediction method provided by an embodiment of the present invention, where the embodiment of the present disclosure is applicable to a case of predicting whether a file transmitted from a service system to a data warehouse will have a transmission delay, and the method may be performed by a file transmission prediction device, where the device may be implemented in a form of software and/or hardware, and optionally, the device may be implemented by an electronic device, where the electronic device may be a mobile terminal such as a mobile phone, a smart watch, a tablet computer, or a personal digital assistant, or may be a device such as a personal computer (Personal Computer, PC) or a server.
Illustratively, the data warehouse interfaces with a plurality of preset service systems, specifically, a plurality of upstream service systems that interface, where the preset service systems may include a database, etc. The target preset service system may be understood as a source system corresponding to a data table to be predicted, which is currently required to perform transmission delay prediction, and may be any one of the preset service systems. The preset type file may include a file generated from a data table, which may be an ODS table, for example, that is, the preset type file may include an ODS table file, which will be described below as an example. The preset type file corresponding to the data table to be predicted in the transmission period can be the preset type file to be transmitted to the data warehouse by the target preset service system, or the preset type file to be transmitted to the data warehouse by the target preset service system, that is, the prediction time can be predicted in advance, or can be predicted in the transmission process.
For example, in the operation process of the preset service system, the same ODS table generates corresponding ODS table files in different transmission periods respectively, and the unique identification of the ODS table can be performed by using the system identification (such as a number, a name, or an ID) of the preset service system and the table identification (such as a table name or a table ID) of the ODS table, or the unique identification of the ODS table file in the current transmission period can be performed. For example, for the preset service system a, it is responsible for generating the ODS table a, and each transmission period generates the ODS table file corresponding to the ODS table a, and the contents of the ODS table files corresponding to the ODS table a generated in different transmission periods are different, but may be all uniquely identified by Aa.
In the embodiment of the invention, in order to comprehensively characterize the historical transmission characteristics corresponding to the data table to be predicted, the historical transmission characteristics comprise three types of transmission behavior characteristics. Optionally, the historical transmission characteristics may be obtained from a log file, the corresponding transmission behavior characteristics may be directly determined according to the obtained log data, and processing (such as calculation or statistics) may be performed on the basis of the obtained log data to obtain the corresponding transmission behavior characteristics.
Firstly, the historical transmission characteristics comprise first transmission behavior characteristics of a preset type file corresponding to the data table to be predicted in a preset historical transmission period. The preset historical transmission period may be understood as one or more transmission periods before the present transmission period. Optionally, the preset historical transmission period includes a plurality of continuous historical transmission periods, so that the setting has the advantage that the historical transmission rule of the preset type file corresponding to the data table to be predicted can be more comprehensively and accurately reflected. The length of a single transmission period may be set according to the actual requirements of the data warehouse, e.g. 1 day. The first transmission behavior feature may comprise, for example, at least one of a file size, a file transmission speed, whether a transmission delay is present, and a transmission delay time length, which has the advantage that the transmission behavior feature of a single file can be accurately characterized, thereby improving the prediction accuracy. Optionally, the first transmission behavior feature may further include, for example, a maximum transmission speed and a minimum transmission speed. Optionally, the first transmission behavior feature may include a combination of the foregoing multiple items, which may more fully characterize the transmission behavior feature of a single file, and further improve the prediction accuracy.
And secondly, the historical transmission characteristics comprise second transmission behavior characteristics of preset type files corresponding to a plurality of data tables in the target preset service system in the preset historical transmission period. Because there are usually multiple data tables in the same preset service system, the multiple data tables all generate corresponding preset type files and transmit the preset type files to the data warehouse, the transmission processes of the preset type files corresponding to the multiple data tables may affect each other, in addition, the transmission behavior characteristics of the multiple data tables can also reflect the performance (such as server performance) of the target preset service system, the code development quality of the file transmission script, the network quality between the target preset service system and the data warehouse, and the like, which may affect the transmission of the preset type files corresponding to the data tables to be predicted. Therefore, the second transmission behavior feature is included in the history transmission feature, so that the prediction accuracy can be improved. Alternatively, the plurality of data tables in the target preset service system may be all data tables in the target preset service system that need to be transmitted to the data warehouse. Alternatively, the second transmission behavior feature may include, for example, an average file size, an average file transmission speed, and an average transmission delay period, etc.
Furthermore, the history transmission characteristics include third transmission behavior characteristics of the preset type files corresponding to the data tables in the preset service systems in the preset history transmission period. Because the multiple preset service systems all need to transmit files to the data warehouse, file transmission processes of different systems may also be mutually affected, in addition, transmission behavior characteristics of multiple data tables in different systems may also embody indexes such as network performance at the data warehouse side, and transmission of preset type files corresponding to the data tables to be predicted may be affected. Therefore, the third transmission behavior feature is included in the history transmission feature, so that the prediction accuracy can be improved. Alternatively, the plurality of data tables in the plurality of preset service systems may be all data tables in all preset service systems to which the data warehouse is connected, which need to be transmitted to the data warehouse. Optionally, the third transmission behavior feature may include, for example, an average file size, an average file transmission speed, and an average transmission delay period, etc.
And 102, constructing input features according to the historical transmission features.
For example, after the historical transmission feature is obtained, further processing, such as normalization processing, may be performed on the historical transmission feature to construct an input feature that needs to be input into the prediction model.
And step 103, inputting the input characteristics into a target preset prediction model corresponding to the target preset service system, and predicting whether the preset type file corresponding to the data table to be predicted in the transmission period will have transmission delay or not according to the output result of the target preset prediction model.
For example, a corresponding preset prediction model may be trained for each preset service system in advance, and the trained model and the system identifier are stored in association. When the data table to be predicted is required to be predicted, a system identifier, such as a system number, corresponding to a target preset service system is acquired, a corresponding trained target preset prediction model is found, the found model is loaded, a prediction interface corresponding to the model is called to input the constructed input characteristics, and whether the data table to be predicted is subjected to transmission delay in the preset type file corresponding to the transmission period is predicted by utilizing the target preset prediction model.
Optionally, the preset prediction model may be a machine learning model, for example, may be a logistic regression model or a neural network model, and may be set according to actual requirements.
For example, the output result of the target preset prediction model may be a probability value that the corresponding preset type file of the data table to be predicted will have transmission delay in the present transmission period, and the probability value is compared with a preset threshold (e.g. 0.5), if the probability value is greater than or equal to the preset threshold, it may be considered that the transmission delay will occur, and if the probability value is less than the preset threshold, it may be considered that the transmission delay will not occur.
Optionally, the method may further include: in the case where it is determined that a transmission delay is to occur, a transmission delay reminding operation is performed to notify the relevant person to take the corresponding processing.
According to the file transmission prediction method provided by the embodiment of the invention, the historical transmission characteristics corresponding to the data table to be predicted are obtained, wherein the data table corresponding to the preset type file in the target preset service system in the multiple preset service systems which are butted with the data warehouse is transmitted to the data warehouse by the data table to be predicted, the historical transmission characteristics comprise the first transmission behavior characteristics of the preset type file corresponding to the data table to be predicted in the preset historical transmission period, the second transmission behavior characteristics of the preset type file corresponding to the data table in the target preset service system in the preset historical transmission period and the third transmission behavior characteristics of the preset type file corresponding to the data table in the multiple preset service systems in the preset historical transmission period, the input characteristics are constructed according to the historical transmission characteristics, the input characteristics are input into the target preset prediction model corresponding to the target preset service system, and whether the preset type file corresponding to the data table to be predicted is delayed in the transmission period is predicted according to the output result of the target preset prediction model. By adopting the technical scheme, the data table transmitted to the data warehouse by a certain business system in butt joint of the data warehouse corresponds to the preset type file, the historical transmission characteristics of the data table to be predicted are comprehensively represented by adopting three different types of transmission behavior characteristics, whether the transmission delay of the preset type file corresponding to the data table to be predicted in the period occurs or not is accurately predicted by using the preset prediction model, the possible transmission delay is conveniently interfered in advance according to the prediction result, the occurrence of delay is reduced, and further the running batch task of the data warehouse can be more smoothly completed.
In some embodiments, the target preset prediction model comprises a model trained based on a logistic regression model; wherein said constructing an input feature from said historical transmission features comprises: selecting at least two target features from the historical transmission features; calculating the at least two target features based on a preset calculation mode to obtain extension features; and constructing input features according to the historical transmission features and the expansion features. The method has the advantages of improving dimension, avoiding the problem of under fitting of the linear model, and improving model accuracy and prediction precision.
Illustratively, logistic regression is essentially a linear model, and the logistic regression in the embodiments of the present invention deals with classification problems, and the linear regression deals with regression problems. For a low-dimensional data set, the problem of under fitting of the linear model often occurs, and after the data set is subjected to polynomial feature expansion, the problem of under fitting of the linear model can be solved to a certain extent. If the linearity of the training data is not time-sharing, the accuracy of the trained model is low, the model is under-fitted, and the classification accuracy is not high. In the embodiment of the invention, the low-dimensional space can be converted into the high-dimensional space by selecting at least two target features from the historical transmission features and transforming the features, and the probability of being linearly separable in the high-order space is improved when the low-dimensional space is linearly non-separable.
In some embodiments, the selecting at least two target features from the historical transmission features includes: and respectively selecting at least one target feature from the two of the first transmission behavior feature, the second transmission behavior feature and the third transmission behavior feature to obtain at least one target feature pair. Based on a preset calculation mode, calculating the at least two target features to obtain an extension feature, wherein the method comprises the following steps: and calculating two target features in the current target feature pair based on a preset calculation mode aiming at each target feature pair in the at least one target feature pair to obtain an extension feature corresponding to the current target feature pair. The method has the advantages that the target characteristics are extracted from the transmission behavior characteristics of different types, so that the extension characteristics can contain relevant information in the transmission behavior characteristics of different types, and the accuracy of model prediction is further improved.
For example, the number of extended features may be one or more, and each extended feature may be calculated from a pair of two target features. For example, the first transmission behavior feature includes b, c and d, the second transmission behavior feature includes e, f and g, the third transmission behavior feature includes h, i and j, b and e may be selected as a target feature pair, an expansion feature may be calculated according to b and e, g and j may be selected as a target feature pair, an expansion feature may be calculated according to g and j, and the like. The preset calculation mode is not limited, and may be, for example, calculating a sum, calculating an average, calculating a product, or the like.
In some embodiments, the selecting at least one target feature from two of the first transmission behavior feature, the second transmission behavior feature and the third transmission behavior feature, respectively, to obtain at least one target feature pair includes: and respectively selecting at least one target feature from the second transmission behavior feature and the third transmission behavior feature to obtain at least one target feature pair. The method has the advantages that the information quantity contained in the expansion feature can be improved, and the accuracy of model prediction is further improved.
In some embodiments, the second transmission behavior feature comprises a first average transmission delay duration and/or a first transmission delay rate; the third transmission behavior feature comprises a second average transmission delay duration and/or a second transmission delay rate. The advantage of this is that the second transmission behavior feature as well as the third transmission behavior feature are determined reasonably accurately.
The first average transmission delay duration may be understood as an average value (may be an average value, or an average value corresponding to each historical transmission period) of transmission delay durations of preset type files corresponding to a plurality of data tables in the target preset service system in preset historical transmission periods (may be a plurality of continuous historical transmission periods, or a single historical transmission period in a plurality of continuous historical transmission periods). The first transmission delay rate may be understood as a ratio of the number of times of transmission delay occurring in a preset historical transmission period to the total number of times of transmission of a preset type file corresponding to a plurality of data tables in a target preset service system. The second average transmission delay time length can be understood as an average value of transmission delay time lengths of preset type files corresponding to a plurality of data tables in a plurality of preset service systems in a preset historical transmission period. The second transmission delay rate may be understood as a ratio of the number of times of transmission delay occurring in a preset historical transmission period of a preset type file corresponding to a plurality of data tables in a plurality of preset service systems to the total number of transmissions.
In some embodiments, the target feature pair includes a first average transmission delay duration and a second average transmission delay duration, and/or a first transmission delay rate and a second transmission delay rate. The method has the advantages that the target characteristics can be reasonably selected, and the model prediction accuracy is improved.
In some embodiments, for each of the at least one target feature pair, calculating, based on a preset calculation manner, two target features in a current target feature pair to obtain an extended feature corresponding to the current target feature pair, including: and calculating the product of two target features in the current target feature pair aiming at each target feature pair in the at least one target feature pair to obtain the corresponding expansion feature of the current target feature pair. The method has the advantages that the expansion characteristic can be obtained through rapid calculation, the model operation speed is improved under the condition that the model prediction accuracy is guaranteed, namely, the prediction efficiency is improved, and whether the transmission delay of the preset type file which is required to be transmitted currently or is being transmitted can be predicted more timely.
For example, assuming that the target feature pair includes a first average transmission delay duration and a second average transmission delay duration, the corresponding extended feature may be: a first average transmission delay period; assuming that the target feature pair includes a first transmission delay rate and a second transmission delay rate, the corresponding extended feature may be the first transmission delay rate.
Fig. 2 is a flowchart of another file transmission prediction method according to an embodiment of the present invention, where the optimization is performed on the basis of the foregoing alternative embodiments, the normalization processing is performed on the historical transmission features and the extension features to obtain the features to be filled, and the features to be filled are filled into the corresponding fields in the preset data structure to obtain the input features, so that the model prediction accuracy can be improved.
As shown in fig. 2, the method may include:
The historical transmission characteristics comprise first transmission behavior characteristics of a preset type file corresponding to a data table to be predicted in each historical transmission period of a plurality of continuous historical transmission periods, second transmission behavior characteristics of a preset type file corresponding to a plurality of data tables in a target preset service system in each historical transmission period of the plurality of continuous historical transmission periods, and third transmission behavior characteristics of a preset type file corresponding to a plurality of data tables in the plurality of preset service systems in each historical transmission period of the plurality of continuous historical transmission periods. The first transmission behavior characteristic comprises a file size, a file transmission speed, whether transmission delay exists or not and a transmission delay time length; the second transmission behavior feature includes a first average transmission speed, a first average transmission delay duration, and a first transmission delay rate; the third transmission behavior feature includes a second average transmission speed, a second average transmission delay duration, and a second transmission delay rate.
Illustratively, data of the past 7 days (7 continuous historical transmission periods) related to the data table to be predicted are read from the file transmission log file, and corresponding processing is performed. For example, the log data may include an upstream system number (target preset service system number), an ODS table name, a file transfer should complete time, a file size, a file transfer start time, a file transfer end time, and the like. Wherein the two fields, upstream system number and ODS table name, may represent a unique table. File transfer should complete time refers to the deadline by which this file should be transferred to the data warehouse. The file size refers to the size of the memory space that this file occupies. The file start transmission time and the file transmission end time represent the start transmission time and the completion transmission time of this file, respectively. Wherein, some ODS table files may be transferred to the data warehouse in multiple subfiles every day, and the same ODS table files may be combined first, the file size is the sum of the subfiles, the file transmission start time is the minimum, and the file transmission end time is the maximum. Optionally, processing data by using spark sql, calling a spark batch processing interface, loading the combined log data, and processing the log data. And subtracting the file transmission start time from the file transmission end time to calculate the file transmission use time, and dividing the file transmission use time by the file size to obtain the file transmission speed. All data of the target preset service system are acquired through the upstream system number, and average file transmission speeds corresponding to all ODS tables of the target preset service system are calculated to obtain a first average transmission speed. In addition, the average file transmission speed of all ODS tables of all preset service systems every day is calculated, and a second average transmission speed is obtained. The file transfer end time is subtracted from the file transfer completion time of each piece of data or each piece of combined data (for the case of multiple subfiles), resulting in a transfer delay time of each piece of data. When the transmission delay time is greater than 0, the file transmission is finished in advance, whether the file transmission delay value is 0 or not, and when the transmission delay time is less than 0, the file transmission is finished in delay, and whether the file transmission delay value is 1 or not. Calculating the average value of the transmission delay time of all ODS tables in a target preset service system to obtain a first average transmission delay time; and calculating file transmission delay rates of all ODS tables in the target preset service system to obtain a first transmission delay rate. In addition, calculating the average value of the transmission delay time of all ODS tables every day to obtain a second average transmission delay time; and calculating file transmission delay rates of all ODS tables every day to obtain a second transmission delay rate.
Illustratively, the target feature pair includes a first average transmission delay duration and a second average transmission delay duration, and a first transmission delay rate and a second transmission delay rate.
Exemplary, extended features are: the first average transmission delay duration is equal to the second average transmission delay duration, and the first transmission delay rate is equal to the second transmission delay rate.
And 204, normalizing the historical transmission characteristics and the expansion characteristics to obtain the characteristics to be filled.
Illustratively, normalization is to reduce the numerical value of the data to (0, 1) or (-1, 1) intervals, so that the data distribution of each dimension is close, and model parameters are prevented from being dominated by data with larger or smaller distribution range. The goal of machine learning is to continuously optimize the loss function to minimize the value, when the gradient is used for descending, the gradient direction deviates from the minimum value direction, so that the gradient is updated to a plurality of curved paths, and after normalization, the loss function curve becomes a comparatively round shape, thereby being beneficial to gradient descent. The normalization can accelerate gradient descent, quicken loss function convergence, eliminate different dimensions, facilitate comprehensive index evaluation and improve classification accuracy. Since the normalization process is beneficial to model training, the normalization process is performed, and in the model application stage, the corresponding normalization process is also required for the data.
The embodiment of the invention can adopt a method of maximum normalization, and the formula is as follows:
where x' represents the normalized feature, x represents the current feature (the history transmission feature or the extension feature), min (x) represents the minimum value of the history transmission feature and the extension feature, and max (x) represents the maximum value of the history transmission feature and the extension feature.
The method of linearizing the original data by the maximum normalization is converted into the range of (0, 1), so that the scaling of the original data in equal proportion is realized. The method solves the problem of different metrics by converting the original data into data bounded by a specific range by using the maximum value and the minimum value of the variable value, thereby eliminating the influence of dimension and magnitude, and changing the weight of the variable in analysis.
For example, the preset data structure may be set according to actual requirements, for example, related fields of the same historical transmission period may be set adjacently, or the same feature index may be set sequentially according to the order of the historical transmission periods. Assuming that the present transmission period is T, the predetermined history transmission periods may be denoted as T-1, T-2, T-3, T-4, T-5, T-6, and T-7, respectively. The input data and output result formats may be represented in table 1 below:
Table 1 input data and output result formats
Optionally, the target preset prediction model is obtained through training in the following manner:
1) Obtaining sample historical transmission characteristics corresponding to the data table to be predicted and sample labels to obtain sample data, wherein the sample historical transmission characteristics comprise first sample transmission behavior characteristics of a preset type file corresponding to the data table to be predicted in a sample preset historical transmission period, second sample transmission behavior characteristics of a preset type file corresponding to a plurality of data tables in the target preset service system in the sample preset historical transmission period and third sample transmission behavior characteristics of a preset type file corresponding to a plurality of data tables in the plurality of preset service systems in the sample preset historical transmission period, and the sample labels are whether transmission delay occurs in the sample transmission period of the preset type file corresponding to the data table to be predicted.
In the embodiment of the invention, model training can be performed for each preset service system, and a training process of a target preset prediction model corresponding to a target preset service system is taken as an example for explanation. By way of example, taking log data of the past 1 year, 1 ODS table can generate one piece of training data every day of the past year, that is, each ODS table can correspond to 365 pieces of training data. According to the log data, determining a sample historical transmission characteristic corresponding to the data table to be predicted, wherein a first sample transmission behavior characteristic contained in the sample historical transmission characteristic corresponds to a first transmission behavior characteristic of the previous, a second sample transmission behavior characteristic corresponds to a second transmission behavior characteristic of the previous, a third sample transmission behavior characteristic corresponds to a third transmission behavior characteristic of the previous, and the determination modes are similar and are not repeated here. The data of the current day and the past 7 days are taken as a complete training data. If the data of the current day is T data, the T data only contains the data of the field of whether the file transmission is delayed, and the field is used as a machine learning data tag, namely a sample tag.
2) And constructing sample input features according to the sample historical transmission features.
The sample input features are constructed in a similar manner to the input features described above, and will not be described in detail herein.
3) And inputting the sample input characteristics into a to-be-trained prediction model corresponding to the target preset service system, determining a loss relation by utilizing an output result of the to-be-trained prediction model and the sample label, and training the to-be-trained prediction model based on the loss relation.
For example, the loss relation may be calculated as a preset loss function. The preset loss function may include adding a cross entropy loss function, and a product of the regularization coefficient and the L2 regularization expression may be specifically expressed as: j (w) +λL 2 。
Wherein, the cross entropy loss function J (w) has the expression:
where w represents the input feature, n represents the number of samples, y i Representing sample tags, x i Representing model parameters, p (x i ) Using the output value of the model.
Wherein, the L2 regularization expression is:
by adding L2 regularization, the weight of each dimension is generally reduced, the fixed proportion of the weight is reduced, and the weight is smoothed.
Illustratively, a random gradient descent method (first-order convergence) is adopted to find an optimal solution during training, a descent direction is found through a first derivative of J (w) to w, and model parameters are updated in an iterative manner, so that a trained target preset prediction model is obtained.
According to the file transmission prediction method provided by the embodiment of the invention, on the basis of comprehensively representing the historical transmission characteristics of the data sheet to be predicted, the expansion characteristics are generated to promote the characteristic dimension, the normalization processing is carried out on the historical transmission characteristics and the expansion characteristics, the problem that the linear model is under fitted on the basis of the prediction model of logistic regression can be avoided, the influence of dimension and magnitude is eliminated, the accuracy and the prediction precision of the model are effectively improved, a relevant user (such as a data warehouse manager) is helped to rapidly and accurately predict whether the delay behavior of the ODS sheet is likely to appear, so that the data warehouse manager can know the risk of the file transmission behavior in advance, inform personnel of a relevant service system before the ODS sheet with relatively high risk is advanced, can make fault troubleshooting in advance, can also make emergency plans in advance, and through predicting the risk, the occurrence of the delay behavior can be reduced, and running tasks of the data warehouse can be completed more smoothly.
Optionally, the method may further include: and scoring the data table to be predicted according to the output result of the target preset prediction model. For example, the output result is a probability value, the probability value is converted into a corresponding score, the lower the score is, the better the file transmission performance of the data table to be predicted is, the higher the score is, and the worse the file transmission performance of the data table to be predicted is. Further, the corresponding preset service system or administrator can be further scored according to multiple scores, so that each preset service system and data warehouse can be continuously perfected.
Fig. 3 is a schematic structural diagram of a file transfer prediction apparatus according to an embodiment of the present invention, where, as shown in fig. 3, the apparatus includes:
the history transmission characteristic obtaining module 301 is configured to obtain a history transmission characteristic corresponding to a data table to be predicted, where the data table to be predicted is a data table corresponding to a preset type file transmitted to the data warehouse by a target preset service system that is in docking with a data warehouse, the data warehouse is docked with a plurality of preset service systems, and the history transmission characteristic includes a first transmission behavior characteristic of the preset type file corresponding to the data table to be predicted in a preset history transmission period, a second transmission behavior characteristic of the preset type file corresponding to a plurality of data tables in the target preset service system in the preset history transmission period, and a third transmission behavior characteristic of the preset type file corresponding to a plurality of data tables in the plurality of preset service systems in the preset history transmission period;
an input feature construction module 302, configured to construct an input feature according to the historical transmission feature;
and the transmission delay prediction module 303 is configured to input the input feature into a target preset prediction model corresponding to the target preset service system, and predict, according to an output result of the target preset prediction model, whether transmission delay will occur in a preset type file corresponding to the data table to be predicted in the present transmission period.
According to the file transmission prediction device provided by the embodiment of the invention, aiming at the fact that a data table transmitted to the data warehouse by a certain business system in data warehouse butt joint corresponds to a preset type file, three different types of transmission behavior characteristics are adopted to comprehensively represent the historical transmission characteristics of the data table to be predicted, and further, whether the preset type file corresponding to the data table to be predicted in the period is subjected to transmission delay or not is accurately predicted by using a preset prediction model, so that interference on possible transmission delay can be performed in advance according to a prediction result, delay is reduced, and further, running batch tasks of the data warehouse can be more smoothly completed.
Optionally, the target preset prediction model comprises a model obtained based on logistic regression model training;
wherein, the input feature construction module includes:
a target feature selection unit for selecting at least two target features from the historical transmission features;
the extended feature calculation unit is used for calculating the at least two target features based on a preset calculation mode to obtain extended features;
and the input feature construction unit is used for constructing input features according to the historical transmission features and the expansion features.
Optionally, the target feature selecting unit is specifically configured to: respectively selecting at least one target feature from the two of the first transmission behavior feature, the second transmission behavior feature and the third transmission behavior feature to obtain at least one target feature pair;
the extended feature calculation unit is specifically configured to: and calculating two target features in the current target feature pair based on a preset calculation mode aiming at each target feature pair in the at least one target feature pair to obtain an extension feature corresponding to the current target feature pair.
Optionally, the target feature selection unit is specifically configured to:
and respectively selecting at least one target feature from the second transmission behavior feature and the third transmission behavior feature to obtain at least one target feature pair.
Optionally, the second transmission behavior feature includes a first average transmission delay duration and/or a first transmission delay rate; the third transmission behavior feature comprises a second average transmission delay duration and/or a second transmission delay rate.
Optionally, the target feature pair includes a first average transmission delay duration and a second average transmission delay duration, and/or a first transmission delay rate and a second transmission delay rate.
Optionally, the extended feature calculation unit is specifically configured to:
and calculating the product of two target features in the current target feature pair aiming at each target feature pair in the at least one target feature pair to obtain the corresponding expansion feature of the current target feature pair.
Optionally, the first transmission behavior feature includes at least one of a file size, a file transmission speed, whether a transmission delay and a transmission delay duration.
Optionally, the preset historical transmission period includes a plurality of continuous historical transmission periods.
Optionally, the input feature construction unit includes:
the normalization processing subunit is used for carrying out normalization processing on the historical transmission characteristics and the expansion characteristics to obtain characteristics to be filled;
and the feature filling subunit is used for filling the features to be filled into corresponding fields in a preset data structure to obtain input features.
Optionally, the target preset prediction model is obtained through training in the following manner:
obtaining sample data by acquiring sample historical transmission characteristics and sample labels corresponding to the data table to be predicted, wherein the sample historical transmission characteristics comprise first sample transmission behavior characteristics of a preset type file corresponding to the data table to be predicted in a sample preset historical transmission period, second sample transmission behavior characteristics of preset type files corresponding to a plurality of data tables in the target preset service system in the sample preset historical transmission period and third sample transmission behavior characteristics of preset type files corresponding to a plurality of data tables in the plurality of preset service systems in the sample preset historical transmission period, and the sample labels are whether transmission delay occurs in the sample transmission period of the preset type file corresponding to the data table to be predicted;
Constructing sample input features according to the sample historical transmission features;
and inputting the sample input characteristics into a to-be-trained prediction model corresponding to the target preset service system, determining a loss relation by utilizing an output result of the to-be-trained prediction model and the sample label, and training the to-be-trained prediction model based on the loss relation.
The file transmission prediction device provided by the embodiment of the invention can execute the file transmission prediction method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.
Fig. 4 shows a schematic diagram of the structure of an electronic device 10 that may be used to implement an embodiment of the invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic equipment may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.
As shown in fig. 4, the electronic device 10 includes at least one processor 11, and a memory, such as a Read Only Memory (ROM) 12, a Random Access Memory (RAM) 13, etc., communicatively connected to the at least one processor 11, in which the memory stores a computer program executable by the at least one processor, and the processor 11 may perform various appropriate actions and processes according to the computer program stored in the Read Only Memory (ROM) 12 or the computer program loaded from the storage unit 18 into the Random Access Memory (RAM) 13. In the RAM 13, various programs and data required for the operation of the electronic device 10 may also be stored. The processor 11, the ROM 12 and the RAM 13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to bus 14.
Various components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, etc.; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, an optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.
The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, digital Signal Processors (DSPs), and any suitable processor, controller, microcontroller, etc. The processor 11 performs the respective methods and processes described above, such as a file transfer prediction method.
In some embodiments, the file transfer prediction method may be implemented as a computer program tangibly embodied on a computer-readable storage medium, such as storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM 12 and/or the communication unit 19. When the computer program is loaded into RAM 13 and executed by processor 11, one or more steps of the file transfer prediction method described above may be performed. Alternatively, in other embodiments, the processor 11 may be configured to perform the file transfer prediction method in any other suitable way (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
A computer program for carrying out methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be implemented. The computer program may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. The computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) through which a user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.
The computing system may include clients and servers. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service are overcome.
Embodiments of the present invention also provide a computer program product comprising a computer program which, when executed by a processor, implements a file transfer prediction method as provided by any of the embodiments of the present application.
Computer program product in the implementation, the computer program code for carrying out operations of the present invention may be written in one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).
The file transmission prediction device, the storage medium and the product provided in the above embodiments can execute the file transmission prediction method provided in any embodiment of the present application, and have the corresponding functional modules and beneficial effects of executing the method. Technical details not described in detail in the above embodiments may be found in the file transfer prediction method provided in any embodiment of the present application.
Note that the above is only a preferred embodiment of the present invention and the technical principle applied. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, while the invention has been described in connection with the above embodiments, the invention is not limited to the embodiments, but may be embodied in many other equivalent forms without departing from the spirit or scope of the invention, which is set forth in the following claims.
Claims (15)
1. A file transfer prediction method, comprising:
acquiring a history transmission characteristic corresponding to a data table to be predicted, wherein the data table to be predicted corresponds to a data table corresponding to a preset type file transmitted to a data warehouse by a target preset service system for data warehouse docking, the data warehouse is docked with a plurality of preset service systems, and the history transmission characteristic comprises a first transmission behavior characteristic of the preset type file corresponding to the data table to be predicted in a preset history transmission period, a second transmission behavior characteristic of the preset type file corresponding to a plurality of data tables in the target preset service system in the preset history transmission period and a third transmission behavior characteristic of the preset type file corresponding to a plurality of data tables in the preset service systems in the preset history transmission period;
Constructing input features according to the historical transmission features;
and inputting the input characteristics into a target preset prediction model corresponding to the target preset business system, and predicting whether the preset type file corresponding to the data table to be predicted in the transmission period will have transmission delay or not according to the output result of the target preset prediction model.
2. The method according to claim 1, wherein the target preset prediction model comprises a model trained based on a logistic regression model;
wherein said constructing an input feature from said historical transmission features comprises:
selecting at least two target features from the historical transmission features;
calculating the at least two target features based on a preset calculation mode to obtain extension features;
and constructing input features according to the historical transmission features and the expansion features.
3. The method of claim 2, wherein said selecting at least two target features from said historical transmission features comprises:
respectively selecting at least one target feature from the two of the first transmission behavior feature, the second transmission behavior feature and the third transmission behavior feature to obtain at least one target feature pair;
Based on a preset calculation mode, calculating the at least two target features to obtain an extension feature, wherein the method comprises the following steps:
and calculating two target features in the current target feature pair based on a preset calculation mode aiming at each target feature pair in the at least one target feature pair to obtain an extension feature corresponding to the current target feature pair.
4. A method according to claim 3, wherein said selecting at least one target feature from two of said first transmission behavior feature, said second transmission behavior feature and said third transmission behavior feature, respectively, to obtain at least one target feature pair comprises:
and respectively selecting at least one target feature from the second transmission behavior feature and the third transmission behavior feature to obtain at least one target feature pair.
5. The method according to claim 4, wherein the second transmission behavior characteristic comprises a first average transmission delay duration and/or a first transmission delay rate; the third transmission behavior feature comprises a second average transmission delay duration and/or a second transmission delay rate.
6. The method of claim 5, wherein the target feature pair comprises a first average transmission delay duration and a second average transmission delay duration, and/or a first transmission delay rate and a second transmission delay rate.
7. The method according to any one of claims 3-6, wherein for each of the at least one target feature pair, calculating, based on a preset calculation manner, two target features in a current target feature pair to obtain an extended feature corresponding to the current target feature pair, includes:
and calculating the product of two target features in the current target feature pair aiming at each target feature pair in the at least one target feature pair to obtain the corresponding expansion feature of the current target feature pair.
8. The method of claim 1, wherein the first transmission behavior characteristic comprises at least one of a file size, a file transmission speed, a transmission delay, and a transmission delay duration.
9. The method of claim 1, wherein the predetermined historical transmission period comprises a continuous plurality of historical transmission periods.
10. The method of claim 2, wherein said constructing an input feature from said historic transmission feature and said extension feature comprises:
normalizing the historical transmission characteristics and the expansion characteristics to obtain characteristics to be filled;
And filling the features to be filled into corresponding fields in a preset data structure to obtain input features.
11. The method according to claim 1, wherein the target preset predictive model is trained by:
obtaining sample data by acquiring sample historical transmission characteristics and sample labels corresponding to the data table to be predicted, wherein the sample historical transmission characteristics comprise first sample transmission behavior characteristics of a preset type file corresponding to the data table to be predicted in a sample preset historical transmission period, second sample transmission behavior characteristics of preset type files corresponding to a plurality of data tables in the target preset service system in the sample preset historical transmission period and third sample transmission behavior characteristics of preset type files corresponding to a plurality of data tables in the plurality of preset service systems in the sample preset historical transmission period, and the sample labels are whether transmission delay occurs in the sample transmission period of the preset type file corresponding to the data table to be predicted;
constructing sample input features according to the sample historical transmission features;
and inputting the sample input characteristics into a to-be-trained prediction model corresponding to the target preset service system, determining a loss relation by utilizing an output result of the to-be-trained prediction model and the sample label, and training the to-be-trained prediction model based on the loss relation.
12. A file transfer prediction apparatus, comprising:
the historical transmission characteristic acquisition module is used for acquiring historical transmission characteristics corresponding to a data table to be predicted, wherein the data table to be predicted is a data table corresponding to a preset type file transmitted to the data warehouse by a target preset service system for docking the data warehouse, the data warehouse is docked with a plurality of preset service systems, and the historical transmission characteristics comprise first transmission behavior characteristics of the preset type file corresponding to the data table to be predicted in a preset historical transmission period, second transmission behavior characteristics of the preset type file corresponding to a plurality of data tables in the target preset service system in the preset historical transmission period and third transmission behavior characteristics of the preset type file corresponding to a plurality of data tables in the preset service systems in the preset historical transmission period;
the input feature construction module is used for constructing input features according to the historical transmission features;
and the transmission delay prediction module is used for inputting the input characteristics into a target preset prediction model corresponding to the target preset service system, and predicting whether the preset type file corresponding to the data table to be predicted in the transmission period will have transmission delay or not according to the output result of the target preset prediction model.
13. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable by the processor, wherein the processor implements the method of any one of claims 1-11 when executing the computer program.
14. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any one of claims 1-11.
15. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any of claims 1-11.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211597582.7A CN116366628A (en) | 2022-12-12 | 2022-12-12 | File transmission prediction method, device, equipment, storage medium and product |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211597582.7A CN116366628A (en) | 2022-12-12 | 2022-12-12 | File transmission prediction method, device, equipment, storage medium and product |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116366628A true CN116366628A (en) | 2023-06-30 |
Family
ID=86939507
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211597582.7A Pending CN116366628A (en) | 2022-12-12 | 2022-12-12 | File transmission prediction method, device, equipment, storage medium and product |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116366628A (en) |
-
2022
- 2022-12-12 CN CN202211597582.7A patent/CN116366628A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN116307215A (en) | Load prediction method, device, equipment and storage medium of power system | |
EP3933719A2 (en) | Method, apparatus, device, storage medium and computer program product for labeling data | |
CN114969161B (en) | Data processing method and device and data center system | |
CN115202847A (en) | Task scheduling method and device | |
CN113987086A (en) | Data processing method, data processing device, electronic device, and storage medium | |
CN117035540A (en) | Project evaluation method, device, equipment and storage medium | |
WO2024065776A1 (en) | Method for data processing, apparatus for data processing, electronic device, and storage medium | |
CN116366628A (en) | File transmission prediction method, device, equipment, storage medium and product | |
CN115544010A (en) | Mapping relation determining method and device, electronic equipment and storage medium | |
CN115422275A (en) | Data processing method, device, equipment and storage medium | |
CN115146986A (en) | Data center equipment maintenance method, device, equipment and storage medium | |
CN115203564A (en) | Information flow recommendation method and device and computer program product | |
CN113934894A (en) | Data display method based on index tree and terminal equipment | |
US20240176658A1 (en) | Data movement and monitoring system | |
CN115017875B (en) | Enterprise information processing method, device, system, equipment and medium | |
CN118227767B (en) | Knowledge graph driven large model business intelligent decision question-answering system and method | |
US20230230035A1 (en) | Method and Apparatus for Constructing Organizational Collaboration Network | |
US12086146B2 (en) | Tables time zone adjuster | |
US20230030193A1 (en) | Method and apparatus for performing valuation on resource, device and storage medium | |
CN116304796A (en) | Data classification method, device, equipment and medium | |
CN117076539A (en) | General data processing method, device, equipment and storage medium | |
CN115357611A (en) | Data processing method and device, electronic equipment and storage medium | |
CN118779663A (en) | Training method, device, equipment and medium of interface data generation model | |
CN116149823A (en) | Data processing method, system, electronic device and storage medium | |
CN117763060A (en) | Data processing method, device, equipment and storage medium based on user behavior |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |