CN114610590A

CN114610590A - Method, device and equipment for determining operation time length and storage medium

Info

Publication number: CN114610590A
Application number: CN202210234459.2A
Authority: CN
Inventors: 蒙权; 林金泉; 马天
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2022-03-10
Filing date: 2022-03-10
Publication date: 2022-06-10

Abstract

The embodiment of the application provides a method, a device, equipment and a storage medium for determining operation running time, which are at least applied to the technical field of artificial intelligence, wherein the method comprises the following steps: acquiring historical data of a target job in a preset historical time period; determining feature information belonging to a preset type from the historical data; performing data statistics on the characteristic information to obtain statistical characteristic information; performing distributed gradient enhancement analysis on the characteristic information and the statistical characteristic information based on historical operating time in the historical data to obtain a corresponding relation between the characteristic information and the operating time; and determining the operation duration of the operation to be analyzed under the target characteristic information based on the corresponding relation. Through the method and the device, the influence of different characteristic information on the operation running time can be accurately and efficiently analyzed, and the analysis efficiency and accuracy are improved.

Description

Method, device and equipment for determining operation time length and storage medium

Technical Field

The embodiment of the application relates to the technical field of artificial intelligence, and relates to but is not limited to a method, a device and equipment for determining operation running time and a computer readable storage medium.

Background

Job refers to the collection of work a user requires on a computer system during a transaction resolution or a transaction. The running time of a job is affected by various factors, such as the number of machines in the cluster running the job, the submission time and the running time of the job, and the like.

At present, when analyzing the influence of different factors on the operation duration, a statistical method is usually adopted, the number of machines in each time period is manually adjusted to observe the influence of the adjusted number of machines on the operation duration, that is, the values of different influencing factors need to be manually adjusted, data needs to be comprehensively collected, and the values of the influencing factors need to be controlled and adjusted at each time every day for analysis.

Obviously, in the related art, the data volume is too large by a manual analysis method, and the analysis difficulty is high, so that the analysis accuracy of the operation running time of the operation under the influence of no factor is low.

Disclosure of Invention

The embodiment of the application provides a method, a device, equipment and a storage medium for determining operation running time, which are at least applied to the technical field of artificial intelligence and can obtain an accurate corresponding relation between characteristic information and the operation running time, so that the influence of different characteristic information on the operation running time can be accurately and efficiently analyzed based on the corresponding relation, and the analysis efficiency and accuracy are improved.

The technical scheme of the embodiment of the application is realized as follows:

the embodiment of the application provides a method for determining operation duration, which comprises the following steps:

acquiring historical data of a target job in a preset historical time period; determining feature information belonging to a preset type from the historical data; performing data statistics on the characteristic information to obtain statistical characteristic information; performing distributed gradient enhancement analysis on the characteristic information and the statistical characteristic information based on historical operating time in the historical data to obtain a corresponding relation between the characteristic information and the operating time; and determining the operation duration of the operation to be analyzed under the target characteristic information based on the corresponding relation.

The embodiment of the application provides a length of time confirming device is run in operation, the device includes:

the acquisition module is used for acquiring historical data of the target job in a preset historical time period; the first determining module is used for determining the characteristic information belonging to a preset type from the historical data; the data statistics module is used for carrying out data statistics on the characteristic information to obtain statistical characteristic information; the analysis module is used for carrying out distributed gradient enhancement analysis on the characteristic information and the statistical characteristic information based on historical operating duration in the historical data to obtain the corresponding relation between the characteristic information and the operating duration; and the second determining module is used for determining the operation running time of the operation to be analyzed under the target characteristic information based on the corresponding relation.

An embodiment of the present application provides a device for determining a running time of a job, including:

a memory for storing executable instructions; and the processor is used for realizing the method for determining the operation running time length when executing the executable instructions stored in the memory.

The embodiment of the application provides a computer program product or a computer program, wherein the computer program product or the computer program comprises executable instructions, and the executable instructions are stored in a computer readable storage medium; the processor of the job running time length determining device reads the executable instruction from the computer readable storage medium and executes the executable instruction, so that the job running time length determining method is realized.

The embodiment of the application provides a computer-readable storage medium, which stores executable instructions and is used for causing a processor to execute the executable instructions to realize the method for determining the operation running time length.

The embodiment of the application has the following beneficial effects:

the characteristic information belonging to the preset type is extracted from the acquired historical data of the target operation in the preset historical time period, and the statistical characteristic information is obtained through statistics, so that the distributed gradient enhancement analysis is performed on the characteristic information and the statistical characteristic information based on the historical operating time in the historical data, the corresponding relation between the characteristic information and the operating time can be accurately obtained, the influence of different characteristic factors on the operating time can be accurately and efficiently analyzed based on the corresponding relation, and the analysis efficiency and accuracy are improved.

Drawings

Fig. 1 is an overall frame diagram of determining a job run length in the related art;

fig. 2 is an alternative architecture diagram of a job run-time length determining system according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of a job running time length determination device according to an embodiment of the present application;

fig. 4 is an alternative flowchart of a job running time length determining method according to an embodiment of the present application;

fig. 5 is a schematic flow chart of another alternative job running time length determining method provided in the embodiment of the present application;

fig. 6 is a schematic flowchart of yet another alternative job running time length determining method according to an embodiment of the present application;

fig. 7 is a flowchart of a method for determining a job running time length according to an embodiment of the present application;

fig. 8 is a schematic diagram of an implementation process of extracting feature information from historical data corresponding to a cluster according to an embodiment of the present application;

fig. 9 is a schematic diagram of a prediction implementation process of a model provided in an embodiment of the present application.

Detailed Description

In order to make the purpose, technical solutions and advantages of the present application clearer, the present application will be described in further detail with reference to the accompanying drawings, the described embodiments should not be considered as limiting the present application, and all other embodiments obtained by a person of ordinary skill in the art without making creative efforts fall within the protection scope of the present application.

In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or different subsets of all possible embodiments, and may be combined with each other without conflict. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the embodiments of the present application belong. The terminology used in the embodiments of the present application is for the purpose of describing the embodiments of the present application only and is not intended to be limiting of the present application.

Before explaining the method of determining the job running time length of the embodiment of the present application, a method in the related art is first explained.

In the related art, engineers usually adjust the number of machines clustered in each time period by a statistical method, i.e., manually, to see the influence of the number of machines on the operation duration. After the mapping relation between a large number of machines and the operation running time is counted in the early stage, the number of the machines is adjusted according to the counted data, and then the operation running time is determined. Fig. 1 is an overall framework diagram for determining the job running time length in the related art, and as shown in fig. 1, for different N jobs (including job 1 and job 2 … …, job N), at different times, the N jobs are run through a cluster 101, where an original cluster 101 includes N machines, and finally the running time length 102 of each job is obtained.

For example, at a certain time t1, the number of machines in cluster 101 is manually adjusted to n-1, and whether the running time length of each job is within an acceptable range is recorded and observed; if yes, the above steps are repeated, and the number of machines in cluster 101 continues to be manually adjusted until the running time of the job is unacceptable. Thus, the number of devices to be reduced at time t1 can be obtained by statistical analysis. Further, by repeating the above process, the number of machines to be reduced at different times of the day can be obtained.

However, the method in the related art has the following disadvantages: data needs to be collected comprehensively in an actual system, namely, variables need to be controlled at each time every day to adjust the number of machines so as to analyze the influence of the number of machines on the operation duration. However, the number is too large, and the analysis is difficult. In addition, the daily cluster needs to run more stably, and the continuous change of the number of machines for acquiring data can cause serious influence on business, even influence the profit of a company and be unreliated. Furthermore, the methods in the related art can only count the historical rules, and cannot consider the interference factors at the current time, for example, if the number of jobs at the current time is increased more than the number of jobs at the same time, accurate analysis cannot be performed.

Based on the above method in the related art and the above problems, the embodiment of the present application provides a method for determining a job running time, which uses an artificial intelligence learning method, and only needs to sample and collect part of historical data without collecting data comprehensively, and does not need to analyze the historical data manually, but instead uses artificial intelligence to learn automatically. And because only part of data need be sampled and collected, the influence on the service is greatly reduced. In addition, the method for determining the operation time length of the job solves the problem of interference at the current time by adding the condition of other jobs at the time when the job is submitted to the machine into the characteristics of the job.

In the method for determining the operation duration of the job, firstly, historical data of a target job in a preset historical time period is obtained; determining feature information belonging to a preset type from historical data; performing data statistics on the characteristic information to obtain statistical characteristic information; then, based on the historical operation duration in the historical data, performing distributed gradient enhancement analysis on the characteristic information and the statistical characteristic information to obtain a corresponding relation between the characteristic information and the operation duration; and finally, determining the operation running time of the operation to be analyzed under the target characteristic information based on the corresponding relation. Therefore, the distributed gradient enhancement analysis is carried out on the characteristic information and the statistical characteristic information, so that the corresponding relation between the characteristic information and the operation time length can be accurately obtained, the influence of different characteristic factors on the operation time length can be accurately and efficiently analyzed based on the corresponding relation, and the analysis efficiency and accuracy are improved.

An exemplary application of the job running time length determining apparatus according to the embodiment of the present application is described below, and the job running time length determining apparatus according to the embodiment of the present application may be implemented as a terminal or a server. In one implementation manner, the device for determining the operation duration provided by the embodiment of the present application may be implemented as any terminal having operation requirements and data processing functions, such as a notebook computer, a tablet computer, a desktop computer, a mobile device (e.g., a mobile phone, a portable music player, a personal digital assistant, a dedicated messaging device, a portable game device), an intelligent robot, an intelligent household appliance, and an intelligent vehicle-mounted device; in another implementation manner, the device for determining the operation running time provided in this embodiment may also be implemented as a server, where the server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server that provides basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, a middleware service, a domain name service, a security service, a Content Delivery Network (CDN), and a big data and artificial intelligence platform. The terminal and the server may be directly or indirectly connected through wired or wireless communication, and the embodiment of the present application is not limited. Next, an exemplary application when the job run-time determining apparatus is implemented as a server will be described.

Referring to fig. 2, fig. 2 is an optional architecture diagram of the job running duration determining system according to the embodiment of the present application, in order to support normal running of any job and accurately predict the job running duration of the job under the target feature information, at least an operating system is installed on the terminal, and in the operating system, the job is an execution unit that a computer operator (or a program called a job scheduler) gives to the operating system. In the embodiment of the present application, the job running length determining system 10 at least includes the terminal 100, the network 200, and the server 300, where the server 300 constitutes the job running length determining apparatus in the embodiment of the present application. The terminal 100 is connected to the server 300 through a network 200, and the network 200 may be a wide area network or a local area network, or a combination of both. During the job execution, the terminal 100 can collect and store history data during the job execution.

When it is necessary to determine that a job to be analyzed runs long under target feature information, the server 300 may acquire, via the network 200, history data of a target job transmitted by the terminal 100 within a preset history period, where the target job may be the same job as the job to be analyzed or a different job. After acquiring historical data of a target job in a preset historical time period, the server determines characteristic information belonging to a preset type from the historical data; performing data statistics on the characteristic information to obtain statistical characteristic information; performing distributed gradient enhancement analysis on the characteristic information and the statistical characteristic information based on historical operating time in the historical data to obtain a corresponding relation between the characteristic information and the operating time; and determining the operation duration of the operation to be analyzed under the target characteristic information based on the corresponding relation. After obtaining the job running time, the server 300 may count the job running time to obtain a job running time analysis table, and send the job running time analysis table to the terminal 100 through the network 200, or may return the job running time to the terminal 100 through the network 200 as a response result of the job running time analysis request.

The method for determining the operation running time provided by the embodiment of the application can be further implemented based on a cloud platform and through a cloud technology, for example, the server 300 may be a cloud server, the characteristic information belonging to a preset type is determined from historical data through the cloud server, or data statistics is performed on the characteristic information through the cloud server to obtain statistical characteristic information, or distributed gradient enhancement analysis is performed on the characteristic information and the statistical characteristic information through the cloud server based on the historical running time in the historical data to obtain a corresponding relation between the characteristic information and the running time, or the operation running time of the operation to be analyzed under the target characteristic information is determined through the cloud server based on the corresponding relation.

In some embodiments, a cloud storage may be further provided, and the corresponding relationship between the feature information and the operation duration may be stored in the cloud storage, or the operation duration of the operation to be analyzed under the target feature information may be stored in the cloud storage. In this way, when the operation running time length of the operation to be analyzed is analyzed again in the subsequent operation to be analyzed, the operation running time length of the operation to be analyzed under the target characteristic information can be directly obtained from the cloud storage; or, when the operation duration of the operation to be analyzed under other feature information is subsequently analyzed, the corresponding relationship between the feature information and the operation duration may be obtained from the cloud storage, and the operation duration of the operation to be analyzed under other feature information is determined based on the obtained corresponding relationship.

It should be noted that Cloud technology (Cloud technology) refers to a hosting technology for unifying series resources such as hardware, software, network, etc. in a wide area network or a local area network to implement calculation, storage, processing and sharing of data. The cloud technology is based on the general names of network technology, information technology, integration technology, management platform technology, application technology and the like applied in the cloud computing business model, can form a resource pool, is used as required, and is flexible and convenient. Cloud computing technology will become an important support. Background services of the technical network system require a large amount of computing and storage resources, such as video websites, picture-like websites and more web portals. With the high development and application of the internet industry, each article may have its own identification mark and needs to be transmitted to a background system for logic processing, data in different levels are processed separately, and various industrial data need strong system background support and can only be realized through cloud computing.

The method for determining the operation running duration further relates to the technical field of artificial intelligence, and distributed gradient enhancement analysis is performed on the characteristic information and the statistical characteristic information through the artificial intelligence technology, namely the composite tree model can be obtained through artificial intelligence technology training. Or the operation running time of the operation to be analyzed under the target characteristic information can be predicted by adopting artificial intelligence, namely the operation running time of the operation to be analyzed under the target characteristic information is predicted by adopting the trained composite tree model. The implementation process of training the composite tree model and the process of performing distributed gradient enhancement analysis on the feature information and the statistical feature information will be described in detail below.

Fig. 3 is a schematic structural diagram of a job operation time length determination device according to an embodiment of the present application, and the job operation time length determination device shown in fig. 3 includes: at least one processor 310, memory 350, at least one network interface 320, and a user interface 330. The various components in the job run length determination device are coupled together by a bus system 340. It will be appreciated that the bus system 340 is used to enable connected communication between these components. The bus system 340 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, however, the various buses are labeled as bus system 340 in fig. 3.

The Processor 310 may be an integrated circuit chip having Signal processing capabilities, such as a general purpose Processor, a Digital Signal Processor (DSP), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like, wherein the general purpose Processor may be a microprocessor or any conventional Processor, or the like.

The user interface 330 includes one or more output devices 331 that enable presentation of media content, and one or more input devices 332.

The memory 350 may be removable, non-removable, or a combination thereof. Exemplary hardware devices include solid state memory, hard disk drives, optical disk drives, and the like. Memory 350 optionally includes one or more storage devices physically located remote from processor 310. The memory 350 can include both volatile memory and nonvolatile memory, and can also include both volatile and nonvolatile memory. The nonvolatile Memory may be a Read Only Memory (ROM), and the volatile Memory may be a Random Access Memory (RAM). The memory 350 described in embodiments herein is intended to comprise any suitable type of memory. In some embodiments, memory 350 is capable of storing data, examples of which include programs, modules, and data structures, or subsets or supersets thereof, as exemplified below, to support various operations.

An operating system 351 including system programs for processing various basic system services and performing hardware-related tasks, such as a framework layer, a core library layer, a driver layer, etc., for implementing various basic services and processing hardware-based tasks;

a network communication module 352 for communicating to other computing devices via one or more (wired or wireless) network interfaces 320, exemplary network interfaces 320 including: bluetooth, wireless compatibility authentication (WiFi), and Universal Serial Bus (USB), and the like;

an input processing module 353 for detecting one or more user inputs or interactions from one of the one or more input devices 332 and translating the detected inputs or interactions.

In some embodiments, the apparatus provided by the embodiments of the present application may be implemented in software, and fig. 3 illustrates a job running time length determining device 354 stored in the memory 350, where the job running time length determining device 354 may be a job running time length determining device in a job running time length determining apparatus, which may be software in the form of programs and plug-ins, and includes the following software modules: the obtaining module 3541, the first determining module 3542, the data statistics module 3543, the analysis module 3544, and the second determining module 3545 are logical and thus may be arbitrarily combined or further separated depending on the functionality implemented. The functions of the respective modules will be explained below.

In other embodiments, the apparatus provided in this embodiment may be implemented in hardware, and for example, the apparatus provided in this embodiment may be a processor in the form of a hardware decoding processor, which is programmed to execute the job running time determination method provided in this embodiment, for example, the processor in the form of the hardware decoding processor may be implemented by one or more Application Specific Integrated Circuits (ASICs), DSPs, Programmable Logic Devices (PLDs), Complex Programmable Logic Devices (CPLDs), Field Programmable Gate Arrays (FPGAs), or other electronic components.

The method for determining the operation duration provided by each embodiment of the present application may be executed by an operation duration determining device, where the operation duration determining device may be any terminal having operation requirements and a data processing function, or may also be a server, that is, the method for determining the operation duration of the operation according to each embodiment of the present application may be executed by a terminal, may also be executed by a server, or may also be executed by interaction between a terminal and a server.

Referring to fig. 4, fig. 4 is an optional flowchart of a method for determining a job running time length according to an embodiment of the present application, which will be described below with reference to steps shown in fig. 4, and it should be noted that the method for determining a job running time length in fig. 4 is described by taking a server as an execution subject as an example.

In step S401, history data of the target job in a preset history time period is acquired.

The target job may be any execution unit in the terminal operating system, that is, any job that can be executed in the operating system, where the job refers to a set of jobs that a computer user requires the operating system to do for the user in a one-time computer-on process, the job includes a set of commands, each command is a job step, and a job step refers to each relatively independent job in the job.

A job consists of an ordered series of steps. The completion of the job is carried out by four stages of job submission, job accommodation, job execution and job completion. Executing a job may run multiple different processes. In the embodiment of the present application, the job running time length to be determined is the time length of the job execution phase.

The method for determining the operation running duration of the job in the embodiment of the application can also be applied to a batch processing system, and in the batch processing system, the job is a basic unit for preempting the memory. That is, a batch system calls programs and data into memory in job units for execution.

In some embodiments, the target job is a job that has already run in the operating system and has historical running data, and the historical running data of the target job in a preset historical time period may be collected to obtain the historical data. The preset historical time period may be any time period before the current time, and may be, for example, the previous week, the previous day, the previous hour, or the like.

It should be noted that the target job and the job to be analyzed, for which the job running time length needs to be predicted, in the embodiment of the present application may be the same job, or may be different jobs, or may be similar jobs. When the target job and the job to be analyzed are the same job, the embodiments of the present application do not strictly distinguish the target job from the job to be analyzed, and it is considered that the job indicated in any embodiment of the present application may be either the target job or the job to be analyzed.

In step S402, feature information belonging to a preset type is determined from the history data.

Here, the preset type may be a manually preset data type, and may be, for example, feature information of a type such as a self attribute feature of the target job, a submission time of the target job, a cluster feature of the running target job, and a feature of another job during the running time of the target job.

In some embodiments, the preset type is not limited, and the feature information capable of characterizing the characteristics or the operation characteristics of the operation process of the target job is the feature information belonging to the preset type, or the feature information capable of distinguishing the operation process of the target job from other jobs may also be the feature information belonging to the preset type.

In the embodiment of the application, a plurality of pieces of feature information belonging to preset types can be determined from historical data. The feature information belonging to the preset type constitutes a feature of the target job, and each feature information includes at least one feature data.

Step S403, performing data statistics on the feature information to obtain statistical feature information.

Here, the data statistics means calculating a numerical value having statistical characteristics such as an average value, a maximum value, and a minimum value of feature data belonging to the same type in the feature information. For example, the average, maximum, and minimum values of the number of other jobs over a period of time may be counted, or the average, maximum, and minimum values of the number of machine cores that have been used by other jobs over a period of time may be counted.

In the embodiment of the present application, the statistical characteristic information constitutes a statistical characteristic of the target job, and the statistical characteristic information includes at least one statistical value.

Step S404, based on the historical operation duration in the historical data, performing distributed gradient enhancement analysis on the characteristic information and the statistical characteristic information to obtain the corresponding relation between the characteristic information and the operation duration.

In the embodiment of the application, the historical data of the target job in the preset historical time period further comprises the historical running time of the target job under the characteristic information. It should be noted that, when the characteristic information is different, the running time of the target job is different, and therefore, the collected historical data may be multiple, that is, multiple historical data of the target job in a preset historical time period may be collected, where each historical data has a set of characteristic information belonging to a preset type. Distributed gradient enhancement analysis can be performed in a circulating mode in sequence based on each historical data, and therefore the corresponding relation between the final characteristic information and the operation duration is obtained.

In the embodiment of the application, based on the historical operating time in the historical data, the distributed gradient enhancement analysis is performed on the feature information and the statistical feature information, and the historical operating time can be used as a judgment basis of an analysis result, and the corresponding relation between the feature information obtained by analysis and the operating time is reversely verified. That is, the corresponding relation between the feature information and the operation duration is obtained by performing distributed gradient enhancement analysis on the feature information and the statistical feature information, the corresponding relation between the currently obtained feature information and the operation duration is verified according to the historical operation duration in the historical data, and if the verification result shows that the corresponding relation between the currently obtained feature information and the operation duration is an accurate corresponding relation, the process of further performing the cycle analysis based on the historical data is stopped; and if the verification result shows that the corresponding relation between the currently obtained characteristic information and the running time length is not an accurate corresponding relation, further performing the cycle analysis again based on another historical data.

The distributed gradient enhancement analysis is to judge whether the characteristic information and the statistical characteristic information meet preset conditions one by one through a plurality of distributed analysis units so that each analysis unit obtains an analysis result, and finally, the analysis results of the plurality of analysis units are overlapped so as to enhance the accuracy of the analysis results and obtain a final result. For example, the distributed gradient enhancement analysis process may be implemented by using a composite tree model including a plurality of decision trees, where each decision tree constitutes an analysis unit, and each decision tree analyzes and judges at least one piece of feature data in the feature information and the statistical feature information to determine whether the feature data meets a preset condition, so as to split a root node of the decision tree into two leaf nodes, obtain a weight value of each leaf node in the splitting process, and determine the weight value as an analysis result of the corresponding leaf node, that is, an operation duration corresponding to a target job on the leaf node may be determined according to the weight value of each leaf node.

It should be noted that in the composite tree model, a tree can be grown by continuously adding decision trees and continuously performing feature splitting, and a decision tree is added each time, which is actually to learn a new function to fit the residual error of the last prediction. Which decision tree computes which feature data in the composite tree model may be randomly assigned, e.g., if the previous decision tree did not predict well, the next decision tree is improved over the previous decision tree. The front and back decision trees have a certain sequence, so that the latter decision tree is modified according to the predicted effect of the former decision tree.

Step S405, determining the job running time of the job to be analyzed under the target characteristic information based on the corresponding relation. In the embodiment of the application, after the corresponding relationship between the characteristic information and the operation duration is determined, the operation duration of the operation to be analyzed under the target characteristic information can be determined based on the corresponding relationship. Here, the target feature information may be manually set feature information, which is different from the feature information in the history data.

For example, the feature information in the history data may be acquired, the number of machines in the cluster in the feature information may be modified to obtain a new number of machines, the new number of machines is used as the target feature information, and the job running time of the job to be analyzed is determined based on the new number of machines, that is, the job running time of the job to be analyzed is analyzed under the condition of the new number of machines.

According to the method for determining the operation running time, the characteristic information belonging to the preset type is extracted from the acquired historical data of the target operation in the preset historical time period, the statistical characteristic information is obtained through statistics, distributed gradient enhancement analysis is conducted on the characteristic information and the statistical characteristic information based on the historical running time in the historical data, the corresponding relation between the characteristic information and the running time can be accurately obtained, therefore, the influence of different characteristic factors on the operation running time can be accurately and efficiently analyzed based on the corresponding relation, and the analysis efficiency and accuracy are improved. In addition, when the corresponding relation is analyzed, the statistical characteristic information is taken into consideration, so that the rule information in the characteristic information of the target operation can be taken into consideration, and the accuracy of the determined corresponding relation can be improved by combining the relation between the rule information and the data in the characteristic information, so that the accuracy of the determined operation running time of the operation to be analyzed is further improved.

In some embodiments, the job running duration determining system at least includes a terminal and a server, and a user can run any one of a target job and a job to be analyzed through the terminal, and after the terminal runs the target job, run data in the current running process can be acquired to obtain historical data of the target job in the current running process. The server may be a server for determining the application of the operation duration, and is configured to perform analysis based on the collected historical data to obtain a corresponding relationship between the feature information of the operation and the operation duration. The following describes a method for determining a job running time according to an embodiment of the present application, with reference to a scenario in which the job running time determination system includes a terminal and a server.

Fig. 5 is another optional flowchart of the method for determining the job running time length according to the embodiment of the present application, and as shown in fig. 5, the method includes the following steps:

in step S501, the terminal runs a target job in a preset history time period to generate history data.

Here, the history data is all operation data of the terminal operating the target job within a preset history period.

And step S502, the terminal collects historical data and stores the historical data.

In the process that the terminal runs the target operation, the historical data can be collected and stored in the storage unit, so that the historical data can be acquired from the storage unit when the historical data is needed to be used subsequently. In the embodiment of the application, each kind of characteristic data in the historical data may have a data tag, and whether the corresponding characteristic data belongs to a preset type or not may be distinguished through the data tag.

In step S503, the server receives the job running time analysis request sent by the terminal.

Here, the job running time length analysis request is used for analyzing the job running time length of the job to be analyzed under the target characteristic information, wherein the job running time length analysis request at least comprises the job identification and the target characteristic information of the job to be analyzed.

In step S504, the server receives the history data sent by the terminal in response to the job running time analysis request.

In the embodiment of the application, the server responds to the operation running time analysis request to determine the operation running time of the operation to be analyzed under the target characteristic information, and the method comprises the following two implementation scenarios:

in the first implementation scenario, the correspondence between the feature information and the operation duration is determined at present, that is, the correspondence between the existing feature information and the operation duration is determined at present. Therefore, the operation time length of the operation to be analyzed under the target characteristic information can be determined directly based on the corresponding relation between the characteristic information and the operation time length.

In the second implementation scenario, the correspondence between the feature information and the operation duration is not determined, that is, there is no correspondence between the feature information and the operation duration. Therefore, the historical data sent by the terminal can be received, the corresponding relation between the characteristic information and the running time length is further determined based on the historical data, and then the running time length of the operation to be analyzed under the target characteristic information is determined based on the corresponding relation between the characteristic information and the running time length.

In addition, the embodiment of the present application is described by taking the second implementation scenario as an example.

In step S505, the server screens out the self attribute feature of the target job, the submission time of the target job, the cluster feature of the running target job, and the features of other jobs during the running time of the target job from the history data. Here, the self attribute feature, the submission time, the cluster feature, and the feature of the other job constitute the above-mentioned feature information belonging to the preset type.

In some embodiments, each feature data in the history data has a data tag, and the embodiment of the present application may distinguish whether the corresponding feature data belongs to a preset type through the data tag. That is, feature data such as self-attribute features, submission time, cluster features and features of other jobs are screened out from the historical data through the data tags.

In step S506, the server acquires, for a specific time period in the running time of the target job, a feature set corresponding to a plurality of other jobs in the specific time period.

Since the jobs in the cluster have some periodic regularity, such as trends in each day are roughly the same, and weekends and non-weekends are also slightly different. Therefore, in the embodiment of the present application, some feature information needs to be newly constructed on the basis of the original feature information belonging to the preset type, so that the correspondence between the feature information and the operation duration can be better determined by combining the relation in the feature data.

In the embodiment of the application, a feature set corresponding to a plurality of other jobs in a specific time period is obtained, where the specific time period refers to any sub-time period in a preset historical time period, that is, the specific time period belongs to the preset historical time period, and a duration of the specific time period is less than or equal to a duration of the preset historical time period.

Step S507, the server performs statistical analysis on the feature data in the feature set to obtain a number statistical value of other jobs, a machine core number statistical value used by other jobs, and a memory statistical value used by other jobs.

Here, the statistical analysis includes, but is not limited to, calculating mean, maximum and minimum values. The statistics of the number of other jobs, the statistics of the number of machine cores used by other jobs, and the statistics of the memory used by other jobs constitute statistical characteristic information. Other job statistics include, but are not limited to: an average, maximum, and minimum of the number of other jobs in a particular time period; machine core statistics that other jobs have used include, but are not limited to: the average value, the maximum value and the minimum value of the number of machine cores used by other jobs in a specific time period; memory statistics that other operations have used include, but are not limited to: average, maximum and minimum values of memory used by other jobs for a particular period of time.

In some embodiments, the time of submission of the target job includes at least one of: a commit date value, a commit hour value, and a commit minute value. Here, the date of submission value indicates that the date corresponding to the time of submission of the target job belongs to the day of the week, the hour of submission value indicates which hour of the day the time of submission of the target job is, and the minute of submission value indicates which minute the time of submission of the target job is. Correspondingly, the method may further include the following steps S11 to S14 (not shown in the figure):

and step S11, sequentially carrying out aggregation processing on the self attribute feature, the submission time, the cluster feature and the features of other jobs based on the submission date value, the submission time value and the submission minute value respectively to obtain the aggregated self attribute feature, the aggregated submission time, the aggregated cluster feature and the aggregated features of other jobs.

And aggregating the self attribute feature, the submission time, the cluster feature and the features of other jobs respectively based on the submission date values, aggregating the self attribute feature, the submission time, the cluster feature and the features of other jobs corresponding to the same submission date value respectively, and correspondingly forming an aggregated self attribute feature, an aggregated submission time, an aggregated cluster feature and aggregated features of other jobs, namely obtaining a self attribute feature set, a submission time set, a cluster feature set and feature sets of other jobs corresponding to the same submission date value.

And respectively carrying out aggregation processing on the attribute features, the submission time, the cluster features and the features of other jobs based on the submission hour values, respectively aggregating the attribute features, the submission time, the cluster features and the features of other jobs corresponding to the same submission hour value, and correspondingly forming aggregated attribute features, aggregated submission time, aggregated cluster features and aggregated features of other jobs to obtain an attribute feature set, a submission time set, a cluster feature set and a feature set of other jobs corresponding to the same submission hour value.

And respectively carrying out aggregation processing on the attribute features, the submission time, the cluster features and the features of other jobs based on the submission minute value, respectively aggregating the attribute features, the submission time, the cluster features and the features of other jobs corresponding to the same submission minute value, and correspondingly forming aggregated attribute features, aggregated submission time, aggregated cluster features and aggregated features of other jobs, so as to obtain an attribute feature set, a submission time set, a cluster feature set and a feature set of other jobs corresponding to the same submission minute value.

And step S12, based on the submission date value, performing statistical analysis on the aggregated self-attribute feature, the aggregated submission time, the aggregated cluster feature and the aggregated features of other jobs in sequence, and correspondingly obtaining first change rules of the aggregated self-attribute feature, the aggregated submission time, the aggregated cluster feature and the aggregated features of other jobs in the date dimension respectively.

Here, the statistical analysis includes, but is not limited to, calculating an average value, a maximum value, and a minimum value, and performing statistical analysis on the aggregated self-attribute feature, the aggregated submission time, the aggregated cluster feature, and the aggregated features of other jobs in sequence based on the submission date value, where the statistical analysis may be performed on a self-attribute feature set corresponding to the same submission date value to obtain a change rule of the aggregated self-attribute feature in the date dimension; performing statistical analysis on a submission time set corresponding to the same submission date value to obtain a change rule of the aggregated submission time in a date dimension; performing statistical analysis on the cluster feature sets corresponding to the same submission date value to obtain a change rule of the aggregated cluster features in a date dimension; and performing statistical analysis on the feature sets of other jobs corresponding to the same submission date value to obtain the change rule of the aggregated other jobs in the date dimension. The four kinds of variation rules herein constitute the first variation rule.

And step S13, based on the submission hour values, performing statistical analysis on the aggregated self-attribute features, the aggregated submission time, the aggregated cluster features and the aggregated features of other jobs in sequence, and correspondingly obtaining second change rules of the aggregated self-attribute features, the aggregated submission time, the aggregated cluster features and the aggregated features of other jobs in hour dimensions respectively.

Here, the statistical analysis also includes, but is not limited to, calculating an average value, a maximum value, and a minimum value, and based on the submitted time value, performing statistical analysis on the aggregated self-attribute features, the aggregated submission time, the aggregated cluster features, and the aggregated features of other jobs in sequence, where the statistical analysis may be performed on a self-attribute feature set corresponding to the same submitted time value to obtain a change rule of the aggregated self-attribute features in an hour dimension; carrying out statistical analysis on a submission time set corresponding to the same submission small-time value to obtain a change rule of the submitted time after aggregation in an hour dimension; performing statistical analysis on the cluster feature sets corresponding to the same submitted hour value to obtain the change rule of the aggregated cluster features in the hour dimension; and (4) performing statistical analysis on the feature sets of other jobs corresponding to the same submitted hour value to obtain the change rule of the aggregated other jobs in the hour dimension. The four kinds of variation rules herein constitute the second variation rule.

And step S14, based on the submitted minute value, performing statistical analysis on the aggregated self-attribute feature, the aggregated submitted time, the aggregated cluster feature and the aggregated features of other jobs in sequence, and correspondingly obtaining a third change rule of the aggregated self-attribute feature, the aggregated submitted time, the aggregated cluster feature and the aggregated features of other jobs in the minute dimension.

Here, the statistical analysis also includes, but is not limited to, calculating an average value, a maximum value, and a minimum value, and based on the submitted minute value, performing statistical analysis on the aggregated self-attribute feature, the aggregated submission time, the aggregated cluster feature, and the aggregated features of other jobs in sequence, where the statistical analysis may be performed on a self-attribute feature set corresponding to the same submitted minute value to obtain a variation rule of the aggregated self-attribute feature in the minute dimension; performing statistical analysis on the submission time sets corresponding to the same submission minute value to obtain a variation rule of the aggregated submission times in the dimension of minutes; performing statistical analysis on the cluster feature sets corresponding to the same submitted minute value to obtain a variation rule of the aggregated cluster features in the minute dimension; and carrying out statistical analysis on the feature sets of other jobs corresponding to the same submitted minute value to obtain the change rule of the aggregated other jobs in the minute dimension. The four kinds of variation rules herein constitute a third variation rule.

In the embodiment of the present application, at least one of the first variation rule, the second variation rule, and the third variation rule also constitutes statistical characteristic information. That is, the statistical characteristic information may include only: the statistics of the number of other jobs, the statistics of the number of machine cores used by other jobs and the statistics of memories used by other jobs; alternatively, the statistical characteristic information may include: at least one of the number statistics of other jobs, the number statistics of machine cores used by other jobs, the memory statistics used by other jobs, the first variation rule, the second variation rule, and the third variation rule.

In some embodiments, after obtaining the feature information and the statistical feature information, invalid feature culling is required because there may be a portion of the features that are invalid features. In the embodiment of the application, the invalid feature can be removed in the following way:

determining a fluctuation value and a variance of each feature data in the feature information and the statistical feature information; determining the characteristic data with the fluctuation value smaller than the fluctuation threshold and the variance smaller than the variance threshold as target elimination data; and removing target removing data from the characteristic information and the statistical characteristic information.

Step S508, the server performs distributed gradient enhancement analysis on the characteristic information and the statistical characteristic information based on the historical operating duration in the historical data to obtain the corresponding relation between the characteristic information and the operating duration.

In step S509, the server obtains the job to be analyzed and the target feature information sent by the terminal.

Step S510, the server determines the job running duration of the job to be analyzed under the target feature information based on the corresponding relationship.

In the embodiment of the application, the statistical characteristic information is taken into consideration when the corresponding relation is analyzed, so that the rule information in the characteristic information of the target operation can be taken into consideration, and the accuracy of the determined corresponding relation can be improved by combining the relation between the rule information and the data in the characteristic information, so that the accuracy of the determined operation running time of the operation to be analyzed is improved.

Fig. 6 is a schematic flowchart of yet another alternative method for determining a job running time length according to an embodiment of the present application, where as shown in fig. 6, the method includes the following steps:

in step S601, the terminal runs the target job within a preset history time period, and generates history data.

And step S602, the terminal collects historical data and stores the historical data.

In step S603, the server receives a job running time analysis request sent by the terminal.

In step S604, the server receives the history data transmitted from the terminal in response to the job operation duration analysis request.

In step S605, the server determines the feature information belonging to the preset type from the history data.

Step S606, the server performs data statistics on the feature information to obtain statistical feature information.

In step S607, the server performs vectorization processing on the feature information and the statistical feature information to obtain a sample feature vector.

Here, any vectorization processing method may be adopted to perform vectorization processing on the feature information and the statistical feature information respectively, so as to obtain a feature information vector and a statistical feature information vector correspondingly, and then perform vector splicing on the feature information vector and the statistical feature information vector to form a sample feature vector.

In step S608, the server inputs the sample feature vector into the composite tree model.

After the server inputs the sample feature vector into the composite tree model, the predicted operation time of the target operation is determined through the composite tree model, and model parameters in the composite tree model are further updated reversely, so that the trained composite tree model is obtained, and the corresponding relation between the feature information and the operation time is obtained. That is to say, the trained composite tree model is obtained by training the composite tree model, and the trained composite tree model can represent the corresponding relationship between the characteristic information and the operation duration, that is, the operation duration of the operation to be analyzed under the target characteristic information can be accurately predicted through the trained composite tree model.

In some embodiments, the characteristic information and the statistical characteristic information may be determined as sample data, and the sample data is subjected to vectorization processing to obtain a sample characteristic vector; and determining the historical running time length as a sample label of the sample data. The sample feature vectors and sample labels may eventually be input into the composite tree model.

Next, a training process of the composite tree model is described, wherein the training process of the composite tree model includes the following steps S609 to S611:

and step S609, performing distributed gradient enhancement analysis through the composite tree model based on the characteristic information and the statistical characteristic information to obtain the predicted running time of the target operation.

In some embodiments, the feature information and the statistical feature information include at least two pieces of sub-feature information, for example, the sub-feature information may be any one of a self-attribute feature, a submission time, a cluster feature, a feature of another job, a number statistical value of machine cores used by another job, a memory statistical value used by another job, a first change rule, a second change rule, and a third change rule. And, the composite tree model includes at least one decision tree. Correspondingly, step S609 can be realized by the following steps S6091 to S6092 (not shown in the figure): and step S6091, node splitting is carried out on the nodes in each decision tree in the composite tree model on the basis of each piece of sub-feature information in at least two pieces of sub-feature information in sequence, and two leaf nodes and the weight value of each leaf node are obtained. Step S6092, determining the predicted running time of the target operation according to the weight values of all leaf nodes to which the target operation belongs.

In the embodiment of the application, the weighted values of all leaf nodes to which the target operation belongs can be added to obtain the total weight of the target operation, and then the operation duration corresponding to the total weight is obtained by calculation according to the corresponding relation between the weight and the operation duration, namely the predicted operation duration of the target operation.

And step S610, inputting the predicted operation duration and the historical operation duration into a preset loss model to obtain a loss result.

Step S611, according to the loss result, modifying the model parameters in the composite tree model to obtain a trained composite tree model, where the trained composite tree model can represent the corresponding relationship between the feature information and the running duration.

The training process of the composite tree model comprises a process of node splitting of nodes in a decision tree in the composite tree model, and in order to guarantee the prediction effect (such as computational efficiency and prediction accuracy) of the composite tree model, the process of node splitting needs to be effectively controlled. In the embodiment of the present application, the node splitting process in the decision tree may be controlled in the following manner:

determining a height threshold value of each decision tree, a leaf number threshold value of each decision tree and a learning rate threshold value of the composite tree model in the composite tree model by using a grid search method based on at least two pieces of sub-feature information; and stopping node splitting of the nodes in the decision tree when at least one of the following conditions is met: the height of any decision tree in the composite tree model is larger than or equal to a height threshold value, the leaf number of any decision tree in the composite tree model is larger than or equal to a leaf number threshold value, and the learning rate of the composite tree model is larger than or equal to a learning rate threshold value.

Step S612, the server obtains the job to be analyzed and the target feature information sent by the terminal.

In step S613, the server determines the job running time of the job to be analyzed under the target feature information based on the correspondence relationship.

In some embodiments, step S613 may be implemented by any one of the following ways:

the first method is as follows: target characteristic information of the operation to be analyzed is obtained, then the target characteristic information is input into the composite tree model, and the operation running time of the operation to be analyzed under the target characteristic information is determined through the composite tree model.

In the embodiment of the application, the composite tree model is trained, and the operation running time of the operation to be analyzed under the target characteristic information can be accurately predicted through the trained composite tree model. When the composite tree model is trained, node splitting and control of a decision tree in the composite tree model are carried out based on the characteristic information and the statistical characteristic information of the target node, so that the operation running time of the operation to be analyzed under the target characteristic information can be accurately predicted based on the composite tree model obtained by learning the characteristic information and the statistical characteristic information.

The second method comprises the following steps: the operation to be analyzed and the target operation are the same operation; the feature information and the statistical feature information constitute a feature data set of the target job. The characteristic data in the characteristic data set can be modified to obtain modified characteristic data; wherein the modified feature data form a test data set; then, determining the characteristic data after the test data is intensively modified as target characteristic information of the operation to be analyzed; and inputting the target characteristic information into the composite tree model, and determining the operation running time of the operation to be analyzed under the target characteristic information through the composite tree model.

In this embodiment, the modifying of the feature data in the feature data set may be modifying the number of machines in the feature data set to obtain a new number of machines, and then determining the new number of machines as the test data in the test data set.

In the embodiment of the application, the operation running time of the operation to be analyzed under the test data in the test data set is accurately predicted through the pre-trained composite tree model. When the composite tree model is trained, node splitting and control of a decision tree in the composite tree model are carried out based on the characteristic information and the statistical characteristic information of the target node, so that the operation running time of the operation to be analyzed under the test data in the test data set can be accurately predicted based on the composite tree model obtained by learning of the characteristic information and the statistical characteristic information.

Next, an exemplary application of the embodiment of the present application in a practical application scenario will be described. The embodiment of the application provides a method for determining the operation running time, which is a method for predicting the operation running time of an operation to be analyzed under different machine numbers and different time moments based on an artificial intelligence technology, and is used for analyzing the different machine numbers and influences on the operation running time under different time moments, so that the optimal machine number in a tolerable range of the operation running time under different time moments can be found, and the purposes of reducing the machine number and reducing the operation cost of the machine are achieved.

The scheme of the embodiment of the application is to extract the feature information of the target job and the label (i.e. the historical running time length) of the target job from historical data (i.e. historical data), and input the feature information and the label into a specific model (for example, a composite tree model) after appropriate processing to learn the relationship between the feature information and the label of the job. After learning is completed, feature information of a certain job can be modified, for example, the time when the job is submitted is modified to be sub2, the number of machines in a cluster at the time is modified to be q2, then, the job running time t2 of the job under new feature information is predicted by using a model, and by analyzing t2 and the original running time t1 of the job, an engineer can determine whether to adjust the number of machines to be q2 at the time of sub2, so that the number of machines is reduced, and the purpose of reducing the operation cost is achieved.

The scheme of the embodiment of the application can be applied to cluster machine operation, the operation running time under different machine numbers and different moments is predicted based on an artificial intelligence technology, the influence of the different machine numbers on the operation running time under different moments is analyzed, and the optimal machine number within a tolerable range of the operation running time under different moments is found.

Fig. 7 is a flowchart of a method for determining a job running time length according to an embodiment of the present application, and as shown in fig. 7, the method includes the following steps:

in step S701, feature information and a label of a job (here, the job may be a target job) are extracted from existing history data, and a data set related to the job is obtained.

In step S701, feature information and a label of the job are extracted, and in the implementation process, the feature information may be extracted from the historical data of the cluster, as shown in fig. 8, which is a schematic diagram of the implementation process for extracting the feature information from the historical data corresponding to the cluster provided in the embodiment of the present application. When a specific job is submitted to run in a cluster with a specific fixed size at a specific moment, the situation of other jobs at the moment is considered, and then the running duration of the job can be considered to be almost learnt and predicted, so that the characteristic information influencing the running duration of the job is roughly classified into the following four types:

the first type: the job self characteristics 801 (i.e., self attribute characteristics) include the number of cores of the machine required to run the job (core _ command), and the memory size of the demand (memory _ command);

the second type: job submission time 802, including the time of day of the week (week), hour (hour), minute (minute);

in the third category: cluster characteristics 803, including the number of machines in the cluster (cnt _ machine), the total number of cores in the cluster (core _ total), and the total memory in the cluster (memory _ total);

the fourth type: the characteristics 804 of the other jobs at this time include the number of other jobs (cnt _ jobs), the number of cores of machines already used by the other jobs (core _ used), and the number of memories already used (memory _ used).

In some embodiments, in addition to the existing characteristic information of the above four types of data, the jobs in the cluster have some periodic regularity, such as trend trends of every day are approximately consistent, and weekends and non-weekends are slightly different. Therefore, the following two types of feature information are newly created on the basis of the original features, so that the model can better learn the relation among the two types of feature information:

the fifth type: historical information over a short period of time in the past. For example, statistics of the characteristics of other jobs, including mean, maximum, minimum, during the first two hours of job operation. I.e. this step adds a total of 3 x 3 features: an average value, a maximum value, and a minimum value of the number of other jobs (cnt _ job), an average value, a maximum value, and a minimum value of the number of machine cores already used by other jobs (core _ used), and an average value, a maximum value, and a minimum value of the number of memories already used (memory _ used);

the sixth type: a polymerization feature comprising: a. aggregating (for example, performing group by operation) based on { week, hour and minute }, counting the average value, the maximum value and the minimum value of the four types of original features, and taking the rule information (namely the first change rule, the second change rule and the third change rule) of the historical data of the original features as new features;

b. and aggregating (for example, performing group by operation) based on { week and hour }, counting the average value, the maximum value and the minimum value of the four types of original features, and taking the regular information of the historical data of the original features as new features.

Finally, after adding new feature information, there may exist a part of feature information that is invalid, so feature elimination is required: for some feature information with small fluctuation and small variance, the model considers that the influence of the feature information on the result is almost 0, and therefore the feature information is directly eliminated.

Step S702, a model is built, a data set of the operation is input into the model for supervised learning, and the relation between the characteristic information and the label (namely the corresponding relation between the characteristic information and the operation duration) is determined to obtain a trained model.

In the embodiment of the application, in the model selection process, a compound tree model in machine learning can be selected for supervised learning, for example, the compound tree model can be an XGBoost model, and the XGBoost model utilizes training data x containing a plurality of feature information_iTo predict the target variable

Namely, the relation between the characteristic information and the operation time length of the operation is learned according to the characteristic information of the operation. The model can be expressed as the following formula (1):

wherein theta is a parameter of model learning; x is the number of_iRepresenting training data corresponding to the ith type of feature information, for example, if i can take any positive integer from 1 to 6, then x_iCorresponding to any one of the six types of feature information, each type of feature information has a plurality of training data, x_ijAnd j training data in i-th class characteristic information are represented.

The objective function for self optimization in the model learning process is the following formula (2):

Obj(Θ)＝L(θ)+Ω(Θ) (2)；

where L (θ) is a loss function, measured by the mean square error, i.e., the following equation (3):

wherein, Ω (Θ) is a regular term used to control the complexity of the model and prevent overfitting of the model. y is_iA label indicating a job, i.e., a real job run time length.

In the embodiment of the present application, the model may be a composite tree model, that is, a single operation sample is superimposed from one tree to the left or to the right and finally reaches a leaf node, so as to perform a prediction, and a prediction implementation process of the model is as shown in fig. 9.

In the model prediction process shown in fig. 9, there are 5 job samples (job 1, job 2, job3, job 4, and job 5) on the left, and it is now desired to predict the job running time of these 5 jobs, these 5 jobs are all divided into leaf nodes, different weight terms are assigned to different leaf nodes, a positive number represents the time length that the job running time of this job needs to be added under the influence of this situation, and a negative number is opposite, that is, the time length needs to be subtracted. Therefore, the operation time length of the current operation needed by the operation can be comprehensively judged through the combination of the leaf nodes and the weight values. The weight of the leaf node where the job3 of the above tree 1 is located is +1, but the effect of using a single decision tree is generally not good, and an integrated approach is generally used, that is, a tree may not be good, and then a tree, see the right tree 2 in the example, and the difference between the tree 2 and the left tree 1 is that the tree 2 uses additional feature information week, and in addition to these feature information, the partition attribute of the commit time can be considered. The current job running time of the job is decided by the two trees together, the weight of the job3 in tree 1 is +1, and the weight in tree 2 is +0.9, so the final weight of the job3 is +1.9, that is, the predicted job running time length.

Fig. 9 is only illustrated by taking two decision trees as an example, but actually, more complex weak classifiers can be provided and combined together to form a strong classifier, the iterative relationship between the decision trees is shown as the following formula (4), each tree is overlapped on the original basis, and the effect is ensured to be improved:

wherein f is_t(x_i) For the t-th superimposed model tree, every time the model tree is added, the loss function is tried to be reduced until convergence.

In the embodiment of the application, the model training process comprises the following steps:

the first step is as follows: data preprocessing of the feature information, namely normalization of the features related to the numerical values, is more beneficial to model learning, discretization of the related feature information, such as the feature of week weeks, needs to be discretized, otherwise, the model can be in the relationship of the numerical values in the wrong learned feature information.

The second step is that: after the feature information data is preprocessed, vectorizing the feature information, and inputting the feature vectors (namely sample feature vectors) and corresponding labels into the model, wherein the model is input into one feature vector and the label corresponding to the feature vector, and the model learns to obtain a group of parameters of the model according to the relationship between the feature information and the label.

The third step: and determining the optimal parameters of the height, the leaf number and the learning rate of the decision tree in the model by using a grid search method.

A fourth step of: and by using a cross validation method, each piece of data is guaranteed to be validated by the model, and the performance of the optimized model is more reliable as far as possible.

Through the four steps of training, a trained model is obtained, and the model can predict the operation running time according to the input characteristic information.

In step S703, characteristics of a job (the job may be a job to be analyzed, where the job to be analyzed and the target job may be the same job) such as the number of machines are modified to obtain a test set.

The aim of the embodiment of the application is to obtain the operation running time length after the number of the machines is changed. Therefore, after the parameters of the model are learned, the feature information of the number of machines of the job can be modified (after the number of machines is changed, the kernel number and the memory are also changed correspondingly), and then the feature information is input into the trained model, so that the model predicts the operation running time of the job after the number of machines is changed.

In this step, the cluster characteristics corresponding to each job, including the number of machines (cnt _ machine), the total number of cores (core _ total), and the total memory amount (memory _ total), are modified. If the operation running time of the operation under the n cluster characteristics needs to be known, n pieces of characteristic information are generated. Finally, each job generates different n characteristics, and a test set of the jobs is obtained.

Step S704, inputting the test set of step S703 into the trained model as the feature of the job, and predicting a label corresponding to the job (i.e., the running time of the job under the new feature). Therefore, after the number of the machines is modified, the operation time length of the operation can be changed to be within a tolerable range.

In this step, the test set generated in step S703 is input into the trained model, and the required operation duration corresponding to each piece of feature information can be obtained. Therefore, the relation between the number of the machines in the cluster and the operation running time can be obtained, and the operation running time can exceed the tolerance range when the number of the machines is reduced to be more than the number of the machines.

According to the method for determining the operation running time, an artificial intelligence technology is introduced to learn the relation between the characteristic information of the operation in the system and the operation running time, so that an engineer can visually recognize the influence of the number of machines in the cluster on the operation running time, the operation of the number of machines is facilitated, and the reduction of the machine cost is facilitated. Moreover, historical information in the cluster and data information in a short period of time in the past are combined, and compared with the method in the related technology, the method has more comprehensive information supplement and data reference value in the aspect of historical rules. Meanwhile, when the operation running time is predicted, the influence of other operations at the current moment is added, so that the model can immediately correct the interference, and the obtained data has higher reference value.

It should be noted that, in the feature extraction part in step S701 in the embodiment of the present application, since the obtained data is limited, only part of the related feature information is used, and when a new related feature can be collected, the feature extraction part may also be added to the solution in the embodiment of the present application. In the model selection in step S702, not only the XGBoost model may be used for learning, but also a LightGBM model in machine learning, a multilayer perceptron (MLP) in a neural network, a Long-Short Term Memory network (LSTM) in a deep neural network in a neural network, and other models may be used instead.

It is understood that, in the embodiments of the present application, the content of the user information, for example, the characteristic information and statistical characteristic information of the target job, the job running time of the job to be analyzed, and the like, if the data related to the user information or the business information is involved, when the embodiments of the present application are applied to a specific product or technology, user permission or consent needs to be obtained, and the collection, use, and processing of the related data need to comply with the relevant laws and regulations and standards of the relevant country and region.

Continuing on with the exemplary structure in which the job run-time-length determination means 354 provided in the embodiment of the present application is implemented as a software module, in some embodiments, as shown in fig. 3, the job run-time-length determination means 354 includes: an obtaining module 3541, configured to obtain history data of a target job in a preset history time period; a first determining module 3542, configured to determine feature information belonging to a preset type from the historical data; a data statistics module 3543, configured to perform data statistics on the feature information to obtain statistical feature information; an analysis module 3544, configured to perform distributed gradient enhancement analysis on the feature information and the statistical feature information based on historical operating time in the historical data, so as to obtain a corresponding relationship between the feature information and the operating time; a second determining module 3545, configured to determine, based on the correspondence, a job running time length of the job to be analyzed under the target feature information.

In some embodiments, the first determining module is further configured to: screening out self attribute characteristics of the target operation, the submission time of the target operation, the cluster characteristics of the target operation and the characteristics of other operations in the running time of the target operation from the historical data; and determining the self attribute feature, the submission time, the cluster feature and the features of other jobs as feature information belonging to a preset type.

In some embodiments, the data statistics module is further to: acquiring a feature set corresponding to a plurality of other jobs in a specific time period in the running time of the target job; performing statistical analysis on the feature data in the feature set to obtain a number statistical value of other operations, a machine core number statistical value used by other operations and a memory statistical value used by other operations; wherein the statistical value of the number of other jobs, the statistical value of the number of machine cores used by the other jobs, and the statistical value of the memory used by the other jobs constitute the statistical characteristic information.

In some embodiments, the time of submission of the target job comprises at least one of: commit date values, commit hours values, and commit minutes values; the data statistics module is further configured to: sequentially performing aggregation processing on the self attribute feature, the submission time, the cluster feature and the features of other jobs based on the submission date value, the submission time value and the submission minute value to obtain an aggregated self attribute feature, an aggregated submission time, an aggregated cluster feature and aggregated features of other jobs; on the basis of the submission date value, performing statistical analysis on the aggregated self attribute feature, the aggregated submission time, the aggregated cluster feature and the aggregated features of other jobs in sequence to correspondingly obtain first change rules of the aggregated self attribute feature, the aggregated submission time, the aggregated cluster feature and the aggregated features of other jobs in date dimensions respectively; on the basis of the submission small-time value, performing statistical analysis on the aggregated self-attribute feature, the aggregated submission time, the aggregated cluster feature and the aggregated features of other jobs in sequence, and correspondingly obtaining second change rules of the aggregated self-attribute feature, the aggregated submission time, the aggregated cluster feature and the aggregated features of other jobs in an hour dimension respectively; on the basis of the submitted minute value, performing statistical analysis on the aggregated self attribute feature, the aggregated submitted time, the aggregated cluster feature and the aggregated features of other jobs in sequence to correspondingly obtain a third change rule of the aggregated self attribute feature, the aggregated submitted time, the aggregated cluster feature and the aggregated features of other jobs in a minute dimension respectively; wherein at least one of the first change rule, the second change rule, and the third change rule constitutes the statistical characteristic information.

In some embodiments, the apparatus further comprises: a data determining module, configured to determine a fluctuation value and a variance of each feature data in the feature information and the statistical feature information; determining the characteristic data of which the fluctuation value is smaller than a fluctuation threshold value and the variance is smaller than a variance threshold value as target elimination data; and the data removing module is used for removing the target removing data from the characteristic information and the statistical characteristic information.

In some embodiments, the analysis module is further to: vectorizing the characteristic information and the statistical characteristic information to obtain a sample characteristic vector; inputting the sample feature vector into a composite tree model; performing distributed gradient enhancement analysis on the basis of the characteristic information and the statistical characteristic information through the composite tree model to obtain the predicted operation duration of the target operation; inputting the predicted operation duration and the historical operation duration into a preset loss model to obtain a loss result; and correcting model parameters in the composite tree model according to the loss result to obtain a trained composite tree model, wherein the trained composite tree model can represent the corresponding relation between the characteristic information and the operation duration.

In some embodiments, the feature information and the statistical feature information include at least two sub-feature information; the composite tree model comprises at least one decision tree; the analysis module is further to: sequentially splitting nodes in each decision tree in the composite tree model based on each piece of sub-feature information in the at least two pieces of sub-feature information to obtain two leaf nodes and a weight value of each leaf node; and determining the predicted running time of the target operation according to the weight values of all leaf nodes to which the target operation belongs.

In some embodiments, the apparatus further comprises: a threshold determination module, configured to determine, based on the at least two sub-feature information, a height threshold of each decision tree in the composite tree model, a leaf number threshold of each decision tree, and a learning rate threshold of the composite tree model using a mesh search method; a control module, configured to stop performing node splitting on nodes in the decision tree when at least one of the following conditions is met: the height of any decision tree in the composite tree model is larger than or equal to the height threshold, the number of leaves of any decision tree in the composite tree model is larger than or equal to the number of leaves threshold, and the learning rate of the composite tree model is larger than or equal to the learning rate threshold.

In some embodiments, the second determination module is further configured to: acquiring target characteristic information of the operation to be analyzed; and inputting the target characteristic information into the composite tree model, and determining the operation running time of the operation to be analyzed under the target characteristic information through the composite tree model.

In some embodiments, the job to be analyzed and the target job are the same job; the characteristic information and the statistical characteristic information form a characteristic data set of the target operation; the second determination module is further to: modifying the characteristic data in the characteristic data set to obtain modified characteristic data; wherein the modified feature data form a test data set; determining the modified characteristic data in the test data set as target characteristic information of the operation to be analyzed; and inputting the target characteristic information into the composite tree model, and determining the operation running time of the operation to be analyzed under the target characteristic information through the composite tree model.

It should be noted that the description of the apparatus in the embodiment of the present application is similar to the description of the method embodiment, and has similar beneficial effects to the method embodiment, and therefore, the description is not repeated. For technical details not disclosed in the embodiments of the apparatus, reference is made to the description of the embodiments of the method of the present application for understanding.

Embodiments of the present application provide a computer program product or computer program comprising executable instructions, which are computer instructions; the executable instructions are stored in a computer readable storage medium. When the processor of the job operation duration determination device reads the executable instructions from the computer-readable storage medium, and the processor executes the executable instructions, the job operation duration determination device is caused to execute the method described above in the embodiment of the present application.

Embodiments of the present application provide a storage medium having stored therein executable instructions, which when executed by a processor, will cause the processor to perform a method provided by embodiments of the present application, for example, the method as illustrated in fig. 4.

In some embodiments, the storage medium may be a computer-readable storage medium, such as a Ferroelectric Random Access Memory (FRAM), a Read Only Memory (ROM), a Programmable Read Only Memory (PROM), an Erasable Programmable Read Only Memory (EPROM), a charged Erasable Programmable Read Only Memory (EEPROM), a flash Memory, a magnetic surface Memory, an optical disc, or a Compact disc Read Only Memory (CD-ROM), among other memories; or may be various devices including one or any combination of the above memories.

In some embodiments, executable instructions may be written in any form of programming language (including compiled or interpreted languages), in the form of programs, software modules, scripts or code, and may be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.

By way of example, executable instructions may, but need not, correspond to files in a file system, and may be stored in a portion of a file that holds other programs or data, such as in one or more scripts in a hypertext Markup Language (HTML) document, in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code). By way of example, executable instructions may be deployed to be executed on one computing device (which may be a job run-time determination device), or on multiple computing devices located at one site, or distributed across multiple sites and interconnected by a communication network.

The above description is only an example of the present application, and is not intended to limit the scope of the present application. Any modification, equivalent replacement, and improvement made within the spirit and scope of the present application are included in the protection scope of the present application.

Claims

1. A method for determining a job running time length is characterized by comprising the following steps:

acquiring historical data of a target job in a preset historical time period;

determining feature information belonging to a preset type from the historical data;

performing data statistics on the characteristic information to obtain statistical characteristic information;

performing distributed gradient enhancement analysis on the characteristic information and the statistical characteristic information based on historical operating time in the historical data to obtain a corresponding relation between the characteristic information and the operating time;

and determining the operation duration of the operation to be analyzed under the target characteristic information based on the corresponding relation.

2. The method of claim 1, wherein the determining feature information belonging to a preset type from the historical data comprises:

screening out self attribute characteristics of the target operation, the submission time of the target operation, the cluster characteristics of the target operation and the characteristics of other operations in the running time of the target operation from the historical data;

and determining the self attribute feature, the submission time, the cluster feature and the features of other jobs as feature information belonging to a preset type.

3. The method according to claim 2, wherein performing data statistics on the feature information to obtain statistical feature information comprises:

acquiring a feature set corresponding to a plurality of other jobs in a specific time period in the running time of the target job;

performing statistical analysis on the feature data in the feature set to obtain a number statistical value of other operations, a machine core number statistical value used by other operations and a memory statistical value used by other operations;

wherein the statistical value of the number of other jobs, the statistical value of the number of machine cores used by the other jobs, and the statistical value of the memory used by the other jobs constitute the statistical characteristic information.

4. The method of claim 3, wherein the time of submission of the target job comprises at least one of: submitting a date value, a small value and a minute value;

the performing data statistics on the feature information to obtain statistical feature information further includes:

sequentially performing aggregation processing on the self attribute feature, the submission time, the cluster feature and the features of other jobs based on the submission date value, the submission time value and the submission minute value to obtain an aggregated self attribute feature, an aggregated submission time, an aggregated cluster feature and aggregated features of other jobs;

on the basis of the submission date value, performing statistical analysis on the aggregated self attribute feature, the aggregated submission time, the aggregated cluster feature and the aggregated features of other jobs in sequence to correspondingly obtain first change rules of the aggregated self attribute feature, the aggregated submission time, the aggregated cluster feature and the aggregated features of other jobs in date dimensions respectively;

on the basis of the submission small-time value, performing statistical analysis on the aggregated self-attribute feature, the aggregated submission time, the aggregated cluster feature and the aggregated features of other jobs in sequence, and correspondingly obtaining second change rules of the aggregated self-attribute feature, the aggregated submission time, the aggregated cluster feature and the aggregated features of other jobs in an hour dimension respectively;

on the basis of the submitted minute value, performing statistical analysis on the aggregated self attribute feature, the aggregated submitted time, the aggregated cluster feature and the aggregated features of other jobs in sequence to correspondingly obtain a third change rule of the aggregated self attribute feature, the aggregated submitted time, the aggregated cluster feature and the aggregated features of other jobs in a minute dimension respectively;

wherein at least one of the first change rule, the second change rule, and the third change rule constitutes the statistical characteristic information.

5. The method of claim 1, further comprising:

determining a fluctuation value and a variance of each feature data in the feature information and the statistical feature information;

determining the characteristic data of which the fluctuation value is smaller than a fluctuation threshold value and the variance is smaller than a variance threshold value as target elimination data;

and eliminating the target elimination data from the characteristic information and the statistical characteristic information.

6. The method according to claim 1, wherein the performing distributed gradient enhancement analysis on the feature information and the statistical feature information based on historical operating duration in the historical data to obtain a corresponding relationship between the feature information and the operating duration comprises:

vectorizing the characteristic information and the statistical characteristic information to obtain a sample characteristic vector;

inputting the sample feature vector into a composite tree model;

performing distributed gradient enhancement analysis on the basis of the characteristic information and the statistical characteristic information through the composite tree model to obtain the predicted operation duration of the target operation;

inputting the predicted operation duration and the historical operation duration into a preset loss model to obtain a loss result;

and correcting the model parameters in the composite tree model according to the loss result to obtain a trained composite tree model, wherein the trained composite tree model can represent the corresponding relation between the characteristic information and the operation duration.

7. The method according to claim 6, wherein the feature information and the statistical feature information comprise at least two sub-feature information; the composite tree model comprises at least one decision tree;

the obtaining the predicted operation duration of the target operation by performing distributed gradient enhancement analysis on the basis of the feature information and the statistical feature information through the composite tree model comprises the following steps:

sequentially performing node splitting on nodes in each decision tree in the composite tree model based on each piece of sub-feature information in the at least two pieces of sub-feature information to obtain two leaf nodes and a weight value of each leaf node;

and determining the predicted running time of the target operation according to the weight values of all leaf nodes to which the target operation belongs.

8. The method of claim 7, further comprising:

determining a height threshold of each decision tree, a leaf number threshold of each decision tree and a learning rate threshold of the composite tree model in the composite tree model by using a grid search method based on the at least two pieces of sub-feature information;

stopping node splitting of nodes in the decision tree when at least one of the following conditions is met:

the height of any decision tree in the composite tree model is larger than or equal to the height threshold, the number of leaves of any decision tree in the composite tree model is larger than or equal to the number of leaves threshold, and the learning rate of the composite tree model is larger than or equal to the learning rate threshold.

9. The method according to claim 6, wherein the determining the job running time length of the job to be analyzed under the target characteristic information based on the corresponding relation comprises:

acquiring target characteristic information of the operation to be analyzed;

and inputting the target characteristic information into the composite tree model, and determining the operation running time of the operation to be analyzed under the target characteristic information through the composite tree model.

10. The method of claim 6, wherein the job to be analyzed and the target job are the same job; the characteristic information and the statistical characteristic information form a characteristic data set of the target operation;

the determining the operation duration of the operation to be analyzed under the target characteristic information based on the corresponding relation comprises the following steps:

modifying the characteristic data in the characteristic data set to obtain modified characteristic data; wherein the modified feature data form a test data set;

determining the modified characteristic data in the test data set as target characteristic information of the operation to be analyzed;

11. An apparatus for determining a length of time during which a job is executed, the apparatus comprising:

the acquisition module is used for acquiring historical data of the target job in a preset historical time period;

the first determining module is used for determining the characteristic information belonging to a preset type from the historical data;

the data statistics module is used for carrying out data statistics on the characteristic information to obtain statistical characteristic information;

the analysis module is used for carrying out distributed gradient enhancement analysis on the characteristic information and the statistical characteristic information based on historical operating duration in the historical data to obtain the corresponding relation between the characteristic information and the operating duration;

and the second determining module is used for determining the operation running time of the operation to be analyzed under the target characteristic information based on the corresponding relation.

12. An operation run length determination device characterized by comprising:

a memory for storing executable instructions; a processor, configured to execute the executable instructions stored in the memory, and to implement the method for determining a job run-time according to any one of claims 1 to 10.

13. A computer-readable storage medium having stored thereon executable instructions for causing a processor to implement the method for determining a length of time during which a job runs of any one of claims 1 to 10 when the executable instructions are executed.