CN115495251B

CN115495251B - Intelligent control method and system for computing resources in data integration operation

Info

Publication number: CN115495251B
Application number: CN202211440650.9A
Authority: CN
Inventors: 曹源
Original assignee: Beijing Deepexi Technology Co Ltd
Current assignee: Beijing Deepexi Technology Co Ltd
Priority date: 2022-11-17
Filing date: 2022-11-17
Publication date: 2023-02-07
Anticipated expiration: 2042-11-17
Also published as: CN115495251A

Abstract

The invention provides a method and a system for intelligently controlling computing resources in data integration operation, wherein the method comprises the following steps: when the data integration task is executed, monitoring the use condition of the system log and the resident memory of the DEC; determining a computing resource allocation strategy based on the system log and the use condition of the resident memory; and executing the computing resource allocation strategy. The method and the system for intelligently controlling the computing resources in the data integration operation can effectively reduce task abnormal interruption caused by OOM in the data integration task, can ensure the stable operation of a high-priority task by a mechanism of dynamically increasing memory resources, and reduce the probability of the occurrence of the condition that the middle end of the task has influence on the downstream. Meanwhile, due to intelligent allocation in a single server node, the resource allocation pressure of the whole cluster can be effectively relieved, and the operation and maintenance cost of an enterprise is reduced.

Description

Intelligent control method and system for computing resources in data integration operation

Technical Field

The invention relates to the field of computer software, in particular to a method and a system for intelligently controlling computing resources in data integration operation.

Background

At present, in the field of big data, more and more scenes of data-driven services are provided, and the demands for computing around data are more and more abundant, and in the whole link process of data processing, data integration (ETL or ELT task) is the first step of all enterprises in the process of extracting values of data, and is also a very important step. The data integration operation theory is to extract and guarantee the real-time operation of the real-time service in real time or calculate that the off-line task is depended on upstream and downstream. The requirements on stability are high. In a single compute node, a key factor in determining whether a process can operate stably is memory resources. Whether memory resource allocation is proper or not has a critical influence on whether the data integration operation can be stably operated or not. Existing data integration products, for example: dataX, canal, etc., generally only focus on data source extraction and target database writing. The control of resources is only to this level allocated to a certain node. The particle size is relatively coarse. Based on the docker container technique, resource allocation is relatively fixed and inflexible once completed. Once the resource allocation is completed, the resource quota does not change again during the whole task running period, which may lead to resource waste if the preset resource is too high. If the preset resources are too low, the task often fails due to the problem of the OOM, and the downstream service is influenced.

Therefore, a solution is needed.

Disclosure of Invention

One of the objectives of the present invention is to provide an intelligent control method for computing resources in data integration operation, which can effectively reduce task abnormal interruption caused by an OOM in a data integration task, and simultaneously can ensure stable operation of a high-priority task by dynamically increasing a mechanism of memory resources, and reduce the probability of occurrence of a situation in which an end in the task has an influence on downstream. Meanwhile, due to intelligent allocation in a single server node, the resource allocation pressure of the whole cluster can be effectively relieved, and the operation and maintenance cost of an enterprise is reduced.

The embodiment of the invention provides an intelligent control method for computing resources in data integration operation, which comprises the following steps:

when the data integration task is executed, monitoring the system log and the resident memory use condition of the DEC;

determining a computing resource allocation strategy based on the system log and the use condition of the resident memory;

a computing resource allocation policy is enforced.

Preferably, the determining a computing resource allocation policy based on the system log and the resident memory usage includes:

determining whether an OOM event occurs based on the system log;

if yes, determining that the computing resource allocation strategy is as follows: if DEC is P1 level, restarting the data integration task and increasing the memory resource allocation of DEC; if DEC is P2 level, restarting the data integration task;

if DEC is P0 grade, determining whether RES occupation of DEC is close to cgoup.

If yes, determining that the computing resource allocation strategy is as follows: actively increasing the memory resource allocation of DEC;

if DEC is P2 level or P3 level, determining whether RES occupation of DEC is continuously lower than a preset percentage based on the use condition of the resident memory;

if yes, determining that the computing resource allocation strategy is as follows: the memory resource allocation of DEC is actively reduced.

Preferably, the P0 level is an active guarantee level;

the P1 level is a failure retry guarantee level;

the P2 level is a conventional level;

the P3 level is a low priority level.

Preferably, the method for intelligently controlling computing resources in data integration operation further comprises:

when a user inputs a manual remote computing resource control request, task information of a data integration task is obtained;

performing feature extraction on the task information to obtain a plurality of task features;

constructing a first task description vector based on the plurality of task features;

acquiring a preset computing resource control expert database, wherein the computing resource control expert database comprises: a plurality of groups of expert nodes and second task description vectors which correspond one to one;

acquiring node information of the expert node, wherein the node information comprises: in the latest preset first time, an expert performs manual remote computing resource control to obtain a plurality of customer evaluation records;

based on the node information, selecting a better expert node from the expert nodes;

calculating a first vector similarity between the first task description vector and a second task description vector corresponding to any better expert node;

taking the better expert node corresponding to the maximum first vector similarity as a suitable expert node;

continuously delivering the system log and the use condition of the resident memory to a proper expert node;

acquiring a computing resource control strategy suitable for the expert node to reply;

a computing resource control policy is executed.

Preferably, based on the node information, selecting a better expert node from the expert nodes, including:

preprocessing the node information;

extracting the characteristics of the preprocessing result to obtain a plurality of node information characteristics;

and selecting a better expert node from the expert nodes based on the information characteristics of the plurality of nodes.

Preferably, the node information is preprocessed, including:

extracting sight line movement tracks within a second time preset before and after any evaluation option is selected when a customer fills in an evaluation questionnaire from the customer evaluation records;

traversing the evaluation options in a reverse order according to the order of the options in the evaluation questionnaire;

during each traversal, extracting a content structure in the traversed evaluation option;

acquiring a preset first track description vector corresponding to a content structure;

performing feature extraction on sight movement tracks within a second time preset before and after the client selects the traversed evaluation option to obtain a plurality of track features;

constructing a second trajectory description vector based on the plurality of trajectory features;

calculating a second vector similarity between the second trajectory description vector and the first trajectory description vector;

if the second vector similarity is smaller than or equal to a preset vector similarity threshold, rejecting corresponding customer evaluation records;

finishing preprocessing after all client evaluation records needing to be eliminated in the node information are eliminated;

and/or the presence of a gas in the atmosphere,

extracting the stay time when the customer selects any evaluation option when filling in the evaluation questionnaire from the customer evaluation record;

traversing the evaluation options in a reverse order according to the option sequence of the evaluation options in the evaluation questionnaire;

during each traversal, acquiring a preset stay time threshold corresponding to the traversed evaluation option;

if the staying time when the client fills in the evaluation questionnaire and selects the traversed evaluation option is less than or equal to the staying time threshold, rejecting the corresponding client evaluation record;

and finishing preprocessing after all the client evaluation records needing to be removed in the node information are removed.

Preferably, based on the information characteristics of the plurality of nodes, the method for selecting the better expert node from the expert nodes comprises the following steps:

constructing a first node description vector based on a plurality of node information characteristics;

acquiring a preset node evaluation library, wherein the node evaluation library comprises: a plurality of groups of one-to-one corresponding second node description vectors and first evaluation values;

matching the first node description vector with any second node description vector;

and if the matching is matched, if the first evaluation value corresponding to the matched second node description vector is greater than or equal to a preset first evaluation threshold value, taking the corresponding expert node as a better expert node.

The embodiment of the invention provides an intelligent control system for computing resources in data integration operation, which comprises:

the monitoring module is used for monitoring the service conditions of the system log and the resident memory of the DEC when the data integration task is executed;

the determining module is used for determining a computing resource allocation strategy based on the system log and the use condition of the resident memory;

and the execution module is used for executing the calculation resource allocation strategy.

Preferably, the determining module determines the computing resource allocation policy based on the system log and the usage of the resident memory, and includes:

determining whether an OOM event occurs based on the system log;

if DEC is P0 level, determining whether RES occupation of DEC is close to cgoup.

Preferably, the P0 level is an active guarantee level;

the P1 level is a failure retry guarantee level;

the P2 level is a conventional level;

the P3 level is a low priority level.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:

FIG. 1 is a diagram illustrating a method for intelligently controlling computing resources in a data integration operation according to an embodiment of the present invention;

FIG. 2 is a diagram illustrating an exemplary application of a method for intelligent control of computing resources in data integration operations according to an embodiment of the present invention;

FIG. 3 is a diagram of an intelligent control system for computing resources in data integration operations according to an embodiment of the present invention.

Detailed Description

The preferred embodiments of the present invention will be described in conjunction with the accompanying drawings, and it should be understood that they are presented herein only to illustrate and explain the present invention and not to limit the present invention.

An embodiment of the present invention provides an intelligent control method for computing resources in data integration operations, as shown in fig. 1, including:

step 1: when the data integration task is executed, monitoring the use condition of the system log and the resident memory of the DEC;

and 2, step: determining a computing resource allocation strategy based on the system log and the use condition of the resident memory;

and step 3: and executing the computing resource allocation strategy.

Determining a computing resource allocation strategy based on the system log and the resident memory usage, comprising:

determining whether an OOM event occurs based on the system log;

The P0 level is an active guarantee level;

the P1 level is a failure retry guarantee level;

the P2 level is a conventional level;

the P3 level is a low priority level.

The working principle and the beneficial effects of the technical scheme are as follows:

the minimum unit controlled by the control method is DEC (Deepexi Data extract Container), is a program combination which is independently developed and oriented to the Data integration field by the Dipper science and technology and supports Data integration among a plurality of heterogeneous Data sources, and is a professional Container mirror image which can be used after opening a box. The DEC is essentially a set of packages that can be run directly in accordance with a configuration file. The runtime is a single independent process. The basic unit controlled by the control method is DEC.

Task queues of 4 levels are defined, which respectively correspond to different resource control policies, and are used to deal with data integration tasks under different scenarios, generally, a data integration task target of a test property is set to be a low priority level P3, and the level of DEC is shown in table 1 below:

the resource control of DEC is based on the cgroup mechanism, and the memory resource therein is used as the key content in the control method. The triggering of the control is divided into active control and passive control, and is mainly realized by combining system log monitoring and RES (resident memory) monitoring of each DEC, and the structure and the control method strategy are shown in FIG. 2.

DEC-C is a physical realization of the control method, is an independently running process and is responsible for receiving basic information started by DEC and controlling the resources of the actual controlled units related to DEC according to different control levels corresponding to DEC. Hereinafter abbreviated as DEC-C is Controller.

Computing resource allocation policies are divided into active policies and passive policies. The active policy is based on the monitoring of the use of the resident memory from each DEC, and is triggered and determined according to the different levels of the DEC: when detecting that RES occupation of DEC of P0 level is close to cgoup.limits, actively increasing allocation of resources for the DEC is triggered; when a percentage of the P2, P3 level DEC is detected when the occupancy of RES continues to be below a preset value, a trigger is made to actively reduce the allocation of resources of the DEC. The passive policy is determined according to different levels of DEC, when OOM (memory overflow) is found, based on monitoring of each system log: if the level is P1, restarting the task and increasing the memory resource allocation of the DEC; if the task is in the P2 level, the task is restarted, and the memory cannot be increased; if it is the P3 level, no restart is performed. The amount of increase or decrease may be configured by a worker in advance when increasing or decreasing the resource allocation of the DEC.

The key parameters of DEC are shown in table 2 below:

the key parameters of the Controller are shown in the following table:

when the computing resources of the task are found to be insufficient or are about to be fully occupied, the computing resources are automatically increased, failure caused by insufficient resource allocation is avoided, when the resource rate of the task is found to be low and the computing resources are excessively allocated, the computing resources preset by the task are automatically recovered, and when the task is found to be restarted due to insufficient resource failure, the resource allocation can be automatically improved while the task is automatically restarted; in a single computing node, the problems of task delay, failure and the like caused by mismatching of preset resource allocation are avoided; in a single computing node, effective resource allocation can be realized, and the computing efficiency of the single node is improved. Each node can improve the resource utilization rate, the total required resources can be properly reduced for the whole operation cluster, and the hardware cost is saved. After initialization is completed, resource allocation in the nodes is realized, and the whole process is automatic and intelligent; the labor operation and maintenance cost can be reduced. By the control method, the data integration task can be effectively managed on the basis of the memory, and the problem caused by mismatching of resource allocation due to the problem of data fluctuation is reduced. The method can effectively reduce task abnormal interruption caused by OOM in the data integration task, and can ensure the stable operation of the high-priority task and reduce the probability of the occurrence of the condition that the middle end of the task affects the downstream through a mechanism of dynamically increasing memory resources. Meanwhile, due to intelligent allocation in a single server node, the resource allocation pressure of the whole cluster can be effectively relieved, and the operation and maintenance cost of an enterprise is reduced.

In one embodiment, the method for intelligently controlling computing resources in data integration operation further comprises the following steps:

constructing a first task description vector based on a plurality of task features;

a computing resource control policy is executed.

generally, some data integration operation clients have the requirement of manual remote computing resource control, and the manual remote computing resource control is more humanized because the system can be communicated with the clients in real time in the control process, so that the problem of manual allocation for remote computing resource control is involved.

When a user requests, task information of the data integration task is acquired. The task information may be a type and a data amount of the task, etc. Task features of the task information are extracted, for example: the type of task and the amount of data. A first task description vector is constructed. The expert nodes and the second task description vectors which are in one-to-one correspondence are specifically as follows: the expert node is a network node and is in communication docking with an operation terminal of an engineer whose back end can serve a customer for manual remote computing resource control, and the corresponding second task description vector is a description vector constructed by characteristics extracted from task information of a data integration task which is adept by the engineer for computing resource control. And screening out better expert nodes with better evaluation based on the node information of the expert nodes. And calculating the first vector similarity between the first task description vector and a second task description vector corresponding to any better expert node, wherein the greater the first vector similarity is, the better the corresponding engineer is in the control of the computing resources of the data integration task, and therefore, the better expert node corresponding to the maximum first vector similarity is taken as the appropriate expert node. And continuously delivering the system log and the use condition of the resident memory to a suitable expert node, so that an engineer can check the system log and the use condition of the resident memory through an operation terminal, and the engineer can give a reply of a computing resource control strategy based on the system log and the use condition of the resident memory and finally execute the strategy.

When the remote computing resource control is manually distributed, the engineers who are most adept at the data integration task of the client are selected to control the computing resources and evaluate better, and the suitability and the distribution efficiency of manual distribution are improved.

In one embodiment, the selecting of the preferred expert nodes from the expert nodes based on the node information includes:

preprocessing the node information;

the purpose of the preprocessing is to remove less genuine customer evaluations from the node information. Extracting node information characteristics of the preprocessing result, such as: the number of good comments, the number of medium comments, the number of bad comments and the like. And selecting a better expert node from the expert nodes based on the information characteristics of the plurality of nodes. The rationality of the engineer selection is improved.

In one embodiment, the node information is preprocessed, including:

if the second vector similarity is smaller than or equal to a preset vector similarity threshold value, rejecting corresponding customer evaluation records;

and/or the presence of a gas in the atmosphere,

if the stay time length when the client selects the traversed evaluation option when filling the evaluation questionnaire is less than or equal to the stay time length threshold, rejecting the corresponding client evaluation record;

typically, a customer fills in an evaluation questionnaire during evaluation. The evaluation questionnaire has a plurality of evaluation options, and the contents of the evaluation options are, for example: "how your remote assistance of an engineer contributes to the stability of your data integration" "1, excellent; 2. generally; 3. is poor. ". When removing the client evaluation with lower authenticity, the evaluation needs to be determined according to the condition that the client fills in each evaluation option. There are two ways:

firstly, a sight line movement track within a second time preset before and after any evaluation option is selected when a customer fills in an evaluation questionnaire is extracted from a customer evaluation record, wherein the preset second time can be 12 seconds, and the acquisition of the sight line movement track belongs to the field of the prior art and is not described in detail; extracting a content structure in the evaluation option, and introducing a preset first trajectory description vector corresponding to the content structure, for example: the content structure is in a character direction from top to bottom, if a user carefully views the content, the sight line track is from top to small and has a certain stopping point, and the first track description vector is a vector constructed by track characteristics of the sight line track generated by the user if carefully views the content; extracting track characteristics of the sight line movement track and constructing a second track description vector; calculating second vector similarity between the second track description vector and the first track description vector, wherein the larger the second vector similarity is, the more carefully the customer views the content of the evaluation option, the more real the evaluation option is given; if the second vector similarity is smaller than or equal to the preset vector similarity threshold, it indicates that the content of the evaluation option is not carefully viewed and should be removed when the client views the content of the evaluation option.

Secondly, extracting the stay time when the customer selects any evaluation option when filling in the evaluation questionnaire from the customer evaluation record, wherein the stay time can be determined by the duration of continuous display of the page of the evaluation option; and introducing a preset stay time threshold corresponding to the evaluation option, wherein the stay time threshold is the minimum stay time for the user to carefully check the content of the evaluation option, and if the stay time is less than or equal to the stay time threshold, the situation that the content is not carefully checked and should be removed when the user checks the content of the evaluation option is shown.

This application introduces two kinds of modes and carries out the preliminary treatment to customer's evaluation, rejects the lower evaluation of authenticity, has indirectly promoted the accurate nature that the engineer selected, simultaneously, has also guaranteed the fairness.

In addition, generally, a user carefully fills in the previous evaluation options, and when filling in the next evaluation options, the evaluation options may be relatively popular due to lack of patience and the like, so that the evaluation options are traversed in reverse order according to the order of the options of the evaluation options in the evaluation questionnaire, and the rejection efficiency is improved.

In one embodiment, the selecting out the better expert node from the expert nodes based on the information characteristics of the plurality of nodes comprises:

matching the first node description vector with any one of the second node description vectors;

and constructing a first node description vector based on the plurality of node information characteristics. The one-to-one correspondence between the second node description vector and the first evaluation value is specifically: the second node description vector is a vector constructed in advance according to the characteristics of different node information, the first evaluation value is an evaluation value given in advance according to the expert evaluation goodness reflected by the node information, and the larger the first evaluation value is, the better the evaluation is, for example: the node information is characterized by a bad score 20, which constructs a second node description vector, and the first evaluation value is 0. And matching the first node description vector with any one of the second node description vectors, if the matching is matched, indicating that the condition of the node information reaction is matched with the condition corresponding to the matched second node description vector, and outputting a corresponding first evaluation value, and if the first evaluation value is greater than or equal to a preset first evaluation threshold value, indicating that the node evaluation is better and serving as a better expert node. Based on vector construction and vector matching, the node evaluation condition is quickly determined, and the evaluation determination efficiency is improved.

An embodiment of the present invention provides an intelligent control system for computing resources in data integration operations, as shown in fig. 3, including:

the monitoring module 1 is used for monitoring the service conditions of a system log and a resident memory (DEC) when a data integration task is executed;

the determining module 2 is used for determining a computing resource allocation strategy based on the system log and the resident memory use condition;

and the execution module 3 is used for executing the calculation resource allocation strategy.

The determining module 2 determines a computing resource allocation strategy based on the system log and the resident memory usage, and comprises the following steps:

determining whether an OOM event occurs based on the system log;

The P0 level is an active guarantee level;

the P1 level is a failure retry guarantee level;

the P2 level is a conventional level;

the P3 level is a low priority level.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. An intelligent control method for computing resources in data integration operation is characterized by comprising the following steps:

when the data integration task is executed, monitoring the use condition of the system log and the resident memory of the DEC;

determining a computing resource allocation strategy based on the system log and the resident memory use condition;

executing the computing resource allocation policy;

further comprising:

when a user inputs a manual remote computing resource control request, acquiring task information of the data integration task;

selecting a better expert node from the expert nodes based on the node information;

calculating a first vector similarity between the first task description vector and the second task description vector corresponding to any one of the preferred expert nodes;

continuously delivering the system log and the use condition of the resident memory to the appropriate expert node;

and executing the computing resource control strategy.

2. The method as claimed in claim 1, wherein determining a computing resource allocation policy based on the system log and the resident memory usage comprises:

determining whether an OOM event occurs based on the system log;

if yes, determining that the computing resource allocation strategy is as follows: if the DEC is in a P1 level, restarting the data integration task, and increasing the memory resource allocation of the DEC; if the DEC is in a P2 level, restarting the data integration task;

if the DEC is in the P0 level, determining whether the RES occupation of the DEC is close to cgoup.

If yes, determining that the computing resource allocation strategy is as follows: actively increasing the memory resource allocation of the DEC;

if the DEC is in a P2 level or a P3 level, determining whether the RES occupation of the DEC is continuously lower than a preset percentage or not based on the use condition of the resident memory;

if yes, determining that the computing resource allocation strategy is as follows: actively reducing memory resource allocation of the DEC.

3. The method of claim 2, wherein the P0 level is an active security level;

the P1 level is a failure retry guarantee level;

the P2 level is a regular level;

the P3 level is a low priority level.

4. The method as claimed in claim 1, wherein the step of selecting a better expert node from the expert nodes based on the node information comprises:

preprocessing the node information;

and selecting a better expert node from the expert nodes based on the plurality of node information characteristics.

5. The method as claimed in claim 4, wherein the preprocessing of the node information comprises:

extracting the content structure in the traversed evaluation option during each traversal;

acquiring a preset first track description vector corresponding to the content structure;

if the second vector similarity is smaller than or equal to a preset vector similarity threshold value, rejecting the corresponding customer evaluation record;

finishing preprocessing after the client evaluation records needing to be removed in the node information are all removed;

and/or the presence of a gas in the gas,

if the dwell time when the client selects the traversed evaluation option when filling in the evaluation questionnaire is less than or equal to the dwell time threshold, rejecting the corresponding client evaluation record;

and finishing preprocessing after the client evaluation records needing to be removed in the node information are all removed.

6. The method as claimed in claim 4, wherein the step of selecting a preferred expert node from the expert nodes based on the information characteristics of the plurality of nodes comprises:

constructing a first node description vector based on the plurality of node information characteristics;

matching the first node description vector with any of the second node description vectors;

7. An intelligent control system for computing resources in data integration operation is characterized by comprising:

the monitoring module is used for monitoring the service conditions of the system logs and the resident memory of the DEC when the data integration task is executed;

an execution module to execute the computing resource allocation policy;

the execution module further comprises:

acquiring node information of the expert node, wherein the node information comprises: in the latest preset first time, experts carry out manual remote computing resource control to obtain a plurality of client evaluation records;

and executing the computing resource control strategy.

8. The system of claim 7, wherein said determining module determines a computing resource allocation policy based on said system log and said resident memory usage, comprising:

determining whether an OOM event occurs based on the system log;

if yes, determining that the computing resource allocation strategy is as follows: if the DEC is in the P1 level, restarting the data integration task and increasing the memory resource allocation of the DEC; if the DEC is in a P2 level, restarting the data integration task;

9. The system as claimed in claim 8, wherein the P0 level is an active security level;

the P1 level is a failure retry guarantee level;

the P2 level is a regular level;

the P3 level is a low priority level.