CN111522843B - Control method, system, equipment and storage medium of data platform - Google Patents

Control method, system, equipment and storage medium of data platform Download PDF

Info

Publication number
CN111522843B
CN111522843B CN202010485281.XA CN202010485281A CN111522843B CN 111522843 B CN111522843 B CN 111522843B CN 202010485281 A CN202010485281 A CN 202010485281A CN 111522843 B CN111522843 B CN 111522843B
Authority
CN
China
Prior art keywords
platform
data platform
data
working
groups
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010485281.XA
Other languages
Chinese (zh)
Other versions
CN111522843A (en
Inventor
崔国祥
李飞
黄健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Chuangxin Journey Network Technology Co ltd
Original Assignee
Beijing Chuangxin Journey Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Chuangxin Journey Network Technology Co ltd filed Critical Beijing Chuangxin Journey Network Technology Co ltd
Priority to CN202010485281.XA priority Critical patent/CN111522843B/en
Publication of CN111522843A publication Critical patent/CN111522843A/en
Application granted granted Critical
Publication of CN111522843B publication Critical patent/CN111522843B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24532Query optimisation of parallel queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/252Integrating or interfacing systems involving database management systems between a Database Management System and a front-end application
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application provides a control method, a control system, a control device and a storage medium of a data platform. The method comprises the following steps: receiving data platform use applications of a plurality of working groups; acquiring data platform records corresponding to a plurality of working groups; determining priorities of the plurality of work groups according to the data platform records of the plurality of work groups; the control data platform provides services for each work group in the order of priority of the plurality of work groups from high to low. The method and the device realize reasonable utilization of the data platform and reduce resource waste of the data platform.

Description

Control method, system, equipment and storage medium of data platform
Technical Field
The present disclosure relates to big data technologies, and in particular, to a method, a system, an apparatus, and a storage medium for controlling a data platform.
Background
With the development of internet technology, more and more companies will use a large data platform to support the business development of the companies.
The big data platform provides powerful data processing capability through which a user can perform computing tasks or use the big data platform's stored data. The users can independently submit applications for use, or a plurality of users submit applications for use according to the division of business departments or work groups where the users are located, and the big data platform can provide services according to the received applications.
Because a workgroup may include multiple users, individual users may have poor usage habits, for example, some users often occupy the data platform for a long period of time without any reason, resulting in other workgroups not being able to use.
Disclosure of Invention
The application provides a control method, a control system, a control device and a storage medium for a data platform so as to realize reasonable utilization of the data platform.
In a first aspect, the present application provides a control method for a data platform, including:
receiving a data platform use application of a plurality of work groups, wherein each work group comprises a plurality of users;
acquiring data platform records corresponding to the plurality of working groups, wherein the data platform records comprise: application information submitted to the data platform by the working group and corresponding use information;
determining priorities of the plurality of work groups according to the data platform records of the plurality of work groups;
and controlling the data platform to provide service for the plurality of working groups according to the order of the priorities of the plurality of working groups from high to low.
Optionally, the determining the priorities of the plurality of working groups according to the data platform records of the plurality of working groups includes:
determining the platform utilization rate of each working group according to the data platform records of the plurality of working groups;
determining the priority of each working group according to the platform utilization rate of each working group; the higher the platform usage, the higher the corresponding priority.
Optionally, the determining the platform usage rate of each working group according to the data platform records of the plurality of working groups includes:
determining the platform utilization rate of each user according to the data platform record of each user in each working group;
the platform usage of each workgroup is determined based on the platform usage of the plurality of users in each workgroup.
Optionally, the data platform uses an application for submitting a computing task;
the determining the platform usage rate of each work group according to the data platform records of the plurality of work groups comprises the following steps:
and determining the platform utilization rate of each user according to the computing resources applied when each working group submits the computing task to the data platform in the data platform record and the computing resources used when the computing task is executed.
Optionally, the
The data platform uses the application for submitting the storage task;
the determining the platform usage rate of each work group according to the data platform records of the plurality of work groups comprises the following steps:
and determining the platform utilization rate of each user according to the number of times each working group stores data assets in the data platform and is called in the computing task of the data platform.
Optionally, the method further comprises:
the resource utilization of each work group is output to a user interface to prompt the user.
Optionally, the method further comprises:
and submitting the task of the target working group to a target resource queue corresponding to the target working group in the data platform according to the corresponding relation between the working group and the resource queue.
Optionally, the method further comprises:
and determining the corresponding relation between the work group and the resource queue according to the type of the work group and the type of the resource.
In a second aspect, the present application provides a control system for a data platform, including: an account management subsystem and a data quality control subsystem;
the account management subsystem is used for receiving application of data platform use of a plurality of working groups, each working group comprises a plurality of users, and controlling the data platform to provide use services for the plurality of working groups according to the order of priority of the plurality of working groups from high to low;
the data quality control subsystem is configured to obtain data platform records corresponding to the plurality of work groups, where the data platform records include: application information submitted to the data platform by the working group and corresponding use information; and determining the priority of the plurality of work groups according to the data platform records of the plurality of work groups.
Optionally, the account management subsystem is further configured to manage a work group and a correspondence between the work group and the resource queue.
Optionally, the account management subsystem is further configured to determine a target resource queue corresponding to the target work group according to a correspondence between the work group and the resource queue;
and submitting the task of the target work group to the server cluster with the label corresponding to the target resource queue according to the corresponding relation between the target resource queue and the label of the server cluster.
Optionally, the account management subsystem is further configured to synchronize user information to a server cluster of the data platform.
Optionally, the system further comprises: a data asset management subsystem;
the data asset management subsystem is used for controlling the operation authority of the user on the data assets in the data platform according to the type of the user.
Optionally, the system further comprises: a metadata base;
the metadata database is used for storing metadata information of the account management subsystem, the data asset management subsystem and the data quality control subsystem.
In a third aspect, the present application provides an electronic device comprising a memory and a processor;
the memory is connected with the processor;
the memory is used for storing a computer program;
the processor is configured to implement the method according to any of the first aspects when the computer program is executed.
In a fourth aspect, the present application provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method according to any of the first aspects.
The method comprises the steps of receiving data resource application requests of a plurality of users, determining the priority of data resource allocation of the users according to historical data of the users using the data platform, ensuring that the users with high resource utilization rate can obtain the resource allocation preferentially, avoiding the situation that other users cannot use the resources due to long-term occupation of the resources by the users with low resource utilization rate, avoiding resource waste of the data platform and reducing the cost of the data platform. In addition, the management system of the data platform realizes unified storage management of metadata of the data platform, and realizes unified management authentication of accounts through the account management subsystem; the unified authorization and the safety protection of the data resources are realized through the data asset management subsystem and the account management subsystem; business isolation of computing resources and data assets is achieved through resource queue management in an account management subsystem; meanwhile, the quality control of the data resources is realized through the data quality control subsystem, the optimized calculation and storage resources are promoted, and the management system provides a unified solution for managing the data platform users and the data resources.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, a brief description will be given below of the drawings that are needed in the embodiments or the prior art descriptions, and it is obvious that the drawings in the following description are some embodiments of the present application, and that other drawings can be obtained according to these drawings without inventive effort to a person skilled in the art.
FIG. 1 is a schematic diagram of a management system for a data platform provided in the present application;
fig. 2 is a flow chart of a resource allocation method of a data platform provided in the present application;
fig. 3 is a schematic structural diagram of an electronic device provided in the present application.
Detailed Description
For the purposes of making the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.
The big data platform provides powerful data processing capability through which a user can perform computing tasks or use the big data platform's stored data. Currently, open source components for constructing a data platform are numerous and non-uniform, a complete scheme for uniformly solving account management, data security protection, storage and computing resource isolation and data quality monitoring of the data platform is not available in the big data industry all the time, and most of the data platforms realize the management functions by utilizing the inherent characteristics of various open source components, so that a plurality of problems are caused in the use of the data platform.
For example, data platform account management is implemented with operating system based users, where the user is not authenticated and user management and synchronization is confusing. Alternatively, for computing resource isolation, the current practice is to perform computing isolation through queues of some schedulers, where a user may control submitted resource queues, resulting in confusion in use of the resource queues, so that non-core computing tasks affect execution of core computing tasks. In addition, the existing data platform has no specific implementation scheme for quality control, for example, a user can independently submit applications for use, or a plurality of users can submit applications for use according to the division of the business department or the work group where the user is located, the large data platform can provide services according to the received applications, some users often occupy the data platform for a long time without any reason, other work groups cannot use the data platform, and resources are seriously wasted.
Based on the problems, the control system of the data platform can realize quality control of the data platform and reasonable utilization of the data platform. In addition, the data platform can realize unified management of users and data resources of the data platform. Fig. 1 is a schematic diagram of a control system of a data platform provided in the present application. As shown in fig. 1, the control system of the data platform includes an account management subsystem, a data quality control subsystem, a data asset management subsystem, and a metadata database. It should be noted that, the control system of the data platform provided in the present application may provide a unified management control function of the data platform as a whole system, and meanwhile, each subsystem included therein may also independently operate to implement a corresponding function of the subsystem, or a part of subsystems may be combined to implement a corresponding function of the part of subsystems. The following first describes the various parts of the control system of the data platform.
The account management subsystem can be used for performing account related services such as user management, work group management, role management, user synchronization, computing resource management, task scheduling, resource authorization and the like, and can also be used for managing the corresponding relation between work needs and resource queues.
The users can be divided into individual users and tenants, and optionally, the individual users can be used for development test, the tenants are different from formal accounts in which the individual users are online running tasks, and the tenants can be customized according to specific business teams. Personnel information and/or user-defined add-Update-Delete (CRUD) personal users can be imported by an administrator of the data platform through a personal user management module of the account management subsystem, CRUD tenants are customized according to specific business teams through a tenant management module of the account management subsystem, the users imported or added by the administrator are authenticated users, and the account management subsystem can call a user synchronization interface in real time to synchronize the user information to a server cluster of the data platform in the user authentication process, so that unified management and authentication synchronization of the users are realized, and poor security of the data platform caused by user management confusion is avoided.
The work groups are a set of specific business team users on the data platform, and can be divided into job level work groups and project work groups, wherein each individual user belongs to one job level work group, in addition, each work group uniquely corresponds to one tenant, and the existing job level information is imported through a work management module of the account management subsystem and/or the CRUD work groups are customized according to specific projects.
The management of the user authority is generally based on the roles, and in order to avoid the confusion of the authorization of the roles, the roles are defined according to the users, and the users are divided into personal users and tenants, so that the roles are also divided into personal roles and tenant roles, and CRUD operation of the roles is performed through a role management module of an account management subsystem.
The computing resource management is mainly to reasonably allocate execution plans of tasks submitted to the data platform by users, CRUD operation of computing resource queues is realized through computing resource module management of an account management subsystem, corresponding relations between work groups and the resource queues are managed, in practical application, the use requirements of different work groups on the data platform are often different, for example, because the work attributes of the different work groups are different, the data platform resources required to be used are also different, the corresponding resource queues can be divided according to the types of the work groups, and the corresponding relations between the work groups and the resource queues are determined. The division of the computing resource queues is mainly performed according to service lines, and a plurality of working groups can correspond to one resource queue, so that service isolation of computing resources is realized.
The task scheduling interface of the account management subsystem is mainly a component provided for a peripheral scheduling system or a cluster client of a data platform and the like to submit a calculation task, and when a user submits the calculation task through the peripheral scheduling system or the cluster client, the account management subsystem judges a work group where the user is located, acquires a calculation resource queue corresponding to the work group, and further realizes the business isolation of the calculation resources. In addition, for the server clusters of the data platform, labels can be added for the server clusters according to different cluster types, and different resource queues correspond to the server clusters with different labels, so that tasks of a target work group can be submitted to the server clusters with the labels corresponding to the target resource queues according to the corresponding relation between the target resource queues and the labels of the server clusters, and physical isolation is realized.
In addition, the resource authorization interface of the account management subsystem provides the data asset management subsystem with authorization management of the data asset.
The data asset management subsystem provides functions including home claim, browse view, on-line/off-line, disable/enable, life cycle management, data desensitization, security level control, authority approval and authority management of core data, and can control the operation authority of a user on the data asset in the data platform according to the type of the user. The users of the data asset management subsystem are based on the users of the account management system, and the roles of the users can be divided into data responsibility, business responsibility, data use parties according to different authorities, and the users with different roles have different operating authorities on the data resources.
For example, the data responsible person is a user who processes production data, and can use the data asset management subsystem to perform operations such as online/offline, disabling/enabling, data lifecycle setting, field desensitization, security level setting and the like on data processed by the data responsible person while being responsible for data accuracy. The business responsible person attributes all data assets under the own business team to own names, so that data can be isolated according to business, and safety access of the data can be realized through authentication management and control. The data user can check browsing data and data details, but only can use the data which is processed and applied for the rights, if other data is needed to be used, the rights application flow of the data asset needs to be initiated in the data asset management subsystem, the approval link comprises a data responsible person and a business responsible person, and the approval passing party can use the data, wherein after the final approval passes, the data asset management subsystem can call a resource authorization interface of the account management subsystem to authorize the data to a user initiating the application flow. A user on the data asset management subsystem may have one or more user roles of both the data responsible person and the business responsible person and the data consumer.
In order to realize reasonable utilization of the data platform, the data quality control subsystem acquires the data platform records corresponding to the working groups and determines the priority of each working group on the data platform according to the data platform records, so that the situation that the data platform is occupied by users in part of the working groups for a long time and cannot be used by other working groups is avoided. If the platform utilization rate in the data platform record corresponding to the working group is lower, the quality score of the working group is lower, the priority is also lower, and if the platform utilization rate in the data platform record corresponding to the working group is higher, the quality score of the working group is higher, and the priority is also higher. Therefore, the account management subsystem can provide service allocation resources for each working group according to the priority of each working group in the working groups of the current request using data platform, and preferentially allocates resources for the working group with high priority, so that reasonable utilization of the data platform is realized, and task execution of the working group with high priority is prevented from being influenced due to the fact that the working group with low priority occupies excessive resources for a long time.
Specifically, the data quality control subsystem may determine the platform usage rate of each user according to the data platform record of each user in each work group, and further determine the platform usage rate of each work group according to the platform usage rates of a plurality of users in each work group, for example, perform weighted average processing on the platform usage rates of a plurality of users in each work group to obtain the platform usage rate of each work group, that is, the data quality control subsystem may perform quality scoring on each user, and then synthesize the quality scores of all users in each work group to obtain the quality score of the work group. Or, the data quality control subsystem may directly calculate the platform usage rate of each working group according to the application information and the usage information included in the data platform records of all the users in each working group.
The users of the data quality control subsystem are also based on the users of the account management subsystem, but not all users in the account management subsystem need to be subjected to quality scoring, and only when the users have calculation tasks on a data platform or data assets serving as data charge persons in the data asset management subsystem, the users need to be subjected to quality assessment according to the data platform records of the users.
For computing resource quality assessment, the data quality control subsystem may obtain all computing tasks of the user on the data platform daily, analyze the resource usage of each computing task, including the total resources applied, the resources used and the resources wasted, store these data in the metadata base, perform dimension-wise display on the system, and calculate the platform usage, i.e., the quality score, or the health score, of all computing tasks historically by the user. That is, the data quality control subsystem may determine the resource usage of each computing task of the user and the overall resource usage of all historical tasks of the user. Meanwhile, the data quality control subsystem can also notify a data user of a computing task with serious resource waste to perform task optimization through mail every day.
For storage resource quality evaluation, the data quality control subsystem can calculate different heat values of the user data assets by acquiring all data assets of the data asset management system, which are data responsible persons, of the user every day and analyzing the frequency of calling all the data assets of the user in the calculation task of the data platform, and store the data into the metadata base for display on the system. The higher the frequency with which a data asset is called, the higher the hotness value, which also means that the storage resources occupied by the data asset are not wasted, and therefore the quality score of the user is higher, i.e., the health score is higher. In addition, the overall quality score, otherwise known as overall health score, of the user history data asset store may also be calculated. Meanwhile, the data quality control subsystem sends mails to the user for data assets with low heat values every day to recommend to modify the life cycle of the data assets, so that the data assets with low heat values are prevented from being stored for a long time. The data quality control subsystem continuously pushes the user to optimize the storage period of the computing resources and the data assets, and the overall computing and total health score storage of the user is improved so as to achieve reasonable use of the data platform resources.
After the data quality control subsystem determines the platform utilization rate of the working group according to the platform utilization rate of the user, the priority of the working group is determined, the higher the platform utilization rate of the working group is, the higher the priority is, and the lower the resource utilization rate is, the lower the priority is. Accordingly, when the account management subsystem allocates resources for the working groups, the account management subsystem preferentially allocates resources for the working groups with higher priority, so that the execution of calculation tasks and the storage of data assets of the working groups with high resource utilization rate are guaranteed to a limited extent, and the problem that the other working groups cannot be used due to the fact that the calculation tasks or the data resources of the working groups with low resource utilization rate occupy excessive resources for a long time is avoided.
Metadata information for the account management subsystem, the data asset management subsystem, and the data quality control subsystem may all be stored in a metadata database. In particular, the metadata repository may include user information, workgroup information, role information, computing resource queue information, data asset information, data quality information, and various other associated data information. The user information mainly comprises personal user and tenant information, and the work group information comprises job level work groups and project work group information. The persona information includes personal user persona and tenant persona information. The data asset information mainly comprises core database table resource information, database table resource attribution and database table resource approval authorization information. The data quality information store mainly includes storage quality score information of the data resources, data resource calculation quality score information, and the like.
The control system of the data platform realizes unified storage management of metadata of the data platform, and realizes unified management authentication of accounts through the account management subsystem; the unified authorization and the safety protection of the data resources are realized through the data asset management subsystem and the account management subsystem; business isolation of computing resources and data assets is achieved through resource queue management in an account management subsystem; meanwhile, the quality control of the data resources is realized through the data quality control subsystem, so that the optimization calculation and storage resources are promoted, and the total cost of the data platform is reduced.
The method for controlling the data platform through the data quality control subsystem and the account management subsystem is described based on the control system of the data platform. Fig. 2 is a flow chart of a control method of a data platform provided in the present application. As shown in fig. 2, the method includes:
s201, receiving data platform use applications of a plurality of working groups.
Wherein each workgroup comprises a plurality of users
In practical use of the data platform, a plurality of working groups request to use the data platform, and a control system of the data platform provides corresponding services for each working group after receiving the request. The data resources requested by the workgroup may be computing resources and/or storage resources. The application for use of the workgroup may be submitted by any user in the workgroup.
S202, acquiring data platform records corresponding to a plurality of working groups.
Wherein the data platform record comprises: application information submitted to the data platform by the working group and corresponding use information.
The data platform used by the working group can comprise executing a computing task through computing resources of the data platform or storing data by using storage resources of the data platform, and the data platform record of the working group comprises application information and corresponding use conditions when the working group submits a use application, namely the use conditions of the resources applied in the data platform record.
S203, determining the priority of the plurality of working groups according to the data platform records of the plurality of working groups.
According to the use condition of the data resources in the data platform records of the working group, whether the working group has resource waste or not can be determined. For example, if a workgroup applies for more resources, but the resources actually used in the end are fewer, it is determined that there is a resource waste for the workgroup. Alternatively, if the period of the storage resource applied by the working group is longer, but the data stored by the working group is never used or is less used, this means that the storage resource is wasted. The priority of the workgroup is determined by the resource usage in the historical data of the workgroup. Obviously, the lower the resource waste of the work group, the higher the priority of the work group, and conversely, the lower the priority of the user, the more the resource waste of the work group. It will be appreciated that since a workgroup includes a plurality of users, if there are more users in the workgroup that have waste in using the data platform, this will result in a lower priority for the workgroup. Conversely, if fewer or even no users in the workgroup are wasted using the data platform, the workgroup is given a higher priority.
S204, the control data platform provides use services for the plurality of working groups according to the order of the priorities of the plurality of working groups from high to low.
When a plurality of working groups apply for using the data platform, the working groups with high priority are preferentially provided with services according to the priority order of each working group. If the priority of the working group is lower, the working group applies for the resources and then uses the applied resources less, namely the resource utilization rate is lower, so that the resources are occupied and wasted without any reason, and other working groups cannot use the resources. Therefore, in this embodiment, the work group with high priority is preferentially satisfied, so that the data platform can be reasonably utilized, and the utilization rate of the data resource is higher.
In the resource allocation method provided in this embodiment, S201 and S204 may be steps implemented by the account management subsystem in the foregoing embodiment, and S202 and S203 may be steps implemented by the data quality control subsystem. According to the record that the working group uses the data platform, the priority of the working group is determined, so that the working group with high platform utilization rate can preferentially use the data platform, the problem that other working groups cannot use due to long-term occupation of resources by users with low platform utilization rate is avoided, resource waste of the data platform is avoided, and the cost of the data platform is reduced.
It should be noted that, in the above embodiment, the step of determining the priority of the working group in S202 and S203 may also be performed before S201, that is, the data quality control subsystem may determine the priority of each working group in advance according to the data platform record of each working group, so that when the account management subsystem receives the application for using the data platform, the account management subsystem may directly provide services for each working group according to the priority of each working group.
Based on the above embodiments, it is further explained how to determine the priority of the work group according to the data platform record. Specifically, determining the platform utilization rate of each working group according to the data platform records of a plurality of working groups; determining the priority of each user according to the platform utilization rate; the higher the platform usage, the higher the corresponding priority.
When the application of the data platform use of the working group is used for submitting the computing task, aiming at the computing resources, determining the platform use rate of each working group according to the computing resources applied when each working group submits the computing task to the data platform in the data platform record and the computing resources used when the computing task is executed.
Specifically, the platform utilization rate of each user in each work group can be calculated first, and then the platform utilization rate of each work group can be further determined according to the platform utilization rates of a plurality of users in each work group. For example, a weighted average process may be performed on platform usage for a plurality of users in each work group. The platform usage rate for each user is described below.
For example, when the user a submits the computing task a to the data platform, the memory 64G is applied, and the memory used in the actual execution process of the task a in the user a data platform record is 20G, so that the platform utilization rate of the task a of the user a is 31.25%. And when the user B submits the calculation task b to the data platform, the memory 64G is applied, and the memory used in the actual execution process of the task b in the history data of the user B is 50G, so that the platform utilization rate of the task b of the user is 78.125%. By summarizing all the calculation tasks in the user's data platform record, for example, the resource utilization rate of each task can be weighted, so that the overall calculation resource utilization rate of the user can be determined, and the first plurality of tasks with higher or lower resource utilization rates in the user's historical tasks can be determined.
When a user's data platform use application is used for submitting a storage task, determining the platform use rate of each user according to the number of times that the data asset stored in the data platform by each user in the data platform record is called in the calculation task of the data platform aiming at the storage resource.
By way of example, the user has stored a data file c in the data platform one year ago, after which the data file c was called only once a year, and it is apparent that the storage resource utilization occupied by the user's data file c is low. In practical applications, platform utilization may be determined jointly based on the size and period of storage resources occupied by the data asset, and the frequency of invocations. And summarizing and calculating the platform utilization rate of all the stored data resources of the user, so that the platform utilization rate of the whole storage resource of the user can be determined.
For users and working groups with lower platform utilization rate, the priority of the users and the working groups is reduced, and the users can be reminded to optimize in other modes. For example, platform usage rates for users and work groups are sent to users by mail, or platform usage rates are output to a user interface to prompt users. For users with low computing resource utilization rate, reminding the users to apply for less resources when submitting computing tasks; for users with low storage resource utilization, the users are reminded to shorten the life cycle of the data asset. The reasonable use of the data platform resources is achieved by the system distributing the resources according to the priority and reminding the user to perform resource optimization adjustment by self.
Furthermore, when the user submits the calculation task to the data platform through the work group, the management system of the data platform not only allocates the resources according to the priority, but also submits the task of the target work group to the target resource queue corresponding to the target work group in the data platform according to the corresponding relation between the work group and the resource queue.
Specifically, the division of the computing resource queues is mainly performed according to the service lines, each user is provided with a corresponding resource queue in the working group, for example, the types of resources required by computing tasks submitted by users in the working groups of different service lines may be different, the corresponding relation between the working group and the resource queues can be determined according to the types of the working groups and the types of the resource queues, and the tasks of the target working group are submitted to the target resource queues corresponding to the target working groups in the data platform, so that the service isolation of computing resources is realized, the problems that the use of the resource queues is disordered, the core tasks are influenced by non-core tasks and the like are avoided, and the resource utilization rate is improved.
In addition, for the server clusters of the data platform, labels can be added for the server clusters according to different cluster types, and different resource queues correspond to the server clusters with different labels, so that tasks of a target work group can be submitted to the server clusters with the labels corresponding to the target resource queues according to the corresponding relation between the target resource queues and the labels of the server clusters, and physical isolation of resources is realized.
Fig. 3 is a schematic structural diagram of an electronic device provided in the present application. As shown in fig. 3, the electronic device 30 includes a memory 301 and a processor 302. Optionally, the memory 301 and the processor 302 are connected by a bus 303.
The memory 301 is used to store a computer program. The processor 302 is configured to implement the method of the above-described method embodiments when the computer program is executed.
The present application also provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method as in the method embodiments described above.
Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the method embodiments described above may be performed by hardware associated with program instructions. The foregoing program may be stored in a computer readable storage medium. The program, when executed, performs steps including the method embodiments described above; and the aforementioned storage medium includes: various media that can store program code, such as ROM, RAM, magnetic or optical disks.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the corresponding technical solutions from the scope of the technical solutions of the embodiments of the present application.

Claims (8)

1. A method for controlling a data platform, comprising:
receiving a data platform use application of a plurality of work groups, wherein each work group comprises a plurality of users;
acquiring data platform records corresponding to the plurality of working groups, wherein the data platform records comprise: application information submitted to the data platform by the working group and corresponding use information;
determining the platform utilization rate of each working group according to the data platform records of the plurality of working groups;
determining the priority of each working group according to the platform utilization rate of each working group; the higher the platform utilization rate is, the higher the corresponding priority is; the platform utilization rate of the work group is sent to a user through mail, or the platform utilization rate is output to a user interface to prompt the user;
controlling the data platform to provide service for the plurality of working groups according to the order of the priority of the plurality of working groups from high to low;
determining a target resource queue corresponding to the target work group according to the corresponding relation between the work group and the resource queue;
and submitting the task of the target work group to the server cluster with the label corresponding to the target resource queue according to the corresponding relation between the target resource queue and the label of the server cluster.
2. The method of claim 1, wherein said determining platform usage for each workgroup from the data platform records for the plurality of workgroups comprises:
determining the platform utilization rate of each user according to the data platform record of each user in each working group;
the platform usage of each workgroup is determined based on the platform usage of the plurality of users in each workgroup.
3. The method of claim 1, wherein the data platform uses an application for submitting a computing task;
the determining the platform usage rate of each work group according to the data platform records of the plurality of work groups comprises the following steps:
and determining the platform utilization rate of each user according to the computing resources applied when each working group submits the computing task to the data platform in the data platform record and the computing resources used when the computing task is executed.
4. The method of claim 1, wherein the data platform uses an application for submitting a storage task;
the determining the platform usage rate of each work group according to the data platform records of the plurality of work groups comprises the following steps:
and determining the platform utilization rate of each user according to the number of times each working group stores data assets in the data platform and is called in the computing task of the data platform.
5. The method according to claim 1, wherein the method further comprises:
and determining the corresponding relation between the work group and the resource queue according to the type of the work group and the type of the resource.
6. A control system for a data platform, comprising: an account management subsystem and a data quality control subsystem;
the account management subsystem is used for receiving application of data platform use of a plurality of working groups, each working group comprises a plurality of users, and controlling the data platform to provide use services for the plurality of working groups according to the order of priority of the plurality of working groups from high to low;
the data quality control subsystem is configured to obtain data platform records corresponding to the plurality of work groups, where the data platform records include: application information submitted to the data platform by the working group and corresponding use information; determining the priority of the plurality of working groups according to the data platform records of the plurality of working groups; the determining the priority of the plurality of work groups according to the data platform records of the plurality of work groups comprises: determining the platform utilization rate of each working group according to the data platform records of the plurality of working groups; determining the priority of each working group according to the platform utilization rate of each working group; the higher the platform utilization rate is, the higher the corresponding priority is;
the management system of the data platform determines a target resource queue corresponding to the target work group according to the corresponding relation between the work group and the resource queue;
and submitting the task of the target work group to the server cluster with the label corresponding to the target resource queue according to the corresponding relation between the target resource queue and the label of the server cluster.
7. An electronic device comprising a memory and a processor;
the memory is connected with the processor;
the memory is used for storing a computer program;
the processor is configured to implement the method of any of claims 1-5 when the computer program is executed.
8. A computer readable storage medium, on which a computer program is stored, which computer program, when being executed by a processor, implements the method according to any of claims 1-5.
CN202010485281.XA 2020-06-01 2020-06-01 Control method, system, equipment and storage medium of data platform Active CN111522843B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010485281.XA CN111522843B (en) 2020-06-01 2020-06-01 Control method, system, equipment and storage medium of data platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010485281.XA CN111522843B (en) 2020-06-01 2020-06-01 Control method, system, equipment and storage medium of data platform

Publications (2)

Publication Number Publication Date
CN111522843A CN111522843A (en) 2020-08-11
CN111522843B true CN111522843B (en) 2023-06-27

Family

ID=71909284

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010485281.XA Active CN111522843B (en) 2020-06-01 2020-06-01 Control method, system, equipment and storage medium of data platform

Country Status (1)

Country Link
CN (1) CN111522843B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112256588A (en) * 2020-11-10 2021-01-22 广州掌动智能科技有限公司 Resource allocation method for application program test, computer readable storage medium and tester

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2014109886A (en) * 2012-11-30 2014-06-12 Seiko Precision Inc Load distribution device, load distribution method and program
US9367354B1 (en) * 2011-12-05 2016-06-14 Amazon Technologies, Inc. Queued workload service in a multi tenant environment
US9852011B1 (en) * 2009-06-26 2017-12-26 Turbonomic, Inc. Managing resources in virtualization systems
CN110619002A (en) * 2019-09-12 2019-12-27 北京百度网讯科技有限公司 Data processing method, device and storage medium

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2519792B2 (en) * 1988-12-26 1996-07-31 富士通株式会社 Job priority setting method
JP4121132B2 (en) * 2005-01-04 2008-07-23 インターナショナル・ビジネス・マシーンズ・コーポレーション Service processing allocation apparatus, control method, and program
CN101346696B (en) * 2005-12-28 2013-10-02 国际商业机器公司 Load distribution in client server system
US8522240B1 (en) * 2006-10-19 2013-08-27 United Services Automobile Association (Usaa) Systems and methods for collaborative task management
WO2012047906A2 (en) * 2010-10-04 2012-04-12 Sempras Software, Inc. Methods and apparatus for integrated management of structured data from various sources and having various formats
TW201407476A (en) * 2012-08-06 2014-02-16 Hon Hai Prec Ind Co Ltd System and method for allocating resource of virtual machine
CN104239518B (en) * 2014-09-17 2017-09-29 华为技术有限公司 Data de-duplication method and device
CN104507169B (en) * 2014-12-15 2017-12-22 东南大学 Reduce the three dimensional resource dynamic allocation method and device of system uplink propagation delay time
CN106851747B (en) * 2015-12-03 2022-04-22 中兴通讯股份有限公司 Dynamic resource allocation method and device in mobile communication system
CN106922028A (en) * 2015-12-28 2017-07-04 中兴通讯股份有限公司 A kind of method of discrimination and device of wireless telecommunication system User Priority
US10956415B2 (en) * 2016-09-26 2021-03-23 Splunk Inc. Generating a subquery for an external data system using a configuration file
JP6940761B2 (en) * 2017-09-01 2021-09-29 富士通株式会社 Information processing equipment, virtual machine monitoring programs, and information processing systems
CN111078404B (en) * 2019-12-09 2023-07-11 腾讯科技(深圳)有限公司 Computing resource determining method and device, electronic equipment and medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9852011B1 (en) * 2009-06-26 2017-12-26 Turbonomic, Inc. Managing resources in virtualization systems
US9367354B1 (en) * 2011-12-05 2016-06-14 Amazon Technologies, Inc. Queued workload service in a multi tenant environment
JP2014109886A (en) * 2012-11-30 2014-06-12 Seiko Precision Inc Load distribution device, load distribution method and program
CN110619002A (en) * 2019-09-12 2019-12-27 北京百度网讯科技有限公司 Data processing method, device and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
移动Ad_Hoc网络多信道MAC协议的研究;刘勇;中国优秀硕士毕业论文;全文 *

Also Published As

Publication number Publication date
CN111522843A (en) 2020-08-11

Similar Documents

Publication Publication Date Title
CN110163474A (en) A kind of method and apparatus of task distribution
CA2951401C (en) Rolling resource credits for scheduling of virtual computer resources
US9032406B2 (en) Cooperative batch scheduling in multitenancy system based on estimated execution time and generating a load distribution chart
US20140013440A1 (en) User license calculation in a subscription based licensing system
CN109074538A (en) Digital employee is created in the tissue
CN111199379A (en) Examination and approval method, examination and approval device and storage medium of workflow engine
US11956330B2 (en) Adaptive data fetching from network storage
US8463755B2 (en) System and method for providing collaborative master data processes
AU2017208245A1 (en) Adaptive resource allocation
US10810596B2 (en) Systems and methods for managing access to segments of payment networks
CN111522843B (en) Control method, system, equipment and storage medium of data platform
Muraña et al. Simulation and evaluation of multicriteria planning heuristics for demand response in datacenters
US11914732B2 (en) Systems and methods for hard deletion of data across systems
CN109978512A (en) The control method of project management system, electronic equipment, storage medium
TW202103076A (en) Resource transfer and allocation method and device Resource transfer and allocation method and device
US8832110B2 (en) Management of class of service
US8620895B1 (en) Mapping organizational accounting codes to access business information
Zheng et al. Addressing the challenges of government service provision with AI
CN111143328A (en) Agile business intelligent data construction method, system, equipment and storage medium
CN110147943B (en) Personnel management method and related device
CN115271553B (en) Contract management method and device based on big data, electronic equipment and storage medium
CA3120507A1 (en) Employee compensation manager
US20230342727A1 (en) Dynamic smart notification scheduling
KR20120036153A (en) Automatic household accounts system, method and recording medium
CN117311972A (en) Container-oriented budget management method and device, electronic equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant