CN111522843A - Control method, system, equipment and storage medium of data platform - Google Patents

Control method, system, equipment and storage medium of data platform Download PDF

Info

Publication number
CN111522843A
CN111522843A CN202010485281.XA CN202010485281A CN111522843A CN 111522843 A CN111522843 A CN 111522843A CN 202010485281 A CN202010485281 A CN 202010485281A CN 111522843 A CN111522843 A CN 111522843A
Authority
CN
China
Prior art keywords
data platform
data
platform
working
working group
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010485281.XA
Other languages
Chinese (zh)
Other versions
CN111522843B (en
Inventor
崔国祥
李飞
黄健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Chuangxin Journey Network Technology Co ltd
Original Assignee
Beijing Chuangxin Journey Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Chuangxin Journey Network Technology Co ltd filed Critical Beijing Chuangxin Journey Network Technology Co ltd
Priority to CN202010485281.XA priority Critical patent/CN111522843B/en
Publication of CN111522843A publication Critical patent/CN111522843A/en
Application granted granted Critical
Publication of CN111522843B publication Critical patent/CN111522843B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24532Query optimisation of parallel queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/252Integrating or interfacing systems involving database management systems between a Database Management System and a front-end application
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application provides a control method, a system, equipment and a storage medium of a data platform. The method of the present application comprises: receiving data platform use applications of a plurality of working groups; acquiring data platform records corresponding to a plurality of working groups; determining the priority of the plurality of working groups according to the data platform records of the plurality of working groups; the control data platform provides services for each work group according to the priority of the work groups from high to low. The data platform and the data processing method achieve reasonable utilization of the data platform and reduce resource waste of the data platform.

Description

Control method, system, equipment and storage medium of data platform
Technical Field
The present application relates to big data technologies, and in particular, to a method, a system, a device, and a storage medium for controlling a data platform.
Background
With the development of internet technology, more and more companies will use large data platforms to support the development of company business.
The big data platform provides powerful data processing capability, and a user can execute computing tasks through the big data platform or use stored data of the big data platform. The users can submit applications for use independently, or a plurality of users can submit applications for use according to the division of business departments or work groups in which the users are located, and the big data platform can provide services according to the received applications.
Since a work group includes a plurality of users, the usage habit of individual users may be poor, for example, some users often occupy the data platform for a long time without reason, so that other work groups cannot use the data platform.
Disclosure of Invention
The application provides a control method, a system, equipment and a storage medium of a data platform, so as to realize reasonable utilization of the data platform.
In a first aspect, the present application provides a method for controlling a data platform, including:
receiving data platform use applications of a plurality of working groups, wherein each working group comprises a plurality of users;
obtaining data platform records corresponding to the plurality of workgroups, the data platform records including: application information and corresponding use information submitted by a working group to the data platform;
determining the priorities of the plurality of working groups according to the data platform records of the plurality of working groups;
and controlling the data platform to provide the use services for the plurality of working groups according to the sequence of the priorities of the plurality of working groups from high to low.
Optionally, the determining the priorities of the plurality of work groups according to the data platform records of the plurality of work groups includes:
determining the platform utilization rate of each working group according to the data platform records of the plurality of working groups;
determining the priority of each working group according to the platform utilization rate of each working group; the higher the platform utilization, the higher the corresponding priority.
Optionally, the determining the platform utilization rate of each working group according to the data platform records of the plurality of working groups includes:
determining the platform utilization rate of each user according to the data platform record of each user in each working group;
and determining the platform utilization rate of each working group according to the platform utilization rates of a plurality of users in each working group.
Optionally, the data platform application is used for submitting a computing task;
the determining the platform utilization rate of each working group according to the data platform records of the plurality of working groups comprises:
and determining the platform utilization rate of each user according to the computing resources applied when each working group submits the computing tasks to the data platform in the data platform records and the computing resources used when the computing tasks are executed.
Optionally, the
The data platform application is used for submitting a storage task;
the determining the platform utilization rate of each working group according to the data platform records of the plurality of working groups comprises:
and determining the platform utilization rate of each user according to the number of times of the data assets stored in the data platform by each working group are called in the computing tasks of the data platform.
Optionally, the method further includes:
and outputting the resource utilization rate of each work group to a user interface to prompt a user.
Optionally, the method further includes:
and submitting the tasks of the target working group to a target resource queue corresponding to the target working group in the data platform according to the corresponding relation between the working group and the resource queue.
Optionally, the method further includes:
and determining the corresponding relation between the work group and the resource queue according to the type of the work group and the type of the resource.
In a second aspect, the present application provides a control system for a data platform, comprising: the account management subsystem and the data quality control subsystem;
the account management subsystem is used for receiving data platform use applications of a plurality of working groups, each working group comprises a plurality of users, and controlling the data platform to provide use services for the working groups according to the sequence of the priorities of the working groups from high to low;
the data quality control subsystem is configured to obtain data platform records corresponding to the plurality of workgroups, the data platform records including: application information and corresponding use information submitted by a working group to the data platform; and determining the priorities of the plurality of working groups according to the data platform records of the plurality of working groups.
Optionally, the account management subsystem is further configured to manage a work group and a corresponding relationship between the work group and the resource queue.
Optionally, the account management subsystem is further configured to determine, according to a correspondence between a work group and a resource queue, a target resource queue corresponding to the target work group;
and submitting the tasks of the target working group to the server cluster with the label corresponding to the target resource queue according to the corresponding relation between the label of the target resource queue and the label of the server cluster.
Optionally, the account management subsystem is further configured to synchronize the user information to a server cluster of the data platform.
Optionally, the system further includes: a data asset management subsystem;
and the data asset management subsystem is used for controlling the operation authority of the user on the data asset in the data platform according to the type of the user.
Optionally, the system further includes: a metadata database;
the metadata base is used for storing metadata information of the account management subsystem, the data asset management subsystem and the data quality control subsystem.
In a third aspect, the present application provides an electronic device comprising a memory and a processor;
the memory is connected with the processor;
the memory is used for storing a computer program;
the processor is adapted to carry out the method of any of the first aspects when the computer program is executed.
In a fourth aspect, the present application provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method according to any one of the first aspect.
The method comprises the steps of receiving data resource application requests of a plurality of users, determining the priority of data resource allocation of the users according to historical data of the data platforms used by the users, ensuring that the users with high resource utilization rate can obtain the resource allocation preferentially, avoiding the situation that the users with low resource utilization rate occupy the resources for a long time to cause that other users cannot use the resources, avoiding the resource waste of the data platforms and reducing the cost of the data platforms. In addition, the management system of the data platform realizes the uniform storage management of the metadata of the data platform, and realizes the uniform management authentication of the account through the account management subsystem; the unified authorization and safety protection of data resources are realized through the data asset management subsystem and the account management subsystem; service isolation of computing resources and data assets is achieved through resource queue management in the account management subsystem; meanwhile, the quality control of data resources is realized through the data quality control subsystem, the optimized calculation and storage resources are promoted, and the management system provides a uniform solution for managing data platform users and data resources.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to these drawings without inventive exercise.
FIG. 1 is a schematic diagram of a management system of a data platform provided in the present application;
fig. 2 is a schematic flowchart of a resource allocation method of a data platform according to the present application;
fig. 3 is a schematic structural diagram of an electronic device provided in the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The big data platform provides powerful data processing capability, and a user can execute computing tasks through the big data platform or use stored data of the big data platform. At present, open source components for constructing a data platform are various and are not unified in specification, a complete scheme for unified account management, data safety protection, storage and calculation resource isolation and data quality monitoring for solving the data platform is not available in the big data industry all the time, most of the data platforms utilize inherent characteristics of various open source components to realize the management function, and thus, a plurality of problems exist in the use of the data platform.
For example, data platform account management is implemented with operating system based users, where the user is not authenticated and user management and synchronization is confusing. Alternatively, for computing resource isolation, the current practice is to perform computing isolation through queues of some schedulers, in which a user can control a submitted resource queue, which causes the resource queue to be used in a confused manner, so that non-core computing tasks affect the execution of core computing tasks. In addition, there is no specific implementation scheme for quality control in the existing data platform, for example, a user may submit an application for use alone, or multiple users may submit applications for use according to their business departments or division of work groups, a big data platform may provide services according to received applications, and some users often occupy the data platform for a long time without reasons, which results in that other work groups cannot use the data platform, and resources are seriously wasted.
Based on the above problems, the control system of the data platform in the application can realize quality control of the data platform, and realize reasonable utilization of the data platform. In addition, the data platform can also realize the unified management of the users and the data resources of the data platform. By way of example, fig. 1 is a schematic diagram of a control system of a data platform provided in the present application. As shown in fig. 1, the control system of the data platform includes an account management subsystem, a data quality control subsystem, a data asset management subsystem, and a metadata base. It should be noted that the control system of the data platform provided by the present application may provide a unified management control function of the data platform as an integrated system, and meanwhile, each included subsystem may also be independently operated to implement a corresponding function of the subsystem, or a part of subsystems may be combined to implement a corresponding function of the part of subsystems. The following first describes portions of the control system of the data platform.
The account management subsystem can be used for carrying out account related services such as user management, workgroup management, role management, user synchronization, computing resource management, task scheduling, resource authorization and the like, and can also be used for managing the corresponding relation between work requirements and resource queues.
The users can be divided into individual users and tenants, optionally, the individual users can be used for development and testing, the tenants are different from the individual users and are formal accounts for online running tasks, and the tenants can be customized according to specific service teams. An administrator of the data platform can lead in personnel information and/or self-define a CRUD (Create-Retrieve-Update-Delete, CRUD for short) individual user through an individual user management module of the account management subsystem, the CRUD tenant is self-defined according to a specific service team through a tenant management module of the account management subsystem, the user led in or added by the administrator is taken as an authenticated user, and the account management subsystem can call a user synchronization interface in real time to synchronize the user information to a server cluster of the data platform in the authentication process of the user, so that unified management and authentication synchronization of the user are realized, and poor safety of the data platform caused by disordered user management is avoided.
The workgroup is a set of specific business team users on the data platform, and can be divided into a job-level workgroup and a project workgroup by way of example, each individual user belongs to one job-level workgroup, in addition, each workgroup only corresponds to one tenant, and the existing job-level information is imported through a job management module of the account management subsystem and/or the CRUD workgroup is customized according to specific projects.
The management of the user authority is usually based on roles, in order to avoid the situation that authorization aiming at the roles is disordered and difficult to manage, the roles in the application are defined according to users, and the roles are also divided into personal roles and tenant roles because the users are divided into personal users and tenants, and CRUD operation of the roles is carried out through a role management module of an account management subsystem.
The computing resource management is mainly to reasonably distribute an execution plan of tasks submitted to a data platform by a user, realize CRUD operation of a computing resource queue through computing resource module management of an account management subsystem, manage corresponding relations between work groups and the resource queue, in practical application, different work groups often have different use requirements on the data platform, for example, because different work groups have different work attributes and different data platform resources need to be used, the corresponding resource queues can be divided according to the types of the work groups, and the corresponding relations between the work groups and the resource queues are determined. The calculation resource queues are divided mainly according to the service lines, and a plurality of working groups can correspond to one resource queue, so that the service isolation of the calculation resources is realized.
The task scheduling interface of the account management subsystem is mainly a component for submitting a computing task, such as a peripheral scheduling system or a cluster client and the like provided for the data platform, when a user submits the computing task through the peripheral scheduling system or the cluster client, the account management subsystem judges a working group where the user is located, obtains a computing resource queue corresponding to the working group, and further realizes the isolation of the computing resource according to the service. In addition, labels can be added to the server clusters of the data platform according to different cluster types, different resource queues correspond to the server clusters with different labels, and therefore tasks of a target working group can be submitted to the server clusters with the labels corresponding to the target resource queues according to the corresponding relation between the target resource queues and the labels of the server clusters, and physical isolation is achieved.
In addition, the resource authorization interface of the account management subsystem is provided for the data asset management subsystem to carry out authorization management on the data assets.
The data asset management subsystem provides functions including attribution claiming, browsing and viewing, online/offline, forbidding/enabling, life cycle management, data desensitization, security level control, authority approval, authorization control and the like of core data, and can control the operation authority of a user on data assets in a data platform according to the type of the user. The users of the data asset management subsystem are based on the users of the account management system, for example, the roles of the users can be divided into data responsible persons, business responsible persons, data users and the operation permissions of the users with different roles on the data resources are different according to the different permissions.
For example, the data administrator is a user for processing and producing data, and can use the data asset management subsystem to perform operations such as online/offline, disabling/enabling, data life cycle setting, field desensitization, security level setting and the like on the data processed by the data administrator while being responsible for data accuracy. The service responsible person belongs all data assets under the service team to the name of the person, data isolation according to the service is achieved, and safe access of the data can be achieved through authentication control. The data using party can check browsing data and data details, but only can use data processed by the data using party and applied for permission, if other data is needed, a permission application flow of the data asset needs to be initiated in the data asset management subsystem, the data responsible person and the service responsible person exist in the examination and approval link, the data using party and the data asset management subsystem can use the data, and the data using party and the data asset management subsystem can call a resource authorization interface of the account management subsystem to authorize the data to a user initiating the application flow after the data using party and the data asset management subsystem pass the examination and approval finally. A user on the data asset management subsystem may have one or more user roles of both a data principal and a business principal and a data consumer.
In order to realize reasonable utilization of the data platform and avoid that users in part of the working groups occupy the data platform for a long time and other working groups cannot use the data platform, the data quality control subsystem acquires data platform records corresponding to the working groups and determines the priority of each working group on the data platform according to the data platform records. If the platform utilization rate in the data platform record corresponding to the working group is low, the quality score of the working group is low, and the priority is low, and if the platform utilization rate in the data platform record corresponding to the working group is high, the quality score of the working group is high, and the priority is high. Therefore, the account management subsystem can provide service allocation resources for each working group according to the priority of each working group in the working groups of the data platform requested to be used currently, and allocates resources for the working groups with high priority preferentially, so that the data platform is reasonably utilized, and the problem that the task execution of the working groups with high priority is influenced because the working groups with low priority occupy too many resources for a long time is avoided.
Specifically, the data quality control subsystem may determine the platform usage rate of each user according to the data platform record of each user in each working group, and further determine the platform usage rate of each working group according to the platform usage rates of a plurality of users in each working group, for example, the platform usage rates of the plurality of users in each working group are weighted and averaged to obtain the platform usage rate of each working group, that is, the data quality control subsystem may perform quality scoring on each user, and then integrate the quality scoring of all users in each working group to obtain the quality scoring of the working group. Or, the data quality control subsystem may also directly calculate the platform utilization rate of each working group according to the application information and the usage information included in the data platform records of all users in each working group.
The users of the data quality control subsystem are based on the users of the account management subsystem, but not all the users in the account management subsystem need to perform quality scoring, and only when the users have calculation tasks on the data platform or data assets serving as data responsible persons exist in the data asset management subsystem, the users need to perform quality evaluation according to the records of the data platform of the users.
Aiming at the evaluation of computing resource quality, the data quality control subsystem can acquire all computing tasks of a user on a data platform every day, analyze the resource use condition of each computing task, including total applied resources, used resources and wasted resources, store the data into a metadata base, perform dimensionality display on a system, and calculate the platform use rate of all computing tasks in the history of the user, namely a quality score, or a health score. That is, the data quality control subsystem may determine the resource usage for each computing task of the user and the overall resource usage for all historical tasks of the user. Meanwhile, the data quality control subsystem can also inform a data user of a calculation task which wastes resources seriously to perform task optimization by an email every day.
For the quality evaluation of the storage resources, the data quality control subsystem can calculate different heat values of the user data assets by acquiring all the data assets of the user as data responsible persons in the data asset management system every day and analyzing the frequency of calling of all the data assets of the user in the calculation task of the data platform, and the data are stored in a metadata base and displayed on the system. The higher the frequency of the data asset being called, the higher the popularity value, which also means that the storage resource occupied by the data asset is not wasted, so the quality score of the user is higher, i.e. the health score is higher. In addition, an overall quality score, or overall health score, of the user's historical data asset store may also be calculated. Meanwhile, the data quality control subsystem sends mails to the user for data assets with lower heat value of the data assets every day, and the life cycle of the data assets is recommended to be modified, so that the data assets with lower heat value are prevented from being stored for a long time. The data quality control subsystem continuously pushes users to optimize the storage period of computing resources and data assets, and the overall computing and storing total health score of the users is improved so as to achieve reasonable use of data platform resources.
And the data quality control subsystem determines the priority of the working group after determining the platform utilization rate of the working group according to the platform utilization rate of the user, wherein the higher the platform utilization rate of the working group is, the higher the priority is, and the lower the resource utilization rate is, the lower the priority is. Correspondingly, when the account management subsystem allocates resources to the working groups, the resources are preferentially allocated to the working groups with higher priorities, so that the execution of the computing tasks and the storage of data assets of the working groups with high resource utilization rate are limited, and the condition that the computing tasks or the data resources of the working groups with low resource utilization rate occupy too many resources for a long time to cause that other working groups cannot use is avoided.
The metadata information of the account management subsystem, the data asset management subsystem and the data quality control subsystem can be stored in a metadata database. Specifically, the metadata base may include user information, workgroup information, role information, computing resource queue information, data asset information, data quality information, and other various associated data information. The user information mainly comprises personal user and tenant information, and the workgroup information comprises job level workgroup and project workgroup information. The role information includes individual user role and tenant role information. The data asset information mainly comprises core database table resource information, and base table resource attribution and base table resource examination and approval authorization information. The data quality information storage mainly comprises storage quality grading information of data resources, data resource calculation quality grading information and the like.
The control system of the data platform realizes unified storage management of metadata of the data platform, and realizes unified management authentication of accounts through the account management subsystem; the unified authorization and safety protection of data resources are realized through the data asset management subsystem and the account management subsystem; service isolation of computing resources and data assets is achieved through resource queue management in the account management subsystem; meanwhile, the quality control of data resources is realized through the data quality control subsystem, the optimized calculation and storage resources are promoted, and the total cost of the data platform is reduced.
A method for controlling the data platform by the data quality control subsystem and the account management subsystem will be described based on the control system of the data platform. Fig. 2 is a schematic flowchart of a control method of a data platform according to the present application. As shown in fig. 2, the method includes:
s201, receiving data platform use applications of a plurality of working groups.
Wherein each workgroup comprises a plurality of users
In the actual use of the data platform, a plurality of working groups request to use the data platform, and a control system of the data platform provides corresponding services for each working group after receiving the requests. The data resources requested by the workgroup may be computing resources and/or storage resources. The application for use of the workgroup may be submitted by any user in the workgroup.
S202, acquiring data platform records corresponding to a plurality of working groups.
Wherein the data platform records include: and the application information and the corresponding use information are submitted to the data platform by the working group.
The data platform used by the working group may include executing a computing task through a computing resource of the data platform, or storing data using a storage resource of the data platform, and the data platform record of the working group includes application information when the working group submits a use application and a corresponding use condition, that is, a use condition of a resource applied in the data platform record.
And S203, determining the priorities of the plurality of working groups according to the data platform records of the plurality of working groups.
According to the use condition of the data resources in the data platform records of the workgroup, whether the workgroup has resource waste or not can be determined. For example, if a work group applies for more resources but the actually used resources are less, it is determined that the work group has a waste of resources. Alternatively, if the storage resource period requested by the workgroup is longer, but the data stored by the workgroup is never used or is used less, then the storage resource is wasted. The priority of the workgroup is determined by the resource usage in the historical data of the workgroup. Obviously, the priority of a workgroup is higher if the waste of resources of the workgroup is less, and conversely, the priority of a user is lower if the waste of resources of the workgroup is more. It can be understood that, since a workgroup includes a plurality of users, if more users in the workgroup are wasted in using the data platform, the priority of the workgroup is low. Conversely, if there are fewer or even no users in the workgroup who are wasting using the data platform, the workgroup is given a higher priority.
And S204, controlling the data platform to provide the use service for the plurality of working groups according to the sequence of the priorities of the plurality of working groups from high to low.
When a plurality of working groups apply for using the data platform, according to the priority order of each working group, the working group with high priority is preferentially provided with service. If the priority of the working group is lower, it means that the working group applies for the resource and then uses the applied resource less, that is, the resource utilization rate is lower, thereby causing the resource to be occupied and wasted without reason, while other working groups can not use. Therefore, in this embodiment, the work group with a high priority is preferentially satisfied, so that the data platform can be reasonably utilized, and the utilization rate of the data resources is higher.
In the resource allocation method provided in this embodiment, S201 and S204 may be steps implemented by the account management subsystem in the above embodiment, and S202 and S203 may be steps implemented by the data quality control subsystem. According to the record of the working group using the data platform, the priority of the working group is determined, the working group with high platform utilization rate can preferentially use the data platform, the problem that other working groups cannot use the data platform due to the fact that users with low platform utilization rate occupy resources for a long time is avoided, the resource waste of the data platform is avoided, and the cost of the data platform is reduced.
It should be noted that, in the above embodiment, the step of determining the priority of the working group in S202 and S203 may also be performed before S201, that is, the data quality control subsystem may determine the priority of each working group in advance according to the data platform record of each working group, so that when receiving the data platform use application, the account management subsystem may directly provide services for each working group according to the priority of each working group.
On the basis of the above embodiments, it is further explained how to determine the priority of the working group according to the data platform record. Specifically, the platform utilization rate of each working group is determined according to the data platform records of a plurality of working groups; determining the priority of each user according to the platform utilization rate; the higher the platform utilization, the higher the corresponding priority.
When the data platform use application of the work groups is used for submitting the computing task, aiming at the computing resource, the platform use rate of each work group is determined according to the computing resource applied when each work group submits the computing task to the data platform in the data platform record and the computing resource used when the computing task is executed.
Specifically, the platform utilization rate of each user in each work group may be calculated first, and then the platform utilization rate of each work group may be determined according to the platform utilization rates of the users in each work group. For example, the platform usage of multiple users in each workgroup may be weighted and averaged. The platform usage by each user is described below.
For example, when the user a submits the computing task a to the data platform, the user a applies for the memory 64G, and the memory used in the actual execution process of the task a in the record of the data platform of the user a is 20G, the platform utilization rate of the task a of the user a is 31.25%. When the user B submits the computing task b to the data platform, the memory 64G is applied, and the memory used in the actual execution process of the task b in the historical data of the user B is 50G, so that the platform utilization rate of the task b of the user is 78.125 percent. By summarizing and calculating all the calculation tasks in the data platform record of the user, for example, the resource utilization rate of each task can be weighted, so that the overall calculation resource utilization rate of the user can be determined, and a plurality of previous tasks with higher or lower resource utilization rates in the historical tasks of the user can also be determined.
When the data platform use application of the user is used for submitting a storage task, aiming at the storage resource, determining the platform use rate of each user according to the number of times of the data assets, stored in the data platform by each user, in the data platform record, being called in the calculation task of the data platform.
For example, the data file c is stored in the data platform a year ago, and in this year, the data file c is called only once, and obviously, the storage resource utilization rate occupied by the data file c of the user is low. In practical application, the platform utilization rate can be determined jointly according to the size and the period of the storage resources occupied by the data assets and the called frequency. And summarizing and calculating the platform utilization rates of all the data resources stored by the user, so that the platform utilization rate of the whole storage resources of the user can be determined.
For the users and the working groups with lower platform utilization rate, the priorities of the users and the working groups can be reduced, and the users can be reminded to optimize in other modes. For example, platform usage by the user and workgroup may be sent to the user by mail or output to a user interface to prompt the user. Reminding users with lower computing resource utilization rate to apply for less resources when submitting computing tasks; and reminding the user of shortening the life cycle of the data assets for the user with lower utilization rate of the storage resources. The system performs resource allocation according to the priority and reminds the user to perform resource optimization and adjustment by oneself, so as to achieve reasonable use of the data platform resources.
Furthermore, when the user submits the calculation task to the data platform through the work group, the management system of the data platform not only allocates the resource according to the priority, but also submits the task of the target work group to the target resource queue corresponding to the target work group in the data platform according to the corresponding relation between the work group and the resource queue.
Specifically, the calculation resource queues are divided mainly according to the service lines, and each work group where each user is located has a corresponding resource queue, for example, the types of resources required by calculation tasks submitted by users in the work groups of different service lines may be different, and the correspondence between the work groups and the resource queues may be determined according to the types of the work groups and the types of the resource queues, and the tasks of the target work groups are submitted to the target resource queues corresponding to the target work groups in the data platform, so that service isolation of calculation resources is realized, the problems that the resource queues are disordered in use, non-core tasks affect core tasks, and the like are avoided, and the resource utilization rate is improved.
In addition, labels can be added to the server clusters of the data platform according to different cluster types, different resource queues correspond to the server clusters with different labels, and therefore tasks of a target working group can be submitted to the server clusters with the labels corresponding to the target resource queues according to the corresponding relation between the target resource queues and the labels of the server clusters, and physical isolation of resources is achieved.
Fig. 3 is a schematic structural diagram of an electronic device provided in the present application. As shown in fig. 3, the electronic device 30 includes a memory 301 and a processor 302. Optionally, the memory 301 and the processor 302 are connected by a bus 303.
The memory 301 is used to store computer programs. The processor 302 is adapted to implement the method in the above-described method embodiments when the computer program is executed.
The present application also provides a computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method in the above-mentioned method embodiments.
Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The program may be stored in a computer-readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.
Finally, it should be noted that: the above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present application.

Claims (10)

1. A method for controlling a data platform, comprising:
receiving data platform use applications of a plurality of working groups, wherein each working group comprises a plurality of users;
obtaining data platform records corresponding to the plurality of workgroups, the data platform records including: application information and corresponding use information submitted by a working group to the data platform;
determining the priorities of the plurality of working groups according to the data platform records of the plurality of working groups;
and controlling the data platform to provide the use services for the plurality of working groups according to the sequence of the priorities of the plurality of working groups from high to low.
2. The method of claim 1, wherein said determining the priority of the plurality of workgroups from the data platform records of the plurality of workgroups comprises:
determining the platform utilization rate of each working group according to the data platform records of the plurality of working groups;
determining the priority of each working group according to the platform utilization rate of each working group; the higher the platform utilization, the higher the corresponding priority.
3. The method of claim 2, wherein determining platform usage for each workgroup from the data platform records for the plurality of workgroups comprises:
determining the platform utilization rate of each user according to the data platform record of each user in each working group;
and determining the platform utilization rate of each working group according to the platform utilization rates of a plurality of users in each working group.
4. The method of claim 2, wherein the data platform usage application is used to submit a computing task;
the determining the platform utilization rate of each working group according to the data platform records of the plurality of working groups comprises:
and determining the platform utilization rate of each user according to the computing resources applied when each working group submits the computing tasks to the data platform in the data platform records and the computing resources used when the computing tasks are executed.
5. The method of claim 2, wherein the data platform usage application is used to submit storage tasks;
the determining the platform utilization rate of each working group according to the data platform records of the plurality of working groups comprises:
and determining the platform utilization rate of each user according to the number of times of the data assets stored in the data platform by each working group are called in the computing tasks of the data platform.
6. The method according to claim 1 or 2, characterized in that the method further comprises:
determining a target resource queue corresponding to the target working group according to the corresponding relation between the working group and the resource queue;
and submitting the tasks of the target working group to the server cluster with the label corresponding to the target resource queue according to the corresponding relation between the label of the target resource queue and the label of the server cluster.
7. The method of claim 6, further comprising:
and determining the corresponding relation between the work group and the resource queue according to the type of the work group and the type of the resource.
8. A control system for a data platform, comprising: the account management subsystem and the data quality control subsystem;
the account management subsystem is used for receiving data platform use applications of a plurality of working groups, each working group comprises a plurality of users, and controlling the data platform to provide use services for the working groups according to the sequence of the priorities of the working groups from high to low;
the data quality control subsystem is configured to obtain data platform records corresponding to the plurality of workgroups, the data platform records including: application information and corresponding use information submitted by a working group to the data platform; and determining the priorities of the plurality of working groups according to the data platform records of the plurality of working groups.
9. An electronic device comprising a memory and a processor;
the memory is connected with the processor;
the memory is used for storing a computer program;
the processor is adapted to carry out the method of any one of claims 1-7 when the computer program is executed.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-7.
CN202010485281.XA 2020-06-01 2020-06-01 Control method, system, equipment and storage medium of data platform Active CN111522843B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010485281.XA CN111522843B (en) 2020-06-01 2020-06-01 Control method, system, equipment and storage medium of data platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010485281.XA CN111522843B (en) 2020-06-01 2020-06-01 Control method, system, equipment and storage medium of data platform

Publications (2)

Publication Number Publication Date
CN111522843A true CN111522843A (en) 2020-08-11
CN111522843B CN111522843B (en) 2023-06-27

Family

ID=71909284

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010485281.XA Active CN111522843B (en) 2020-06-01 2020-06-01 Control method, system, equipment and storage medium of data platform

Country Status (1)

Country Link
CN (1) CN111522843B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112256588A (en) * 2020-11-10 2021-01-22 广州掌动智能科技有限公司 Resource allocation method for application program test, computer readable storage medium and tester

Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH02171932A (en) * 1988-12-26 1990-07-03 Fujitsu Ltd Job priority setting system
US20070143290A1 (en) * 2005-01-04 2007-06-21 International Business Machines Corporation Priority Determination Apparatus, Service Processing Allocation Apparatus, Control Method and Program
US20090006541A1 (en) * 2005-12-28 2009-01-01 International Business Machines Corporation Load Distribution in Client Server System
US8522240B1 (en) * 2006-10-19 2013-08-27 United Services Automobile Association (Usaa) Systems and methods for collaborative task management
US20140040895A1 (en) * 2012-08-06 2014-02-06 Hon Hai Precision Industry Co., Ltd. Electronic device and method for allocating resources for virtual machines
JP2014109886A (en) * 2012-11-30 2014-06-12 Seiko Precision Inc Load distribution device, load distribution method and program
CN104507169A (en) * 2014-12-15 2015-04-08 东南大学 Three-dimensional resource dynamic allocation method and device for reducing system uplink transmission time delay
US9367354B1 (en) * 2011-12-05 2016-06-14 Amazon Technologies, Inc. Queued workload service in a multi tenant environment
CN106851747A (en) * 2015-12-03 2017-06-13 中兴通讯股份有限公司 Dynamic resource allocation method and device in a kind of GSM
CN106922028A (en) * 2015-12-28 2017-07-04 中兴通讯股份有限公司 A kind of method of discrimination and device of wireless telecommunication system User Priority
US9852011B1 (en) * 2009-06-26 2017-12-26 Turbonomic, Inc. Managing resources in virtualization systems
JP2019046163A (en) * 2017-09-01 2019-03-22 富士通株式会社 Information processing device, virtual machine monitoring program, and information processing system
US20190258631A1 (en) * 2016-09-26 2019-08-22 Splunk Inc. Query scheduling based on a query-resource allocation and resource availability
US20190317944A1 (en) * 2010-10-04 2019-10-17 Sempras Software, Inc. Methods and apparatus for integrated management of structured data from various sources and having various formats
EP3564844A1 (en) * 2014-09-17 2019-11-06 Huawei Technologies Co., Ltd. Data deduplication method and apparatus
CN110619002A (en) * 2019-09-12 2019-12-27 北京百度网讯科技有限公司 Data processing method, device and storage medium
CN111078404A (en) * 2019-12-09 2020-04-28 腾讯科技(深圳)有限公司 Computing resource determination method and device, electronic equipment and medium

Patent Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH02171932A (en) * 1988-12-26 1990-07-03 Fujitsu Ltd Job priority setting system
US20070143290A1 (en) * 2005-01-04 2007-06-21 International Business Machines Corporation Priority Determination Apparatus, Service Processing Allocation Apparatus, Control Method and Program
US20090006541A1 (en) * 2005-12-28 2009-01-01 International Business Machines Corporation Load Distribution in Client Server System
US8522240B1 (en) * 2006-10-19 2013-08-27 United Services Automobile Association (Usaa) Systems and methods for collaborative task management
US9852011B1 (en) * 2009-06-26 2017-12-26 Turbonomic, Inc. Managing resources in virtualization systems
US20190317944A1 (en) * 2010-10-04 2019-10-17 Sempras Software, Inc. Methods and apparatus for integrated management of structured data from various sources and having various formats
US9367354B1 (en) * 2011-12-05 2016-06-14 Amazon Technologies, Inc. Queued workload service in a multi tenant environment
US20140040895A1 (en) * 2012-08-06 2014-02-06 Hon Hai Precision Industry Co., Ltd. Electronic device and method for allocating resources for virtual machines
JP2014109886A (en) * 2012-11-30 2014-06-12 Seiko Precision Inc Load distribution device, load distribution method and program
EP3564844A1 (en) * 2014-09-17 2019-11-06 Huawei Technologies Co., Ltd. Data deduplication method and apparatus
CN104507169A (en) * 2014-12-15 2015-04-08 东南大学 Three-dimensional resource dynamic allocation method and device for reducing system uplink transmission time delay
CN106851747A (en) * 2015-12-03 2017-06-13 中兴通讯股份有限公司 Dynamic resource allocation method and device in a kind of GSM
CN106922028A (en) * 2015-12-28 2017-07-04 中兴通讯股份有限公司 A kind of method of discrimination and device of wireless telecommunication system User Priority
US20190258631A1 (en) * 2016-09-26 2019-08-22 Splunk Inc. Query scheduling based on a query-resource allocation and resource availability
JP2019046163A (en) * 2017-09-01 2019-03-22 富士通株式会社 Information processing device, virtual machine monitoring program, and information processing system
CN110619002A (en) * 2019-09-12 2019-12-27 北京百度网讯科技有限公司 Data processing method, device and storage medium
CN111078404A (en) * 2019-12-09 2020-04-28 腾讯科技(深圳)有限公司 Computing resource determination method and device, electronic equipment and medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
刘勇: "移动Ad_Hoc网络多信道MAC协议的研究", 中国优秀硕士毕业论文 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112256588A (en) * 2020-11-10 2021-01-22 广州掌动智能科技有限公司 Resource allocation method for application program test, computer readable storage medium and tester

Also Published As

Publication number Publication date
CN111522843B (en) 2023-06-27

Similar Documents

Publication Publication Date Title
US10678598B2 (en) Enforcing compute equity models in distributed blockchain
US10616132B2 (en) Managing user privileges for computer resources in a networked computing environment
Singh et al. Cloud resource provisioning: survey, status and future research directions
US10609032B2 (en) Enforcing compute equity models in distributed blockchain
US10528994B2 (en) Allocation of application licenses within cloud or infrastructure
US20210314152A1 (en) Deterministic verification of digital identity documents
US20180157861A1 (en) Automatic removal of global user security groups
WO2021159638A1 (en) Method, apparatus and device for scheduling cluster queue resources, and storage medium
CN109558989A (en) Queuing time prediction technique, device, equipment and computer readable storage medium
Zhao et al. SLA-based profit optimization resource scheduling for big data analytics-as-a-service platforms in cloud computing environments
EP2109820A1 (en) Time based permissioning
US20140006094A1 (en) Context-dependent transactional management for separation of duties
CN112380517B (en) Cloud platform management method and system based on unified biological information authentication
US20120323821A1 (en) Methods for billing for data storage in a tiered data storage system
CN111199379A (en) Examination and approval method, examination and approval device and storage medium of workflow engine
US20180322260A1 (en) License-based access control of computing resources
CN109492024A (en) Data processing method, device, computer equipment and storage medium
US8838799B2 (en) Command management in a networked computing environment
US20120005519A1 (en) System and method for providing collaborative master data processes
CN111522843B (en) Control method, system, equipment and storage medium of data platform
Tang et al. Pricing model for dynamic resource overbooking in edge computing
CN111079094B (en) Method, device, electronic equipment and storage medium for scheduling allowed resources
CN111833015A (en) Resource distribution method and device
US20210035115A1 (en) Method and system for provisioning software licenses
US20180097849A1 (en) Cognitive authentication with employee onboarding

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant