CN116827939A

CN116827939A - Cluster resource management allocation method, device and storage medium based on big data

Info

Publication number: CN116827939A
Application number: CN202310787801.6A
Authority: CN
Inventors: 李飞; 李耀; 彭磊
Original assignee: Wuhan Zhongbang Bank Co Ltd
Current assignee: Wuhan Zhongbang Bank Co Ltd
Priority date: 2023-06-28
Filing date: 2023-06-28
Publication date: 2023-09-29

Abstract

The invention provides a cluster resource management and distribution method, equipment and a storage medium based on big data, wherein the resource management and distribution method comprises the following steps: step 1, establishing a set of perfect personnel, roles and authority relationships through an Apache Ranger providing interface, and authorizing users to access corresponding resources and data; and 2, judging whether the login user is authorized, if so, enabling the authorized user to legally access the authorized resources and data, and if not, enabling the unauthorized illegal user to thoroughly reject the login. The invention can allocate different big data cluster resource components according to different resource information required by different users so as to solve the problems of strong resource competition and uneven resource allocation of the existing big data platform.

Description

Cluster resource management allocation method, device and storage medium based on big data

Technical Field

The present invention relates to the field of big data, and in particular, to a method, an apparatus, and a storage medium for managing and allocating cluster resources based on big data.

Background

With the development of big data industry, the rapid development of big data business and the expansion of market, the big data cluster is more and more common, and in the prior art, the management and the distribution of big data storage media are usually controlled by manpower, so that the problems of high management difficulty, low resource utilization rate, easy occurrence of resource competition and the like exist. With the increasing demand for big data storage, the conventional cluster resource management manner cannot meet the service demand, and an efficient and intelligent cluster resource management allocation scheme is required.

Disclosure of Invention

Aiming at the technical problems in the prior art, the invention provides a cluster resource management and distribution method, equipment and a storage medium based on big data, so as to solve the problems of strong resource competition and uneven resource distribution of the existing big data platform.

According to a first aspect of the present invention, there is provided a cluster resource management allocation method based on big data, for an Apache Ranger, the allocation method including the steps of:

step 1, establishing a set of perfect personnel, roles and authority relationships through an Apache Ranger providing interface, and authorizing users to access corresponding resources and data;

and 2, judging whether the login user is authorized, if so, enabling the authorized user to legally access the authorized resources and data, and if not, enabling the unauthorized illegal user to thoroughly reject the login.

On the basis of the technical scheme, the invention can also make the following improvements.

Optionally, in step 1, the establishing a perfect set of personnel, roles and authority relationships through the Apache Ranger providing interface includes:

creating different users in the operating system, and synchronizing authority data of the operating system users/genus groups into a database of a Ranger through user synchronization; in the garden manager console, an existing HBase service is selected in the "service manager" page, and when the "policy list" page appears, a new policy is created or an existing policy is edited, in which the users or groups of users to be authorized, and the resources and operations they have access to, are specified, and policy changes are saved.

Optionally, in step 1, a temporary policy is created, and temporary authorization is performed on other users through the created temporary policy, and after the temporary authorized user completes related operations, the policies are deleted, so as to implement temporary authorization of the user.

Optionally, the step 2 includes the following steps:

confirming that the user is registered as a user or a user group in a garden manager, selecting a service and a policy type to be checked in a garden manager console, searching a policy conforming to resources and operations which the user wants to access, and if the policy authorizes the user or contains the user group, the user is authorized.

Optionally, the cluster resource management allocation method is applicable to Hadoop ecological components, including but not limited to HDFS, hive, HBase, yarn for fine-grained data access control.

According to a second aspect of the present invention, the present invention further provides a big data cluster-based word eye allocation method, applied to a YARN system, the method comprising the following steps:

step 1, submitting application programs to a system through different users;

step 2, the system allocates a first container for the application program, communicates with the corresponding node, and starts an application main program of the application program in the first container;

step 3, the application main program is firstly registered to a Resource manager, a user checks the running state of the application program through Resource management, then the Resource manager applies resources for each task and monitors the running state of each task until the running is finished, namely, the steps 4 to 7 are repeated;

step 4, the application main program applies for and retrieves resources from the resource manager through the RPC protocol in a polling mode;

step 5, once the application main program applies for the resource, the application main program communicates with the corresponding node manager to request the application main program to start the task;

step 6, after the node manager sets an operation environment for the task, writing a task starting command into a corresponding script, and starting the task by operating the script;

step 7, each task reports the state and progress of the task to the application main program through the RPC protocol so that the application main program can master the running state of each task at any time, and the task is restarted when the task fails;

and 8, after the running of the application program is completed, the application main program logs off the resource manager and closes the application main program.

Optionally, the running environment includes an environment variable, a JAR package, and a binary.

Optionally, during the running process of the application program, the user can query the current running state of the application program from the application main program through the RPC at any time.

According to a third aspect of the present invention, there is provided an electronic device, including a memory, and a processor, where the processor is configured to implement the above-mentioned method for managing and allocating resources based on big data lower clusters and/or the step of the method for allocating words based on big data clusters when executing a computer program stored in the memory.

According to a fourth aspect of the present invention, there is provided a computer readable storage medium having stored thereon a computer program, which when executed by a processor, implements the above-mentioned method for cluster resource management allocation under big data and/or the method for cluster word allocation under big data.

The invention has the technical effects and advantages that:

compared with the prior art, the invention provides a cluster resource management and allocation method, equipment and a storage medium based on big data, which can allocate different big data cluster resource components according to different resource information required by different users. Therefore, different users have different large data cluster resource components, and the aim of resource isolation can be achieved. So that the resource competition and the preemption problem among different users do not occur, thereby improving the request response rate of the big data platform to each user.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention may be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

Drawings

FIG. 1 is a flowchart of a first embodiment of a method for allocating large data cluster resources according to the present invention;

fig. 2 is a flowchart of a second embodiment of a method for allocating large data cluster resources according to the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Note that, in this embodiment, apache range: the Hadoop cluster authority framework is used for providing complex data authorities for operation, health and management, and a centralized management mechanism is provided for managing all data authorities of the Hadoop ecological ring based on yarn.

HDFS: the distributed file system based on the stream data access mode is designed and built on the basis of write-once and read-many, and provides high-throughput and high-fault-tolerance data access, so that the problem of mass data storage can be well solved.

HIVE: is a data warehouse tool based on Hadoop and is used for extracting, converting and loading data, and is a mechanism capable of storing, inquiring and analyzing large-scale data stored in the Hadoop. The hive data warehouse tool can map a structured data file into a database table, provide SQL query functions, and convert SQL sentences into MapReduce tasks for execution. Hive has the advantages that learning cost is low, rapid MapReduce statistics can be realized through SQL-like sentences, mapReduce is simpler, and a special MapReduce application program does not need to be developed. hive is well suited for statistical analysis of data warehouses.

MapReduce: is a programming model for parallel operation of large-scale data sets. The method is convenient for programmers to run own programs on the distributed system under the condition of not carrying out distributed parallel programming. The current software implementation is to specify a Map function to Map a set of key-value pairs into two new key-value pairs, and to specify a concurrent Reduce function to ensure that each of all mapped key-value pairs share the same key-set.

HBase: the system is a high-reliability, high-performance, column-oriented and telescopic distributed storage system, and a large-scale structured storage cluster can be built on a low-cost PC Server by utilizing the HBase technology.

YARN (Yet Another Resource Negotiator), another resource coordinator) is a new Hadoop resource manager, which is a universal resource management system, can provide unified resource management and scheduling for upper-layer applications, and the introduction of YARN (Yet Another Resource Negotiator) brings great benefits to clusters in aspects of utilization rate, unified resource management, data sharing and the like.

Example 1

It can be understood that, based on the defects in the background art, the embodiment of the invention provides a cluster resource management and allocation system based on big data, and fig. 1 shows a specific flow of the allocation method of big data cluster resources according to the invention, as shown in fig. 1, the invention adopts the following steps:

step 1, installing an Apache Ranger, and establishing a set of perfect personnel, roles and authority relations through an Apache Ranger providing interface;

it should be noted that, step 1 specifically includes: installing Apache Ranger, creating different Users in the operating system, and synchronizing authority data of the Users/Groups (Users/Groups) of the operating system into a database of the Ranger through user synchronization; in the garden manager console, an existing HBase service is selected in the "service manager" page, and when the "policy list" page appears. Creating a new strategy or editing an existing strategy, designating a user or a user group to be authorized in the strategy, accessing resources and operations, and saving strategy changes;

in the step 1, temporary policy creation is supported to realize temporary authorization of other users, and when the temporary authorized users complete related operations, the temporary policies are deleted, so that the temporary authorization of the users is realized conveniently and rapidly.

It should be noted that, step 2 specifically includes: confirming that the user is registered as a user or a user group in a garden manager, selecting a service and a policy type to be checked in a garden manager console, searching a policy conforming to resources and operations which the user wants to access, and if the policy authorizes the user or contains the user group, the user is authorized.

Specifically, in the embodiment of the invention, a complete set of relation of personnel, roles and authorities is established through an Apache Ranger providing interface; authorized users can access authorized resources and data legally, and unauthorized illegal users are thoroughly refused from the outside; thereby realizing fine-grained data access control on Hadoop ecological components such as HDFS, hive, HBase, yarn and the like.

Example two

And if the target key information matched with the current key information exists, distributing a first big data cluster resource component corresponding to the first required resource information for the tenant corresponding to the target key information. As a further improvement of the present invention, the first big data cluster resource component includes, but is not limited to HDFS, YARN, HIVE and HBase. Taking YARN as an example, FIG. 2 illustrates a flow of specific allocation of large data cluster resources in YARN according to the embodiments of the present invention. The embodiment of the invention also provides a large data cluster word allocation method, and in the example, the large data cluster word allocation method comprises the following steps:

step 1, submitting application programs to a system through different users;

step 2, the system allocates a first container for the application program and communicates with the corresponding node, and requires the application program to start an application main program of the application program in the container;

step 3, the application main program is firstly registered with the Resource manager, so that a user can directly check the running state of the application program through Resource management, then the user applies for resources for each task and monitors the running state of the application main program until the running is finished, namely, the steps 4 to 7 are repeated;

step 5, once the application main program applies for the resource, the application main program communicates with the corresponding node manager to request the corresponding node manager to start the task;

step 6, after the node manager sets an operation environment for the task, writing a task starting command into a script, and starting the task by operating the script;

and 7, reporting the state and progress of each task to the application main program through a certain RPC protocol, so that the application main program can master the running state of each task at any time, and the task can be restarted when the task fails.

Specifically, in step 6, the running environment includes environment variables, JAR packages, binary programs.

Specifically, in step 7, during the running process of the application program, the user may query the application main program for the current running state of the application program through the RPC at any time.

It should be noted that, for other details of the implementation technical scheme of each module in the terminal in the above embodiment, reference may be made to the description in the method for allocating large data cluster resources in the above embodiment, which is not repeated herein.

In summary, the method for managing and distributing cluster resources based on big data provided by the embodiment of the invention can distribute different big data cluster resource components according to different resource information required by different users. Therefore, different users have different large data cluster resource components, and the aim of resource isolation can be achieved. So that the resource competition and the preemption problem among different users do not occur, thereby improving the request response rate of the big data platform to each user.

Furthermore, an embodiment of the present invention provides an electronic device including a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the following steps when executing the computer program:

step 1, installing Apache Ranger, creating different Users in an operating system, and synchronizing authority data of Users/Groups (Users/Groups) of the operating system into a database of the Ranger through user synchronization; in a garden manager console, selecting an existing HBase service in a service manager page, creating a new strategy or editing an existing strategy when a strategy list page appears, designating a user or a user group to be authorized in the strategy, accessing resources and operations, and saving strategy changes;

and 2, confirming that the user is registered as a user or a user group in a garden manager, selecting a service and a strategy type to be checked in a control console of the garden manager, searching a strategy conforming to the resource and operation which the user wants to access, and if the strategy authorizes the user or contains the user group, authorizing the user.

The present embodiment provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of:

step 1, submitting application programs to a system through different users;

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the invention.

It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Finally, it should be noted that: the foregoing description is only illustrative of the preferred embodiments of the present invention, and although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that modifications may be made to the embodiments described, or equivalents may be substituted for elements thereof, and any modifications, equivalents, improvements or changes may be made without departing from the spirit and principles of the present invention.

Claims

1. The cluster resource management and distribution method based on big data is characterized by being used for Apache Ranger, and comprises the following steps:

2. The method for managing and distributing cluster resources based on big data according to claim 1, wherein in step 1, the step of establishing a complete set of personnel, roles and authority relationships through the Apache Ranger providing interface includes:

creating different users in the operating system, and synchronizing authority data of the operating system users/genus groups into a database of a Ranger through user synchronization; in the administrator console, selecting an existing HBase service in a "service manager" page; when the "policy list" page appears, a new policy is created or an existing policy is edited, the user or group of users to be authorized are specified in the policy, and the accessible resources and operations, and the policy change is saved.

3. The method for managing and distributing cluster resources based on big data according to claim 2, wherein in step 1, a temporary policy is created, other users are temporarily authorized by the created temporary policy, and when the temporarily authorized users complete related operations, the temporary policies are deleted to realize temporary authorization of the users.

4. The method for cluster resource management allocation based on big data according to claim 1, wherein in step 2, the determining whether the logged-in user is authorized comprises:

confirming that the user is registered as the user or the user group in the garden manager, selecting the service and the strategy type to be checked in the control console of the garden manager, searching the strategy which accords with the resource and the operation which the user wants to access, and if the strategy authorizes the user or the group containing the user, the user is authorized.

5. The method for managing and distributing cluster resources based on big data according to claim 1, wherein the method for managing and distributing cluster resources is applicable to Hadoop ecological components, including but not limited to HDFS, hive, HBase, yarn for fine-grained data access control.

6. The word eye distribution method based on the big data cluster is characterized by being applied to a YARN system, and specifically comprises the following steps:

step 1, submitting application programs to a system by different users;

step 2, the system allocates a first container for the application program, communicates with the corresponding node, and starts a main program of the application program in the first container;

step 3, the application main program is firstly registered to the resource manager, a user checks the running state of the application program through the resource manager, then the resource manager applies resources for each task, monitors the running state of each task until the running is finished, namely, the steps 4-7 are repeated;

7. The big data cluster based word eye allocation method of claim 6, wherein in step 6, the running environment includes environment variables, JAR packages, and binary programs.

8. The large data cluster-based word and eye allocation method according to claim 6, wherein a user can query the current running state of the application program from the application main program through the RPC at any time during the running process of the application program.

9. An electronic device, comprising a memory, a processor for implementing the steps of the big data based cluster resource management allocation method according to any of claims 1-5 or the big data based cluster word eye allocation method according to any of claims 6-8 when executing a computer program stored in the memory.

10. A computer-readable storage medium, having stored thereon a computer management class program, which when executed by a processor, implements the steps of the big data based cluster resource management allocation method according to any of claims 1-5 or the big data based cluster word eye allocation method according to any of claims 6-8.