Disclosure of Invention
The embodiment of the invention aims to provide a multi-tenant isolation method and device, which aim to solve the problems that a Hadoop multi-tenant mechanism (Yarn) in the prior art has the limitation of depending on a bottom operating system, realizes inaccurate resource isolation and cannot realize platform-level isolation; the multi-tenant mechanism utilizing the IaaS layer capability must depend on the IaaS capability, IaaS management platform software is deployed, the cost is increased, and the problem that the sharing of bottom layer data cannot be realized due to the fact that a plurality of Hadoop clusters are deployed is solved.
In order to achieve the above object, an embodiment of the present invention provides a method for isolating tenants of multiple tenants, where the method includes:
receiving resource scheduling service requests of a plurality of tenants; the resource scheduling service request at least carries a service requested by a tenant and a service deployment node requested by the tenant;
deploying n copies of the local data on a service deployment node of a tenant request needing the local data according to the resource scheduling service request and a target copy distribution strategy; wherein n is an integer greater than or equal to 2;
and respectively deploying resource scheduling service strategies corresponding to the services requested by the tenants on the service deployment nodes requested by the tenants according to the resource scheduling service requests.
The resource scheduling service request also carries indication information indicating whether a tenant needs local data or not;
the step of deploying the n copies of the local data on the service deployment node of the tenant request needing the local data according to the target copy distribution strategy according to the resource scheduling service request comprises the following steps:
determining the number N of tenants needing local data according to the indication information;
configuring a target copy distribution strategy according to the number N of the tenants needing local data and service deployment nodes requested by the tenants needing local data; wherein the target replica distribution policy comprises: the number n of the copies and deployment nodes of the copies;
and respectively deploying the copies of the local data on the deployment nodes of the copies.
The step of configuring a target copy distribution policy according to the number N of tenants requiring local data and a service deployment node requested by the tenants requiring local data includes:
if the service deployment node of the first tenant request needing the local data comprises all service deployment nodes of the second tenant request needing the local data, acquiring the number of the second tenants;
the number N of the configuration copies is equal to the number N of the tenants needing the local data minus the number of the second tenants;
dividing each service deployment node of the tenant request needing the local data except the second tenant into a logic group to obtain m logic groups, wherein m is equal to n;
and configuring one copy in each logic group, wherein any service deployment node in the logic group is the deployment node of the copy.
If the number n of the copies is less than 3, the method comprises the following steps:
and respectively deploying the copies on any n-3 nodes of all service deployment nodes requiring the tenant request of the local data.
After the step of respectively deploying the resource scheduling service policies corresponding to the services requested by the tenants on the service deployment node requested by each tenant according to the resource scheduling service requests, the method further includes:
and correspondingly sending the target copy distribution strategy and the resource scheduling service strategy to the plurality of tenants.
The embodiment of the invention also provides a multi-tenant isolation device, which comprises:
the system comprises a request receiving module, a resource scheduling service module and a resource scheduling service module, wherein the request receiving module is used for receiving resource scheduling service requests of a plurality of tenants; the resource scheduling service request at least carries a service requested by a tenant and a service deployment node requested by the tenant;
the replica deployment module is used for deploying n replicas of the local data on a service deployment node of a tenant request needing the local data according to the resource scheduling service request and a target replica distribution strategy; wherein n is an integer greater than or equal to 2;
and the scheduling and deploying module is used for respectively deploying the resource scheduling service strategies corresponding to the services requested by the tenants on the service deployment nodes requested by the tenants according to the resource scheduling service requests.
The resource scheduling service request also carries indication information indicating whether a tenant needs local data or not;
the replica deployment module comprises:
the first determining submodule is used for determining the number N of tenants needing local data according to the indication information;
the policy configuration submodule is used for configuring a target copy distribution policy according to the number N of the tenants needing local data and the service deployment node requested by the tenants needing local data; wherein the target replica distribution policy comprises: the number n of the copies and deployment nodes of the copies;
and the copy deployment submodule is used for respectively deploying the copies of the local data on the deployment nodes of the copies.
Wherein the policy configuration sub-module comprises:
the acquiring unit is used for acquiring the number of the second tenants if the service deployment nodes required by the first tenants of the local data contain all the service deployment nodes required by the second tenants of the local data;
the configuration unit is used for configuring the number N of the copies to be equal to the number N of the tenants needing the local data minus the number of the second tenants;
the dividing unit is used for dividing each service deployment node which needs the local data and is requested by the tenant except the second tenant into a logic group to obtain m logic groups, wherein m is equal to n;
and the configuration unit is used for configuring one copy stored in each logic group, and any service deployment node in the logic group is the deployment node of the copy.
Wherein the apparatus further comprises:
and the deployment unit is used for respectively deploying the copies on any n-3 nodes of all service deployment nodes required by the tenant of the local data if the number n of the copies is less than 3.
Wherein the apparatus further comprises:
and the sending module is used for correspondingly sending the target copy distribution strategy and the resource scheduling service strategy to the plurality of tenants.
The technical scheme of the embodiment of the invention has the following beneficial effects:
in the scheme of the embodiment of the invention, after each tenant puts forward a resource scheduling service request, a target copy distribution strategy is set for the tenant needing local data and a copy is deployed according to whether the tenant needs local data, and data sharing is realized without depending on a Yarn bottleneck; meanwhile, different resource scheduling service strategies are deployed for each tenant, resource isolation is achieved, platform-level tenant isolation is achieved by using the existing Hadoop technology on the premise that the ability of a class platform is not relied on, existing Hadoop investment is fully used, and the investment budget of the class platform is not increased.
Detailed Description
In order to make the technical problems, technical solutions and advantages of the present invention more apparent, the following detailed description is given with reference to the accompanying drawings and specific embodiments.
As shown in fig. 1, an embodiment of the present invention provides a method for isolating tenants of multiple tenants, where the method includes:
step 11, receiving resource scheduling service requests of a plurality of tenants; the resource scheduling service request at least carries a service requested by a tenant and a service deployment node requested by the tenant.
In this step, the resource scheduling service request may be initiated after the user logs in the multi-tenant manager, or may be initiated by the user through some signaling or information, which is not specifically limited herein.
The services requested by the tenant include, but are not limited to, a distributed merge computation MR, a distributed storage system Hbase, a big data query engine Hive, a memory iterative computation framework Spark, a big data query library Impala, another resource coordinator Yarn, and the like.
Step 12, deploying n copies of the local data on a service deployment node of a tenant request needing the local data according to a target copy distribution strategy according to the resource scheduling service request; wherein n is an integer greater than or equal to 2.
In this step, the copy is used to ensure data security, and further ensure that each tenant requiring local data can access the local data, thereby implementing data sharing between different tenants. It should be noted that n copies need to be deployed on n service deployment nodes.
And step 13, respectively deploying resource scheduling service strategies corresponding to the services requested by the tenants on the service deployment nodes requested by the tenants according to the resource scheduling service requests.
In this step, a corresponding resource scheduling service policy is deployed for the service requested by each tenant, so that data isolation between different tenants is realized.
In summary, after each tenant makes a resource scheduling service request in the above embodiment of the present invention, according to whether the tenant needs local data, a target copy distribution policy is set for the tenant that needs local data and a copy is deployed, and data sharing is achieved without relying on a Yarn bottleneck; meanwhile, different resource scheduling service strategies are deployed for each tenant to realize resource isolation, so that platform-level tenant isolation is realized by using the conventional Hadoop technology on the premise of not depending on the capacity of a class platform, the conventional Hadoop investment is fully utilized, and the investment budget of the class platform is not increased
Specifically, in the above embodiment of the present invention, the resource scheduling service request further carries indication information indicating whether the tenant needs the local data.
For example, after a plurality of tenants make resource scheduling service requests, specific requirements required by each tenant are summarized, and the specific requirements are stored in an array service Request (), where the service Request is defined as follows:
class service Request()
{
// services required, including but not limited to MR, Hbase, Hive, Spark, Impala
String[]service Name;
Indication information indicating whether the tenant needs local data
Boolean is Local Data;
// on which service deployment nodes the service needs to be deployed, usually identified by hostname or IP
String[]nodes;
}
Accordingly, step 12 comprises:
step 121, determining the number N of tenants requiring local data according to the indication information;
step 122, configuring a target copy distribution strategy according to the number N of the tenants needing the local data and the service deployment nodes requested by the tenants needing the local data; wherein the target replica distribution policy comprises: the number n of the copies and deployment nodes of the copies;
and step 123, respectively deploying the copies of the local data on the deployment nodes of the copies.
Preferably, step 122 in the above embodiment of the present invention includes:
if the service deployment node of the first tenant request needing the local data comprises all service deployment nodes of the second tenant request needing the local data, acquiring the number of the second tenants; i.e. the service deployment node of the second tenant request is a subset of the service deployment node of the first tenant request. For example, the number of the second tenants is determined by sequentially comparing the service deployment nodes requested by each tenant, which require local data, in the order from small to large (or from large to small).
For another example, the service deployment node of the first tenant is a node number 1-5, the service deployment node of the second tenant is a node number 1-3, and the service deployment node requested by the second tenant is a subset of the service deployment node requested by the first tenant.
For example, the specific algorithm for n is as follows:
1.1: setting n as 1;
1.2: and sequentially circulating each tenant from small to large according to the number of the service deployment nodes requested by each tenant.
1.21: if the service deployment node of the current tenant request comprises the service deployment node of any previous tenant request, namely the service deployment node of any previous tenant request is a subset of the service deployment node of the current tenant request, the loop is ended, and the loop jumps to 1.2;
1.3:n++。
the number N of the configuration copies is equal to the number N of the tenants needing the local data minus the number of the second tenants; if the number of the second tenants is 0, the number N of the copies is equal to the number N of the tenants requiring the local data.
Dividing each service deployment node of the tenant request needing the local data except the second tenant into a logic group to obtain m logic groups, wherein m is equal to n;
and configuring one copy in each logic group, wherein any service deployment node in the logic group is the deployment node of the copy.
Specifically, each piece of local data generates n copies, all service deployment nodes of the local data are required to be divided into m logical groups (m is equal to n), each logical group includes all service deployment nodes requested by a corresponding tenant, and at least one copy of the local data is set in each logical group during deployment.
It should be noted that, in order to ensure the data security of the local data, the number of copies needs to be greater than or equal to 3; however, if the number N of the copies obtained by the above method provided in the present application is less than 3 (the value obtained by subtracting the number of the second tenant from the number N of the tenant requiring local data is less than 3), the method includes:
and respectively deploying the copies on any n-3 nodes of all service deployment nodes requiring the tenant request of the local data. I.e. the further 3-n copies are randomly distributed on any service deployment node.
Further, after step 13 in the above embodiment of the present invention, the method further includes:
and correspondingly sending the target copy distribution strategy and the resource scheduling service strategy to the plurality of tenants.
Sending the corresponding policy to each tenant enables the tenant to make sure which part of data can be shared and which part of resources are isolated.
In conclusion, the existing Hadoop technology is utilized in the scheme, and platform-level tenant isolation is achieved on the premise that the capability of the IaaS platform is not relied on. The scheme fully utilizes the existing Hadoop investment, and does not increase the investment budget of the IaaS platform. And the scheme can realize platform-level resource isolation and is independent of the Yarn bottleneck. In addition, the scheme sufficiently shares data, and avoids the situation that isolated data of a plurality of clusters cannot be shared.
As shown in fig. 2, an example of an application flow of the embodiment of the present invention is as follows.
Step 201, the user logs in.
Step 202, select whether local data is needed.
Step 203, select the required service.
Step 204, according to the configuration type of the service deployment node required by the selection, a specific service deployment node is specified.
Step 205, the application is successful, and the user obtains the access entry address URL of the cluster, the client and the policy list of various services.
The overall flow of the embodiment of the present invention is exemplified as follows.
Assuming that three tenants respectively represent B, O, M three fields and share one Hadoop cluster, in order to realize data sharing and strict isolation of components used by the tenants, the requirements of the three tenants are firstly summarized according to the flow of the text as follows:
the/Bss tenant wants to deploy Yann, Hive, Impala and Spark, needs local data, and has strict limitation on resources on nodes No. 1-5
Bss Service Request(){
String service Name={Yarn,Hive,Impala,Spark};
Boolean is Local Data=true;
String[]nodes={10.1.1.1,10.1.1.2,10.1.1.3,10.1.1.4,10.1.1.5};}
The// Os tenant ratio Bss has multiple deployments of Hbase, local data is needed, and resources are strictly limited on nodes 6-11
Oss Service Request(){
String service Name={Yarn,Hive,Impala,Spark,Hbase};
Boolean is Local Data=true;
String[]nodes={10.1.1.6,10.1.1.7,10.1.1.8,10.1.1.9,10.1.1.10,10.1.1.11};}
the/Mss tenant only deploys the Yarn and the Hive without local data, and resources are strictly limited on nodes No. 1-3
Mss Service Request(){
String service Name={Yarn,Hive};
Boolean is Local Data=false;
String[]nodes={10.1.1.1,10.1.1.2,10.1.1.3};}
Firstly, only Bss tenants and Oss tenants needing local data are determined according to the method provided by the embodiment of the invention; and the service deployment node requested by the Oss tenant and the service deployment node requested by the Bss tenant are not subsets of each other, so that the number n of copies is 2, the number m of logical groups is also 2, the service deployment node included in one logical group is node number 1-5, and the service deployment node included in the other logical group is node number 6-11. The first copy is deployed on any one of nodes 1-5, the second copy is deployed on any one of nodes 6-11, and as the number of copies is greater than or equal to 3, a 3-copy strategy is adopted when n is less than 3; specifically, the third copy is deployed on any one of nodes 1 to 11, and is determined in a random distribution manner, so that the uniform distribution of the third copy is ensured.
Secondly, respectively deploying the services required by each tenant according to the node requirements provided by each tenant.
After the deployment is completed, the data of the three tenants can be shared, and meanwhile, the components are not influenced by each other and are independent of each other. Meanwhile, each tenant still has a Yarn mechanism, and resources can be allocated to each user inside the tenant.
Assuming that three tenants respectively represent B, O, M three fields and share one Hadoop cluster, in order to realize data sharing and strict isolation of the components used by the tenants, the requirements of the three tenants are firstly summarized according to the flow of the text as follows:
the/Bss tenant wants to deploy Yann, Hive, Impala and Spark, needs local data, and has strict limitation on resources on nodes No. 1-5
Bss Service Request(){
String service Name={Yarn,Hive,Impala,Spark};
Boolean is Local Data=true;
String[]nodes={10.1.1.1,10.1.1.2,10.1.1.3,10.1.1.4,10.1.1.5};}
The// Os tenant ratio Bss has multiple deployments of Hbase, local data is needed, and resources are strictly limited on nodes 6-11
Oss Service Request(){
String service Name={Yarn,Hive,Impala,Spark,Hbase};
Boolean is Local Data=true;
String[]nodes={10.1.1.6,10.1.1.7,10.1.1.8,10.1.1.9,10.1.1.10,10.1.1.11};}
the/Mss tenant only deploys the Yarn and the Hive, local data is needed, and resources are strictly limited on nodes No. 1-3
Mss Service Request(){
String service Name={Yarn,Hive};
Boolean is Local Data=true;
String[]nodes={10.1.1.1,10.1.1.2,10.1.1.3};}
Firstly, according to the method provided by the embodiment of the invention, only Bss tenants, Oss tenants and Mss tenants needing local data are determined, but the service deployment node requested by the Mss tenants is a subset of the service deployment node requested by Bss tenants, that is, the number of second tenants is 1; the number of copies N is the number of tenants that need local data N — the number of second tenants is 2. The number m of the logical groups is also 2, the service deployment nodes included in one logical group are nodes No. 1-5, and the service deployment nodes included in the other logical group are nodes No. 6-11. The first copy is deployed on any one of nodes 1-5, the second copy is deployed on any one of nodes 6-11, and as the number of copies is greater than or equal to 3, a 3-copy strategy is adopted when n is less than 3; specifically, the third copy is deployed on any one of nodes 1 to 11, and is determined in a random distribution manner, so that the uniform distribution of the third copy is ensured.
Secondly, respectively deploying the services required by each tenant according to the node requirements provided by each tenant.
After the deployment is completed, the data of the three tenants can be shared, and meanwhile, the components are not influenced by each other and are independent of each other. Meanwhile, each tenant still has a Yarn mechanism, and resources can be allocated to each user inside the tenant.
As shown in fig. 3, an embodiment of the present invention further provides a tenant isolation apparatus with multiple tenants, where the apparatus includes:
a request receiving module 31, configured to receive resource scheduling service requests of multiple tenants; the resource scheduling service request at least carries a service requested by a tenant and a service deployment node requested by the tenant;
the replica deploying module 32 is configured to deploy, according to the resource scheduling service request, n replicas of the local data on a service deployment node of a tenant request that needs the local data according to a target replica distribution policy; wherein n is an integer greater than or equal to 2;
and the scheduling deployment module 33 is configured to deploy, according to the resource scheduling service request, a resource scheduling service policy corresponding to the service requested by the tenant on the service deployment node requested by each tenant.
Specifically, in the above embodiment of the present invention, the resource scheduling service request further carries indication information indicating whether the tenant needs local data;
the replica deployment module comprises:
the first determining submodule is used for determining the number N of tenants needing local data according to the indication information;
the policy configuration submodule is used for configuring a target copy distribution policy according to the number N of the tenants needing local data and the service deployment node requested by the tenants needing local data; wherein the target replica distribution policy comprises: the number n of the copies and deployment nodes of the copies;
and the copy deployment submodule is used for respectively deploying the copies of the local data on the deployment nodes of the copies.
Specifically, in the foregoing embodiment of the present invention, the policy configuration sub-module includes:
the acquiring unit is used for acquiring the number of the second tenants if the service deployment nodes required by the first tenants of the local data contain all the service deployment nodes required by the second tenants of the local data;
the configuration unit is used for configuring the number N of the copies to be equal to the number N of the tenants needing the local data minus the number of the second tenants;
the dividing unit is used for dividing each service deployment node which needs the local data and is requested by the tenant except the second tenant into a logic group to obtain m logic groups, wherein m is equal to n;
and the configuration unit is used for configuring one copy stored in each logic group, and any service deployment node in the logic group is the deployment node of the copy.
Specifically, in the above embodiment of the present invention, the apparatus further includes:
and the deployment unit is used for respectively deploying the copies on any n-3 nodes of all service deployment nodes required by the tenant of the local data if the number n of the copies is less than 3.
Specifically, in the above embodiment of the present invention, the apparatus further includes:
and the sending module is used for correspondingly sending the target copy distribution strategy and the resource scheduling service strategy to the plurality of tenants.
In the scheme of the embodiment of the invention, after each tenant puts forward a resource scheduling service request, a target copy distribution strategy is set for the tenant needing local data and a copy is deployed according to whether the tenant needs local data, and data sharing is realized without depending on a Yarn bottleneck; meanwhile, different resource scheduling service strategies are deployed for each tenant, resource isolation is achieved, platform-level tenant isolation is achieved by using the existing Hadoop technology on the premise that the ability of a class platform is not relied on, existing Hadoop investment is fully used, and the investment budget of the class platform is not increased.
It should be noted that the tenant isolation device with multiple tenants provided in the embodiment of the present invention is a tenant isolation device capable of executing the tenant isolation method with multiple tenants, and all embodiments of the tenant isolation method with multiple tenants are applicable to the device and can achieve the same or similar friendship effect.
While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention as defined in the appended claims.