CN113836143A

CN113836143A - Index creation method and device

Info

Publication number: CN113836143A
Application number: CN202111143947.4A
Authority: CN
Inventors: 李长青; 王佳佳
Original assignee: New H3C Big Data Technologies Co Ltd
Current assignee: New H3C Big Data Technologies Co Ltd
Priority date: 2021-09-28
Filing date: 2021-09-28
Publication date: 2021-12-24
Anticipated expiration: 2041-09-28
Also published as: CN113836143B

Abstract

The application provides an index creating method and device. According to the method, different data catalogs are configured for different users through permission policy configuration, different disks mounted by the data catalogs belonging to different users are different, and user data are finally stored in the disks mounted by the data catalogs distributed for the user data. It can be seen that the physical spatial isolation of different users has been defined by the rights policy configuration. When a request of a user for creating an index is received, the legality of the user for creating the index and the legality of a data directory specified by the user are verified by matching with a configured authority policy, and the user is allowed to execute only through the verified request, namely, only the legal user creates the index on the legal data directory is allowed, so that the service data of different users are physically isolated, and the service of a plurality of users is not influenced due to the fault of a single disk.

Description

Index creation method and device

Technical Field

The present application relates to the field of computer technologies, and in particular, to an index creation method and apparatus.

Background

The elastic search is an open-source distributed search engine, has the capabilities of distribution, expandability, real-time search and data analysis, is a current mainstream enterprise-level search engine, and can provide storage management and full-text retrieval capabilities for mass data.

The elastic search cluster is composed of at least one elastic search node (hereinafter referred to as a node), and each node includes at least one disk for storing data. During data storage, the nodes may store the associated data in the same Index (Index) according to service requirements, where the Index is a logical namespace pointing to one or more physical shards (boards), in other words, one Index may be divided into multiple shards, and the data is finally stored on the disk in the shards.

In practical use, it is found that, for a cluster supporting multi-user use, when a certain disk fails, the service of multiple users is often affected.

Disclosure of Invention

In view of this, the present application provides an index creating method and apparatus, so as to avoid the influence of a disk failure on multiple user services as much as possible.

In order to achieve the purpose of the application, the application provides the following technical scheme:

in a first aspect, the present application provides an index creating method applied to nodes included in an Elasticsearch cluster, where the Elasticsearch cluster includes at least one node, each node includes at least one disk, and the Elasticsearch cluster supports multi-user usage, and the method includes:

receiving an index creation request sent by a target user, wherein the index creation request comprises an identifier of the target user, an index name of a target index to be created by the target user and at least one first data directory specified by the target user and used for storing a fragment corresponding to the target index;

searching a target authority policy containing an index name of the target index from at least one configured authority policy, wherein the authority policy is used for granting the authority of creating the index under at least one second data directory configured for the user, and the second data directories corresponding to each user are different and the disks mounted by the second data directories belonging to different users are different;

and if the identification of the user included in the target authority policy is the same as the identification of the target user, and the at least one second data directory included in the target authority policy includes at least one first data directory specified by the target user, allocating the target index corresponding fragment to the at least one first data directory.

Optionally, the allocating the target index corresponding segment to the at least one first data directory includes:

executing the following processing aiming at each fragment to be distributed corresponding to the target index:

estimating the size of the current fragment;

screening at least one third data directory with the residual capacity not smaller than the estimated current fragment size from the at least one first data directory;

counting the number of fragments which belong to the target index and exist in each third data directory;

selecting a fourth data directory with the least counted number of fragments from the at least one third data directory;

assigning the current shard to the fourth data directory.

Optionally, the allocating the current segment to the fourth data directory includes:

if a plurality of fourth data directories exist, counting the total number of the existing fragments in each fourth data directory;

selecting a fifth data directory with the least total statistical number from the plurality of fourth data directories;

assigning the current shard to the fifth data directory.

Optionally, the method further includes:

if the third data directory does not exist in the at least one first data directory, selecting the data directory with the largest residual capacity from the at least one first data directory;

and distributing the current fragment to the data directory with the maximum residual capacity.

Optionally, the estimating of the size of the current segment includes:

acquiring the remaining total capacity of the at least one first data directory;

acquiring a target capacity accounting for a preset percentage of the residual total capacity;

taking the quotient of the total size of all existing shards in the current Elasticissearch cluster and the total number of all existing shards as the average value of the sizes of the shards corresponding to the target index;

and selecting the maximum value from the target capacity and the average value as the estimated size of the current fragment.

In a second aspect, the present application provides an index creating apparatus, applied to a node included in an Elasticsearch cluster, where the Elasticsearch cluster includes at least one node, each node includes at least one disk, and the Elasticsearch cluster supports multi-user usage, and the apparatus includes:

a receiving unit, configured to receive an index creation request sent by a target user, where the index creation request includes an identifier of the target user, an index name of a target index to be created by the target user, and at least one first data directory specified by the target user and used for storing a segment corresponding to the target index;

the searching unit is used for searching a target authority policy containing the index name of the target index from at least one configured authority policy, wherein the authority policy is used for granting the authority of creating the index under at least one second data directory configured for the user, the second data directories corresponding to each user are different, and the disks mounted by the second data directories belonging to different users are different;

and the distribution unit is used for distributing the target index corresponding fragment to at least one first data directory if the identifier of the user included in the target authority policy is the same as the identifier of the target user and the at least one second data directory included in the target authority policy includes at least one first data directory specified by the target user.

Optionally, the allocating, by the allocating unit, the target index corresponding partition to the at least one first data directory includes:

estimating the size of the current fragment;

assigning the current shard to the fourth data directory.

Optionally, the allocating unit allocates the current segment to the fourth data directory, including:

assigning the current shard to the fifth data directory.

Optionally, the allocating unit is further configured to select, if the third data directory does not exist in the at least one first data directory, a data directory with the largest remaining capacity from the at least one first data directory; and distributing the current fragment to the data directory with the maximum residual capacity.

Optionally, the estimating, by the allocating unit, the size of the current segment, including:

As can be seen from the above description, in the embodiment of the present application, different data directories are configured for different users through permission policy configuration, and different disks mounted in the data directories belonging to different users are different, so that user data is finally stored in the disk mounted in the data directory allocated to the user data. It can be seen that the physical spatial isolation of different users has been defined by the rights policy configuration. When a request of a user for creating an index is received, the legality of the user for creating the index and the legality of a data directory specified by the user are verified by matching with a configured authority policy, and the user is allowed to execute only through the verified request, namely, only the legal user creates the index on the legal data directory is allowed, so that the service data of different users are physically isolated, and the service of a plurality of users is not influenced due to the fault of a single disk.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a flow chart of an index creation method according to an embodiment of the present disclosure;

FIG. 2 is a schematic diagram of a rights policy configuration page according to an embodiment of the present application;

FIG. 3 is a flowchart illustrating an implementation of step 103 according to an embodiment of the present application;

FIG. 4 is a flowchart illustrating an implementation of step 305 according to an embodiment of the present application;

FIG. 5 is a flowchart illustrating an implementation of step 301 according to an embodiment of the present application;

fig. 6 is a schematic diagram of an index creating apparatus according to an embodiment of the present application.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application.

The terminology used in the embodiments of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the embodiments of the present application. As used in the embodiments of the present application, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It should be understood that although the terms first, second, third, etc. may be used in the embodiments of the present application to describe various information, the information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, the negotiation information may also be referred to as second information, and similarly, the second information may also be referred to as negotiation information without departing from the scope of the embodiments of the present application. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.

Referring to fig. 1, a flowchart of an index creation method shown in the embodiment of the present application is applied to nodes included in an Elasticsearch cluster.

The Elasticsearch cluster may comprise at least one node, each node comprising at least one disk. The Elasticsearch cluster may provide storage management and retrieval services for multiple users.

As shown in fig. 1, the process may include the following steps:

step 101, receiving an index creation request sent by a target user.

Here, the user who sends the index creation request is referred to as a target user. It is to be understood that the reference to the target user is merely a name for convenience of distinction and is not intended to be limiting.

The index creating request comprises an identification of a target user, an index name of a target index to be created by the target user and at least one first data directory specified by the target user and used for storing the fragments corresponding to the target index.

Here, the index to be created by the target user is referred to as a target index. It is to be understood that the reference to target index is merely a nomenclature for ease of distinction and is not intended to be limiting.

And the data directory specified by the target user and used for storing the fragments corresponding to the target index is called a first data directory. It is to be understood that the first data directory is named for convenience of distinguishing and is not intended to be limiting.

Step 102, searching a target authority policy containing the index name of the target index from at least one configured authority policy.

In the embodiment of the application, an administrator can configure the authority policy through the authority management platform. As one embodiment, the rights management platform may be Ranger.

Range is a unified framework integrating monitoring and authority management, and an administrator can perform authority strategy configuration through a Web product Interface (English: Website User Interface, abbreviated as Web UI) page provided by Range.

Referring to fig. 2, a schematic diagram of a permission policy configuration page shown in the embodiment of the present application is shown. The rights policies configured through the page include, but are not limited to, the following: a policy name (e.g., P1), an Index name (e.g., Index1) of an Index to which the policy applies, a User name (e.g., User1) of a User having an operation right to the Index, a data directory (e.g.,/path 1,/path 2,/path 3,/path 4) configured for the User to store a corresponding slice of the Index, and an operation (e.g., create) executable on the Index.

In the embodiment of the application, the authority for the User to create the Index is authorized through the configuration authority policy, specifically, the authority for the User to create the Index under the data directory configured for the User is authorized, for example, the authority policy shown in fig. 2 indicates the authority for the User1 to create the Index1 under/path 1,/path 2,/path 3,/path 4.

In the embodiment of the present application, the data directory configured for the user in the authority policy is referred to as a second data directory. It is to be understood that the reference to the second directory is merely a nomenclature for convenience of distinction and is not intended to be limiting.

In addition, in the embodiment of the present application, the second data directories configured for different users are different, and the disks mounted in the second data directories belonging to different users are different.

For example, the second data directories configured for the User 1(User1) are/path 1,/path 2,/path 3,/path 4, respectively, and the disks mounted on the second data directories are Disk1, Disk2, Disk3, and Disk 4; the second data directories configured for the User 2(User2) are/path 5,/path 6 respectively, and the disks mounted on the second data directories are Disk5 and Disk6 respectively.

That is, when configuring the permission policy, it is defined to physically isolate the storage spaces corresponding to different users.

Nodes of the Elasticissearch cluster can regularly pull configured permission strategies from the Range permission management platform by running a Range plug-in. When the node acquires the index name of the target index to be created by the target user through step 101, the authority policy including the index name of the target index, that is, the authority policy configured for the target index, may be queried from the acquired authority policies according to the index name of the target index.

For example, if the Index name of the target Index obtained in step 101 is Index1, the authority policy P1 including Index1 can be queried in this step.

In the embodiment of the application, the authority policy containing the index name of the target index is called a target authority policy. It is to be understood that the reference to target permission policy is merely a nomenclature for ease of distinction and is not intended to be limiting.

And 103, if the identification of the user included in the target authority policy is the same as the identification of the target user, and the at least one second data directory included in the target authority policy includes at least one first data directory specified by the target user, allocating the fragment corresponding to the target index to the at least one first data directory.

Here, the identifier of the user included in the target permission policy is the same as the identifier of the target user, which indicates that the target user is a legal user and has the permission to create the target index; the at least one second data directory included in the target authority policy comprises at least one first data directory specified by the target user, and the first data directory specified by the user and used for storing the corresponding fragment of the target index is in a legal data directory (second data directory) range configured by an administrator.

After the user identity is determined to be legal and the specified data directory is legal through the step, namely after the user identity is verified to be authorized, the target index is allowed to be created under the first data directory specified by the user, namely, the fragment corresponding to the target index is allowed to be distributed under the first data directory.

For example, User1 requests to create Index1, and specifies that the data directories storing the fragment corresponding to Index1 are/path 1,/path 2,/path 3, respectively, and matches with authority policy P1 corresponding to Index1 to know that User1 is a valid User,/path 1,/path 2,/path 3 is also an allowed valid data directory, and then Index1 can be created under/path 1,/path 2,/path 3.

Specifically, the process of allocating the fragment corresponding to the target index to the first data directory is described below, and details are not described here.

Thus, the flow shown in fig. 1 is completed.

As can be seen from the flow shown in fig. 1, in the embodiment of the present application, different data directories are configured for different users through permission policy configuration, and different disks mounted in the data directories belonging to different users are different, and the user data is finally stored in the disk mounted in the data directory allocated to the user data. It can be seen that the physical spatial isolation of different users has been defined by the rights policy configuration. When a request of a user for creating an index is received, the legality of the user for creating the index and the legality of a data directory specified by the user are verified by matching with a configured authority policy, and the user is allowed to execute only through the verified request, namely, only the legal user creates the index on the legal data directory is allowed, so that the service data of different users are physically isolated, and the service of a plurality of users is not influenced due to the fault of a single disk.

The following describes a process of allocating the target index corresponding slice to at least one first data directory in step 103. Specifically, the flow illustrated in fig. 3 may be sequentially executed for each to-be-allocated slice corresponding to the target index. As shown in fig. 3, the process may include the following steps:

step 301, estimating the size of the current fragment.

Since the actual size of the fragment cannot be determined in the current pre-allocation stage, the size of the fragment to be allocated needs to be estimated, so that a data directory (disk) in which the fragment can be placed is determined according to the estimated size in the following.

The process of estimating the size of the segment is described below, and will not be described herein again.

Step 302, at least one third data directory with the residual capacity not less than the estimated current fragment size is screened from the at least one first data directory.

Here, the third data directory refers to the first data directory having a remaining capacity not smaller than the predicted current slice size. That is, from the first data directory, a data directory with enough remaining storage space to place the current slice is found. It is to be understood that the reference to the third data directory is merely a name for convenience of distinguishing and is not intended to be limiting.

Still take/path 1,/path 2,/path 3 specified by User1 as an example, where the remaining capacity of/path 1 is 100GB, the remaining capacity of/path 2 is 200GB, and the remaining capacity of/path 3 is 18GB, and it is estimated that the current fragment size is 20GB, then it is determined that/path 1,/path 2 is a third data directory whose remaining capacity is not less than the current fragment size.

Step 303, for each third data directory, counting the number of segments belonging to the target index existing in the third data directory.

After the third data directories with the remaining storage space satisfying the requirement are screened out through step 302, the number of the fragments existing (allocated) in each third data directory and identical to the index (target index) to which the current fragment belongs is counted.

Still taking Index1 as an example, after screening/path 1 and/path 2 through step 302, the number of fragments belonging to Index1 existing under/path 1 and/path 2 are counted, respectively.

Step 304, selecting a fourth data directory with the smallest counted number of fragments belonging to the target index from at least one third data directory.

Here, the fourth data directory refers to the third data directory currently including the smallest number of slices belonging to the target index. Namely, the data directory with the least fragments corresponding to the current storage target index is selected from the third data directories. It is to be understood that the fourth data directory is named for convenience of distinguishing and is not limiting.

Step 305, the current shard is assigned to the fourth data directory.

Still taking the Index1 to which the current fragment belongs as an example, the number of fragments belonging to Index1 already existing under/path 1 is counted as 1 and the number of fragments belonging to Index1 already existing under/path 2 is counted as 2 through step 303, and then the current fragment is allocated to/path 1 with the smallest number of fragments corresponding to the already existing Index 1.

In the embodiment of the application, the fragments corresponding to the same index are distributed to the data directories in a balanced manner, so that the efficiency of subsequently searching the data corresponding to the index can be effectively improved, and hot data can be prevented from being formed in a certain data directory as much as possible.

The flow shown in fig. 3 is completed.

As can be seen from the flow shown in fig. 3, in the embodiment of the present application, the storage space of the data directories and the existing allocation condition of the partition corresponding to the same index in each data directory are considered comprehensively, and the partition corresponding to the index is allocated in a balanced manner, so that the hot data is avoided as much as possible.

The process of assigning the current shard to the fourth data directory in step 305 is described below. Referring to fig. 4, a flow of implementing step 305 according to an embodiment of the present application is shown.

As shown in fig. 4, the process may include the following steps:

step 401, if there are multiple fourth data directories, for each fourth data directory, counting the total number of existing fragments in the fourth data directory.

When a plurality of fourth data directories are screened out through step 304, a data directory needs to be selected from the plurality of fourth data directories to allocate the current shard.

As an embodiment, a fourth data directory may be selected from the plurality of fourth data directories, and the current shard is allocated to the selected fourth data directory.

As another embodiment, the total number of existing shards in each fourth data directory may be further counted.

Still taking the Index1 to which the current fragment belongs as an example, if the number of fragments belonging to the Index1 already existing under/path 1 and/path 2 counted by step 303 is the same, for example, 2 fragments, the number of all fragments already existing under/path 1 and/or the number of all fragments already existing under path2 are continuously counted.

Step 402, selecting a fifth data directory with the least total number of the counted existing fragments from the plurality of fourth data directories.

Here, the fifth data directory refers to a fourth data directory having the smallest total number of existing fragments. Namely, the data directory with the least number of the existing fragments is selected from the fourth data directory. It is to be understood that the fifth data directory is named for convenience of distinguishing and is not limiting.

In step 403, the current shard is assigned to the fifth data directory.

For example, if the total number of existing shards under/path 1 is 20 and the total number of existing shards under/path 2 is 16, the current shard is allocated to the/path 2 with the smallest number of existing shards, which is counted by step 401.

The flow shown in fig. 4 is completed.

As can be seen from the flow shown in fig. 4, in the embodiment of the present application, in a case that it is determined that the distribution of the distributed segments corresponding to the current index on each data directory (fourth data directory) is balanced, the data directory with a relatively lighter load (with a smaller total number of segments) is selected for distribution by further considering the overall load condition of each data directory, that is, the number of the segments of each data directory is balanced as much as possible, so as to improve the cluster retrieval efficiency.

In addition, it should be added that when the data directories meeting the conditions are not screened out through fig. 3 and fig. 4, the data directory with the maximum remaining capacity can be directly selected from the first data directories specified by the user, and the current segment is allocated to the data directory with the maximum remaining capacity, so as to achieve the purpose of reasonably utilizing the storage space.

The process of estimating the size of the current tile in step 301 will be described below. Referring to fig. 5, a flow of implementing step 301 is shown in the embodiment of the present application.

As shown in fig. 5, the process may include the following steps:

step 501, obtaining the remaining total capacity of at least one first data directory.

That is, the remaining total capacity of all the first data directories that can store the corresponding slices of the target index specified by the user is counted.

Still take/path 1,/path 2,/path 3 specified by User1 for Index1 as an example, wherein the remaining capacity of/path 1 is 100GB, the remaining capacity of/path 2 is 200GB, and the remaining capacity of/path 3 is 18GB, so that the total remaining capacity can be 318GB currently.

Step 502, obtaining a target capacity which accounts for a preset percentage of the remaining total capacity.

For example, if the preset percentage is 5%, the currently obtained target capacity is 318GB × 5% — 15.9 GB.

Here, the capacity that occupies a preset percentage of the remaining total capacity is referred to as a target capacity. It is to be understood that the reference to target capacity is merely a nomenclature for ease of distinction and is not intended to be limiting.

Step 503, taking the quotient of the total size of all existing shards in the current Elasticsearch cluster and the total number of all existing shards as the average value of the sizes of the corresponding shards of the target index.

For example, if the total size of all existing shards in the current Elasticsearch cluster is 2000GB, and the total number of all existing shards is 100, the average value of the shard sizes determined by this step is 2000/100 ═ 20 GB.

And step 504, selecting the maximum value from the target capacity and the average value as the estimated size of the current fragment.

Still taking the target capacity as 15.9GB and the average value of the slice sizes as 20GB as an example, 20GB is selected as the estimated size of the current slice in this step.

The flow shown in fig. 5 is completed. The estimation of the slice size is completed by the flow shown in fig. 5.

The method provided by the embodiment of the present application is described above, and the apparatus provided by the embodiment of the present application is described below:

referring to fig. 6, an index creating apparatus shown for the embodiment of the present application is applied to nodes included in an Elasticsearch cluster, where the Elasticsearch cluster includes at least one node, each node includes at least one disk, and the Elasticsearch cluster supports multi-user usage, and the apparatus includes: a receiving unit 601, a searching unit 602, and an allocating unit 603, wherein:

a receiving unit 601, configured to receive an index creation request sent by a target user, where the index creation request includes an identifier of the target user, an index name of a target index to be created by the target user, and at least one first data directory specified by the target user and used for storing a segment corresponding to the target index;

a searching unit 602, configured to search, from at least one configured permission policy, a target permission policy that includes an index name of the target index, where the permission policy is used to grant a user permission to create an index under at least one second data directory configured for the user, where second data directories corresponding to each user are different, and disks mounted in the second data directories belonging to different users are different;

an allocating unit 603, configured to allocate the target index corresponding segment to at least one first data directory if the identifier of the user included in the target permission policy is the same as the identifier of the target user, and the at least one second data directory included in the target permission policy includes at least one first data directory specified by the target user.

As an embodiment, the allocating unit 603 allocates the target index corresponding slice to the at least one first data directory, including:

estimating the size of the current fragment;

assigning the current shard to the fourth data directory.

As an embodiment, the allocating unit 603 allocates the current slice to the fourth data directory, including:

assigning the current shard to the fifth data directory.

As an embodiment, the allocating unit 603 is further configured to select, if the third data directory does not exist in the at least one first data directory, a data directory with the largest remaining capacity from the at least one first data directory; and distributing the current fragment to the data directory with the maximum residual capacity.

As an embodiment, the allocating unit 603 pre-estimates the size of the current slice, including:

The description of the apparatus shown in fig. 6 is thus completed.

The above description is only a preferred embodiment of the present application, and should not be taken as limiting the present application, and any modifications, equivalents, improvements, etc. made within the spirit and principle of the present application shall be included in the scope of the present application.

Claims

1. An index creation method applied to nodes included in an Elasticissearch cluster, wherein the Elasticissearch cluster includes at least one node, each node includes at least one disk, and the Elasticissearch cluster supports multi-user usage, the method comprising:

2. The method of claim 1, wherein said assigning the target index correspondence slice to the at least one first data directory comprises:

estimating the size of the current fragment;

assigning the current shard to the fourth data directory.

3. The method of claim 2, wherein said assigning the current shard to the fourth data directory comprises:

assigning the current shard to the fifth data directory.

4. The method of claim 2, wherein the method further comprises:

5. The method of claim 2, wherein the pre-estimating the size of the current slice comprises:

6. An index creation apparatus applied to a node included in an Elasticissearch cluster, wherein the Elasticissearch cluster includes at least one node, each node includes at least one disk, and the Elasticissearch cluster supports multi-user usage, the apparatus comprising:

7. The apparatus of claim 6, wherein the assigning unit assigns the target index correspondence slice to the at least one first data directory comprises:

estimating the size of the current fragment;

assigning the current shard to the fourth data directory.

8. The apparatus of claim 7, wherein the allocation unit to allocate the current shard to the fourth data directory comprises:

assigning the current shard to the fifth data directory.

9. The apparatus of claim 7, wherein:

the allocating unit is further configured to select a data directory with the largest remaining capacity from the at least one first data directory if the third data directory does not exist in the at least one first data directory; and distributing the current fragment to the data directory with the maximum residual capacity.

10. The apparatus of claim 7, wherein the allocating unit predicts the size of the current slice, comprising: