CN113836143A - Index creation method and device - Google Patents

Index creation method and device Download PDF

Info

Publication number
CN113836143A
CN113836143A CN202111143947.4A CN202111143947A CN113836143A CN 113836143 A CN113836143 A CN 113836143A CN 202111143947 A CN202111143947 A CN 202111143947A CN 113836143 A CN113836143 A CN 113836143A
Authority
CN
China
Prior art keywords
data directory
target
index
data
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111143947.4A
Other languages
Chinese (zh)
Other versions
CN113836143B (en
Inventor
李长青
王佳佳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
New H3C Big Data Technologies Co Ltd
Original Assignee
New H3C Big Data Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by New H3C Big Data Technologies Co Ltd filed Critical New H3C Big Data Technologies Co Ltd
Priority to CN202111143947.4A priority Critical patent/CN113836143B/en
Publication of CN113836143A publication Critical patent/CN113836143A/en
Application granted granted Critical
Publication of CN113836143B publication Critical patent/CN113836143B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides an index creating method and device. According to the method, different data catalogs are configured for different users through permission policy configuration, different disks mounted by the data catalogs belonging to different users are different, and user data are finally stored in the disks mounted by the data catalogs distributed for the user data. It can be seen that the physical spatial isolation of different users has been defined by the rights policy configuration. When a request of a user for creating an index is received, the legality of the user for creating the index and the legality of a data directory specified by the user are verified by matching with a configured authority policy, and the user is allowed to execute only through the verified request, namely, only the legal user creates the index on the legal data directory is allowed, so that the service data of different users are physically isolated, and the service of a plurality of users is not influenced due to the fault of a single disk.

Description

Index creation method and device
Technical Field
The present application relates to the field of computer technologies, and in particular, to an index creation method and apparatus.
Background
The elastic search is an open-source distributed search engine, has the capabilities of distribution, expandability, real-time search and data analysis, is a current mainstream enterprise-level search engine, and can provide storage management and full-text retrieval capabilities for mass data.
The elastic search cluster is composed of at least one elastic search node (hereinafter referred to as a node), and each node includes at least one disk for storing data. During data storage, the nodes may store the associated data in the same Index (Index) according to service requirements, where the Index is a logical namespace pointing to one or more physical shards (boards), in other words, one Index may be divided into multiple shards, and the data is finally stored on the disk in the shards.
In practical use, it is found that, for a cluster supporting multi-user use, when a certain disk fails, the service of multiple users is often affected.
Disclosure of Invention
In view of this, the present application provides an index creating method and apparatus, so as to avoid the influence of a disk failure on multiple user services as much as possible.
In order to achieve the purpose of the application, the application provides the following technical scheme:
in a first aspect, the present application provides an index creating method applied to nodes included in an Elasticsearch cluster, where the Elasticsearch cluster includes at least one node, each node includes at least one disk, and the Elasticsearch cluster supports multi-user usage, and the method includes:
receiving an index creation request sent by a target user, wherein the index creation request comprises an identifier of the target user, an index name of a target index to be created by the target user and at least one first data directory specified by the target user and used for storing a fragment corresponding to the target index;
searching a target authority policy containing an index name of the target index from at least one configured authority policy, wherein the authority policy is used for granting the authority of creating the index under at least one second data directory configured for the user, and the second data directories corresponding to each user are different and the disks mounted by the second data directories belonging to different users are different;
and if the identification of the user included in the target authority policy is the same as the identification of the target user, and the at least one second data directory included in the target authority policy includes at least one first data directory specified by the target user, allocating the target index corresponding fragment to the at least one first data directory.
Optionally, the allocating the target index corresponding segment to the at least one first data directory includes:
executing the following processing aiming at each fragment to be distributed corresponding to the target index:
estimating the size of the current fragment;
screening at least one third data directory with the residual capacity not smaller than the estimated current fragment size from the at least one first data directory;
counting the number of fragments which belong to the target index and exist in each third data directory;
selecting a fourth data directory with the least counted number of fragments from the at least one third data directory;
assigning the current shard to the fourth data directory.
Optionally, the allocating the current segment to the fourth data directory includes:
if a plurality of fourth data directories exist, counting the total number of the existing fragments in each fourth data directory;
selecting a fifth data directory with the least total statistical number from the plurality of fourth data directories;
assigning the current shard to the fifth data directory.
Optionally, the method further includes:
if the third data directory does not exist in the at least one first data directory, selecting the data directory with the largest residual capacity from the at least one first data directory;
and distributing the current fragment to the data directory with the maximum residual capacity.
Optionally, the estimating of the size of the current segment includes:
acquiring the remaining total capacity of the at least one first data directory;
acquiring a target capacity accounting for a preset percentage of the residual total capacity;
taking the quotient of the total size of all existing shards in the current Elasticissearch cluster and the total number of all existing shards as the average value of the sizes of the shards corresponding to the target index;
and selecting the maximum value from the target capacity and the average value as the estimated size of the current fragment.
In a second aspect, the present application provides an index creating apparatus, applied to a node included in an Elasticsearch cluster, where the Elasticsearch cluster includes at least one node, each node includes at least one disk, and the Elasticsearch cluster supports multi-user usage, and the apparatus includes:
a receiving unit, configured to receive an index creation request sent by a target user, where the index creation request includes an identifier of the target user, an index name of a target index to be created by the target user, and at least one first data directory specified by the target user and used for storing a segment corresponding to the target index;
the searching unit is used for searching a target authority policy containing the index name of the target index from at least one configured authority policy, wherein the authority policy is used for granting the authority of creating the index under at least one second data directory configured for the user, the second data directories corresponding to each user are different, and the disks mounted by the second data directories belonging to different users are different;
and the distribution unit is used for distributing the target index corresponding fragment to at least one first data directory if the identifier of the user included in the target authority policy is the same as the identifier of the target user and the at least one second data directory included in the target authority policy includes at least one first data directory specified by the target user.
Optionally, the allocating, by the allocating unit, the target index corresponding partition to the at least one first data directory includes:
executing the following processing aiming at each fragment to be distributed corresponding to the target index:
estimating the size of the current fragment;
screening at least one third data directory with the residual capacity not smaller than the estimated current fragment size from the at least one first data directory;
counting the number of fragments which belong to the target index and exist in each third data directory;
selecting a fourth data directory with the least counted number of fragments from the at least one third data directory;
assigning the current shard to the fourth data directory.
Optionally, the allocating unit allocates the current segment to the fourth data directory, including:
if a plurality of fourth data directories exist, counting the total number of the existing fragments in each fourth data directory;
selecting a fifth data directory with the least total statistical number from the plurality of fourth data directories;
assigning the current shard to the fifth data directory.
Optionally, the allocating unit is further configured to select, if the third data directory does not exist in the at least one first data directory, a data directory with the largest remaining capacity from the at least one first data directory; and distributing the current fragment to the data directory with the maximum residual capacity.
Optionally, the estimating, by the allocating unit, the size of the current segment, including:
acquiring the remaining total capacity of the at least one first data directory;
acquiring a target capacity accounting for a preset percentage of the residual total capacity;
taking the quotient of the total size of all existing shards in the current Elasticissearch cluster and the total number of all existing shards as the average value of the sizes of the shards corresponding to the target index;
and selecting the maximum value from the target capacity and the average value as the estimated size of the current fragment.
As can be seen from the above description, in the embodiment of the present application, different data directories are configured for different users through permission policy configuration, and different disks mounted in the data directories belonging to different users are different, so that user data is finally stored in the disk mounted in the data directory allocated to the user data. It can be seen that the physical spatial isolation of different users has been defined by the rights policy configuration. When a request of a user for creating an index is received, the legality of the user for creating the index and the legality of a data directory specified by the user are verified by matching with a configured authority policy, and the user is allowed to execute only through the verified request, namely, only the legal user creates the index on the legal data directory is allowed, so that the service data of different users are physically isolated, and the service of a plurality of users is not influenced due to the fault of a single disk.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a flow chart of an index creation method according to an embodiment of the present disclosure;
FIG. 2 is a schematic diagram of a rights policy configuration page according to an embodiment of the present application;
FIG. 3 is a flowchart illustrating an implementation of step 103 according to an embodiment of the present application;
FIG. 4 is a flowchart illustrating an implementation of step 305 according to an embodiment of the present application;
FIG. 5 is a flowchart illustrating an implementation of step 301 according to an embodiment of the present application;
fig. 6 is a schematic diagram of an index creating apparatus according to an embodiment of the present application.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application.
The terminology used in the embodiments of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the embodiments of the present application. As used in the embodiments of the present application, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It should be understood that although the terms first, second, third, etc. may be used in the embodiments of the present application to describe various information, the information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, the negotiation information may also be referred to as second information, and similarly, the second information may also be referred to as negotiation information without departing from the scope of the embodiments of the present application. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.
Referring to fig. 1, a flowchart of an index creation method shown in the embodiment of the present application is applied to nodes included in an Elasticsearch cluster.
The Elasticsearch cluster may comprise at least one node, each node comprising at least one disk. The Elasticsearch cluster may provide storage management and retrieval services for multiple users.
As shown in fig. 1, the process may include the following steps:
step 101, receiving an index creation request sent by a target user.
Here, the user who sends the index creation request is referred to as a target user. It is to be understood that the reference to the target user is merely a name for convenience of distinction and is not intended to be limiting.
The index creating request comprises an identification of a target user, an index name of a target index to be created by the target user and at least one first data directory specified by the target user and used for storing the fragments corresponding to the target index.
Here, the index to be created by the target user is referred to as a target index. It is to be understood that the reference to target index is merely a nomenclature for ease of distinction and is not intended to be limiting.
And the data directory specified by the target user and used for storing the fragments corresponding to the target index is called a first data directory. It is to be understood that the first data directory is named for convenience of distinguishing and is not intended to be limiting.
Step 102, searching a target authority policy containing the index name of the target index from at least one configured authority policy.
In the embodiment of the application, an administrator can configure the authority policy through the authority management platform. As one embodiment, the rights management platform may be Ranger.
Range is a unified framework integrating monitoring and authority management, and an administrator can perform authority strategy configuration through a Web product Interface (English: Website User Interface, abbreviated as Web UI) page provided by Range.
Referring to fig. 2, a schematic diagram of a permission policy configuration page shown in the embodiment of the present application is shown. The rights policies configured through the page include, but are not limited to, the following: a policy name (e.g., P1), an Index name (e.g., Index1) of an Index to which the policy applies, a User name (e.g., User1) of a User having an operation right to the Index, a data directory (e.g.,/path 1,/path 2,/path 3,/path 4) configured for the User to store a corresponding slice of the Index, and an operation (e.g., create) executable on the Index.
In the embodiment of the application, the authority for the User to create the Index is authorized through the configuration authority policy, specifically, the authority for the User to create the Index under the data directory configured for the User is authorized, for example, the authority policy shown in fig. 2 indicates the authority for the User1 to create the Index1 under/path 1,/path 2,/path 3,/path 4.
In the embodiment of the present application, the data directory configured for the user in the authority policy is referred to as a second data directory. It is to be understood that the reference to the second directory is merely a nomenclature for convenience of distinction and is not intended to be limiting.
In addition, in the embodiment of the present application, the second data directories configured for different users are different, and the disks mounted in the second data directories belonging to different users are different.
For example, the second data directories configured for the User 1(User1) are/path 1,/path 2,/path 3,/path 4, respectively, and the disks mounted on the second data directories are Disk1, Disk2, Disk3, and Disk 4; the second data directories configured for the User 2(User2) are/path 5,/path 6 respectively, and the disks mounted on the second data directories are Disk5 and Disk6 respectively.
That is, when configuring the permission policy, it is defined to physically isolate the storage spaces corresponding to different users.
Nodes of the Elasticissearch cluster can regularly pull configured permission strategies from the Range permission management platform by running a Range plug-in. When the node acquires the index name of the target index to be created by the target user through step 101, the authority policy including the index name of the target index, that is, the authority policy configured for the target index, may be queried from the acquired authority policies according to the index name of the target index.
For example, if the Index name of the target Index obtained in step 101 is Index1, the authority policy P1 including Index1 can be queried in this step.
In the embodiment of the application, the authority policy containing the index name of the target index is called a target authority policy. It is to be understood that the reference to target permission policy is merely a nomenclature for ease of distinction and is not intended to be limiting.
And 103, if the identification of the user included in the target authority policy is the same as the identification of the target user, and the at least one second data directory included in the target authority policy includes at least one first data directory specified by the target user, allocating the fragment corresponding to the target index to the at least one first data directory.
Here, the identifier of the user included in the target permission policy is the same as the identifier of the target user, which indicates that the target user is a legal user and has the permission to create the target index; the at least one second data directory included in the target authority policy comprises at least one first data directory specified by the target user, and the first data directory specified by the user and used for storing the corresponding fragment of the target index is in a legal data directory (second data directory) range configured by an administrator.
After the user identity is determined to be legal and the specified data directory is legal through the step, namely after the user identity is verified to be authorized, the target index is allowed to be created under the first data directory specified by the user, namely, the fragment corresponding to the target index is allowed to be distributed under the first data directory.
For example, User1 requests to create Index1, and specifies that the data directories storing the fragment corresponding to Index1 are/path 1,/path 2,/path 3, respectively, and matches with authority policy P1 corresponding to Index1 to know that User1 is a valid User,/path 1,/path 2,/path 3 is also an allowed valid data directory, and then Index1 can be created under/path 1,/path 2,/path 3.
Specifically, the process of allocating the fragment corresponding to the target index to the first data directory is described below, and details are not described here.
Thus, the flow shown in fig. 1 is completed.
As can be seen from the flow shown in fig. 1, in the embodiment of the present application, different data directories are configured for different users through permission policy configuration, and different disks mounted in the data directories belonging to different users are different, and the user data is finally stored in the disk mounted in the data directory allocated to the user data. It can be seen that the physical spatial isolation of different users has been defined by the rights policy configuration. When a request of a user for creating an index is received, the legality of the user for creating the index and the legality of a data directory specified by the user are verified by matching with a configured authority policy, and the user is allowed to execute only through the verified request, namely, only the legal user creates the index on the legal data directory is allowed, so that the service data of different users are physically isolated, and the service of a plurality of users is not influenced due to the fault of a single disk.
The following describes a process of allocating the target index corresponding slice to at least one first data directory in step 103. Specifically, the flow illustrated in fig. 3 may be sequentially executed for each to-be-allocated slice corresponding to the target index. As shown in fig. 3, the process may include the following steps:
step 301, estimating the size of the current fragment.
Since the actual size of the fragment cannot be determined in the current pre-allocation stage, the size of the fragment to be allocated needs to be estimated, so that a data directory (disk) in which the fragment can be placed is determined according to the estimated size in the following.
The process of estimating the size of the segment is described below, and will not be described herein again.
Step 302, at least one third data directory with the residual capacity not less than the estimated current fragment size is screened from the at least one first data directory.
Here, the third data directory refers to the first data directory having a remaining capacity not smaller than the predicted current slice size. That is, from the first data directory, a data directory with enough remaining storage space to place the current slice is found. It is to be understood that the reference to the third data directory is merely a name for convenience of distinguishing and is not intended to be limiting.
Still take/path 1,/path 2,/path 3 specified by User1 as an example, where the remaining capacity of/path 1 is 100GB, the remaining capacity of/path 2 is 200GB, and the remaining capacity of/path 3 is 18GB, and it is estimated that the current fragment size is 20GB, then it is determined that/path 1,/path 2 is a third data directory whose remaining capacity is not less than the current fragment size.
Step 303, for each third data directory, counting the number of segments belonging to the target index existing in the third data directory.
After the third data directories with the remaining storage space satisfying the requirement are screened out through step 302, the number of the fragments existing (allocated) in each third data directory and identical to the index (target index) to which the current fragment belongs is counted.
Still taking Index1 as an example, after screening/path 1 and/path 2 through step 302, the number of fragments belonging to Index1 existing under/path 1 and/path 2 are counted, respectively.
Step 304, selecting a fourth data directory with the smallest counted number of fragments belonging to the target index from at least one third data directory.
Here, the fourth data directory refers to the third data directory currently including the smallest number of slices belonging to the target index. Namely, the data directory with the least fragments corresponding to the current storage target index is selected from the third data directories. It is to be understood that the fourth data directory is named for convenience of distinguishing and is not limiting.
Step 305, the current shard is assigned to the fourth data directory.
Still taking the Index1 to which the current fragment belongs as an example, the number of fragments belonging to Index1 already existing under/path 1 is counted as 1 and the number of fragments belonging to Index1 already existing under/path 2 is counted as 2 through step 303, and then the current fragment is allocated to/path 1 with the smallest number of fragments corresponding to the already existing Index 1.
In the embodiment of the application, the fragments corresponding to the same index are distributed to the data directories in a balanced manner, so that the efficiency of subsequently searching the data corresponding to the index can be effectively improved, and hot data can be prevented from being formed in a certain data directory as much as possible.
The flow shown in fig. 3 is completed.
As can be seen from the flow shown in fig. 3, in the embodiment of the present application, the storage space of the data directories and the existing allocation condition of the partition corresponding to the same index in each data directory are considered comprehensively, and the partition corresponding to the index is allocated in a balanced manner, so that the hot data is avoided as much as possible.
The process of assigning the current shard to the fourth data directory in step 305 is described below. Referring to fig. 4, a flow of implementing step 305 according to an embodiment of the present application is shown.
As shown in fig. 4, the process may include the following steps:
step 401, if there are multiple fourth data directories, for each fourth data directory, counting the total number of existing fragments in the fourth data directory.
When a plurality of fourth data directories are screened out through step 304, a data directory needs to be selected from the plurality of fourth data directories to allocate the current shard.
As an embodiment, a fourth data directory may be selected from the plurality of fourth data directories, and the current shard is allocated to the selected fourth data directory.
As another embodiment, the total number of existing shards in each fourth data directory may be further counted.
Still taking the Index1 to which the current fragment belongs as an example, if the number of fragments belonging to the Index1 already existing under/path 1 and/path 2 counted by step 303 is the same, for example, 2 fragments, the number of all fragments already existing under/path 1 and/or the number of all fragments already existing under path2 are continuously counted.
Step 402, selecting a fifth data directory with the least total number of the counted existing fragments from the plurality of fourth data directories.
Here, the fifth data directory refers to a fourth data directory having the smallest total number of existing fragments. Namely, the data directory with the least number of the existing fragments is selected from the fourth data directory. It is to be understood that the fifth data directory is named for convenience of distinguishing and is not limiting.
In step 403, the current shard is assigned to the fifth data directory.
For example, if the total number of existing shards under/path 1 is 20 and the total number of existing shards under/path 2 is 16, the current shard is allocated to the/path 2 with the smallest number of existing shards, which is counted by step 401.
The flow shown in fig. 4 is completed.
As can be seen from the flow shown in fig. 4, in the embodiment of the present application, in a case that it is determined that the distribution of the distributed segments corresponding to the current index on each data directory (fourth data directory) is balanced, the data directory with a relatively lighter load (with a smaller total number of segments) is selected for distribution by further considering the overall load condition of each data directory, that is, the number of the segments of each data directory is balanced as much as possible, so as to improve the cluster retrieval efficiency.
In addition, it should be added that when the data directories meeting the conditions are not screened out through fig. 3 and fig. 4, the data directory with the maximum remaining capacity can be directly selected from the first data directories specified by the user, and the current segment is allocated to the data directory with the maximum remaining capacity, so as to achieve the purpose of reasonably utilizing the storage space.
The process of estimating the size of the current tile in step 301 will be described below. Referring to fig. 5, a flow of implementing step 301 is shown in the embodiment of the present application.
As shown in fig. 5, the process may include the following steps:
step 501, obtaining the remaining total capacity of at least one first data directory.
That is, the remaining total capacity of all the first data directories that can store the corresponding slices of the target index specified by the user is counted.
Still take/path 1,/path 2,/path 3 specified by User1 for Index1 as an example, wherein the remaining capacity of/path 1 is 100GB, the remaining capacity of/path 2 is 200GB, and the remaining capacity of/path 3 is 18GB, so that the total remaining capacity can be 318GB currently.
Step 502, obtaining a target capacity which accounts for a preset percentage of the remaining total capacity.
For example, if the preset percentage is 5%, the currently obtained target capacity is 318GB × 5% — 15.9 GB.
Here, the capacity that occupies a preset percentage of the remaining total capacity is referred to as a target capacity. It is to be understood that the reference to target capacity is merely a nomenclature for ease of distinction and is not intended to be limiting.
Step 503, taking the quotient of the total size of all existing shards in the current Elasticsearch cluster and the total number of all existing shards as the average value of the sizes of the corresponding shards of the target index.
For example, if the total size of all existing shards in the current Elasticsearch cluster is 2000GB, and the total number of all existing shards is 100, the average value of the shard sizes determined by this step is 2000/100 ═ 20 GB.
And step 504, selecting the maximum value from the target capacity and the average value as the estimated size of the current fragment.
Still taking the target capacity as 15.9GB and the average value of the slice sizes as 20GB as an example, 20GB is selected as the estimated size of the current slice in this step.
The flow shown in fig. 5 is completed. The estimation of the slice size is completed by the flow shown in fig. 5.
The method provided by the embodiment of the present application is described above, and the apparatus provided by the embodiment of the present application is described below:
referring to fig. 6, an index creating apparatus shown for the embodiment of the present application is applied to nodes included in an Elasticsearch cluster, where the Elasticsearch cluster includes at least one node, each node includes at least one disk, and the Elasticsearch cluster supports multi-user usage, and the apparatus includes: a receiving unit 601, a searching unit 602, and an allocating unit 603, wherein:
a receiving unit 601, configured to receive an index creation request sent by a target user, where the index creation request includes an identifier of the target user, an index name of a target index to be created by the target user, and at least one first data directory specified by the target user and used for storing a segment corresponding to the target index;
a searching unit 602, configured to search, from at least one configured permission policy, a target permission policy that includes an index name of the target index, where the permission policy is used to grant a user permission to create an index under at least one second data directory configured for the user, where second data directories corresponding to each user are different, and disks mounted in the second data directories belonging to different users are different;
an allocating unit 603, configured to allocate the target index corresponding segment to at least one first data directory if the identifier of the user included in the target permission policy is the same as the identifier of the target user, and the at least one second data directory included in the target permission policy includes at least one first data directory specified by the target user.
As an embodiment, the allocating unit 603 allocates the target index corresponding slice to the at least one first data directory, including:
executing the following processing aiming at each fragment to be distributed corresponding to the target index:
estimating the size of the current fragment;
screening at least one third data directory with the residual capacity not smaller than the estimated current fragment size from the at least one first data directory;
counting the number of fragments which belong to the target index and exist in each third data directory;
selecting a fourth data directory with the least counted number of fragments from the at least one third data directory;
assigning the current shard to the fourth data directory.
As an embodiment, the allocating unit 603 allocates the current slice to the fourth data directory, including:
if a plurality of fourth data directories exist, counting the total number of the existing fragments in each fourth data directory;
selecting a fifth data directory with the least total statistical number from the plurality of fourth data directories;
assigning the current shard to the fifth data directory.
As an embodiment, the allocating unit 603 is further configured to select, if the third data directory does not exist in the at least one first data directory, a data directory with the largest remaining capacity from the at least one first data directory; and distributing the current fragment to the data directory with the maximum residual capacity.
As an embodiment, the allocating unit 603 pre-estimates the size of the current slice, including:
acquiring the remaining total capacity of the at least one first data directory;
acquiring a target capacity accounting for a preset percentage of the residual total capacity;
taking the quotient of the total size of all existing shards in the current Elasticissearch cluster and the total number of all existing shards as the average value of the sizes of the shards corresponding to the target index;
and selecting the maximum value from the target capacity and the average value as the estimated size of the current fragment.
The description of the apparatus shown in fig. 6 is thus completed.
As can be seen from the above description, in the embodiment of the present application, different data directories are configured for different users through permission policy configuration, and different disks mounted in the data directories belonging to different users are different, so that user data is finally stored in the disk mounted in the data directory allocated to the user data. It can be seen that the physical spatial isolation of different users has been defined by the rights policy configuration. When a request of a user for creating an index is received, the legality of the user for creating the index and the legality of a data directory specified by the user are verified by matching with a configured authority policy, and the user is allowed to execute only through the verified request, namely, only the legal user creates the index on the legal data directory is allowed, so that the service data of different users are physically isolated, and the service of a plurality of users is not influenced due to the fault of a single disk.
The above description is only a preferred embodiment of the present application, and should not be taken as limiting the present application, and any modifications, equivalents, improvements, etc. made within the spirit and principle of the present application shall be included in the scope of the present application.

Claims (10)

1. An index creation method applied to nodes included in an Elasticissearch cluster, wherein the Elasticissearch cluster includes at least one node, each node includes at least one disk, and the Elasticissearch cluster supports multi-user usage, the method comprising:
receiving an index creation request sent by a target user, wherein the index creation request comprises an identifier of the target user, an index name of a target index to be created by the target user and at least one first data directory specified by the target user and used for storing a fragment corresponding to the target index;
searching a target authority policy containing an index name of the target index from at least one configured authority policy, wherein the authority policy is used for granting the authority of creating the index under at least one second data directory configured for the user, and the second data directories corresponding to each user are different and the disks mounted by the second data directories belonging to different users are different;
and if the identification of the user included in the target authority policy is the same as the identification of the target user, and the at least one second data directory included in the target authority policy includes at least one first data directory specified by the target user, allocating the target index corresponding fragment to the at least one first data directory.
2. The method of claim 1, wherein said assigning the target index correspondence slice to the at least one first data directory comprises:
executing the following processing aiming at each fragment to be distributed corresponding to the target index:
estimating the size of the current fragment;
screening at least one third data directory with the residual capacity not smaller than the estimated current fragment size from the at least one first data directory;
counting the number of fragments which belong to the target index and exist in each third data directory;
selecting a fourth data directory with the least counted number of fragments from the at least one third data directory;
assigning the current shard to the fourth data directory.
3. The method of claim 2, wherein said assigning the current shard to the fourth data directory comprises:
if a plurality of fourth data directories exist, counting the total number of the existing fragments in each fourth data directory;
selecting a fifth data directory with the least total statistical number from the plurality of fourth data directories;
assigning the current shard to the fifth data directory.
4. The method of claim 2, wherein the method further comprises:
if the third data directory does not exist in the at least one first data directory, selecting the data directory with the largest residual capacity from the at least one first data directory;
and distributing the current fragment to the data directory with the maximum residual capacity.
5. The method of claim 2, wherein the pre-estimating the size of the current slice comprises:
acquiring the remaining total capacity of the at least one first data directory;
acquiring a target capacity accounting for a preset percentage of the residual total capacity;
taking the quotient of the total size of all existing shards in the current Elasticissearch cluster and the total number of all existing shards as the average value of the sizes of the shards corresponding to the target index;
and selecting the maximum value from the target capacity and the average value as the estimated size of the current fragment.
6. An index creation apparatus applied to a node included in an Elasticissearch cluster, wherein the Elasticissearch cluster includes at least one node, each node includes at least one disk, and the Elasticissearch cluster supports multi-user usage, the apparatus comprising:
a receiving unit, configured to receive an index creation request sent by a target user, where the index creation request includes an identifier of the target user, an index name of a target index to be created by the target user, and at least one first data directory specified by the target user and used for storing a segment corresponding to the target index;
the searching unit is used for searching a target authority policy containing the index name of the target index from at least one configured authority policy, wherein the authority policy is used for granting the authority of creating the index under at least one second data directory configured for the user, the second data directories corresponding to each user are different, and the disks mounted by the second data directories belonging to different users are different;
and the distribution unit is used for distributing the target index corresponding fragment to at least one first data directory if the identifier of the user included in the target authority policy is the same as the identifier of the target user and the at least one second data directory included in the target authority policy includes at least one first data directory specified by the target user.
7. The apparatus of claim 6, wherein the assigning unit assigns the target index correspondence slice to the at least one first data directory comprises:
executing the following processing aiming at each fragment to be distributed corresponding to the target index:
estimating the size of the current fragment;
screening at least one third data directory with the residual capacity not smaller than the estimated current fragment size from the at least one first data directory;
counting the number of fragments which belong to the target index and exist in each third data directory;
selecting a fourth data directory with the least counted number of fragments from the at least one third data directory;
assigning the current shard to the fourth data directory.
8. The apparatus of claim 7, wherein the allocation unit to allocate the current shard to the fourth data directory comprises:
if a plurality of fourth data directories exist, counting the total number of the existing fragments in each fourth data directory;
selecting a fifth data directory with the least total statistical number from the plurality of fourth data directories;
assigning the current shard to the fifth data directory.
9. The apparatus of claim 7, wherein:
the allocating unit is further configured to select a data directory with the largest remaining capacity from the at least one first data directory if the third data directory does not exist in the at least one first data directory; and distributing the current fragment to the data directory with the maximum residual capacity.
10. The apparatus of claim 7, wherein the allocating unit predicts the size of the current slice, comprising:
acquiring the remaining total capacity of the at least one first data directory;
acquiring a target capacity accounting for a preset percentage of the residual total capacity;
taking the quotient of the total size of all existing shards in the current Elasticissearch cluster and the total number of all existing shards as the average value of the sizes of the shards corresponding to the target index;
and selecting the maximum value from the target capacity and the average value as the estimated size of the current fragment.
CN202111143947.4A 2021-09-28 2021-09-28 Index creation method and device Active CN113836143B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111143947.4A CN113836143B (en) 2021-09-28 2021-09-28 Index creation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111143947.4A CN113836143B (en) 2021-09-28 2021-09-28 Index creation method and device

Publications (2)

Publication Number Publication Date
CN113836143A true CN113836143A (en) 2021-12-24
CN113836143B CN113836143B (en) 2024-02-27

Family

ID=78967067

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111143947.4A Active CN113836143B (en) 2021-09-28 2021-09-28 Index creation method and device

Country Status (1)

Country Link
CN (1) CN113836143B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114461745A (en) * 2021-12-27 2022-05-10 天翼云科技有限公司 Index establishing method and device and electronic equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090083403A1 (en) * 2006-06-02 2009-03-26 Huawei Technologies Co., Ltd. Method, device and system for implementing vpn configuration service
CN106874383A (en) * 2017-01-10 2017-06-20 清华大学 A kind of decoupling location mode of metadata of distributed type file system
CN109582758A (en) * 2018-12-06 2019-04-05 重庆邮电大学 A kind of Elasticsearch index fragment optimization method
CN112612751A (en) * 2020-12-25 2021-04-06 北京浪潮数据技术有限公司 Asynchronous directory operation method, device, equipment and system
CN112965935A (en) * 2021-02-20 2021-06-15 北京星网锐捷网络技术有限公司 Data processing method and distributed file system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090083403A1 (en) * 2006-06-02 2009-03-26 Huawei Technologies Co., Ltd. Method, device and system for implementing vpn configuration service
CN106874383A (en) * 2017-01-10 2017-06-20 清华大学 A kind of decoupling location mode of metadata of distributed type file system
CN109582758A (en) * 2018-12-06 2019-04-05 重庆邮电大学 A kind of Elasticsearch index fragment optimization method
CN112612751A (en) * 2020-12-25 2021-04-06 北京浪潮数据技术有限公司 Asynchronous directory operation method, device, equipment and system
CN112965935A (en) * 2021-02-20 2021-06-15 北京星网锐捷网络技术有限公司 Data processing method and distributed file system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
廖小飞;殷江培;程斌;: "基于P2P的VOD系统中数据缓存策略研究", 华中科技大学学报(自然科学版), no. 08, 15 August 2007 (2007-08-15) *
徐旭平;李小勇;: "基于MongoDB的元数据管理研究", 信息技术, no. 08, 23 August 2018 (2018-08-23) *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114461745A (en) * 2021-12-27 2022-05-10 天翼云科技有限公司 Index establishing method and device and electronic equipment

Also Published As

Publication number Publication date
CN113836143B (en) 2024-02-27

Similar Documents

Publication Publication Date Title
CN109684092B (en) Resource allocation method and device
JP4139675B2 (en) Virtual volume storage area allocation method, apparatus and program thereof
US10466899B2 (en) Selecting controllers based on affinity between access devices and storage segments
CN110209490B (en) Memory management method and related equipment
US7392261B2 (en) Method, system, and program for maintaining a namespace of filesets accessible to clients over a network
US9477503B2 (en) Resource management server, resource management method and storage medium for identifying virtual machines satisfying resource requirements
US7788233B1 (en) Data store replication for entity based partition
US9052962B2 (en) Distributed storage of data in a cloud storage system
RU2710860C1 (en) Method for limiting the scope of automatic selection of a virtual protection machine
CN106302702A (en) Burst storage method, the Apparatus and system of data
US9135041B2 (en) Selecting provisioning targets for new virtual machine instances
US8417929B2 (en) System for selecting a server from a plurality of server groups to provide a service to a user terminal based on a boot mode indicated in a boot information from the user terminal
US10379834B2 (en) Tenant allocation in multi-tenant software applications
US20120072459A1 (en) Distributed data storage and access systems
KR101714412B1 (en) Method and apparatus for organizing database system in cloud environment
CN109542861B (en) File management method, device and system
CN106296530B (en) Trust coverage for non-converged infrastructure
WO2010006134A2 (en) Distributed data storage and access systems
CN110569302A (en) method and device for physical isolation of distributed cluster based on lucene
US20150278543A1 (en) System and Method for Optimizing Storage of File System Access Control Lists
CN113836143B (en) Index creation method and device
CN107948229B (en) Distributed storage method, device and system
JP5758449B2 (en) Data rearrangement apparatus, method and program
KR101386161B1 (en) Apparatus and method for managing compressed image file in cloud computing system
CN110515564B (en) Method and device for determining input/output (I/O) path

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant