CN111221814B

CN111221814B - Method, device and equipment for constructing secondary index

Info

Publication number: CN111221814B
Application number: CN201811426358.5A
Authority: CN
Inventors: 刘洋
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2018-11-27
Filing date: 2018-11-27
Publication date: 2023-06-27
Anticipated expiration: 2038-11-27
Also published as: CN111221814A

Abstract

The application discloses a construction method of a secondary index, which comprises the following steps: reading original data in a service node through a mapping task, selecting a non-primary key column in the original data as a primary key column of a secondary index table, and constructing data of the secondary index table according to the primary key column of the secondary index table; and writing the data of the secondary index table into the secondary index table. The method is used for solving the problem of occupation of original service resources when the secondary index is built in the prior art.

Description

Method, device and equipment for constructing secondary index

Technical Field

The application relates to the technical field of distributed databases, in particular to a method and a device for constructing a secondary index, electronic equipment and storage equipment.

Background

In a distributed database, each product will generally provide the best query mode in a certain scenario, taking HBase (an open-source distributed non-relational storage system) as an example: HBase is a NoSQL (non-relational database) database supporting KV queries, supporting primary key (rowkey) queries, range (scan) queries, and full-table queries. Of which the most common is a primary key based query. Each service needs to determine a primary key in advance, and the primary key needs to be designated during writing and inquiring, so that the inquiring mode is limited, if inquiring is to be performed based on a non-primary key row, the HBase can only adopt full-table scanning, and the resource consumption is huge and the performance is extremely low.

Therefore, a secondary index scheme based on non-primary key row query needs to be provided, and a secondary index can be established for the non-primary key row, so that quick query for the non-primary key row is realized. Then, in the face of massive historical data, how to quickly construct the secondary index is the first step to be solved.

Under the prior art, apache Phoenix is an open source SQL engine, can provide SQL capability for HBase, supports the establishment of a secondary index for HBase non-primary key columns, and also supports the establishment of an index for historical data. The scheme is as follows: when the index is created, if synchronous construction index is selected, multithreading is adopted to simultaneously read data in a main table (source data table), then the index is constructed, and then a secondary index table is written to construct a secondary index; when creating the index, if an asynchronous build index is selected, the batch build index may be later implemented by submitting MR tasks.

The following disadvantages exist in the prior art for constructing the secondary index: when the index is synchronously constructed, the multithreading runs in the region server, which occupies the resources of the region server, such as the Handler and the memory of the region server, and influences the normal read-write requests of other tables.

Disclosure of Invention

The application provides a method for constructing a secondary index, which aims to solve the problem of occupation of original service resources when the secondary index is constructed in the prior art.

The application provides a construction method of a secondary index, which comprises the following steps:

reading original data in a service node through a mapping task, selecting a non-primary key column in the original data as a primary key column of a secondary index table, and constructing data of the secondary index table according to the primary key column of the secondary index table;

and writing the data of the secondary index table into the secondary index table.

Optionally, the mapping task includes a local area service thread corresponding to a main table partition in the service node;

the reading of the original data in the service node by the mapping task comprises the following steps: and reading the original data of the main table partition corresponding to the local regional service thread in the service node through the local regional service thread in the mapping task.

Optionally, the method further comprises: performing snapshot operation on the original data in the service node to generate snapshot data of the original data;

the reading of the original data in the service node by the mapping task comprises the following steps: reading snapshot data of the original data in the service node through the mapping task;

the selecting the non-primary key column in the original data as the primary key column of the secondary index table includes: and selecting a non-primary key column in the snapshot data of the original data as a primary key column of a secondary index table.

Optionally, the method further comprises: obtaining indication information of a main key column of a secondary index table;

the selecting the non-primary key column in the original data as the primary key column of the secondary index table includes: and searching a non-primary key column matched with the primary key column of the secondary index table from the original data according to the indication information of the primary key column of the secondary index table, and determining the non-primary key column matched with the primary key column of the secondary index table as the primary key column of the secondary index table.

Optionally, the constructing the data of the secondary index table according to the primary key row of the secondary index table includes:

the primary key columns in the original data, and the non-primary key columns other than the primary key columns that are used as the secondary index table, are constructed as the non-primary key columns of the secondary index table.

Optionally, the writing the data of the secondary index table into the secondary index table includes:

according to the data structure of the secondary index table, carrying out aggregation and sequencing treatment on the data of the secondary index table to obtain the data of the secondary index table after the aggregation and sequencing treatment;

and writing the data of the secondary index table after the aggregation and sequencing treatment into the index table.

Optionally, the aggregating and sorting processing is performed on the data of the secondary index table according to the data structure of the secondary index table, so as to obtain the data of the secondary index table after the aggregating and sorting processing, including:

Obtaining the feature requirement information of the primary key row of the secondary index table;

and according to the characteristic requirement information of the primary key row of the secondary index table, carrying out aggregation and sequencing treatment on the data of the secondary index table to obtain the data of the secondary index table after aggregation and sequencing treatment.

and generating a file comprising the data of the secondary index table through a summarizing task, and loading the file into the secondary index table.

Optionally, the number of the summarizing tasks is the same as the number of the partitions of the secondary index table.

Optionally, the summarizing task includes an index file writing thread;

the generating a file including the data of the secondary index table through the summarizing task, and loading the file into the secondary index table includes: generating a file comprising data of the secondary index table through an index file writing thread in a summary task, and loading the file into the secondary index table.

Optionally, the mapping task is a mapping task running in a service node.

Optionally, the secondary index table is a secondary index table in a non-relational database.

The application also provides a device for constructing the secondary index, which comprises:

the original data reading unit is used for reading the original data in the service node through the mapping task;

a secondary index table data construction unit, configured to select a non-primary key column in the original data as a primary key column of a secondary index table, and construct data of the secondary index table according to the primary key column of the secondary index table;

and the index table data writing unit is used for writing the data of the secondary index table into the secondary index table.

The application also provides an electronic device comprising:

a processor; and

a memory for storing a program of a construction method of the secondary index, the apparatus being powered on and executing the program of the construction method of the secondary index by the processor, and performing the steps of:

reading original data in the service node through a mapping task;

selecting a non-primary key column in the original data as a primary key column of a secondary index table, and constructing data of the secondary index table according to the primary key column of the secondary index table;

The present application also provides a storage device storing a program of a construction method of a secondary index, the program being executed by a processor to perform the steps of:

Reading original data in the service node through a mapping task;

Compared with the prior art, the application has the following advantages:

the application provides a method, a device, electronic equipment and storage equipment for constructing a secondary index, wherein original data in a service node is read through a mapping task; the problem of occupation of original regional service resources when constructing the secondary index is avoided.

In the preferred scheme of the application, the snapshot operation is carried out on the original data in the service node, so that the snapshot data of the original data is generated, the original data in the service node is solidified, and the interference of real-time data writing is avoided in the process of constructing the index.

In the preferred scheme of the application, the file comprising the data of the secondary index table is generated through the summarizing task and then is loaded into the secondary index table, so that the situation that the data of the secondary index table are directly written into the secondary index table across a network is avoided, and the interference to the service node where the secondary index table is located is reduced.

Drawings

Fig. 1 is a flowchart of a method for constructing a secondary index according to a first embodiment of the present application.

Fig. 2 is a flowchart of an example of the two-level index construction provided in the first embodiment of the present application.

Fig. 3 is a schematic diagram of a scenario of two-level index construction provided in the first embodiment of the present application.

Fig. 4 is a schematic diagram of a second level index building apparatus according to a second embodiment of the present application.

Fig. 5 is a schematic diagram of an electronic device according to a third embodiment of the present application.

Fig. 6 is a flowchart of a method for offline construction of a secondary index according to a fifth embodiment of the present application.

Detailed Description

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. The present invention may be embodied in many other forms than those herein described, and those skilled in the art will readily appreciate that the present invention may be similarly embodied without departing from the spirit or essential characteristics thereof, and therefore the present invention is not limited to the specific embodiments disclosed below.

A first embodiment of the present application provides a method for constructing a secondary index, which is described in detail below with reference to fig. 1, 2, and 3.

As shown in fig. 1, in step S101, original data in a service node is read by a mapping task, a non-primary key column in the original data is selected as a primary key column of a secondary index table, and data of the secondary index table is constructed according to the primary key column of the secondary index table.

The service node refers to a physical node for storing the original data. The service nodes include regional service nodes (region server nodes), which may be one or more, and each regional service node may store a portion of the original data when the regional service nodes are plural.

The raw data in the service node includes raw data of a distributed database.

The distributed database includes NoSQL (non-relational database), such as: HBASE (an open source distributed NoSQL storage system) and the like.

The mapping task may be a mapping task running in a service node, and may include a local area service thread corresponding to a primary table partition in the service node. The mapping task may be a Map task in MapReduce. MapReduce is a Hadoop offline execution engine and supports divide and conquer processing of mass data. The main table is an index table of the primary database, as shown in fig. 3, and the table with the user id as the main key is the main table. The main table includes at least one main table partition, and may be divided into a plurality of main table partitions because the data amount of the main table is large in general.

If a master table contains multiple master table partitions, different master table partitions may be stored on different service nodes, as shown in fig. 2, where the master table is distributed on 3 area service nodes, each area service node stores different master table partitions, each master table partition corresponds to a mapping task, each mapping task contains a local area service thread, at this time, each local area service thread corresponds to a master table partition on an area service node, and each local area service thread reads the original data of the master table partition corresponding to the local area service thread.

If a master table contains multiple master table partitions, multiple master table partitions may also be stored on the same service node, and a mapping task includes multiple local area service threads, where each local area service thread corresponds to one master table partition in the service node, and the local area service thread reads the original data of the corresponding master table partition.

According to the method and the device, a mapping task is started for each service node, and the data of the corresponding service node is read by adopting a local area service thread, so that source data is prevented from being read across multiple layers; because the local area service thread has the capability of reading data of the original region server, the local area service thread (Localregion server) can be realized to mainly multiplex the read link of the region server, and the reconstruction of the read link of HFile is avoided.

Furthermore, in order to avoid the interference of real-time data writing, before the step of reading the original data in the service node through the mapping task, snapshot (snap shot) operation may be performed on the original data of the service node, so as to generate Snapshot data of the original data. Snapshot refers to a SnapShot of data, and does not relate to copying of the data, so that the SnapShot is a more efficient data backup scheme. As shown in fig. 2, a SnapShot operation is performed on a main table partition of each service node storing a main table, and a main table snapplot (SnapShot data of the main table partition) is generated. After generating the main table Snapplot, the original data of the main table partition is solidified, and the interference of real-time data writing is not worried in the process of constructing the secondary index.

The reading of the original data in the service node by the mapping task comprises the following steps: the snapshot data of the original data in the service node is read by the mapping task.

After generating the snapshot data of the original data, the local area service thread may read the snapshot data of the original data from the corresponding service node. The corresponding service node may refer to a physical node running a local area service thread. As shown in fig. 2, the

region servers

1, 2 and 3 respectively correspond to a local region service thread, and each local region service thread respectively reads the snapshot data of the main table partition from the corresponding region service node.

It should be noted that, since there are usually multiple main table partitions, each main table partition corresponds to one local area service thread, and the multiple local area service threads can concurrently read the data of each main table partition, thereby ensuring the throughput of data reading and reducing the time for constructing the secondary index.

As shown in fig. 1, in step S102, a non-primary key column in the original data is selected as a primary key column of a secondary index table, and data of the secondary index table is constructed according to the primary key column of the secondary index table.

The secondary index table may refer to an index table constructed according to a metadata format of the secondary index. The secondary index can improve the efficiency of searching, adding and deleting data.

The data of the secondary index table may refer to index data stored in the secondary index table. In fig. 3, the data with "name" as the primary key column is the data of the constructed secondary index table, wherein the primary key column of the secondary index table is named "name", and the non-primary key column is named "user id" and "commodity".

In the implementation, any non-primary key column in the original data can be used as the primary key column of the secondary index table, and the other columns are used as the non-primary key columns of the secondary index table; the indication information of the main key row of the secondary index table can be obtained; according to the indication information of the primary key column of the secondary index table, searching the non-primary key column matched with the primary key column of the secondary index table from the original data, determining the non-primary key column matched with the primary key column of the secondary index table as the primary key column of the secondary index table, for example, if the primary key column name in the indication information of the primary key column of the secondary index table is "name", the "name" column is taken as the primary key column as shown in fig. 3.

When snapshot data of original data in a service node is read through a mapping task, selecting a non-primary key column in the original data as a primary key column of a secondary index table comprises: and selecting a non-primary key column in the snapshot data of the original data as a primary key column of a secondary index table.

The constructing the data of the secondary index table according to the primary key row of the secondary index table comprises the following steps:

the primary key columns in the original data, and the non-primary key columns other than the primary key columns that are used as the secondary index table, are constructed as the non-primary key columns of the secondary index table. For example, as shown in fig. 3, if the primary key column name is "name" in the indication information of the primary key column of the secondary index table, the "user id" column and the "commodity" column are regarded as non-primary key columns.

As shown in fig. 3, the original data in the main table partition is shown in table 1:

user id	Name of name	Goods commodity
				100	Zhang San	Television set
101	Li Si	Refrigerator with a refrigerator body
			110	Wang Wu	Washing machine

TABLE 1

The data for generating the secondary index table is shown in table 2, via step S102:

name of name	User id	Goods commodity
			Zhang San
	100	Television set
			Li Si
	101	Refrigerator with a refrigerator body
			Wang Wu
	110	Washing machine

TABLE 2

As shown in fig. 1, in step S103, the data of the secondary index table is written into the secondary index table.

After the mapping task constructs the data of the secondary index table according to the read original data in the service node, the mapping task can directly write the data of the secondary index table into the index table.

Because the secondary index table may include a plurality of secondary index table partitions, in order to ensure that data belonging to the same section can be written into the same secondary index table partition, the data of the secondary index table may be first aggregated and categorized.

The writing the data of the secondary index table into the secondary index table comprises the following steps:

The data structure of the secondary index table may refer to a metadata format of a directory in the secondary index table, for example, the data sequence in the data structure of the secondary index table in fig. 3 is: main key row: name, non-primary key column: user id, commodity.

For example, if the type of the primary key of the secondary index table is a number, the section where the value is located is 1-100, the data of the secondary index table corresponding to the primary key in the section where the value is 1-50 can be aggregated together through a shuffle operation, so as to obtain the data of one partition of the secondary index table after aggregation and sorting processing; the data of the secondary index table corresponding to the primary key in the interval with the value of 51-100 are aggregated together through the shuffle operation, so that the data of the other partition of the secondary index table after aggregation and sequencing are obtained, and the data of the two partitions are respectively written into the corresponding secondary index table partition, so that the data belonging to the same interval can be ensured to be written into the same secondary index table partition.

For another example, if the type of the primary key of the secondary index table is a Chinese character, the data of the secondary index table with the initial letters of the name pinyin of the primary key in the a-h interval can be aggregated together to obtain the data of one partition of the secondary index table after aggregation and sorting processing; and aggregating the data of the secondary index table with the initial letter of the name pinyin of the primary key in the i-z interval to obtain the data of the other partition of the secondary index table after aggregation and sequencing, and writing the data of the two partitions into the corresponding secondary index table partitions respectively. When the type of the primary key of the secondary index table is Chinese characters, the primary key can be aggregated according to strokes of the primary key in addition to the aggregation classification according to the initial letters of pinyin.

As shown in fig. 3, the records of the first letter of the name pinyin of the primary key in the a-w section are aggregated together, and the records of the names "Chen Er", "litetral", "Liu Yi", "Sun Qi", "wang wu" and "Wu Jiu" in the original data are aggregated in a secondary index table partition; the records of the initial letters of the name pinyin of the primary key in the x-z interval are aggregated together, namely, the records of the names of 'Zhang San', 'Zhao Liu', 'Zheng Shi', 'Zhouba' in the original data are aggregated in another secondary index table partition.

The aggregation and sorting processing is performed on the data of the secondary index table according to the data structure of the secondary index table, so as to obtain the data of the secondary index table after the aggregation and sorting processing, which comprises the following steps:

The feature requirement information of the primary key row of the secondary index table may refer to information according to which data of the secondary index table is aggregated and ordered according to the feature setting of the primary key row. For example, if the primary key column is a name, the feature requirement information of the primary key column of the secondary index table may be: the records of the initials of the name pinyin of the primary key in the a-w interval are aggregated together; the records of the initials of the name pinyin of the primary key in the x-z interval are aggregated together.

Further, in order to avoid the problem of occupying resources of a service node where the secondary index table is located caused when the data of the secondary index table is directly written into the secondary index table, the data of the secondary index table may be sent to a summary task, a file including the data of the secondary index table is generated by the summary task, and then the file is loaded into the secondary index table.

It should be noted that, the number of the summary tasks is determined by the number of partitions of the secondary index table, the number of the summary tasks may be the same as the number of the partitions of the secondary index table, the number of the partitions of the secondary index table may be set according to the requirement, and the secondary index table includes at least one partition of the secondary index table. When the secondary index table comprises two or more secondary index table partitions, each secondary index table partition corresponds to a summary task. As shown in FIG. 2, the secondary index table includes two secondary index table partitions, each corresponding to a summary task.

The summary task may be a resumptask in a MapReduce module, which may include an index file write thread (e.g., indexHFileWriter). The generating a file including the data of the secondary index table through the summarizing task, and loading the file into the secondary index table includes: generating a file comprising data of the secondary index table through an index file writing thread in a summary task, and loading the file into the secondary index table.

If the non-relational database is HBase, the file comprising the data of the secondary index table is HFile of the secondary index table. Generating a file including the data of the secondary index table through the summarization task may refer to writing the constructed data of the secondary index table into the HFile file of the secondary index table through an IndexHFileWriter thread. Because the redtask writes the constructed index data into the file of the secondary index table through the IndexHFileWriter, the operation of writing into the secondary index table through the HBase native API is avoided, the use of the native API to generate an excessively deep call stack is avoided, and the interference to the original service node is reduced; meanwhile, the data of the secondary index table is prevented from being directly written into the secondary index table across the network, and the interference to the service node where the secondary index table is located is reduced.

After the file including the data of the secondary index table is generated by the summarizing task, since the previous step is performed offline, the file including the data of the secondary index table needs to be loaded into the corresponding index table partition.

An application scenario of the first embodiment of the present application is described below with reference to fig. 2 and 3, where, as shown in fig. 2 and 3, the main table includes three main table partitions, which are respectively stored on three regional service nodes, namely, a region server1 node, a region server2 node, and a region server3 node, the main table uses "user id" as a main key, and the secondary index table uses "name" as a main key. The method comprises the steps that three regional service nodes respectively start three independent local regional service threads (LocalRegionServer) to read snapshot data of a main table partition of the corresponding regional service node, then data after aggregation and sequencing are transmitted to two IndexFileWriter tasks through data aggregation and sequencing of a shuffle, so that HFile files containing secondary index table data are generated, finally the HFile files are loaded to the corresponding index table partition (namely the secondary index table partition), and the data in the secondary index table takes names as main keys.

Thus far, the embodiment of the watermark embedding method provided in the first embodiment of the present application is described in detail. According to the first embodiment of the application, the snapshot operation is carried out on the main table partition of each service node storing the main table, the original data of the main table partition is solidified, and the interference of real-time data writing is not worried in the process of constructing the index; by running a mapping service on each service node, the resources of the regional service (region server) are prevented from being occupied, a plurality of local regional service threads can read the partition data of the service nodes concurrently, the throughput of data reading is ensured, and the time for constructing the secondary index is reduced; because the summarizing task generates the file comprising the data of the secondary index table through the index file writing thread, the operation of writing the index table through the HBase native API is avoided, the generation of an excessively deep call stack by using the native API is avoided, and the interference to the original service node is reduced.

Corresponding to the method for constructing the secondary index provided in the first embodiment of the present application, the second embodiment of the present application further provides a device for constructing the secondary index.

As shown in fig. 4, the device for constructing the secondary index includes:

an original data reading unit 401, configured to read original data in the service node through a mapping task;

a secondary index table data construction unit 402, configured to select a non-primary key column in the original data as a primary key column of a secondary index table, and construct data of the secondary index table according to the primary key column of the secondary index table;

an index table data writing unit 403, configured to write data of the secondary index table into the secondary index table.

the original data reading unit is specifically configured to: and reading the original data of the main table partition corresponding to the local regional service thread in the service node through the local regional service thread in the mapping task.

Optionally, the apparatus further includes: the original data snapshot unit is used for carrying out snapshot operation on the original data in the service node and generating snapshot data of the original data;

The original data reading unit is specifically configured to: reading snapshot data of the original data in the service node through the mapping task;

Optionally, the apparatus further includes: a primary key column indication information obtaining unit for obtaining indication information of a primary key column of the secondary index table;

Optionally, the secondary index table data construction unit is specifically configured to:

Optionally, the writing unit of the index table data is specifically configured to:

Optionally, the index table data writing unit is specifically configured to:

Optionally, the summarizing task includes an index file writing thread;

Optionally, the mapping task is a mapping task running in a service node.

It should be noted that, for the detailed description of the device for constructing the secondary index provided in the second embodiment of the present application, reference may be made to the related description of the first embodiment of the present application, which is not repeated here.

Corresponding to the method for constructing the secondary index provided in the first embodiment of the present application, the third embodiment of the present application further provides an electronic device.

As shown in fig. 5, the electronic device includes:

a processor 501; and

a memory 502 for storing a program of a construction method of a secondary index, and after the apparatus is powered on and runs the program of the construction method of the secondary index through the processor, the following steps are performed:

reading original data in the service node through a mapping task;

Optionally, the electronic device further performs the following steps: performing snapshot operation on the original data in the service node to generate snapshot data of the original data;

Optionally, the electronic device further performs the following steps: obtaining indication information of a main key column of a secondary index table;

Optionally, the summarizing task includes an index file writing thread;

Optionally, the mapping task is a mapping task running in a service node.

It should be noted that, for the detailed description of the electronic device provided in the third embodiment of the present application, reference may be made to the related description of the first embodiment of the present application, which is not repeated here.

Corresponding to the method for constructing a secondary index provided in the first embodiment of the present application, the fourth embodiment of the present application further provides a storage device, storing a program of the method for constructing a secondary index, the program being executed by a processor, and executing the steps of:

reading original data in the service node through a mapping task;

It should be noted that, for the detailed description of the storage device provided in the fourth embodiment of the present application, reference may be made to the related description of the first embodiment of the present application, which is not repeated here.

A fifth embodiment of the present application provides an offline construction method of a secondary index, which is described below with reference to fig. 6.

As shown in fig. 6, in step S601, original offline data in a service node is read by a mapping task.

The original offline data in the service node includes original offline data of a distributed database.

The mapping task may be a mapping task running in a service node, and may include a local area service thread corresponding to a primary table partition in the service node. The mapping task may be a Map task in MapReduce.

As shown in fig. 6, in step S602, a non-primary key column in the original offline data is selected as a primary key column of a secondary index table, and the offline data of the secondary index table is constructed according to the primary key column of the secondary index table.

The implementation of this step is similar to step S102 of the first embodiment of the present application, and only the data of the secondary index table in step S102 need be changed to the offline data of the secondary index table, which is not described in detail herein.

As shown in fig. 6, in step S603, the offline data of the secondary index table is written into the secondary index table.

The writing the offline data of the secondary index table into the secondary index table comprises the following steps:

and generating a file comprising offline data of the secondary index table through a summarizing task, and loading the file into the secondary index table.

The summary task may be a resumptask in a MapReduce module, which may include an index file write thread (e.g., indexHFileWriter). The generating a file of offline data comprising the secondary index table through the summarizing task, and loading the file into the secondary index table comprises the following steps: generating a file comprising offline data of the secondary index table through an index file writing thread in a summary task, and loading the file into the secondary index table.

According to the fifth embodiment of the application, the offline data of the secondary index table can be sent to the summarizing task, the file comprising the offline data of the secondary index table is generated through the summarizing task, and then the file is loaded into the secondary index table, so that the offline construction of the secondary index is realized, and the problem that resources of a service node where the secondary index table is located are occupied when the data of the secondary index table is written into the secondary index table on line is solved.

While the preferred embodiment has been disclosed, it is not intended to limit the invention thereto, and any person skilled in the art may make variations and modifications which fall within the spirit and scope of the present invention, and therefore the scope of the present invention shall be defined by the appended claims.

In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.

Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer readable media, as defined herein, does not include non-transitory computer readable media (transmission media), such as modulated data signals and carrier waves.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

Claims

1. A method for constructing a secondary index, comprising:

reading original data in the service node through a mapping task;

writing the data of the secondary index table into the secondary index table;

the writing the data of the secondary index table into the secondary index table comprises the following steps: generating a file comprising data of the secondary index table through an index file writing thread in a summary task, and loading the file into the secondary index table.

2. The method of claim 1, wherein the mapping task comprises a local area service thread corresponding to a primary table partition in a service node;

3. The method as recited in claim 1, further comprising: performing snapshot operation on the original data in the service node to generate snapshot data of the original data;

4. The method as recited in claim 1, further comprising: obtaining indication information of a main key column of a secondary index table;

5. The method of claim 1, wherein constructing the data of the secondary index table from the primary key columns of the secondary index table comprises:

6. The method of claim 1, wherein writing the data of the secondary index table into the secondary index table comprises:

7. The method of claim 6, wherein the aggregating and sorting the data of the secondary index table according to the data structure of the secondary index table, to obtain the aggregated and sorted data of the secondary index table, comprises:

8. The method of claim 1, wherein the number of summary tasks is the same as the number of partitions of the secondary index table.

9. The method of claim 1, wherein the mapping task is a mapping task running in a service node.

10. The method of claim 1, wherein the secondary index table is a secondary index table in a non-relational database.

11. A secondary index building apparatus, comprising:

an index table data writing unit, configured to write data of the secondary index table into the secondary index table, includes: generating a file comprising data of the secondary index table through an index file writing thread in a summary task, and loading the file into the secondary index table.

12. An electronic device, comprising:

A processor; and

reading original data in the service node through a mapping task;

writing the data of the secondary index table into the secondary index table;

13. A memory device, characterized in that,

a program for storing a construction method of a secondary index, the program being executed by a processor, the program performing the steps of:

Writing the data of the secondary index table into the secondary index table;

14. An offline construction method of a secondary index, comprising the steps of:

reading original offline data in the service node through a mapping task;

selecting a non-primary key column in the original offline data as a primary key column of a secondary index table, and constructing the offline data of the secondary index table according to the primary key column of the secondary index table;

writing the offline data of the secondary index table into the secondary index table;

the writing the offline data of the secondary index table into the secondary index table comprises the following steps: generating a file comprising offline data of the secondary index table through an index file writing thread in a summary task, and loading the file into the secondary index table.