WO2015161550A1

WO2015161550A1 - Index management method and device, and computer storage medium

Info

Publication number: WO2015161550A1
Application number: PCT/CN2014/079517
Authority: WO
Inventors: 谢东; 喻红宇
Original assignee: 中兴通讯股份有限公司
Priority date: 2014-04-24
Filing date: 2014-06-09
Publication date: 2015-10-29
Also published as: CN105022743A

Abstract

Provided are an index management method and device, and a computer storage medium. The method comprises: after an index management command is received, acquiring corresponding index data from the existing index data, and if the index data is not acquired, reading data from a database table, and acquiring the corresponding index data after ranking through calculation; and according to the index management command, conducting a management operation on the acquired index data.

Description

Method, device and calculation for managing index *^ Storage medium

Technical field

The present invention relates to database technology, and more particularly to a method, apparatus, and computer storage medium for managing an index. Background technique

A database is a data processing device that has been developed to meet the needs of data processing. The database system sprouted in 1960. In 1970, the concept of relational model of database was proposed. On this basis, a relational database was formed. With the development of information technology, data has penetrated into various industries and applications, and relational databases have been widely used in various industries. In a relational database, an index is a data structure that sorts the values of one or more columns in a database table, allowing the corresponding structured query language (SQL, Structured Query Language) statements to execute faster. Indexes are written by application developers and are commonly used in database development. The maintenance of the index is a very important task because the database system is automatically completed. However, the world has undergone earth-shaking changes, and the data characteristics have changed greatly compared with the era when the database concept was just introduced. For those cases where the data structure is complex and the amount of data is large, it is collectively referred to as big data. Faced with these data, index maintenance becomes more and more difficult, and it has become an important issue to be solved urgently.

The related art management index method has at least the following disadvantages:

Α, the index maintenance process consumes a lot of system resources;

Β In the process of maintaining the index, the existing index data is not fully utilized. Summary of the invention

Embodiments of the present invention provide a method, an apparatus, and a computer storage medium for managing an index, which can reduce system resource consumption.

The technical solution of the embodiment of the present invention is implemented as follows: An embodiment of the present invention provides a method for managing an index, including:

After receiving the management index command, the corresponding index data is obtained from the existing index data. If not obtained, the data is read from the database table, and the corresponding index data is obtained by calculating the sorting;

Performing a management operation on the acquired index data according to the management index command.

Preferably, the obtaining the corresponding index data from the existing index data includes: determining, by analyzing a calculated similarity of the multiple indexes to be merged, and a range of the index data corresponding to the multiple indexes to be merged, determining Index data for the new index.

Preferably, the performing the management operation on the obtained index data according to the management index command includes:

The obtained index data is merged, and the repeated index data in the merged index data is eliminated, and the plurality of indexes to be merged are merged into one new index.

Preferably, the obtaining the corresponding index data from the existing index data includes: acquiring index data corresponding to the index to be split from the existing index data;

Splitting the index data corresponding to the index to be split into multiple parts according to the specified splitting method, and splitting the index to be split into multiple indexes, for example, if there is a duplicate index data range between the split indexes, the copy is repeated. Index data.

Preferably, the management index command includes any one of the following:

Create index commands; modify index definition commands; insert data commands; update data commands; merge index commands; split index commands.

An embodiment of the present invention further provides an apparatus for managing an index, including:

The obtaining module is configured to obtain the corresponding index data from the existing index data after receiving the management index command, and if not obtained, read the data from the database table, and sort by calculation After obtaining the corresponding index data;

The management module is configured to perform management operations on the obtained index data according to the management index command.

Preferably, the acquiring module is further configured to: after receiving the merge index command, determine a new index by analyzing a calculation similarity of the multiple indexes to be merged, and a range of index data corresponding to the multiple indexes to be merged. Index data.

Preferably, the management module is further configured to merge the acquired index data, eliminate duplicate index data in the merged index data, and merge the multiple indexes into one new index.

Preferably, the acquiring module is further configured to: after receiving the split index command, obtain index data corresponding to the index to be split from the existing index data.

Preferably, the management module is further configured to divide the index data into a plurality of parts according to the specified splitting method, and split the index to be split into multiple indexes, for example, the index data range is overlapped between the split indexes. , copy the duplicate index data.

Preferably, the acquiring module, the received management index command includes any one of the following: an index creation command, a modification index definition command, an insert data command, an update data command, a merge index command, and a split index command.

The embodiment of the invention further provides a computer storage medium, wherein the computer storage medium stores computer executable instructions, and the computer executable instructions are used to execute the method for managing the index described above.

In summary, in the embodiment of the present invention, the problem that the index data is not fully utilized and the index maintenance consumes a large system resource can be solved in the current database management index process. In the process of database management indexing, the technology of obtaining index data from existing index data is especially suitable for adding selection conditions in index definition statements and managing big data lifecycle. DRAWINGS

FIG. 1 is a flowchart of a method for managing an index according to an embodiment of the present invention; 2 is a flowchart of a method for creating a new index according to an embodiment of the present invention;

3 is a flowchart of a method for modifying an index definition in an embodiment of the present invention;

4 is a flowchart of a method for inserting data management index data according to an embodiment of the present invention; FIG. 5 is a flowchart of a method for updating data management index data according to an embodiment of the present invention; FIG. 6 is a flowchart of combining multiple index methods according to an embodiment of the present invention; Flow chart

FIG. 7 is a flowchart of a method for splitting an index into multiple indexes according to an embodiment of the present invention; FIG. 8 is a schematic diagram of an apparatus for managing an index according to an embodiment of the present invention. Implementation method

In order to make the objects, the technical solutions and the advantages of the present invention more comprehensible, the embodiments of the present invention will be described in detail below. It should be noted that, in the case of no conflict, the features in the embodiments and the embodiments in the present application may be arbitrarily combined with each other.

In the process of implementing the present invention, the inventors have found that in the database management indexing process of the related art, the index data is obtained by calculating the table data. Here's how to maintain index data:

First, the database begins to maintain index data. This action may be triggered by the user sending a create command, or it may be caused by a user addition, deletion, or modification.

Then, according to the definition of the index, the database reads the corresponding data content from the table, calculates the index data, includes pointers of the index column data values, and sorts according to the specified order.

Finally, the calculated index data is written to the index according to the order specified by the index definition. If it is an update operation, you need to delete the old index data at the same time.

A. The index maintenance process consumes a lot of system resources;

In the big data generation, indexing becomes more and more difficult as table data continues to expand and indexes become larger and larger. In the index maintenance process, a large amount of system resources are consumed, mainly in the following: In the stage of reading the table data, the system input and output (10) resources are mainly consumed; in the calculation of the index data and the sorting stage, the central processing unit (CPU) resources are mainly consumed. . In some large In the system, the amount of table data may reach the order of TB, PB, and ZB, and it takes a long time to create a new index.

B. In the process of maintaining the index, the existing index data is not fully utilized;

Indexes are created based on the actual needs of the application. Typically, there are multiple indexes on the same table. Some indexes may have the same or similar index data and should be used by each other during index maintenance. The relational database has a solid mathematical theoretical basis, and the index data is very reliable. Unless extreme conditions such as natural disaster hardware damage occur, the index data is hard to be damaged. Under the relevant technical conditions, in the database management index process, in order to obtain the index data, the existing index data is not fully utilized, but the index data is recalculated according to the table data each time, and some data of the same field is repeatedly Repeated calculation.

In the database management index process, the embodiment of the present invention is divided into two stages: first, the index data is obtained from the existing index data; if not found, the data is read from the table (ie, the database table), and the sort is calculated. Get the index data. For the technique of adding a selection condition in the index definition statement (for example, the syntax for creating an index may be create index idxt log l on t-log (callno). In this embodiment, the syntax of the selection condition of the force P can be written as create index. Idxt log l on t— log ( callno ) where calltime> '20140101000000' ), for big data lifecycle management, the index is very flexible, database application developers may create rich indexes to meet actual needs, manage index data It is especially important. In view of the shortcomings of related technologies, it is found through in-depth research: In the process of database management indexing, if index data is obtained from existing index data, not only can the results be obtained quickly, but also the system resources can be reduced.

Several terms involved in the embodiments of the present invention are explained as follows:

Database is a warehouse that organizes, stores, and manages data according to its data structure. There are many types of databases, from the simplest tables for storing various data to large database systems capable of massive data storage. A relational database is a database based on a relational database model that processes data in a database by means of concepts and methods such as collection algebra. In 1970, IBM researcher Dr. Edgar Frank Cod proposed the concept of a relational model of the database and laid the theoretical foundation for the relational model. The relational database has a solid mathematical theoretical foundation and is widely used in various industries with the development of information technology and market.

Big data, not only contains "massive data", but also contains complex types of data. Big data includes all data sets, including transactional and interactive datasets, that are larger or more complex than the ability of common technologies to capture, manage, and process these data sets at reasonable cost and timelines. The big data concept is actually an effective use of massive data, and the data size and transfer speed are quite high.

Index, in a relational database, an index is a data structure that sorts the values of one or more columns in a database table. The index provides pointers to the values of these column data, sorted according to the specified order. Indexing can make the corresponding SQL statement execute faster. The role of the index is equivalent to the book's directory. You can quickly find the content you need based on the page number in the directory. After the index is defined, its maintenance is done automatically by the database system. Commonly used index maintenance tasks include: creating new indexes, updating index data, and deleting indexes.

FIG. 1 is a flowchart of a method for managing an index according to an embodiment of the present invention. As shown in FIG. 1, the method in this embodiment includes the following steps:

511. After receiving the management index command, obtain corresponding index data from the existing index data. If not obtained, the data is read from the database table, and the index data is obtained by calculating the sort.

512. Perform a management operation on the obtained index data according to the management index command. The embodiments of the present invention have the following technical effects:

Index maintenance consumes a small amount of system resources: The embodiment of the present invention obtains index data from existing index data, and consumes less 10 and CPU than the method for calculating index data according to the table data in the related art, and consumes less system resources. , index maintenance time is shorter. SQL statement execution efficiency: The actual situation is that a table often has multiple indexes. When executing a SQL statement on a table, if it is to add/modify data operations, it is necessary to add/modify index data; if multiple indexes require the same index data, the related art processing is: If there is no required index, create Index, repeats the sorting of the same index data multiple times. After the method of the embodiment of the present invention is used, only the first calculation and sorting are needed, and then other indexes can be obtained by copying. The same SQL statement consumes less CPU and 10, and SQL statement execution is more efficient.

2 is a flowchart of an implementation of a new index in the embodiment of the present invention. The technical solution adopted by the embodiment of the present invention is: determining whether an existing index includes required index data, and preferentially obtaining required data from the existing index data, as shown in FIG. As shown in FIG. 2, the process of creating a new index includes the following steps: Step S110: The database receives a command to create an index by the user.

The following information can be included in the create index command: index type, name, table name, field name, etc., if it is "the technology of the index definition statement to increase the selection condition", and also includes the description of the selection condition, which may be a where condition.

Step S120: Analyze whether it is necessary to recalculate and sort, if necessary, execute step S130; otherwise, execute step S140.

The database analyzes the current status, including: which indexes are on the table, which data ranges each index has, and the index data ranges that need to be created; whether the index data can be used currently, and if so, the recalculation and sorting are not required. Step S140; If no, proceed to step S130 to recalculate and sort.

Step S130, calculating index data.

Read the table data, calculate the sort, and get the index data.

Step S140, finding the pointer position on the index and writing the index data.

In step S150, it is judged whether or not it is finished. If there is still data to be processed, the process proceeds to step S120, and the remaining data is processed until the process is completed. Step S160, ending.

FIG. 3 is a flowchart of an implementation of modifying an index definition according to an embodiment of the present invention. The technical solution adopted by the embodiment of the present invention is: determining whether an existing index includes required index data, and if so, preferentially obtaining data required from existing index data. As shown in Figure 3, the process of modifying the index definition includes the following steps:

Step S210: The database receives the modification command of the index issued by the user, and needs to maintain the index data of the index.

Modify the index definition, for example, modify the index data range that an index needs to create, and so on. Step S220: Analyze whether it is necessary to recalculate and sort, if necessary, execute step S230; otherwise, execute step S240.

The database analyzes the current status, including: which indexes on the table (including the index defined by the modification), what data ranges each index has, and what is the range of index data that needs to be created; and then determines whether there is currently index data available, if , indicating that no recalculation and sorting are required, step S240 is performed; if no, step S230 is performed to recalculate and sort.

Step S230: Calculate index data.

Read the table data, calculate the sort, and get the index data.

Step S240, find the location, and update the index.

Delete all index data of the modified index, find the new pointer position on the index, and write the new index data.

Step S250: Determine whether there is still data to be processed, and if necessary, return to step S220 to process the remaining data until the processing is completed; otherwise, execute step S260.

Step S260, ending.

4 is a flowchart of implementing data management index data insertion in an embodiment of the present invention. The technical solution adopted by the embodiment of the present invention is as follows: If multiple indexes require the same index data (some index data of multiple indexes may be the same), only Calculate sorting once, other indexes do not need to be recalculated and Sorting, as shown in FIG. 4, the process of inserting data management index data includes the following steps: Step S310: The database receives the insert data command issued by the user, and needs to maintain the table index data.

Step S320: Analyze whether it is necessary to recalculate and sort, if necessary, execute step S330; otherwise, execute step S340.

For an index, if it needs to maintain its index data, analyze the current situation, whether the required index data has been calculated and sorted, if yes, it indicates that the index data can be directly used, proceed to step S340; if not, proceed to step S330 to Perform calculations and sorting.

Step S330, calculating index data.

According to the index definition, the current inserted data is calculated and sorted to obtain index data.

Step S340: Find a pointer position on the index, and write index data.

Step S350: determining whether there is an index on the table for maintenance, if it is necessary, returning to step S320; if not, proceeding to step S360.

Step S360, ending.

FIG. 5 is a flowchart of an implementation of updating data management index data in an embodiment of the present invention. The technical solution adopted by the embodiment of the present invention is as follows: If multiple indexes require the same index data, only sorting is performed once, and other indexes do not need to be recalculated and Sorting, as shown in Figure 5, the process of updating the data management index data includes the following steps:

Step S410: The database receives the update data command sent by the user, and needs to maintain the index data corresponding to the table.

Step S420, analyzing whether it is necessary to recalculate and sort, if necessary, executing step S430; otherwise, executing step S440.

For an index, if it needs to maintain its index data, analyze the current situation, whether the required index data has been calculated and sorted, if yes, it indicates that the index data can be directly used, proceed to step S440; if not, proceed to step S430, Perform calculations and sorting. Step S430: Calculate the current update data according to the index definition method, obtain new index data, and then go to step S440.

Step S440, find the location, and write the index.

Find the pointer position on the index, delete the index data (that is, the old index data) at the pointer position, and write the new index data.

If the index data has been sorted and consistent, you can delete it and then re-write it, or you can not delete or re-write it.

Step S450: determining whether there is an index on the table for maintenance, if yes, proceeding to step S420; if not, executing step S460.

Step S460, ending.

FIG. 6 is a flowchart of an implementation of combining multiple indexes according to an embodiment of the present invention. The technical solution adopted by the embodiment of the present invention is as follows: obtaining a new by analyzing the computational similarity of multiple indexes to be merged and the range of corresponding index data. Indexing the index data, and then merging the obtained index data, culling the repeated index data, and merging the plurality of indexes into a new index. As shown in FIG. 6, the process of merging the multiple indexes includes the following steps:

Step S510: The database receives the merge index command issued by the user, and combines the multiple indexes into one index command, and the index data should have the same or similar calculation method.

For example, having an index includes the same field, and the calculation methods are the same or similar.

Step S520: Combine the index data.

Since the indexes are ordered, you can combine multiple indexes into one index in the specified order.

For the case of storing indexes in a linked list, the method of merging may be to modify the linked list pointers, and connect the multiple indexes at the beginning and the end; then, by analyzing the data range of each index, the duplicate index data is culled.

Step S530, updating the system table. Modify the database data dictionary, delete the previous multiple index information, and insert new index information. Step S540, ending.

FIG. 7 is a flowchart of an implementation of splitting an index into multiple indexes according to an embodiment of the present invention. The technical solution adopted by the embodiment of the present invention is: obtaining index data corresponding to an index to be split from existing index data, and then specifying The splitting method divides the index data into multiple parts, and splits the to-be-split index into multiple indexes. For example, if there is a duplicate index data range between the split indexes, the duplicate index data is copied, as shown in FIG. The process of splitting an index into multiple indexes includes the following steps:

Step S610: The database receives a split index command sent by the user, and splits an index into multiple index commands.

The splitting method can be divided according to the data range (for example: time field), and the sub-index can be kept in the original calculation method, and the index data ranges can be mutually exclusive.

Step S620, splitting the index data.

Since the indexes are ordered, it is only necessary to traverse the index once in the specified order. For the case of storing the index in a linked list, the split method may be to modify the linked list pointer and interrupt the linked list; if the sub-index range is repeated, Need to copy the duplicate index data once.

Step S630, updating the system table.

Modify the database data dictionary, delete the previous index information, and insert the split index information. Step S640, ending.

FIG. 8 is a schematic diagram of an apparatus for managing an index according to an embodiment of the present invention. The apparatus may run the database as described above. As shown in FIG. 8, the apparatus for managing an index includes:

The obtaining module 81 is configured to: after receiving the management index command, obtain the corresponding index data from the existing index data, and if not obtained, read the data from the database table, and obtain the corresponding index data by calculating the sorting;

The management module 82 is configured to perform the obtained index data according to the management index command. Manage operations.

As an implementation manner, the acquiring module 81 may be further configured to: after receiving the merge index command, analyze the calculated similarity of the multiple indexes to be merged, and the range of the index data corresponding to the multiple indexes to be merged , obtaining index data of the new index;

The management module 82 may be further configured to merge the acquired index data, cull the duplicate index data in the merged index data, and merge the multiple indexes into one new index.

As an implementation manner, the acquiring module 81 may be configured to: after receiving the split index command, obtain index data corresponding to the index to be split from the existing index data;

The management module 82 may be configured to divide the index data into a plurality of parts according to the specified splitting method, and split the index to be split into multiple indexes, for example, the index data ranges are overlapped between the split indexes. Then copy the duplicate index data.

The management index command received by the obtaining module 81 may include any one of the following: an index creation command, a modification index definition command, an insert data command, an update data command, a merge index command, and a split index command.

In an actual application, the obtaining module 81 and the management module 82 may be a central processing unit (CPU), a microprocessor (MPU), a digital signal processor (DSP), or a field programmable gate array (FPGA) of a device that manages the index. achieve.

The embodiment of the invention further describes a computer storage medium, wherein the computer storage medium stores computer executable instructions, and the computer executable instructions are configured to perform the alarm processing priority determination method shown in FIG.

Those skilled in the art will appreciate that embodiments of the invention may be provided as a method, system, or computer program product. Accordingly, the present invention can take the form of a hardware embodiment, a software embodiment, or a combination of software and hardware aspects. Moreover, the invention can take the form of a computer program product embodied on one or more computer usable storage media (including but not limited to disk storage and optical storage, etc.) in which computer usable program code is embodied. The present invention has been described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (system), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or FIG. These computer program instructions can be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing device to produce a machine that causes configuration of instructions executed by a processor of a computer or other programmable data processing device Means for implementing the functions specified in a block or blocks of a flow or a flow and/or a block diagram of a flow chart.

The computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device. The apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.

These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device. The instructions provide steps that are configured to implement the functions specified in one or more blocks of the flowchart or in a block or blocks of the flowchart.

The above description is only a preferred embodiment of the present invention, and it should be noted that those skilled in the art can also make several improvements and retouchings without departing from the principles of the present invention. It should be considered as the scope of protection of the present invention.

Claims

claims

1. A method of managing indexes, including:

After receiving the management index command, the corresponding index data is obtained from the existing index data. If it is not obtained, the data is read from the database table, and the corresponding index data is obtained after calculation and sorting;

Perform management operations on the acquired index data according to the management index command.

2. The method of claim 1, wherein obtaining corresponding index data from existing index data includes:

The index data of the new index is determined by analyzing the calculated similarities of the multiple indexes to be merged and the range of index data corresponding to the multiple indexes to be merged.

3. The method of claim 2, wherein the management operation on the acquired index data according to the management index command includes:

Merge the obtained index data, remove duplicate index data from the merged index data, and merge the multiple indexes to be merged into a new index.

4. The method of claim 1, wherein obtaining corresponding index data from existing index data includes:

Obtain the index data corresponding to the index to be split from the existing index data.

5. The method of claim 4, wherein the management operation on the acquired index data according to the management index command includes:

Divide the index data corresponding to the index to be split into multiple parts according to the specified splitting method, and split the index to be split into multiple indexes. If there is duplication of index data ranges between the split indexes, the copies will be repeated. index data.

6. The method according to any one of claims 1 to 5, wherein the management index command includes any one of the following:

Create index command; Modify index definition command; Insert data command; Update data command; Merge index command; Split index command.

7. A device for managing indexes, including:

The acquisition module is configured to obtain the corresponding index data from the existing index data after receiving the management index command. If it is not obtained, the data is read from the database table, and the corresponding index data is obtained after sorting by calculation;

A management module configured to perform management operations on the acquired index data according to the management index command.

8. The device of claim 7, wherein,

The acquisition module is also configured to, after receiving the merge index command, determine the index data of the new index by analyzing the calculated similarities of the multiple indexes to be merged and the range of index data corresponding to the multiple indexes to be merged. .

9. The device of claim 8, wherein,

The management module is also configured to merge the acquired index data, eliminate duplicate index data from the merged index data, and merge the multiple indexes into a new index.

10. The device of claim 6, wherein,

The acquisition module is also configured to acquire the index data corresponding to the index I to be split from the existing index data after receiving the split index command.

11. The device of claim 10, wherein,

The management module is also configured to divide the index data into multiple parts according to the specified splitting method, and split the index to be split into multiple indexes. If there is duplication of index data ranges between the split indexes, copy Duplicate index data.

12. The device according to any one of claims 7 to 11, wherein,

The management index commands received by the acquisition module include any one of the following: create index command, modify index definition command, insert data command, update data command, merge index command and split index command.

13. A computer storage medium, the computer storage medium stores computer executable instructions, and the computer executable instructions are used to execute the method of managing an index according to any one of claims 1 to 7.