CN111831622A - Data index generation method and device, electronic equipment and readable storage medium - Google Patents

Data index generation method and device, electronic equipment and readable storage medium Download PDF

Info

Publication number
CN111831622A
CN111831622A CN202010244790.3A CN202010244790A CN111831622A CN 111831622 A CN111831622 A CN 111831622A CN 202010244790 A CN202010244790 A CN 202010244790A CN 111831622 A CN111831622 A CN 111831622A
Authority
CN
China
Prior art keywords
data
index
file
files
database
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010244790.3A
Other languages
Chinese (zh)
Inventor
赵锐
余汶龙
李鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Didi Infinity Technology and Development Co Ltd
Original Assignee
Beijing Didi Infinity Technology and Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Didi Infinity Technology and Development Co Ltd filed Critical Beijing Didi Infinity Technology and Development Co Ltd
Priority to CN202010244790.3A priority Critical patent/CN111831622A/en
Publication of CN111831622A publication Critical patent/CN111831622A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention discloses a data index generation method, a data index generation device, electronic equipment and a readable storage medium.

Description

Data index generation method and device, electronic equipment and readable storage medium
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a data index generation method and apparatus, an electronic device, and a readable storage medium.
Background
The database can establish indexes for quick query and use on the basis of data, when the database system provides services in a production environment at present, the indexes are often required to be newly added on line, the traditional scheme is that data in all files are traversed, then the data needing to be established are generated, and finally the data are rewritten in the database to establish the indexes, the processing time required by the index establishing mode is long and uncontrollable, and the disk load caused by traversing and writing data can also increase the test on the stability of the services on the line.
Disclosure of Invention
In view of this, embodiments of the present invention provide a data index generation method, an apparatus, an electronic device, and a readable storage medium, so as to improve the speed of establishing a data index.
In a first aspect, an embodiment of the present invention provides a data index generation method, where the method includes:
acquiring a plurality of data files in a database stored based on an LSM storage engine;
traversing and analyzing the plurality of data files in parallel to generate analysis data, wherein the analysis data comprises index data of each data in the data files;
determining at least one index file according to the analysis data, wherein index data in the index file are ordered;
and loading each index file to a corresponding database.
Optionally, the data file and the index file are stored in a physically isolated manner.
Optionally, traversing and parsing the plurality of data files in parallel, and generating the parsed data includes:
traversing a plurality of data files in parallel, and determining an index column in each data file, wherein at least one column in the index column is identification information;
and encoding the data values of the index column to generate index data of each data in the data file so as to determine the analysis data.
Optionally, traversing and parsing the plurality of data files in parallel, and generating the parsed data includes:
analyzing a plurality of data files in parallel through a Map algorithm to generate the analyzed data;
determining at least one index file from the parsed data comprises:
and generating at least one index file according to the analysis data through a Reduce algorithm.
Optionally, obtaining a plurality of data files in a database stored by the LSM-based storage engine includes:
and receiving a plurality of data files sent by the equipment where the database is located.
Optionally, loading each index file into a corresponding database includes:
and sending each index file to the equipment where the database is located, so that the equipment loads each index file into the database.
Optionally, the size of the data file is a first preset value, and the size of the index file is a second preset value.
Optionally, the first preset value is 64M, and the second preset value is 64M.
In a second aspect, an embodiment of the present invention provides a data index generating apparatus, where the apparatus includes:
a data file acquisition unit configured to acquire a plurality of data files in a database stored based on the LSM storage engine;
the analysis unit is configured to traverse and analyze the plurality of data files in parallel to generate analysis data, and the analysis data comprises index data of each data in the data files;
an index file determining unit configured to determine at least one index file according to the parsing data, wherein index data in the index file are ordered;
and the loading unit is configured to load each index file to a corresponding database.
Optionally, the data file and the index file are stored in a physically isolated manner.
Optionally, the parsing unit includes:
the index column determining subunit is configured to traverse a plurality of data files in parallel and determine an index column in each data file, wherein at least one column in the index columns is identification information;
and the encoding subunit is configured to encode the data values of the index column to generate index data of each data in the data file so as to determine the analysis data.
Optionally, the parsing unit is further configured to parse a plurality of the data files in parallel through a Map algorithm, so as to generate the parsed data;
the index file determining unit is further configured to generate at least one index file according to the analysis data through a Reduce algorithm.
Optionally, the data file obtaining unit includes:
and the receiving subunit is configured to receive a plurality of data files sent by the equipment where the database is located.
Optionally, the loading unit includes:
the sending subunit is configured to send each index file to the device where the database is located, so that the device loads each index file into the database.
Optionally, the size of the data file is a first preset value, and the size of the index file is a second preset value.
Optionally, the first preset value is 64M, and the second preset value is 64M.
In a third aspect, an embodiment of the present invention provides an electronic device, including a memory and a processor, where the memory is used to store one or more computer program instructions, where the one or more computer program instructions are executed by the processor to implement the method described above.
In a fourth aspect, embodiments of the present invention provide a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement a method as described above.
The method and the device for establishing the index of the data index can improve the speed of establishing the data index by acquiring a plurality of data files in a database stored based on an LSM storage engine, traversing and analyzing the plurality of data files in parallel to generate analyzed data, determining at least one index file according to the analyzed data, and loading each index file to a corresponding database, wherein the analyzed data comprises the index data of each data in the data files, and the index data in the index files are ordered.
Drawings
The above and other objects, features and advantages of the present invention will become more apparent from the following description of the embodiments of the present invention with reference to the accompanying drawings, in which:
FIG. 1 is a schematic diagram of a prior art data index generation process;
FIG. 2 is a flow chart of a data index generation method according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a data storage process of an embodiment of the present invention;
FIG. 4 is a diagram illustrating a data index generation process according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of another data index generation process according to an embodiment of the invention;
FIG. 6 is a flow chart of another data index generation method according to an embodiment of the present invention;
FIG. 7 is a diagram of a data index generation apparatus according to an embodiment of the present invention;
FIG. 8 is a schematic diagram of another data index generation apparatus according to an embodiment of the present invention;
fig. 9 is a schematic diagram of an electronic device of an embodiment of the invention.
Detailed Description
The present invention will be described below based on examples, but the present invention is not limited to only these examples. In the following detailed description of the present invention, certain specific details are set forth. It will be apparent to one skilled in the art that the present invention may be practiced without these specific details. Well-known methods, procedures, components and circuits have not been described in detail so as not to obscure the present invention.
Further, those of ordinary skill in the art will appreciate that the drawings provided herein are for illustrative purposes and are not necessarily drawn to scale.
Unless the context clearly requires otherwise, throughout the description, the words "comprise", "comprising", and the like are to be construed in an inclusive sense as opposed to an exclusive or exhaustive sense; that is, what is meant is "including, but not limited to".
In the description of the present invention, it is to be understood that the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. In addition, in the description of the present invention, "a plurality" means two or more unless otherwise specified.
FIG. 1 is a schematic diagram of a prior art data index generation process. Database services often require an online full build index, i.e., the generation of index data for historical data calculations. In the prior art, a storage engine is usually relied on, a full scanning program is constructed in the program, traversal is performed by taking a single piece of data as a unit to search data needing to be indexed, the data needing to be indexed is analyzed to obtain analyzed data, the analyzed data is rearranged, and a new data file and an index file are obtained again based on the corresponding storage engine, wherein the analyzed data may include information (such as a main key and the like) for uniquely identifying data and data information. As shown in fig. 1, traversing data in a data file 11 in a database 1 one by one, determining data in which an index needs to be constructed, parsing the data in which the index needs to be constructed to obtain parsed data 2, rearranging the parsed data 2, and obtaining a new data file 11 and an index file 12 based on a corresponding storage engine. Therefore, in the prior art, each piece of data in the data file 11 needs to be traversed one by one, and if parallel traversal analysis is adopted, a great amount of system resources are occupied, so that execution of other applications in the system may be affected.
Therefore, the embodiment of the invention provides a data index generation method, which realizes high concurrence of data traversal analysis by performing parallel traversal analysis on data by taking a data file as a unit, thereby improving the index creation efficiency.
Fig. 2 is a flowchart of a data index generation method according to an embodiment of the present invention. As shown in fig. 2, the data index generating method of the present embodiment includes the following steps:
in step S110, a plurality of data files in a database stored by the LSM-based storage engine are obtained. In an alternative implementation, the size of the data file is a first predetermined value, and optionally, the first predetermined value is 64M.
The LSM storage engine is a storage engine based on an LSM-Tree (Log-Structured Merged-Tree). The LSM storage engine provides a file loading mode, so that files (SST files) generated by adopting a certain rule can be quickly imported into a database. The core idea of the LSM storage engine is to keep the modified deltas to the data in memory and to write them to disk in bulk after a specified size limit is reached.
FIG. 3 is a schematic diagram of a data storage process of an embodiment of the present invention. As shown in fig. 3, the LSM storage engine 3 mainly includes a MemTable file 311 and a frozen MemTable file 312 in the memory 31, and files on the disk 32, such as an SST file 321, an operation log file (not shown in the figure), and the like. When a record is written based on the storage engine 3, the modified increment is written into the operation log file, and then the modified increment is written into the MemTable file 311 in the memory, and after the memory occupied by the MemTable file 311 reaches the upper limit value, the data in the memory needs to be dumped into the external memory file. Specifically, first, the MemTable file 311 is frozen into the immutable frozen MemTable file 312, and then the data of the frozen MemTable file 312 is sorted and then dumped to the disk 32, thereby forming a new SST file. Wherein the data in the SST file is ordered, such as based on primary key. Thus, the data files in the database stored based on the LSM storage engine are ordered and have a predetermined size. Alternatively, the size of the data file may be set by setting an upper limit value of the memory occupied by the MemTable file 311. It should be understood that the LSM storage engine based data storage process in fig. 3 is only exemplary, and other LSM storage engine based data storage methods can be applied to the present embodiment.
Step S120, traversing and analyzing the plurality of data files in parallel to generate analysis data. The analysis data comprises index data of each data in the data file.
In an alternative implementation, step S120 may include:
a1: and traversing a plurality of data files in parallel, and determining an index column in each data file, wherein at least one column in the index column is identification information. Optionally, a plurality of data files are traversed in parallel, and data in the data files, which needs to be indexed, and an index column of each data file are determined. Optionally, if the data file includes a score table of the student, the school number column, the name column, and the like of the student may be determined as an index column, where the school number column of the student is identification information, that is, the student may be uniquely identified.
A2: and encoding the data values of the index column to generate index data of each data in the data file so as to determine the analysis data. The analysis data comprises index data of each data in the data file. Optionally, the data values of the index column are encoded according to a predetermined encoding rule to generate index data of the data file, so as to determine the parsing data. In an alternative implementation manner, index data of [ primary key, data ] is created according to a predetermined encoding rule, that is, data corresponding to the index data is obtained according to the primary key in the indexing process, wherein the primary key is information that can uniquely identify the data, such as student number and the like. In another optional implementation manner, the [ index, primary key ] is created according to a predetermined encoding rule, that is, the corresponding primary key is queried according to the corresponding index in the indexing process, and then the corresponding data is obtained according to the primary key.
Step S130, determining at least one index file according to the analysis data. Wherein the index data in the index file is ordered.
In an optional implementation manner, the embodiment adopts a Map/Reduce programming model to implement parallel processing of multiple data files. Specifically, a plurality of data files are analyzed in parallel through a Map algorithm to determine index data and data information of each data in each data file, analysis data are generated, and at least one ordered index file is generated according to the analysis data through a Reduce algorithm. Optionally, the Reduce algorithm writes the parsed data through the LSM storage engine to generate at least one index file, and thus, the index file is also an SST type file.
FIG. 4 is a diagram illustrating a data index generation process according to an embodiment of the present invention. As shown in FIG. 4, a plurality of data files D1-DN are analyzed in parallel through a Map algorithm to obtain analyzed data X, and a plurality of ordered index files I1-IM are generated through a Reduce algorithm. Wherein N is greater than or equal to 1, and M is greater than or equal to 1. Optionally, the size of the data file is a first preset value, and the size of the index file is a second preset value, where the first preset value and the second preset value may be the same or different. Optionally, the first preset value and the second preset value are both 64M. It should be appreciated that during Reduce processing, a plurality of ordered index files are generated from the index data of each of the parsed data X. That is, the index data in the index file is ordered, and thus, the data file and the index file may not have a one-to-one correspondence.
Step S140, loading each index file to the corresponding database to complete the creation process of the data index.
In an optional implementation manner, the data file and the index file of the embodiment are stored in a physically isolated manner, so that the parsed data does not need to be rearranged, and a new data file and an index file are obtained based on the storage engine. Therefore, system resources and data index creation time can be further saved.
FIG. 5 is a diagram illustrating another data index generation process according to an embodiment of the invention. As shown in fig. 5, in this embodiment, a plurality of data files 51 in the database 5 are traversed and parsed in parallel by Map/Reduce to determine an index column in each data file, data values of the index column are encoded to generate index data of each data in the data file, parse data is determined, at least one index file 52 is generated according to the parse data, and the index file 52 is imported into the database to complete the creation process of the data index. Therefore, the data file and the index file are stored in a physical isolation mode, so that the analysis data does not need to be rearranged, and a new data file and a new index file are obtained based on the storage engine, so that great system resources and data index creation time are saved.
The method and the device for establishing the data index have the advantages that the multiple data files in the database stored by the LSM-based storage engine are obtained, the multiple data files are traversed and analyzed in parallel, the analyzed data are generated, at least one index file is determined according to the analyzed data, and each index file is loaded to the corresponding database, wherein the analyzed data comprise the index data of each data in the data files, and the index data in the index files are ordered. Meanwhile, in the embodiment, data traversal is performed without depending on a storage engine, so that the concurrency of data traversal is easily increased, and the traversal analysis time is further saved. In addition, in the embodiment, the data file and the index file are stored in a physical isolation manner, so that the analysis data does not need to be rearranged, and a new data file and an index file are obtained based on the storage engine, thereby saving a large amount of system resources and data index creation time.
FIG. 6 is a flow chart of another data index generation method according to an embodiment of the invention. As shown in fig. 6, the data index generating method according to the embodiment of the present invention includes the following steps:
in step S1, the index creating apparatus receives a plurality of data files in the database transmitted from the apparatus in which the database is located. Wherein a plurality of data files in the database are stored based on the LSM storage engine. The data file is an ordered SST type data file.
Step S2, on the index creation device, traversing and parsing the plurality of data files in parallel to generate parsed data. The analysis data comprises index data of each data in the data file. In an optional implementation manner, a plurality of data files are traversed in parallel, an index column in each data file is determined, and data values of the index column are encoded to generate index data of each data in the data file, so as to determine the parsing data. At least one of the index columns is identification information, and the analysis data comprises index data of each data in the data file. Optionally, if the data file includes a score table of the student, the school number column, the name column, and the like of the student may be determined as an index column, where the school number column of the student is identification information, that is, the student may be uniquely identified.
Optionally, the data values of the index column are encoded according to a predetermined encoding rule to generate index data of the data file, so as to determine the parsing data. In an alternative implementation manner, index data of [ primary key, data ] is created according to a predetermined encoding rule, that is, data corresponding to the index data is obtained according to the primary key in the indexing process, wherein the primary key is information that can uniquely identify the data, such as student number and the like. In another optional implementation manner, the [ index, primary key ] is created according to a predetermined encoding rule, that is, the corresponding primary key is queried according to the corresponding index in the indexing process, and then the corresponding data is obtained according to the primary key.
Step S3, determining, on the index creating device, at least one index file according to the parsed data. Wherein the index data in the index file is ordered. In an optional implementation manner, the embodiment adopts a Map/Reduce programming model to implement parallel processing of multiple data files. Specifically, a plurality of data files are analyzed in parallel through a Map algorithm to determine index data and data information of each data in each data file, analysis data are generated, and at least one ordered index file is generated according to the analysis data through a Reduce algorithm. Optionally, the Reduce algorithm writes the parsed data through the LSM storage engine to generate at least one index file, and thus, the index file is also an SST type file.
In step S4, the index creating device sends each index file to the device where the database is located.
Step S5, the device in which the database is located loads each index file to the corresponding database to complete the creation process of the data index.
In an optional implementation manner, the data file and the index file of the embodiment are stored in a physically isolated manner, so that the parsed data does not need to be rearranged, and a new data file and an index file are obtained based on the storage engine. Therefore, system resources and data index creation time can be further saved.
The embodiment of the invention does not occupy the system resource of the equipment where the database is located by transferring the data index creating process to other equipment, thereby avoiding the influence of the full index creating process on other applications of the equipment where the database is located. Meanwhile, in the embodiment of the invention, a plurality of data files in a database are acquired, the plurality of data files are traversed and analyzed in parallel to generate analysis data, at least one index file is determined according to the analysis data, and each index file is loaded to the corresponding database, wherein the analysis data comprises index data of each data in the data files, and the index data in the index files are ordered. Meanwhile, in the embodiment, data traversal is performed without depending on a storage engine, so that the concurrency of data traversal is easily increased, and the traversal analysis time is further saved. In addition, the data file and the index file are stored in a physical isolation manner, so that the data does not need to be rewritten into the database for index creation, and a large amount of system resources and data index creation time are saved.
Fig. 7 is a schematic diagram of a data index generating apparatus according to an embodiment of the present invention. As shown in fig. 7, the data indexing device 7 of the embodiment of the present invention includes a data file obtaining unit 71, a parsing unit 72, an index file determining unit 73, and a loading unit 74.
The data file acquiring unit 71 is configured to acquire a plurality of data files in a database stored based on the LSM storage engine.
The parsing unit 72 is configured to traverse and parse the plurality of data files in parallel, generating parsed data, which includes index data of each data in the data files. In an alternative implementation, the parsing unit 72 is further configured to parse a plurality of the data files in parallel through a Map algorithm to generate the parsed data. In an alternative implementation, the parsing unit 72 includes an index column determination subunit 721 and an encoding subunit 722. The index column determination subunit 721 is configured to traverse a plurality of the data files in parallel, and determine an index column in each of the data files, where at least one of the index columns is identification information. The encoding subunit 722 is configured to encode the data values of the index column to generate index data of each data in the data file to determine the parsed data.
The index file determining unit 73 is configured to determine at least one index file from the parsed data, index data in the index file being ordered. In an optional implementation manner, the index file determining unit 73 is further configured to generate at least one index file according to the parsed data through a Reduce algorithm.
The loading unit 74 is configured to load each of the index files into a corresponding database. In an alternative implementation, the data file and the index file are stored in a physically separate manner. In an optional implementation manner, the size of the data file is a first preset value, and the size of the index file is a second preset value. Optionally, the first preset value is 64M, and the second preset value is 64M.
The method and the device for establishing the index of the data index can improve the speed of establishing the data index by acquiring a plurality of data files in a database stored based on an LSM storage engine, traversing and analyzing the plurality of data files in parallel to generate analyzed data, determining at least one index file according to the analyzed data, and loading each index file to a corresponding database, wherein the analyzed data comprises the index data of each data in the data files, and the index data in the index files are ordered.
Fig. 8 is a schematic diagram of another data index generating apparatus according to an embodiment of the present invention. As shown in fig. 8, the data index generating device 8 of the present embodiment includes a data file acquiring unit 81, a parsing unit 82, an index file determining unit 83, and a loading unit 84.
The data file acquiring unit 81 is configured to acquire a plurality of data files in a database stored based on the LSM storage engine. In an alternative implementation, the data file obtaining unit 81 includes a receiving subunit 811. The receiving subunit 811 is configured to receive a plurality of data files transmitted by the device in which the database is located.
The parsing unit 82 is configured to traverse and parse the plurality of data files in parallel, and generate parsed data, which includes index data of each data in the data files. In an alternative implementation manner, the parsing unit 82 is further configured to parse a plurality of the data files in parallel through a Map algorithm to generate the parsed data. In an alternative implementation, the parsing unit 82 includes an index column determination subunit 821 and an encoding subunit 822. The index column determination subunit 821 is configured to traverse a plurality of the data files in parallel, and determine an index column in each of the data files, where at least one of the index columns is identification information. The encoding subunit 822 is configured to encode the data values of the index column to generate index data of each data in the data file to determine the parsing data.
The index file determining unit 83 is configured to determine at least one index file from the parsed data, the index data in the index file being ordered. In an optional implementation manner, the index file determining unit 83 is further configured to generate at least one index file according to the parsed data through a Reduce algorithm.
The loading unit 84 is configured to load each of the index files into a corresponding database. In an alternative implementation, the loading unit 84 includes a sending subunit 841. The sending subunit 841 is configured to send each index file to the device where the database is located, so that the device loads each index file into the database.
In an alternative implementation, the data file and the index file are stored in a physically separate manner. In an optional implementation manner, the size of the data file is a first preset value, and the size of the index file is a second preset value. Optionally, the first preset value is 64M, and the second preset value is 64M.
The method and the device for establishing the index of the data index can improve the speed of establishing the data index by acquiring a plurality of data files in a database stored based on an LSM storage engine, traversing and analyzing the plurality of data files in parallel to generate analyzed data, determining at least one index file according to the analyzed data, and loading each index file to a corresponding database, wherein the analyzed data comprises the index data of each data in the data files, and the index data in the index files are ordered.
Fig. 9 is a schematic diagram of an electronic device of an embodiment of the invention. As shown in fig. 9, the electronic device shown in fig. 9 is a general address query device, which includes a general computer hardware structure, which includes at least a processor 91 and a memory 92. The processor 91 and the memory 92 are connected by a bus 93. The memory 92 is adapted to store instructions or programs executable by the processor 91. The processor 91 may be a stand-alone microprocessor or may be a collection of one or more microprocessors. Thus, the processor 91 implements the processing of data and the control of other devices by executing instructions stored by the memory 92 to perform the method flows of embodiments of the present invention as described above. The bus 93 connects the above components together, and also connects the above components to a display controller 94 and a display device and an input/output (I/O) device 95. Input/output (I/O) devices 95 may be a mouse, keyboard, modem, network interface, touch input device, motion sensing input device, printer, and other devices known in the art. Typically, the input/output devices 95 are coupled to the system through input/output (I/O) controllers 99.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, apparatus (device) or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may employ a computer program product embodied on one or more computer-readable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations of methods, apparatus (devices) and computer program products according to embodiments of the application. It will be understood that each flow in the flow diagrams can be implemented by computer program instructions.
These computer program instructions may be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows.
These computer program instructions may also be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows.
Another embodiment of the invention is directed to a non-transitory storage medium storing a computer-readable program for causing a computer to perform some or all of the above-described method embodiments.
That is, as can be understood by those skilled in the art, all or part of the steps in the method for implementing the embodiments described above may be implemented by a program instructing related hardware, where the program is stored in a storage medium and includes several instructions to enable a device (which may be a single chip, a chip, or the like) or a processor (processor) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (18)

1. A method for generating a data index, the method comprising:
acquiring a plurality of data files in a database stored based on an LSM storage engine;
traversing and analyzing the plurality of data files in parallel to generate analysis data, wherein the analysis data comprises index data of each data in the data files;
determining at least one index file according to the analysis data, wherein index data in the index file are ordered;
and loading each index file to a corresponding database.
2. The method of claim 1, wherein the data file and the index file are stored in physical isolation.
3. The method of claim 1, wherein traversing and parsing the plurality of data files in parallel, generating parsed data comprises:
traversing a plurality of data files in parallel, and determining an index column in each data file, wherein at least one column in the index column is identification information;
and encoding the data values of the index column to generate index data of each data in the data file so as to determine the analysis data.
4. The method of claim 1, wherein traversing and parsing the plurality of data files in parallel, generating parsed data comprises:
analyzing a plurality of data files in parallel through a Map algorithm to generate the analyzed data;
determining at least one index file from the parsed data comprises:
and generating at least one index file according to the analysis data through a Reduce algorithm.
5. The method of claim 1, wherein retrieving the plurality of data files in the database stored based on the LSM storage engine comprises:
and receiving a plurality of data files sent by the equipment where the database is located.
6. The method of claim 5, wherein loading each of the index files into a corresponding database comprises:
and sending each index file to the equipment where the database is located, so that the equipment loads each index file into the database.
7. The method of claim 1, wherein the size of the data file is a first predetermined value and the size of the index file is a second predetermined value.
8. The method of claim 7, wherein the first preset value is 64M and the second preset value is 64M.
9. An apparatus for generating a data index, the apparatus comprising:
a data file acquisition unit configured to acquire a plurality of data files in a database stored based on the LSM storage engine;
the analysis unit is configured to traverse and analyze the plurality of data files in parallel to generate analysis data, and the analysis data comprises index data of each data in the data files;
an index file determining unit configured to determine at least one index file according to the parsing data, wherein index data in the index file are ordered;
and the loading unit is configured to load each index file to a corresponding database.
10. The apparatus of claim 9, wherein the data file and the index file are stored in physical isolation.
11. The apparatus of claim 9, wherein the parsing unit comprises:
the index column determining subunit is configured to traverse a plurality of data files in parallel and determine an index column in each data file, wherein at least one column in the index columns is identification information;
and the encoding subunit is configured to encode the data values of the index column to generate index data of each data in the data file so as to determine the analysis data.
12. The apparatus according to claim 9, wherein the parsing unit is further configured to parse a plurality of the data files in parallel through a Map algorithm, generating the parsed data;
the index file determining unit is further configured to generate at least one index file according to the analysis data through a Reduce algorithm.
13. The apparatus according to claim 9, wherein the data file obtaining unit comprises:
and the receiving subunit is configured to receive a plurality of data files sent by the equipment where the database is located.
14. The apparatus of claim 13, wherein the loading unit comprises:
the sending subunit is configured to send each index file to the device where the database is located, so that the device loads each index file into the database.
15. The apparatus of claim 9, wherein the size of the data file is a first predetermined value and the size of the index file is a second predetermined value.
16. The apparatus of claim 15, wherein the first preset value is 64M and the second preset value is 64M.
17. An electronic device comprising a memory and a processor, wherein the memory is configured to store one or more computer program instructions, wherein the one or more computer program instructions are executed by the processor to implement the method of any of claims 1-8.
18. A computer-readable storage medium on which computer program instructions are stored, which computer program instructions, when executed by a processor, are to implement a method according to any one of claims 1-8.
CN202010244790.3A 2020-03-31 2020-03-31 Data index generation method and device, electronic equipment and readable storage medium Pending CN111831622A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010244790.3A CN111831622A (en) 2020-03-31 2020-03-31 Data index generation method and device, electronic equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010244790.3A CN111831622A (en) 2020-03-31 2020-03-31 Data index generation method and device, electronic equipment and readable storage medium

Publications (1)

Publication Number Publication Date
CN111831622A true CN111831622A (en) 2020-10-27

Family

ID=72913964

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010244790.3A Pending CN111831622A (en) 2020-03-31 2020-03-31 Data index generation method and device, electronic equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN111831622A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112749125A (en) * 2021-01-13 2021-05-04 北京明朝万达科技股份有限公司 Text processing method and device and text processing system

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102426609A (en) * 2011-12-28 2012-04-25 厦门市美亚柏科信息股份有限公司 Index generation method and index generation device based on MapReduce programming architecture
CN104408128A (en) * 2014-11-26 2015-03-11 上海爱数软件有限公司 Read optimization method for asynchronously updating indexes based on B+ tree
CN104809237A (en) * 2015-05-12 2015-07-29 百度在线网络技术(北京)有限公司 LSM-tree (The Log-Structured Merge-Tree) index optimization method and LSM-tree index optimization system
CN105117415A (en) * 2015-07-30 2015-12-02 西安交通大学 Optimized SSD data updating method
CN107220285A (en) * 2017-04-24 2017-09-29 中国科学院计算技术研究所 Towards the temporal index construction method of magnanimity track point data
US20170344588A1 (en) * 2016-05-25 2017-11-30 Eliot Horowitz Systems and methods for generating partial indexes in distributed databases
CN108021702A (en) * 2017-12-26 2018-05-11 百度在线网络技术(北京)有限公司 Classification storage method, device, OLAP database system and medium based on LSM-tree
CN108052643A (en) * 2017-12-22 2018-05-18 北京奇虎科技有限公司 Date storage method, device and storage engines based on LSM Tree structures
CN109947759A (en) * 2017-07-17 2019-06-28 中国移动通信集团吉林有限公司 A kind of data directory method for building up, indexed search method and device

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102426609A (en) * 2011-12-28 2012-04-25 厦门市美亚柏科信息股份有限公司 Index generation method and index generation device based on MapReduce programming architecture
CN104408128A (en) * 2014-11-26 2015-03-11 上海爱数软件有限公司 Read optimization method for asynchronously updating indexes based on B+ tree
CN104809237A (en) * 2015-05-12 2015-07-29 百度在线网络技术(北京)有限公司 LSM-tree (The Log-Structured Merge-Tree) index optimization method and LSM-tree index optimization system
CN105117415A (en) * 2015-07-30 2015-12-02 西安交通大学 Optimized SSD data updating method
US20170344588A1 (en) * 2016-05-25 2017-11-30 Eliot Horowitz Systems and methods for generating partial indexes in distributed databases
CN107220285A (en) * 2017-04-24 2017-09-29 中国科学院计算技术研究所 Towards the temporal index construction method of magnanimity track point data
CN109947759A (en) * 2017-07-17 2019-06-28 中国移动通信集团吉林有限公司 A kind of data directory method for building up, indexed search method and device
CN108052643A (en) * 2017-12-22 2018-05-18 北京奇虎科技有限公司 Date storage method, device and storage engines based on LSM Tree structures
CN108021702A (en) * 2017-12-26 2018-05-11 百度在线网络技术(北京)有限公司 Classification storage method, device, OLAP database system and medium based on LSM-tree

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112749125A (en) * 2021-01-13 2021-05-04 北京明朝万达科技股份有限公司 Text processing method and device and text processing system
CN112749125B (en) * 2021-01-13 2024-05-03 北京明朝万达科技股份有限公司 Text processing method and device and text processing system

Similar Documents

Publication Publication Date Title
CN110292775B (en) Method and device for acquiring difference data
US20190004875A1 (en) Artificial Creation Of Dominant Sequences That Are Representative Of Logged Events
CN103810224B (en) information persistence and query method and device
US8271523B2 (en) Coordination server, data allocating method, and computer program product
O'Neil et al. Bitmap index design choices and their performance implications
CN110990402B (en) Format conversion method from row storage to column storage, query method and device
CN103902702A (en) Data storage system and data storage method
CN104881466A (en) Method and device for processing data fragments and deleting garbage files
WO2014167647A1 (en) Data management device, date management method, and permanent storage medium
CN108062314B (en) Dynamic sub-table data processing method and device
CN112307124A (en) Database synchronization verification method, device, equipment and storage medium
CN114398346A (en) Data migration method, device, equipment and storage medium
CN107256233A (en) A kind of date storage method and device
CN111831622A (en) Data index generation method and device, electronic equipment and readable storage medium
CN111984625B (en) Database load characteristic processing method and device, medium and electronic equipment
JP6244274B2 (en) Correlation rule analysis apparatus and correlation rule analysis method
CN105955971B (en) A kind of implementation method and device of key assignments caching
CN109344163B (en) Data verification method and device and computer readable medium
US8606757B2 (en) Storage and retrieval of concurrent query language execution results
CN110908978A (en) Database data structure verification method and device
CN110851437A (en) Storage method, device and equipment
CN115687359A (en) Data table partitioning method and device, storage medium and computer equipment
CN115454353A (en) High-speed writing and query method for space application data
CN110413617B (en) Method for dynamically adjusting hash table group according to size of data volume
CN108984615B (en) Data query method and system and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination