CN116303417A - Method for reconstructing database index and electronic equipment - Google Patents

Method for reconstructing database index and electronic equipment Download PDF

Info

Publication number
CN116303417A
CN116303417A CN202211614518.5A CN202211614518A CN116303417A CN 116303417 A CN116303417 A CN 116303417A CN 202211614518 A CN202211614518 A CN 202211614518A CN 116303417 A CN116303417 A CN 116303417A
Authority
CN
China
Prior art keywords
index
database
server
reconstruction
time period
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211614518.5A
Other languages
Chinese (zh)
Inventor
李园园
孙培
刘晓棣
杨杰
李玉林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fulian Yuzhan Technology Shenzhen Co Ltd
Original Assignee
Fulian Yuzhan Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fulian Yuzhan Technology Shenzhen Co Ltd filed Critical Fulian Yuzhan Technology Shenzhen Co Ltd
Priority to CN202211614518.5A priority Critical patent/CN116303417A/en
Publication of CN116303417A publication Critical patent/CN116303417A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2272Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/252Integrating or interfacing systems involving database management systems between a Database Management System and a front-end application
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a method for reconstructing a database index and electronic equipment, wherein the method comprises the following steps: under the condition that the reconstruction judging model judges that the database index reconstruction is needed, obtaining the time consumption of the server predicted by the reconstruction time length prediction model for the prediction of the database index reconstruction, and obtaining the prediction result of the subsequent change of the front end service interface connection number of the database output by the traffic trough time period prediction model; predicting a plurality of operable time periods of the server according to the prediction result, and selecting a reconstructable operation time period meeting the prediction time consumption from the predicted operable time periods; the server is instructed to reorganize the index or regenerate the index during the reconstructable job period. According to the method, whether the database index needs to be rebuilt or not and the operation time period of the database index is optimized can be rapidly judged, timeliness of the database index rebuilding is improved, and other businesses of the execution database are effectively prevented from being influenced by the database index rebuilding.

Description

Method for reconstructing database index and electronic equipment
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a method for reconstructing a database index and an electronic device.
Background
Rebuilding a database index is a common operation in database maintenance operations. After the database runs for a long time, the database index is damaged, or after the data in the database is updated in a large amount, the database index is possibly mismatched with the content in the database, and then the database index needs to be rebuilt to improve the retrieval efficiency of the database.
Generally, whether a database index needs to be rebuilt depends primarily on the experience of the database administrator. In order to avoid the reduction of the retrieval efficiency of the database, a database manager needs to reconstruct the database index in time after the database index is damaged or when the database index is not matched with the content in the database. This requires the database manager to monitor the database status at any time and requires the database manager to have accurate judgment experience.
However, in the manual operation state, the timeliness of monitoring the database state and the correctness of the judgment depending on the experience of the database manager cannot be well ensured, and the situation that the database index is not immediately rebuilt after the database index is damaged often occurs. Thus, there is a need for a method of rebuilding a database index to accurately determine when to initiate a rebuild of the database index.
Disclosure of Invention
Aiming at the problem of determining when to initiate reconstruction of the database index, the application provides a method and electronic equipment for reconstructing the database index. According to the method provided by the application, when the reconstruction of the database index is initiated can be determined, the interference of the reconstruction of the database index to other database business behaviors is avoided, and the timeliness of the reconstruction of the database index is improved.
The application adopts the following technical scheme:
in a first aspect, the present application provides a method of reconstructing a database index, the method being applied to an electronic device, the method comprising:
collecting relevant information of a database, wherein the relevant information of the database comprises the following steps: the method comprises the steps of monitoring information of the index size of a database, the index scanning times of the database, the index fragmentation rate of the database, the index density of the database, the processor configuration of a server for realizing the database, the number of read-write operations which can be performed by the server per second, the memory size of the server, the memory parameter setting of the server and the front-end service interface connection number of the database;
invoking a reconstruction judgment model, wherein the reconstruction judgment model is used for judging whether database index reconstruction is needed according to the index fragment rate and the index scanning times;
Inputting the index scanning times and the index fragment rate into the reconstruction judgment model to obtain a judgment result output by the reconstruction judgment model, wherein the judgment result comprises a database index reconstruction required or a database index reconstruction not required;
ending the flow of the method for reconstructing the database index when the judgment result is that the database index reconstruction is not needed;
when the judging result is that the database index reconstruction is needed, a reconstruction time length prediction model is called, wherein the reconstruction time length prediction model is used for predicting the prediction time consumption of the server for carrying out the database index reconstruction according to the index size, the index density, the processor configuration of the server, the number of read-write operations which can be carried out per second by the server, the memory size of the server and the memory parameter setting of the server;
inputting the index size, the index density, the processor configuration, the number of read-write operations which can be performed by the server per second, the memory size of the server and the memory parameter setting of the server in the related information of the database into the reconstruction duration prediction model, and obtaining the predicted time consumption output from the reconstruction duration prediction model;
When the judging result is that the database index reconstruction is needed, a flow low-valley time period prediction model is called, wherein the flow low-valley time period prediction model is used for generating a prediction result of the follow-up change of the front-end service interface connection number of the database according to the monitoring information of the front-end service interface connection number of the database;
inputting monitoring information of the front-end service interface connection number of the database into the flow off-peak time period prediction model, and obtaining the prediction result output from the flow off-peak time period prediction model;
predicting a plurality of operable time periods of the server according to the prediction result, wherein the time period of which the front-end service interface connection number of the database is smaller than a preset connection number threshold value is the operable time period of the server;
selecting a reconstructable operation time period from the predicted plurality of operation time periods, wherein the reconstructable operation time period is a time-consuming operation time period with a first duration greater than or equal to the predicted time-consuming operation time period after the current time;
and indicating the server to reconstruct the database index in the reconstructable operation time period.
According to the method, whether the database index needs to be rebuilt or not is judged by the rebuilding judgment model, and the operation time period for rebuilding the database index is optimized based on the rebuilding duration prediction model and the flow low-valley time period prediction model, so that the fact that the rebuilding of the database index affects the execution of other business of the database can be effectively avoided, and timeliness of rebuilding the database index is improved.
In one implementation manner of the first aspect, the database index rebuilding includes:
reorganizing an index, wherein the reorganizing the index comprises reordering data in an outermost data page of the database and compressing an index page;
or,
regenerating an index, wherein regenerating the index comprises deleting an original index of the database and generating a new index.
In an implementation manner of the first aspect, the reconstruction and judgment model is a decision tree algorithm model;
the method further comprises the steps of:
acquiring historical data of the database, wherein the historical data of the database comprises index scanning frequency historical records and index fragment rate historical records of the database and/or other databases, the index scanning frequency historical records comprise index scanning frequency record values and corresponding time periods, and the index scanning frequency historical records comprise index fragment rate record values and corresponding time periods;
generating a plurality of characteristic values according to the historical data of the database, wherein each characteristic value comprises an index scanning frequency record value and an index fragment rate record value which correspond to the same time period aiming at the same database;
Adding a mark for executing the database index reconstruction to the characteristic value, or adding a mark for not executing the database index reconstruction to the characteristic value to generate a plurality of training samples;
and performing model training by using a decision tree algorithm and using a machine learning platform by using the plurality of training samples to obtain the reconstruction judgment model.
In one implementation manner of the first aspect, the method further includes:
acquiring the database index reconstruction history record, wherein the database index reconstruction history record comprises occurrence time corresponding to the execution of the database index reconstruction;
the adding a flag for the eigenvalue that needs to perform the database index rebuilding, or adding a flag for the eigenvalue that does not need to perform the database index rebuilding, to generate a plurality of training samples includes:
based on the database index rebuilding history record, confirming whether the database index rebuilding is executed or not in a time period corresponding to the characteristic value;
when the database index reconstruction is executed in the time period corresponding to the characteristic value, adding a mark for executing the database index reconstruction for the characteristic value;
And when the database index reconstruction is not executed within the time period corresponding to the characteristic value, adding a mark which does not need to execute the database index reconstruction to the characteristic value.
In an implementation manner of the first aspect, the adding a flag that needs to perform the database index rebuilding to the feature value, or adding a flag that does not need to perform the database index rebuilding to the feature value, to generate a plurality of training samples includes:
when the index fragment rate record value is larger than a preset fragment rate threshold value and the index scanning frequency record value is larger than a preset scanning frequency threshold value, adding a mark for executing the database index reconstruction for the characteristic value;
and when the index fragment rate record value is smaller than or equal to a preset fragment rate threshold value and/or the index scanning frequency record value is smaller than or equal to a preset scanning frequency threshold value, adding a mark which does not need to execute the database index reconstruction to the characteristic value.
In an implementation manner of the first aspect, the reconstruction duration prediction model is a multiple linear regression algorithm model;
the method further comprises the steps of:
acquiring database index reconstruction histories of the database and/or other databases, wherein the database index reconstruction histories comprise database index reconstruction time length record values and occurrence time of executing database index reconstruction;
Acquiring database historical data, wherein the database historical data comprises index size historical records, index density historical records, processor configuration records of a server, read-write operation times records which can be performed by the server every second, server memory size records and server memory parameter setting records, the index size historical records comprise index size record values and corresponding time periods, the index density historical records comprise index density record values and corresponding time periods, the processor configuration records of the server comprise processor configuration record values and corresponding time periods of the server, the read-write operation times records which can be performed by the server every second comprise read-write operation times records which can be performed by the server every second and corresponding time periods, the server memory size records comprise server memory size record values and corresponding time periods, and the server memory parameter setting records comprise server memory parameter setting record values and corresponding time periods;
generating a plurality of training samples according to the database index reconstruction history record, wherein each training sample comprises a database index reconstruction duration record value;
Adding feature values to the plurality of training samples according to the database history data, wherein: the characteristic values comprise the index density record value, the index size record value, a processor configuration record value of the server, a read-write operation number record value which can be carried out by the server every second, a memory size record value of the server and a memory parameter setting record value of the server; in the same training sample, the database index rebuilding time length record value, the index density record value, the index size record value, the server processor configuration record value, the server reading and writing operation times record value which can be carried out every second, the server memory size record value and the server memory parameter setting record value correspond to the same database and the same time period;
and performing model training by using a multiple linear regression algorithm and a machine learning platform by using the plurality of training samples to obtain the reconstruction duration prediction model.
In an implementation manner of the first aspect, the traffic trough period prediction model is a K nearest neighbor regression algorithm model;
the method further comprises the steps of:
Acquiring a monitoring history of the front-end service interface connection number of the database and/or other databases, wherein the monitoring history of the front-end service interface connection number comprises a front-end service interface connection number record value and a corresponding time period;
acquiring host access history data of the database and/or other databases, wherein the host access history data comprises host addresses for accessing the database and/or other databases and occurrence time of the access;
generating a training sample set comprising a plurality of training samples according to the monitoring history of the front-end service interface connection number, wherein each training sample comprises a front-end service interface connection number record value;
adding a characteristic value to the training sample according to the host access history data, wherein: the characteristic value comprises the host address and the occurrence time of the access; the host address and the front-end service interface connection number record value in the same training sample correspond to the same database, and the access time is within a time period corresponding to the front-end service interface connection number record value;
and performing model training by using a K nearest neighbor regression algorithm and a machine learning platform by using the training sample set to obtain the traffic trough time period prediction model.
In an implementation manner of the first aspect, the performing model training using the training sample set to obtain the traffic trough period prediction model includes:
splitting the training sample set into a plurality of sub-training sample sets based on different services of the database, wherein each sub-training sample set corresponds to one service of the database;
model training is performed by using the sub-training sample set to obtain the traffic trough period prediction model corresponding to the traffic of the database.
In an implementation manner of the first aspect, the instructing the server to perform the database index rebuild after the rebuildable job period further includes:
acquiring the current load condition of the server at the starting time of the reconstructable operation time period;
and when the current load condition of the server meets the execution requirement of the rebuilding database index, rebuilding the database index in the rebuildable operation time period.
In a second aspect, the present application provides an electronic device comprising a memory for storing computer program instructions and a processor for executing the computer program instructions, wherein the computer program instructions, when executed by the processor, trigger the electronic device to perform the method steps as described in the first aspect.
Drawings
FIG. 1 is a schematic diagram of an electronic device according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a logic structure of a model center of the electronic device shown in FIG. 1;
FIG. 3 is a flow chart illustrating a method of rebuilding a database index according to an embodiment of the present application;
FIG. 4 is a flow chart illustrating a method of training a reconstruction judgment model according to the method of reconstructing a database index of FIG. 2;
FIG. 5 is a flow chart illustrating a method of training a reconstruction duration prediction model of the method of reconstructing a database index of FIG. 2;
FIG. 6 is a training and predictive trend graph of a prediction model of the method of reconstructing a database index of FIG. 2;
FIG. 7 is a flow chart illustrating a method of training a traffic trough period prediction model of the method of reconstructing a database index of FIG. 2.
Detailed Description
For the purposes, technical solutions and advantages of the present application, the technical solutions of the present application will be clearly and completely described below with reference to specific embodiments of the present application and corresponding drawings. It will be apparent that the described embodiments are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.
The terminology used in the description section of the present application is for the purpose of describing particular embodiments of the present application only and is not intended to be limiting of the present application.
In order to solve the problem that how to reconstruct a database index in time in the prior art, the application provides a method for reconstructing the database index and electronic equipment.
Fig. 1 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
As shown in fig. 1, the electronic device includes a memory 1310 for storing computer program instructions and a processor 1320 for executing the program instructions, wherein the computer program instructions stored in the memory 1310, when executed by the processor 1320, trigger the electronic device to execute a method flow corresponding to the computer program instructions.
Specifically, the processor 1320 is configured with the data acquisition module 110, the index reconstruction determination module 120, and the model center 140.
The data collection module 110 is used for collecting related information of the database.
The model center 140 is used to load the computational model.
Fig. 2 is a schematic diagram of the logic structure of the model center 140 shown in fig. 1.
As shown in fig. 2, the model center 140 is configured to store and load the reconstruction judgment model, the reconstruction duration prediction model, and the traffic trough period prediction model.
The index reconstruction determination module 120 is configured to obtain related information of the database collected by the data collection module 110; calling a reconstruction judgment model, a reconstruction duration prediction model and a flow valley time period prediction model loaded by the model center 140; and calculating by using a reconstruction judging model, a reconstruction duration prediction model and a flow trough time period prediction model according to the related information of the database, and selecting a first time period for carrying out database index reconstruction according to a calculation result.
Specifically, fig. 3 is a flowchart illustrating a method for reconstructing a database index according to an embodiment of the present application.
The electronic device shown in fig. 1 performs the following method for reconstructing the database index shown in fig. 3 to instruct the server to reconstruct the database index.
In step S200, the data acquisition module 110 acquires related information of the database.
Specifically, in step S200, the relevant information of the database includes: index size of database
(index size), index scan number of database (idx_scan), index fragment rate of database (leaf_fragment), index density of database (leaf_density), processor configuration of server implementing database (e.g. number of processor cores), number of read/write operations that can be performed per second by server (Input/Output Operations Per Second, IOPS), memory size of server (work_mem), memory parameter setting of server (main management_work_mem), and monitoring information of front end service interface connection number of database.
The memory parameter setting of the server refers to the available memory setting maintained by the server for the database. For example, in an application scenario, the memory size of the server is 8G, and setting 4G therein as the available memory when the server performs database maintenance.
Further, in step S200, the data sources of the relevant information of the database include, but are not limited to, one or any combination of the following: database index and table information, CMBD server configuration information, platform traffic monitoring data.
Specifically, in step S200, relevant information of the database is obtained through a structured query language (Structured Query Language, SQL) SQL query. The specific acquisition steps comprise:
creating a database table (index_info) comprising the columns: id (number), creator, create_date (creation time), update_date (update time), hostinfo (host address), dbname (database name), CPU, IOPS, work_mem, main name_work_mem, table name, index size, leaf_density, leaf_fragment (index fragment rate), idx_scan (index scan number), is_demand (whether to reconstruct), status, esti_duration (prediction time), esti_start, esti_end time, starttime, endtime;
Entering a database address and a database name of a database;
according to the entered database address and database name, invoking SQL script to query index information, wherein the queried index information comprises an index name (index name corresponding to a database table), an index size (index size corresponding to the database table), index scanning times (idx_scan corresponding to the database table), index fragment rate (leaf_fragment corresponding to the database table), and index density (leaf_density corresponding to the database table);
writing the queried index information into a database table (index_info);
the call configuration management database (Configuration Management Database, CMDB) platform queries the server information of the database, the server information including the CPU core number of the server (CPU corresponding to the database table), the hard disk IOPS of the server (IOPS corresponding to the database table), the memory size of the server (work_mem corresponding to the database table), the memory parameter setting of the server (main_work_mem corresponding to the database table).
In step S200, the relevant information of the database may be collected for one index of one database (the database table includes only one row of content); the relevant information of the database may be collected for a plurality of indexes of a plurality of databases, or for a plurality of indexes of one database (the database table only includes a plurality of rows of contents).
Table 1 shows partial data of a database table (index_info) according to an embodiment of the present application.
TABLE 1
Figure BDA0003996255800000071
As shown in table 1, the column of the database table (index_info) includes a table name, an index size, an index density, an index fragmentation rate, and an index scan number.
Further, to improve data accuracy. In step S200, data cleansing and data updating are also performed on the collected related information of the database.
Specifically, in one embodiment, index information or server information in which null data exists is deleted. For example, if one or more of the index name, index size, index scan number, index fragment rate, and index density of the obtained index information is null, the index information needs to be deleted, and the index information is re-obtained; for another example, if one or more of the CPU core number, the hard disk IOPS, the memory size, and the memory parameter setting in the obtained server information is null, the corresponding null information needs to be deleted, and the server information is re-obtained.
Specifically, in one embodiment, index information or server information in which error data exists is deleted. For example, assuming that the index fragment rate of the obtained index information is 100% (which cannot theoretically occur in an actual application scenario), the index information needs to be deleted and re-obtained; for another example, if the memory parameter set in the obtained server information corresponds to a memory size larger than the memory size, the corresponding server information needs to be deleted and the server information is obtained again.
In step S210, the index rebuilding determination module 120 invokes the rebuilding determination model loaded by the model center 140, where the rebuilding determination model is used to determine whether the database index rebuilding is needed.
In an application scenario, there are many factors for determining whether to reconstruct the database index, in an embodiment of the present application, according to analysis of database maintenance experience, significant feature index fragment rate, index usage times for determining whether to reconstruct the database index are screened out through model training and score comparison, and non-significant features such as index density, index size, etc. are discarded.
Specifically, in an embodiment of the present application, the reconstruction determination model determines whether to reconstruct the database index according to the index fragment rate and the index reference number.
Specifically, in one embodiment, the reconstruction judgment model is a decision tree algorithm model. Preferably, the reconstruction judgment model is a decision tree algorithm model in a monitoring learning-classifying algorithm.
FIG. 4 is a flow chart of a method of training a reconstructed decision model in accordance with an embodiment of the present application.
Prior to step S210, the electronic device trains the reconstruction judgment model based on the following flow shown in fig. 4.
Step S510, obtaining database historical data, wherein the database historical data comprises index scanning time historical records and index fragment rate historical records of a database and/or other databases, the index scanning time historical records comprise index scanning time record values and corresponding time periods, and the index scanning time historical records comprise index fragment rate record values and corresponding time periods.
Step S520, generating a plurality of characteristic values according to the historical data of the database, wherein each characteristic value comprises an index scanning frequency record value and an index fragment rate record value corresponding to the same time period aiming at one database;
for example, in the index scan number history for the database A1, the index scan number record value is 9 in a period of recording 2022, 10, 9, 13 minutes, 15 seconds to 2022, 10, 9, 13 minutes, 31 minutes, 25 seconds.
For another example, in the index patch rate history for the database A1, the index patch rate record value is 13% in a period of 30 minutes 25 seconds at 13 th 10 th month 9 th day of 2022 to 33 minutes 15 seconds at 13 th month 9 th day of 2022.
A characteristic value T1 may be generated, the characteristic value T1 including an index fragmentation rate record value of 13% and an index scan number record value of 9, the characteristic value T1 corresponding to a period of time of 2022, 10 month, 9 days, 13 points 30 minutes 25 seconds to 2022, 10 months, 9 days, 13 points 31 minutes 25 seconds.
In step S530, a flag (True, T) that needs to perform database index reconstruction is added to the feature value, or a flag (False, F) that does not need to perform database index reconstruction is added to the feature value, so as to generate a plurality of training samples.
Specifically, in step S530, it is confirmed that the index scan number record value and the index fragment rate record value included in the feature value are used to add a flag for performing the database index rebuilding to the feature value, or a flag for not performing the database index rebuilding to the feature value is added.
For example, when the index fragmentation rate record value is greater than a preset fragmentation rate threshold (e.g., 20%), and the index scan number record value is greater than a preset scan number threshold (e.g., 10), adding a flag for the feature value that requires the database index reconstruction to be performed;
and when the index fragment rate record value is smaller than or equal to a preset fragment rate threshold value and/or the index scanning frequency record value is smaller than or equal to a preset scanning frequency threshold value, adding a mark for the characteristic value, wherein the mark does not need to execute database index reconstruction.
Further, in another embodiment, a database index rebuilding history is obtained, and according to the database index rebuilding history, it is confirmed that a flag for executing the database index rebuilding is added to the feature value, or a flag for not executing the database index rebuilding is added to the feature value.
Specifically, the database index rebuilding history record includes occurrence time corresponding to execution of database index rebuilding. Based on the database index reconstruction history, confirming whether the database index reconstruction is executed in a time period corresponding to the characteristic value; under the condition that the database index reconstruction is executed at the occurrence time corresponding to the characteristic value, adding a mark for executing the database index reconstruction for the characteristic value; and adding a mark which does not need to execute the database index reconstruction for the characteristic value under the condition that the database index reconstruction is not executed at the occurrence time corresponding to the characteristic value.
For example, in the database index rebuilding history for the database A1, since the database index rebuilding is executed at the 13 th day of 10 months of 2022 and the 13 th day of 9 minutes of 30 minutes of 55 seconds, the occurrence time of the database index rebuilding is within the period corresponding to the feature value T1, the mark for executing the database index rebuilding is added to the feature value T1, and the training sample X1 is generated.
Step S540, using a decision tree algorithm, using a machine learning platform, performing model training using a plurality of training samples to obtain a reconstructed judgment model.
Further, after step S540, the reconstruction judgment model is checked using the training sample. Specifically, the characteristic value of the training sample is input into the reconstruction judgment model, whether the output of the reconstruction judgment model is consistent with the characteristic value mark of the training sample is compared, if so, the output of the reconstruction judgment model is correct, and if not, the output of the reconstruction judgment model is incorrect. And using a plurality of training samples to test the reconstruction judgment model, obtaining the output accuracy of the reconstruction judgment model, and completing the training of the reconstruction judgment model when the output accuracy of the reconstruction judgment model reaches a preset standard (for example, 90%). If the output accuracy of the reconstruction judgment model does not reach the preset standard, continuing to train the reconstruction judgment model.
Table 2 shows sample data and test results of a training portion of a reconstruction judgment model according to an embodiment of the present application.
TABLE 2
ID Index fragmentation rate Index scan times Label (Label) Test results
1 9.98 1304 F F
2 1.8 34673 F F
3 34.16 525881304 T T
4 30.77 1735836929 T T
5 35.37 15116277 T T
6 32.78 29405877 T T
7 33.88 0 F F
8 16.73 52764284 F F
9 30.98 1065862400 T T
19 31.64 85170 T T
In an embodiment, as shown in table 2, the training samples include index fragment rates and usage times (index scanning times) of feature values in different rows, and labels in the same row are labels added to the feature values, and the test result is a determination result output by the reconstruction determination model after the feature values are input to the reconstruction determination model. And when the mark added for the characteristic value is consistent with the judgment result output by the reconstruction judgment model, indicating that the reconstruction judgment model is output correctly.
After step S210, step S211 is performed.
Step S211, the index scanning times and the index fragment rate in the related information of the database are input into the reconstruction judgment model, and the judgment result output by the reconstruction judgment model is obtained.
Specifically, in step S211, the index scan number (idx_scan) and the index fragment rate (leaf_fragment) of the database table (index_info) are input to the reconstruction judgment model, and the judgment result output by the reconstruction judgment model is obtained, where the judgment result includes that the database index reconstruction (T) needs to be performed or that the database index reconstruction (F) does not need to be performed.
Recording the judgment result output by the reconstruction judgment model into a database table (index_info) (corresponding to the is_redox of the database table). Specifically, when the reconstruction judgment model outputs T, recording T in the is_redox of the database table; when the reconstruction judgment model outputs F, F is recorded in the is_redox of the database table.
Further, in the case that the database table (index_info) includes a plurality of lines of contents, the database table is
The index scan number (idx_scan) and the index fragment rate (leaf_fragment) of each line (index_info) are input to the reconstruction judgment model, and the judgment result corresponding to each line is acquired and recorded.
Table 3 shows a portion of data obtained by filling the database table (index_info) shown in table 1 with the determination result according to an embodiment of the present application.
TABLE 3 Table 3
Figure BDA0003996255800000101
In one embodiment, as shown in Table 3, the columns of the database table (index_info) include table name, index size, index density, index fragmentation rate, index scan number, and whether the database index needs to be rebuilt.
According to the method, whether the database index needs to be rebuilt or not is judged based on the rebuilding judgment model, and timeliness of rebuilding the database index is improved.
Further, after step S211, the row with is_redox F in the database table (index_info) is deleted to reduce the data amount of the database table (index_info).
After step S211, when the determination result output by the reconstruction determination model is that the database index reconstruction is not required (i_redox of each line in the database table (index_info) is F (or the database table (index_info) is empty)), step S212 is performed. Step S212, the process of the method for reconstructing the database index is ended.
After step S211, when the determination result output by the reconstruction determination model is that the database index reconstruction is required (i_redox of a certain row or a few rows in the database table (index_info is T), step S220 is performed. In step S220, the index reconstruction determination module 120 invokes the reconstruction duration prediction model loaded by the model center 140, where the reconstruction duration prediction model is used to predict the time consumed by the prediction server for database index reconstruction.
In an application scenario, there are many time-consuming factors that affect the server's reconstruction of the database index. In an embodiment of the present application, factors affecting the index rebuilding duration are analyzed according to database maintenance experience, and index size, index density, CPU core number, server IOPS, database work_mem parameter, main work_mem parameter are screened out as significant feature values affecting time consumption of the server for performing database index rebuilding, and non-significant features such as table size, table number, etc. are discarded.
Specifically, in one embodiment, the rebuilding duration prediction model predicts the time consumed by the database index rebuilding by the server according to the index size, the index density, the server processor configuration, the number of read-write operations that can be performed by the server per second, the memory size of the server, and the memory parameter setting of the server.
Specifically, in one embodiment, the reconstruction duration prediction model is a multiple linear regression algorithm model. Specifically, the reconstruction duration prediction model is a supervised learning-regression algorithm-multiple linear regression algorithm model.
FIG. 5 is a flow chart of a method of training a reconstructed time duration prediction model according to an embodiment of the present application.
Prior to S220, the electronic device trains a reconstructed duration prediction model based on the following procedure shown in fig. 5.
In step S710, a database index rebuilding history of the database and/or other databases is obtained, where the database index rebuilding history includes a database index rebuilding duration record value and an occurrence time of executing the database index rebuilding.
Step S720, obtaining database history data, wherein the database history data comprises index size history records, index density history records, processor configuration records of a server, read-write operation times records which can be performed by the server every second, memory size records of the server and memory parameter setting records of the server of a database and/or other databases.
The index size history record includes index size record values and corresponding time periods. Specifically, in the process of running the database by the server, when the index size of the database is changed due to the change of the database data, the server records the latest size of the index into the index size history record and records the occurrence time of the change of the index size record value. The time period between the occurrence time of the change of the index size before and after the index size record value is the time period corresponding to the index size record value.
The index density history record includes index density record values and corresponding time periods. Specifically, in the process of running the database by the server, when the index density of the database changes due to the change of the database data, the server records the latest density of the index into the index size history record and records the occurrence time of the change of the index density record value. The time period between the occurrence time of the change of the index density before and after the index density record value is the time period corresponding to the index density record value.
The processor configuration record of the server includes the processor configuration record value of the server and the corresponding time period. Specifically, when the processor configuration of the server changes (for example, upgrading the processor with 2 cores to the processor with 3 cores), the server records the latest processor configuration into the processor configuration record and simultaneously records the occurrence time of the change of the processor configuration record value. The time period between the occurrence time of the change of the processor configuration record value before and after the processor configuration record value is the time period corresponding to the processor configuration record value.
The record of the number of the read-write operations which can be performed by the server per second comprises a record value of the number of the read-write operations which can be performed by the server per second and a corresponding time period. Specifically, when the number of times of the read/write operations that can be performed per second of the server changes (for example, a hard disk is upgraded, and a hard disk with a higher read/write speed is configured), the server records the latest number of times of the read/write operations that can be performed per second into the configuration record of the processor, and simultaneously records the occurrence time of the change of the recording value of the number of times of the read/write operations that can be performed per second. The time period between the occurrence time of the change of the recording value of the number of the read-write operations which can be performed every second before and after the recording value of the number of the read-write operations which can be performed every second is the time period corresponding to the recording value of the number of the read-write operations which can be performed every second.
The server memory size record includes a server memory size record value and a corresponding time period. Specifically, when the memory size of the server changes (for example, the memory is upgraded, and a larger capacity memory is configured), the server records the latest memory size into the processor configuration record and records the occurrence time of the change of the memory size record value. The time period between the occurrence time of the change of the memory size recorded values before and after the memory size recorded value is the time period corresponding to the memory size recorded value.
The server memory parameter setting record comprises a server memory parameter setting record value and a corresponding time period. Specifically, when the memory parameter setting of the server changes (e.g., more memory is configured for maintaining the database), the server records the latest memory parameter setting into the processor configuration record and simultaneously records the occurrence time of the change of the memory parameter setting record value. The time period between the occurrence time of the change of the two memory parameter setting record values before and after the memory parameter setting record value is the time period corresponding to the memory parameter setting record value.
Further, in some application scenarios, one or more of the processor configuration, the number of read/write operations that can be performed per second, the memory size, and the memory parameter setting of the server are intrinsic attributes of the server, and no consideration is given to the scenario in which the change occurs, and in this application scenario, the non-changing items in the processor configuration, the number of read/write operations that can be performed per second, the memory size, and the memory parameter setting do not have a corresponding time period (or the corresponding time period is valid for a long period).
Further, since the index size, the index density, the server processor configuration, the number of read/write operations that can be performed by the server per second, the memory size of the server, and the memory parameter setting of the server predict that the data level between the servers is large (for example, the index size is MB level, the memory size and the memory parameter are GB level), in step S720, the record data in the database history data is normalized, for example, the values of the index size, the memory size, and the memory parameter setting fall within the interval of-1 to 0.
For example, in an actual scenario, assuming that the physical upper limit of the server memory size is 32G, the corresponding standardized interval of the server memory size is set to-1 to 0 (-1 corresponds to 0G,0 corresponds to 32G). If the memory size of the server is 16G, the memory is standardized, and the standardized value is-0.5.
For example, in an actual scene, assuming that the upper limit of the index size is 500MB, the corresponding standardized section of the index size is set to-1 to 0 (-1 corresponds to 0MB, and 0 corresponds to 500 MB). If the index size is 20MB, it is normalized to a value of-0.96.
In step S730, a plurality of training samples are generated according to the database index reconstruction history, and each training sample includes a database index reconstruction duration record value.
Step S740, adding characteristic values for a plurality of training samples according to the historical data of the database, wherein: the characteristic values comprise an index density record value, an index size record value, a processor configuration record value of the server, a record value of the number of times of reading and writing operations which can be performed by the server every second, a memory size record value of the server and a memory parameter setting record value of the server.
In the same training sample, the database index rebuilding time length record value, the index density record value, the index size record value, the server processor configuration record value, the server reading and writing operation times record value which can be carried out every second, the server memory size record value and the server memory parameter setting record value correspond to the same database and the same time period.
For example, in the database index rebuilding history for the database A1, the database index rebuilding is performed for 30 minutes and 55 seconds at 13 th 10/9/2022, and the execution time period of the database index rebuilding is 3 seconds.
In the index size history for the database A1, the index size record value is 50MB in a period of 30 minutes 14 seconds at 13 th 10 th month 9 th year 2022 to 31 minutes 26 seconds at 13 th 10 th month 9 th year 2022.
In the index density history for the database A1, the index density record value was 65.15 in a period of 30 minutes 15 seconds at 13 on 10 month and 9 days of 2022 to 31 minutes 27 seconds at 13 on 10 month and 9 days of 2022.
In the processor configuration record for the server of database A1, the processor configuration record value is 4 cores, 1.33GHZ. And, the processor configuration is valid for a long period of time, without a corresponding period of time.
In the number of times of writing operations that can be performed per second for the server of the database A1, the number of times of writing operations that can be performed per second is recorded to be 1000. In addition, the number of times of the read-write operation which can be performed per second is effective for a long time, and no corresponding time period exists.
In the memory size record for the server of database A1, the memory size record value is 16G. And the memory size is valid for a long time, and no corresponding time period exists.
In the memory parameter setting record for the server of the database A1, the memory parameter setting record value is 8G. And, the memory parameter setting is valid for a long time, and no corresponding time period exists.
A training sample X2 is created, the training sample X2 comprising a database index rebuilding execution time period of 3 seconds. And adding a characteristic value T2 to the training sample X2, wherein the characteristic value T2 comprises an index size record value 50MB, an index density record value 65.15, a processor configuration record value 4 kernel/1.33 GHZ, a read-write operation number record value 1000 which can be performed per second, a memory size record value 16G and a memory parameter setting record value 8G.
Step S750, using a multiple linear regression algorithm, performing model training by using a machine learning platform and using a plurality of training samples to obtain a reconstruction duration prediction model.
Further, after step S750, the reconstructed time length prediction model is checked using the training samples. Specifically, the characteristic value of the training sample is input into the reconstruction time length prediction model, whether the difference value between the prediction time consumption of the output of the reconstruction judgment model and the database index reconstruction time length record value of the training sample is within a preset range is compared, if the difference value is within the preset range, the output of the reconstruction judgment model is correct, and if the difference value is outside the preset range, the output of the reconstruction judgment model is wrong. Specifically, when the difference between the predicted time consumption and the database index reconstruction time length record value is smaller than 10% of the database index reconstruction time length record value, the difference between the predicted time consumption and the database index reconstruction time length record value is within a preset range.
FIG. 6 is a training and predictive trend graph of a prediction model for a reconstructed time period according to one embodiment of the present application.
As shown in fig. 6, the prediction result of the reconstruction time length prediction model and the database index reconstruction time length record value of the training sample tend to be consistent.
Table 4 shows a portion of training data for a reconstructed duration prediction model according to an embodiment of the present application.
TABLE 4 Table 4
Figure BDA0003996255800000131
Figure BDA0003996255800000141
As shown in table 4, the index size, index density, CPU, IOPS, memory size (mem 1), and memory parameter setting (mem 2) are normalized data. time is the database index reconstruction duration record value of the training sample. The index size, the index density, the CPU, the IOPS, the memory size (mem 1) and the memory parameter setting (mem 2) are input into a rebuilding duration prediction model, and the rebuilding duration prediction model outputs a test result.
As shown in table 4, the deviation between the test result and the time is large, and training of the reconstruction duration prediction model needs to be continued.
And using a plurality of training samples to check the reconstruction time length prediction model, obtaining the output accuracy of the reconstruction time length prediction model, and finishing the training of the reconstruction time length prediction model when the output accuracy of the reconstruction time length prediction model reaches a preset standard (for example, 90%). If the output accuracy of the reconstruction duration prediction model does not reach the preset standard, continuing to train the reconstruction duration prediction model.
After step S220, step S221 is performed.
In step S221, the index size, index density, processor configuration, number of read/write operations that can be performed by the server per second, the memory size of the server, and the memory parameter setting of the server are input into the reconstruction duration prediction model, and the prediction time consumption (in seconds in one embodiment) output by the reconstruction duration prediction model is obtained.
Specifically, in step S221, indexsize, leaf _ density, CPU, IOPS, work _mem and main_work_mem of the row recorded as T in the database table (index_info) are input to the reconstruction duration prediction model, the prediction time consumption output by the reconstruction duration prediction model is obtained, and the prediction time consumption is recorded to the esti_duration.
Further, in the case where the database table (index_info) includes a plurality of lines of is_redox as T, indexsize, leaf _ density, CPU, IOPS, work _mem and main_work_mem of each line of the database table (index_info) whose is_redox as T are input to the reconstruction period prediction model, and prediction time consumption of each line is acquired and recorded to the esti_duration of the corresponding line, respectively.
After step S211, when the determination result output by the reconstruction determination model indicates that the database index reconstruction is required, step S230 is performed in addition to the predicted time consumption for obtaining the reconstruction duration prediction model output in step S220 and step S221.
In step S230, the index rebuilding determination module 120 invokes the traffic trough period prediction model loaded by the model center 140, where the traffic trough period prediction model is used to predict the subsequent change of the front end service interface connection number of the database according to the monitoring information of the front end service interface connection number of the database.
Specifically, in one embodiment, the traffic trough period prediction model is a K nearest neighbor regression algorithm (K-NearestNeighbor, KNN) model.
FIG. 7 is a flow chart illustrating a method of training a traffic trough period prediction model according to an embodiment of the present application.
Prior to step S230, the electronic device trains the traffic trough period prediction model based on the following flow shown in fig. 7.
Step S810, a monitoring history record of the front-end service interface connection number of the database and/or other databases is obtained, wherein the monitoring history record of the front-end service interface connection number comprises a front-end service interface connection number record value and a corresponding time period.
Specifically, in step S810, monitoring data about the number of front-end service interfaces in the monitoring platform (e.g., zabbix) is queried, and monitoring data of the last year is obtained.
In step S820, host access history data of the database and/or other databases is obtained, where the host access history data includes host addresses of the access databases and/or other databases and occurrence time of the access.
In step S830, a training sample set including a plurality of training samples is generated according to the monitoring history of the front-end service interface connection number, and each training sample in the training sample set includes a front-end service interface connection number record value.
Step S840, adding feature values to the training samples in the training sample set according to the host access history data, wherein:
the characteristic value comprises a host address of a host for accessing and the occurrence time of the access;
the host address and the front-end service interface connection number record value in the same training sample correspond to the same database;
the access time in the same training sample is within the time period corresponding to the front-end service interface connection number record value.
For example, in the monitoring history for the front-end service interface connection number of the database A1, the front-end service interface connection number record value is 5 in a period of 2022, 10, 9, 12 minutes, 55 seconds to 2022, 10, 9, 12 minutes, 35 minutes, 27 seconds.
In the host access history data for the database A1, the 2022, 10, 9, 12 points, 32 minutes, and 14 seconds are recorded, and the host with the host address Z1 accesses the database.
And establishing a training sample X3, wherein the training sample X3 comprises a front-end service interface connection number record value of 5. A feature value T3 is added to the training sample X3, and the feature value T3 includes the host address Z1 and the access time 2022, 10, 9, 12 points, 32 minutes and 14 seconds.
Further, in an embodiment, a period of time corresponding to a record value of the front end service interface connection number in the monitoring history of the access time of the host access history data and the front end service interface connection number is converted into a periodic description mode with a period of years.
Specifically, the time is converted into what week the week is (52 weeks a year), what day the weekday is (7 days a week), what hour the hour is (24 hours a day), and what minute the minutes are (60 minutes a hour). For example, 5 points 10 at 1 month 6 of 2022 are converted into week (1) weekday (6) hours (5) minutes (10).
Step S850, using a K nearest neighbor regression algorithm, performing model training by using a training sample set by using a machine learning platform to obtain a flow valley time period prediction model.
Further, after step S850, the traffic trough period prediction model is checked using the training sample. Specifically, the characteristic value of the training sample is input into a traffic low-valley time period prediction model, whether the difference value between the predicted value of the front end service interface connection number of the output of the traffic low-valley time period prediction model and the recorded value of the front end service interface connection number of the training sample is within a preset range is compared, if the difference value is within the preset range, the output of the traffic low-valley time period prediction model is correct, and if the difference value is outside the preset range, the output of the traffic low-valley time period prediction model is incorrect. Specifically, when the difference between the predicted value and the front-end service interface connection number record value is smaller than 10% of the front-end service interface connection number record value, the difference between the predicted value and the front-end service interface connection number record value is within a preset range.
Table 5 shows a portion of training data for a traffic trough period prediction model according to an embodiment of the present application.
host week weekday hour minute value Test results
0 13 6 23 5 152.0367 159.20088
0 13 6 23 5 155.8981 159.20088
0 13 6 23 5 162.8581 159.20088
0 13 6 23 5 168.9614 159.20088
0 13 6 23 4 233.772 293.86074
0 13 6 23 4 220.3243 293.86074
0 13 6 23 4 241.0623 293.86074
0 13 6 23 4 215.5407 293.86074
0 13 6 23 3 261.03 296.36378
0 13 6 23 3 255.8694 296.36378
0 13 6 23 3 271.6318 296.36378
0 13 6 23 3 260.4238 296.36378
0 13 6 23 2 262.7971 283.79818
0 13 6 23 2 271.9616 283.79818
As shown in Table 5, the feature values (access occurrence time) of the training samples are converted into
The format of week/weekday/hour/minute, 2 nd to 5 minutes of 23 rd hour on the sixth day of week 13. The front-end service interface connection number record value (value) and the output value (test result) of the traffic low-valley time period prediction model tend to be consistent, and the traffic low-valley time period prediction model can be used.
Furthermore, in the practical application scenario, for different services of the database, the periodic variation rule of the connection number of the front-end service interface is also different due to different service characteristics. In view of the above features, in an embodiment, a training sample set is split into a plurality of sub-training sample sets based on different database services, each sub-training sample set corresponding to one database service; model training is performed by using the sub-training sample set to obtain a traffic trough period prediction model corresponding to the database service.
After step S230, step S231 is performed.
Step S231, the monitoring information of the front end service interface connection number of the database in the related information of the database is input into the flow low-valley time period prediction model, and the prediction result of the subsequent change of the front end service interface connection number output by the flow low-valley time period prediction model is obtained.
Specifically, in one embodiment, the host address of the host currently accessing the database is obtained according to the monitoring information of the front end service interface connection number of the database, and is recorded in the hostinfo of the database table (index_info). And inputting the host address recorded by the hostingfo of the database table (index_info) into a traffic trough time period prediction model, and obtaining a front-end service interface connection number predicted value and a time period corresponding to the predicted value output by the traffic trough time period prediction model.
Step S240, predicting a plurality of operable time periods of the server according to the prediction result, wherein the time period of which the front end service interface connection number of the database is smaller than the preset connection number threshold value is the operable time period of the server.
Specifically, comparing the predicted value of the connection number of the front-end service interface with a preset threshold value of the connection number, and screening the predicted value of the connection number of the front-end service interface smaller than the preset threshold value of the connection number from the predicted values of the connection number of the front-end service interface.
The preset connection number threshold is a value set according to the processing resources of the server, and when the connection number of the front-end service interface is larger than or equal to the connection number threshold, the processing resources of the server are insufficient for reconstructing the database index; only when the connection number of the front-end service interface is less than or equal to the connection number threshold value, the processing resource of the server is enough to reconstruct the database index. Specifically, the connection number threshold takes a value between 50 and 300.
Predicting the operable time period of the server according to the time period corresponding to the front-end service interface connection number predicted value smaller than the preset connection number threshold, wherein when a plurality of time periods corresponding to the front-end service interface connection number predicted value smaller than the preset connection number threshold are adjacent time periods, the adjacent time periods are spliced to form the operable time period; when the time period corresponding to the front-end service interface connection number predicted value smaller than the preset connection number threshold is an independent time period, the independent time period is taken as an operable time period.
After steps S220 to S221 and steps S230 to S240 are performed, the predicted time consumption outputted by the reconstruction duration prediction model is obtained, and a plurality of operable time periods of the server are predicted, step S250 is performed.
Step S250, selecting a reconstructable operation time period from the predicted plurality of operation time periods according to the predicted time consumption output by the reconstruction time period prediction model, wherein the reconstructable operation time period is an operation time period with the first time period more than or equal to the predicted time consumption after the current time.
For example, in one embodiment, the predicted required duration for rebuilding the database index is TS, and the predicted period of time available for operational use for rebuilding the database index includes period T1, period T2, period T3, and period T4 from the current time. Wherein, the time period T1 and the time period T4 are smaller than the time length TS. The time periods T2, T3 are greater than the time period TS. To avoid database index rebuild failure, the database index rebuild needs to be performed continuously. Thus, the period T2 or the period T3 is selected as a job period for performing the rebuilding of the database index. Further, in order to avoid database traffic errors, it is necessary to reconstruct the database index as soon as possible, and therefore, the period T2 is selected as the working period for reconstructing the database index.
Further, in another embodiment, when no predicted time-consuming operable period exists among the predicted operable periods available for performing the database index rebuilding, the index rebuilding determination module 120 outputs a reminder to the administrator, and the administrator designates a period for performing the index rebuilding.
According to the method, the operation time period for reconstructing the database index can be optimized, and the influence of the reconstruction of the database index on the execution of other business of the database is effectively avoided.
Step S260 instructs the server to perform database index rebuilding in the rebuildable operation time period.
Specifically, in step S260, the index rebuilding determination module 120 formulates (time-task) a planned job according to the selected period of time that can be worked. For example, the data source information, the rebuild index statement, and the rebuild start time are formed into an executable script to be added to the job schedule. The server's scheduled task system performs the scheduled job formulated by the index rebuild determination module 120 based on the python code interfacing to the server's scheduled task system.
Specifically, the server reorganizes or regenerates the index during the database index reconstruction process.
Specifically, reorganizing the index includes reordering data in an outermost data page of the database and compressing the index page, thereby reducing the amount of data in the index.
Regenerating the index includes deleting the original index of the database and generating a new index. In the process of regenerating the index, all index fragments in the original index are deleted, and the logical ordering of the newly generated index is consistent with the physical ordering of the data storage in the database, so that the index retrieval speed is greatly improved.
When the server rebuilds the database index, one of reorganizing the index and regenerating the index is judged to be specifically selected according to the state of the current database to rebuild the database index. For example, compared with the last database index reconstruction, when the updated data amount in the database exceeds 10% of the total data amount, the database index reconstruction is performed by adopting a regeneration index mode; compared with the last database index reconstruction, when the updated data volume in the database is less than or equal to 10% of the total data volume, the database index reconstruction is performed by adopting a reorganization index mode.
Specifically, in steps S231 to S260, a period table (session_info) is created. The period table (session_info) includes the columns: hostinfo, dbname, week, weekday, hour, minute, sessions, esti_load. week, weekday, hour, minute column data is divided by 52 weeks and converted into weeks, days, hours, minutes.
And acquiring a front-end service interface connection number predicted value of a time period of one month in the future output by the traffic trough time period prediction model, and counting the sessions of a time period table (session_info).
Judging whether the value of the session exceeds the connection number threshold, and if the value of the session exceeds the connection number threshold, recording high in the esti_load of the period table (session_info). If the value of the session does not exceed the connection number threshold, a low is recorded in the esti_load of the session_info.
Each value of esti_duration in the database table (index_info) is matched to the period of time when esti_load is low. Specifically, the esti_duration in the database table (index_info) is converted into minutes, and the continuous time with the esti_load being low constitutes a time interval, and the time interval is recorded to the esti_starttime and the esti_endtime of the database table (index_info). When the esti_duration is less than the esti_endtime-esti_starttime, the esti_starttime is taken as the starting time of the index rebuild.
An index rebuild script is created to add the hostinfo, rebuild _sql (fixed syntax+index name: rebuild index concurrently indexname, for example) and esti_starttime times of the index to the job schedule (crontab).
Further, it is contemplated that the index rebuild determination module 120 is based on a predicted operable period (a predicted low-valley period of traffic) for performing database index rebuild and further selects an operable period for performing database index rebuild.
Therefore, in order to avoid that the available working time period for rebuilding the database index is mispredicted, the rebuilding of the database index affects the normal operation of other database services, after S260, before the server rebuilds the database index in the available working time period, it is necessary to confirm whether the load state of the server implementing the database service meets the requirement of rebuilding the database index.
Specifically, after step S260, the server performs the following steps:
at the beginning of the reconstructable job period, the current load condition of the server is obtained.
And judging whether the current load condition of the server meets the execution requirement of reconstructing the database index.
And when the current load condition of the server meets the execution requirement of the rebuilding database index, rebuilding the database index in the rebuildable operation time period. .
And when the current load condition of the server does not meet the execution requirement of the reconstruction database index, canceling the database index reconstruction operation.
Specifically, in one embodiment, it is determined whether the current server load value (e.g., the system resource occupancy percentage of the current load) exceeds a preset X value (X value is defined according to the traffic busyness, e.g., 60%). When the current server load value exceeds a preset X value, judging that the current load condition of the server does not meet the execution requirement of reconstructing the database index.
Specifically, the server invokes a job scheduling (crontab) to execute an index rebuilding script, the index rebuilding script queries the current server load (load average), when load average > X, the job of the index rebuilding script is canceled, otherwise, the index rebuilding script is executed (index rebuild_sql is executed). And writing the actual reconstruction start time and the end time into the starttime and the endtime of the index_info table.
Further, in another embodiment, the current load condition of the database server is detected, and if the load in the server 5m is lower than 1 and the number of server connections is lower than 100, it is determined that the current load condition of the server meets the execution requirement of rebuilding the database index.
According to the method, whether the database index needs to be rebuilt or not can be automatically judged by the rebuilding judgment model, and the operation time period for rebuilding the database index is optimized based on the rebuilding duration prediction model and the flow low-valley time period prediction model, so that timeliness of rebuilding the database index is improved, and the fact that the rebuilding of the database index affects execution of other business of the database is effectively avoided.
In the description of the embodiments of the present application, for convenience of description, the apparatus is described as being divided into various modules by functions, where the division of each module is merely a division of a logic function, and the functions of each module may be implemented in one or more pieces of software and/or hardware when the embodiments of the present application are implemented.
In particular, the apparatus according to the embodiments of the present application may be fully or partially integrated into one physical entity or may be physically separated when actually implemented. And these modules may all be implemented in software in the form of calls by the processing element; or can be realized in hardware; it is also possible that part of the modules are implemented in the form of software called by the processing element and part of the modules are implemented in the form of hardware. For example, the detection module may be a separately established processing element or may be implemented integrated in a certain chip of the electronic device. The implementation of the other modules is similar. In addition, all or part of the modules can be integrated together or can be independently implemented. In implementation, each step of the above method or each module above may be implemented by an integrated logic circuit of hardware in a processor element or an instruction in a software form.
Those of ordinary skill in the art will appreciate that the various elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as a combination of electronic hardware, computer software, and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
It will be clearly understood by those skilled in the art that, for convenience and brevity of description, specific working procedures of the apparatus, the apparatus and the units described above may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein.
It will be apparent to those skilled in the art that embodiments of the present application may be provided as a method, apparatus, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media having computer-usable program code embodied therein.
In several embodiments provided herein, any of the functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application.
Specifically, in an embodiment of the present application, there is further provided a computer readable storage medium, where a computer program is stored, when the computer program is executed on a computer, to cause the computer to perform the method provided in the embodiment of the present application.
An embodiment of the present application also provides a computer program product comprising a computer program which, when run on a computer, causes the computer to perform the method provided by the embodiments of the present application.
The description of embodiments herein is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (devices), and computer program products according to embodiments herein. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In the embodiments of the present application, the term "at least one" refers to one or more, and the term "a plurality" refers to two or more. "and/or", describes an association relation of association objects, and indicates that there may be three kinds of relations, for example, a and/or B, and may indicate that a alone exists, a and B together, and B alone exists. Wherein A, B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship. "at least one of the following" and the like means any combination of these items, including any combination of single or plural items. For example, at least one of a, b and c may represent: a, b, c, a and b, a and c, b and c or a and b and c, wherein a, b and c can be single or multiple.
In the present embodiments, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.
The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
All embodiments in the application are described in a progressive manner, and identical and similar parts of all embodiments are mutually referred, so that each embodiment mainly describes differences from other embodiments. In particular, for the device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments in part.
The foregoing is merely a specific embodiment of the present application, and any person skilled in the art may easily think of changes or substitutions within the technical scope of the present application, and should be covered in the scope of the present application. The protection scope of the present application shall be subject to the protection scope of the claims.

Claims (10)

1. A method of reconstructing a database index, the method being applied to an electronic device, the method comprising:
collecting relevant information of a database, wherein the relevant information of the database comprises the following steps: the method comprises the steps of monitoring information of the index size of a database, the index scanning times of the database, the index fragmentation rate of the database, the index density of the database, the processor configuration of a server for realizing the database, the number of read-write operations which can be performed by the server per second, the memory size of the server, the memory parameter setting of the server and the front-end service interface connection number of the database;
Invoking a reconstruction judgment model, wherein the reconstruction judgment model is used for judging whether database index reconstruction is needed according to the index fragment rate and the index scanning times;
inputting the index scanning times and the index fragment rate into the reconstruction judgment model to obtain a judgment result output by the reconstruction judgment model, wherein the judgment result comprises a database index reconstruction required or a database index reconstruction not required;
ending the flow of the method for reconstructing the database index when the judgment result is that the database index reconstruction is not needed;
when the judging result is that the database index reconstruction is needed, a reconstruction time length prediction model is called, wherein the reconstruction time length prediction model is used for predicting the prediction time consumption of the server for carrying out the database index reconstruction according to the index size, the index density, the processor configuration of the server, the number of read-write operations which can be carried out per second by the server, the memory size of the server and the memory parameter setting of the server;
inputting the index size, the index density, the processor configuration, the number of read-write operations which can be performed by the server per second, the memory size of the server and the memory parameter setting of the server in the related information of the database into the reconstruction duration prediction model, and obtaining the predicted time consumption output from the reconstruction duration prediction model;
When the judging result is that the database index reconstruction is needed, a flow low-valley time period prediction model is called, wherein the flow low-valley time period prediction model is used for generating a prediction result of the follow-up change of the front-end service interface connection number of the database according to the monitoring information of the front-end service interface connection number of the database;
inputting monitoring information of the front-end service interface connection number of the database into the flow off-peak time period prediction model, and obtaining the prediction result output from the flow off-peak time period prediction model;
predicting a plurality of operable time periods of the server according to the prediction result, wherein the time period of which the front-end service interface connection number of the database is smaller than a preset connection number threshold value is the operable time period of the server;
selecting a reconstructable operation time period from the predicted plurality of operation time periods, wherein the reconstructable operation time period is a time-consuming operation time period with a first duration greater than or equal to the predicted time-consuming operation time period after the current time;
and indicating the server to reconstruct the database index in the reconstructable operation time period.
2. The method of claim 1, wherein the database index rebuild comprises:
Reorganizing an index, wherein the reorganizing the index comprises reordering data in an outermost data page of the database and compressing an index page;
or,
regenerating an index, wherein regenerating the index comprises deleting an original index of the database and generating a new index.
3. The method of claim 1, wherein the reconstruction judgment model is a decision tree algorithm model;
the method further comprises the steps of:
acquiring historical data of the database, wherein the historical data of the database comprises index scanning frequency historical records and index fragment rate historical records of the database and/or other databases, the index scanning frequency historical records comprise index scanning frequency record values and corresponding time periods, and the index scanning frequency historical records comprise index fragment rate record values and corresponding time periods;
generating a plurality of characteristic values according to the historical data of the database, wherein each characteristic value comprises an index scanning frequency record value and an index fragment rate record value which correspond to the same time period aiming at the same database;
adding a mark for executing the database index reconstruction to the characteristic value, or adding a mark for not executing the database index reconstruction to the characteristic value to generate a plurality of training samples;
And performing model training by using a decision tree algorithm and using a machine learning platform by using the plurality of training samples to obtain the reconstruction judgment model.
4. A method according to claim 3, characterized in that the method further comprises:
acquiring the database index reconstruction history record, wherein the database index reconstruction history record comprises occurrence time corresponding to the execution of the database index reconstruction;
the adding a flag for the eigenvalue that needs to perform the database index rebuilding, or adding a flag for the eigenvalue that does not need to perform the database index rebuilding, to generate a plurality of training samples includes:
based on the database index rebuilding history record, confirming whether the database index rebuilding is executed or not in a time period corresponding to the characteristic value;
when the database index reconstruction is executed in the time period corresponding to the characteristic value, adding a mark for executing the database index reconstruction for the characteristic value;
and when the database index reconstruction is not executed within the time period corresponding to the characteristic value, adding a mark which does not need to execute the database index reconstruction to the characteristic value.
5. A method according to claim 3, wherein the adding a marker for the eigenvalue that is needed to perform the database index rebuild or a marker for the eigenvalue that is not needed to perform the database index rebuild to generate a plurality of training samples comprises:
when the index fragment rate record value is larger than a preset fragment rate threshold value and the index scanning frequency record value is larger than a preset scanning frequency threshold value, adding a mark for executing the database index reconstruction for the characteristic value;
and when the index fragment rate record value is smaller than or equal to a preset fragment rate threshold value and/or the index scanning frequency record value is smaller than or equal to a preset scanning frequency threshold value, adding a mark which does not need to execute the database index reconstruction to the characteristic value.
6. The method of claim 1, wherein the reconstruction period prediction model is a multiple linear regression algorithm model;
the method further comprises the steps of:
acquiring database index reconstruction histories of the database and/or other databases, wherein the database index reconstruction histories comprise database index reconstruction time length record values and occurrence time of executing database index reconstruction;
Acquiring database historical data, wherein the database historical data comprises index size historical records, index density historical records, processor configuration records of a server, read-write operation times records which can be performed by the server every second, server memory size records and server memory parameter setting records, the index size historical records comprise index size record values and corresponding time periods, the index density historical records comprise index density record values and corresponding time periods, the processor configuration records of the server comprise processor configuration record values and corresponding time periods of the server, the read-write operation times records which can be performed by the server every second comprise read-write operation times records which can be performed by the server every second and corresponding time periods, the server memory size records comprise server memory size record values and corresponding time periods, and the server memory parameter setting records comprise server memory parameter setting record values and corresponding time periods;
generating a plurality of training samples according to the database index reconstruction history record, wherein each training sample comprises a database index reconstruction duration record value;
Adding feature values to the plurality of training samples according to the database history data, wherein: the characteristic values comprise the index density record value, the index size record value, a processor configuration record value of the server, a read-write operation number record value which can be carried out by the server every second, a memory size record value of the server and a memory parameter setting record value of the server; in the same training sample, the database index rebuilding time length record value, the index density record value, the index size record value, the server processor configuration record value, the server reading and writing operation times record value which can be carried out every second, the server memory size record value and the server memory parameter setting record value correspond to the same database and the same time period;
and performing model training by using a multiple linear regression algorithm and a machine learning platform by using the plurality of training samples to obtain the reconstruction duration prediction model.
7. The method of claim 1, wherein the traffic trough period prediction model is a K nearest neighbor regression algorithm model;
The method further comprises the steps of:
acquiring a monitoring history of the front-end service interface connection number of the database and/or other databases, wherein the monitoring history of the front-end service interface connection number comprises a front-end service interface connection number record value and a corresponding time period;
acquiring host access history data of the database and/or other databases, wherein the host access history data comprises host addresses for accessing the database and/or other databases and occurrence time of the access;
generating a training sample set comprising a plurality of training samples according to the monitoring history of the front-end service interface connection number, wherein each training sample comprises a front-end service interface connection number record value;
adding a characteristic value to the training sample according to the host access history data, wherein: the characteristic value comprises the host address and the occurrence time of the access; the host address and the front-end service interface connection number record value in the same training sample correspond to the same database, and the access time is within a time period corresponding to the front-end service interface connection number record value;
and performing model training by using a K nearest neighbor regression algorithm and a machine learning platform by using the training sample set to obtain the traffic trough time period prediction model.
8. The method of claim 7, wherein the model training using the training sample set to obtain the traffic trough period prediction model comprises:
splitting the training sample set into a plurality of sub-training sample sets based on different services of the database, wherein each sub-training sample set corresponds to one service of the database;
model training is performed by using the sub-training sample set to obtain the traffic trough period prediction model corresponding to the traffic of the database.
9. The method of any of claims 1-8, wherein the instructing the server to perform the database index rebuild after the rebuildable job time period, the method further comprises:
acquiring the current load condition of the server at the starting time of the reconstructable operation time period;
and when the current load condition of the server meets the execution requirement of the rebuilding database index, rebuilding the database index in the rebuildable operation time period.
10. An electronic device comprising a memory for storing computer program instructions and a processor for executing the computer program instructions, wherein the computer program instructions, when executed by the processor, trigger the electronic device to perform the method steps of any of claims 1-9.
CN202211614518.5A 2022-12-13 2022-12-13 Method for reconstructing database index and electronic equipment Pending CN116303417A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211614518.5A CN116303417A (en) 2022-12-13 2022-12-13 Method for reconstructing database index and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211614518.5A CN116303417A (en) 2022-12-13 2022-12-13 Method for reconstructing database index and electronic equipment

Publications (1)

Publication Number Publication Date
CN116303417A true CN116303417A (en) 2023-06-23

Family

ID=86822819

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211614518.5A Pending CN116303417A (en) 2022-12-13 2022-12-13 Method for reconstructing database index and electronic equipment

Country Status (1)

Country Link
CN (1) CN116303417A (en)

Similar Documents

Publication Publication Date Title
CN110287052B (en) Root cause task determination method and device for abnormal task
CN109271321B (en) Method and device for counting contribution code number
CN109271435B (en) Data extraction method and system supporting breakpoint continuous transmission
CN111381970B (en) Cluster task resource allocation method and device, computer device and storage medium
CN114661994B (en) User interest data processing method and system based on artificial intelligence and cloud platform
CN118152124A (en) Data processing method and system based on cloud computing
CN111125018B (en) File exception tracing method, device, equipment and storage medium
CN112559641B (en) Pull chain table processing method and device, readable storage medium and electronic equipment
CN112395307A (en) Statement execution method, statement execution device, server and storage medium
CN113901037A (en) Data management method, device and storage medium
CN115329011A (en) Data model construction method, data query method, data model construction device and data query device, and storage medium
CN116303417A (en) Method for reconstructing database index and electronic equipment
US20150112947A1 (en) System and method for database flow management
CN111435327B (en) Log record processing method, device and system
CN108664499B (en) Data storage method, device and equipment
CN113268477B (en) Data table cleaning method and device and server
CN112765170B (en) Embedded time sequence data management method and device
CN104317820A (en) Statistical method and device of report
CN115759742A (en) Enterprise risk assessment method and device, computer equipment and storage medium
CN113419957A (en) Rule-based big data offline batch processing performance capacity scanning method and device
CN117785272B (en) Code statistics method and system
CN113362007B (en) Zero inventory record processing method
CN116610729B (en) Database intelligent statistical information management method, system, equipment and medium
CN111324434B (en) Configuration method, device and execution system of computing task
US20180046966A1 (en) System and method for analyzing and prioritizing issues for automation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination