WO2023000766A1 - 数据库模型动态调整的方法、设备及存储介质 - Google Patents

数据库模型动态调整的方法、设备及存储介质 Download PDF

Info

Publication number
WO2023000766A1
WO2023000766A1 PCT/CN2022/091510 CN2022091510W WO2023000766A1 WO 2023000766 A1 WO2023000766 A1 WO 2023000766A1 CN 2022091510 W CN2022091510 W CN 2022091510W WO 2023000766 A1 WO2023000766 A1 WO 2023000766A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
database model
database
model
current
Prior art date
Application number
PCT/CN2022/091510
Other languages
English (en)
French (fr)
Inventor
王爱军
郑星权
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2023000766A1 publication Critical patent/WO2023000766A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases

Definitions

  • the embodiments of the present application relate to but are not limited to the field of database technology, and in particular, relate to a method, device and storage medium for dynamically adjusting a database model.
  • the database model constructed by this method is usually not applicable to the management of various types of data. For example, in managing the performance data of network elements of various standards, there will be deviations in the runtime data and models, which will cause the database to fail to perform well.
  • Embodiments of the present application provide a method, device and storage medium for dynamically adjusting a database model.
  • the embodiment of the present application provides a method for dynamically adjusting the database model, including: optimizing the data processing process corresponding to the managed data stored in the database model, and obtaining the data processing process corresponding to the optimized data processing process.
  • the current measurement index data perform data analysis on the current measurement index data to obtain an analysis result; drive the database model to perform data adjustment according to the analysis result; perform the optimization process on the adjusted database model to obtain Updating the current metric data until the current metric data is the same as the preset target metric data.
  • the embodiment of the present application also provides a device, including: a memory, a processor, and a computer program stored on the memory and operable on the processor, when the processor executes the computer program, the above-mentioned
  • the database model dynamic adjustment method is provided.
  • the embodiment of the present application further provides a computer-readable storage medium storing computer-executable instructions, and the computer-executable instructions are configured to execute the method for dynamically adjusting the database model as described above.
  • FIG. 1 is a flowchart of a method for dynamically adjusting a database model provided by an embodiment of the present application
  • Fig. 2 is a flow chart of optimizing the data processing process in the method for dynamically adjusting the database model provided by one embodiment of the present application;
  • Fig. 3 is a flow chart of optimizing the data processing process in the method for dynamically adjusting the database model provided by another embodiment of the present application;
  • Fig. 4 is a flow chart of obtaining time series distribution results in the method for dynamically adjusting the database model provided by one embodiment of the present application;
  • FIG. 5 is a flow chart of partitioning managed data in a method for dynamically adjusting a database model provided by an embodiment of the present application
  • FIG. 6 is a flow chart of data adjustment of the database model in the method for dynamically adjusting the database model provided by an embodiment of the present application;
  • FIG. 7 is a flow chart of obtaining resource distribution results in a method for dynamically adjusting a database model provided by another embodiment of the present application.
  • Fig. 8 is a schematic diagram of an apparatus for dynamically adjusting a database model provided by an embodiment of the present application.
  • the management and analysis of massive data is usually based on a database model, and the modeling of the database is driven by a rule-based model.
  • the database model constructed by this method is usually not applicable to the management of various types of data. It is understandable that due to the difference in the data volume and data type of the managed data, the database model will also have differences in the data processing process of the managed data. Therefore, the database model built based on a rule-driven model is usually It cannot be applied to various scene environments.
  • a database model is built based on a rule-based model (that is, it can be understood as a database model built based on expert experience)
  • the database model When directly used as a benchmark model, it is easy to have deviations in the runtime data and the model, which will cause the database to fail to perform well, and it will easily lead to low efficiency of the application system in the managed data, unreasonable resource usage and configuration, etc.
  • the present application provides a method, device and storage medium for dynamically adjusting the database model.
  • the performance of the optimized and adjusted database model is better, and it can be applied to multiple.
  • the management and analysis of various types of data solves the problem that the database cannot perform well due to deviations in the runtime data and models in related technologies.
  • this application mainly optimizes the managed data stored in the database model from two stages.
  • the first stage is to optimize the data processing process corresponding to the managed data stored in the database model;
  • the second stage is to adjust the data of the database model for optimization in terms of the data processing efficiency of the managed data.
  • a database model is constructed in a model-driven manner, and the database model can be used as a benchmark model. It can be understood that the database model (that is, the benchmark model) can be obtained through the following steps.
  • the network element performance model includes model elements (that is, network element performance data). It can be understood that the relationship between model elements can be established from a service perspective.
  • the model elements may specifically include network element standards, managed objects, measurement types, indicators, counters, table definitions corresponding to databases, etc., which are not specifically limited in this embodiment; it can be understood that the above indicators can be network element performance index. Since the model elements in the network element performance model are not related to the database model to be constructed, after the network element performance model is established, the database model needs to be constructed in a model-driven manner. That is, by analyzing the network element performance model, the model elements in the network element performance model are mapped to database objects. It can be understood that the network element performance model can be generated based on expert experience, or a model trained by importing a network of the same type can be used as the network element performance model.
  • database objects include but are not limited to: databases, logical sub-libraries, database tables, data tables, partitions, views, indexes, custom functions, etc., which are not specifically limited in this embodiment.
  • these model elements can be defined through the corresponding table definition of the database, so as to establish a mapping relationship between the model elements and the database objects in the database model.
  • the mapping relationship is established by defining the network element standard in the model element in a certain database or a certain data table in the database model.
  • mapping it to the database model is the Data Definition Language (Data Definition Language, DDL) operation on database objects, such as the construction of logical sub-libraries, data tables, indexes, custom functions, views, and partitions and maintenance; or, mapping to the database model is a data manipulation language (Data Manipulation Language, DML) operation on database objects, such as adding, querying, deleting, and updating data tables.
  • DDL Data Definition Language
  • DML Data Manipulation Language
  • a data table includes one thousand fields, and the user only pays attention to one hundred fields at a certain time. Through the view, the corresponding data of the one hundred fields can be established. display data.
  • the managed data in this embodiment of the present application includes database objects. Build the database model from the database objects by using the database objects as the starting point of the database model.
  • this application uses the network element performance model to further construct the database model.
  • the database model dynamic adjustment method of the application is used to optimize the data processing process of the database model and perform data processing of the database model. Adjustment, so as to obtain the optimized and adjusted database model, which can exert better performance.
  • FIG. 1 is a flowchart of a method for dynamically adjusting a database model provided by an embodiment of the present application.
  • the method for dynamically adjusting a database model includes but is not limited to the following steps S100 to S400 .
  • Step S100 optimize the data processing process corresponding to the managed data stored in the database model, and obtain the current metric data corresponding to the optimized data processing process.
  • step S200 data analysis is performed on the current measurement index data to obtain an analysis result.
  • Step S300 drive the database model to adjust data according to the analysis result.
  • Step S400 optimize the database model after data adjustment to update the current metric data until the current metric data is the same as the preset target metric data.
  • the data processing process of the managed data can be constrained, so that it can be kept in a continuously optimized state.
  • the corresponding current metric data will also change; therefore, after the data processing process is optimized, the data corresponding to the optimized data processing process can be obtained through calculation.
  • Current metric data Then, data analysis is performed on the current measurement index data, and the analysis results can be obtained. It can be understood that in this embodiment, based on the constructed database model as the benchmark model, the current metric data corresponding to the optimized data processing process is collected, and continuous machine learning and analysis are performed on the current metric data. Analysis, to establish a mechanism to drive the database model to dynamically adjust according to the analysis results of the current metric data corresponding to the data processing process during the runtime of the database model.
  • the database model is driven to perform data adjustment, so that the managed data stored in the database model can be dynamically adjusted.
  • the above-mentioned optimization process is performed on the database model after data adjustment, that is, return to step S100 to update the current metric data until the current metric data is the same as the preset target metric data, so that the optimized and adjusted database can be obtained Model.
  • the data adjustment of the database model can be set to be automatically adjusted by the program or manually adjusted.
  • the purpose is to drive the database model to adjust the data through the analysis results, and then verify by obtaining the current metric data corresponding to the data processing process.
  • the quality of the database model makes the current metric data corresponding to the managed data in the finally optimized and adjusted database model optimal.
  • the finally optimized and adjusted database model can be applied to the same type of scenario environment, and can also be applied to various personalized scenarios.
  • the data type of the managed data is the same, and/or the data amount of the managed data is the same.
  • the database model adopted also corresponds to The management level of 2000 data volumes; it is also applicable to apply the optimized and adjusted database model copy to other areas that also have 2000 data volumes and the same data type.
  • the management level of the database model can be distinguished by the size of the data volume.
  • management level of the database model can be: 2000 data volumes are one management level, and 20,000 data volumes are another management level. A management level; or, in a narrow sense, 2,000 data volumes are one management level, and 2,001 data volumes are another management level, etc.
  • the database can be The model is optimized and adjusted to apply to the above-mentioned personalized scenarios.
  • the data types of managed data for example, for different types of mobile communication systems such as 2G/3G/4G/5G, there will also be differences between the corresponding database models.
  • the finally optimized and adjusted database model can be applied to the above mobile communication system.
  • the data types of the managed data are the same, but the data volumes corresponding to the database objects in the managed data are different.
  • the corresponding database models will also be different.
  • the performance of the database model is better, and it can be applied to the management and analysis of various types of data, which solves the problem of the deviation in the runtime data and the model in the related technology, which leads to the inability of the database to perform A question of better performance.
  • managed data stored in the database model is not limited to network element performance data, and may also be other performance data, including but not limited to: metrics data corresponding to all managed systems.
  • FIG. 2 is a flow chart of optimizing a data processing process provided by an embodiment of the present application. It can be understood that the optimization of the data processing process corresponding to the managed data stored in the database model in step S100 includes but is not limited to the following step S110.
  • Step S110 acquiring the algorithmic complexity corresponding to the data processing process, and optimizing the algorithmic complexity that does not meet the first rule; wherein, the first rule includes that the algorithmic complexity is less than or equal to a preset threshold.
  • the algorithm complexity corresponding to the data processing process is specifically optimized based on the first rule.
  • the data processing procedures corresponding to the managed data stored in the database model usually involve many types, so in some embodiments, some data processing procedures with high usage frequency can be extracted for data analysis.
  • the data processing process includes, but is not limited to: an extraction process, a conversion process, a loading process, a query process, a summarization process, and a supplementary summarization process. Therefore, the algorithm complexity corresponding to the data extraction process, conversion process, loading process, query process, summary process or supplementary summary process can be optimized based on the first rule, that is, the algorithm complexity that does not meet the first rule For optimization processing, it can be understood that the algorithm complexity after optimization processing should be less than or equal to the preset threshold.
  • an O(n) function is usually used to characterize the algorithm complexity, that is, the algorithm complexity can be made less than or equal to the preset O(n).
  • the data processing process corresponding to the algorithmic complexity that does not meet the first rule is continuously optimized to reduce its algorithmic complexity until it is less than or equal to O(n), and the data processing process in this embodiment can be kept in a continuously optimized state.
  • FIG. 3 is a flow chart of optimizing a data processing process provided by another embodiment of the present application. It can be understood that the optimization of the data processing process corresponding to the managed data stored in the database model in step S100 also includes but is not limited to the following step S120.
  • Step S120 acquiring the structured query language corresponding to the data processing process, and optimizing the structured query language that does not meet the second rule.
  • the second rule includes at least one of the following: data redistribution or broadcasting is less than or equal to the set threshold; the index level is less than or equal to the index level threshold; the number of nested loops is 0; the number of overflow files is 0; the query involves The partition is smaller than the partition threshold; the filtering rate of the partition involved in the query is smaller than the filtering rate threshold; the number of data skew is 0; the number of data filtering by function is 0.
  • this embodiment needs to optimize the structured query language corresponding to the data processing process based on the second rule.
  • the data processing process includes but is not limited to: extraction process, conversion process, loading process, query process, summary process and supplementary summary process, etc. Therefore, the data extraction process, conversion process, etc.
  • the structured query language corresponding to the loading process, the query process, the summary process or the supplementary summary process is optimized based on the second rule, that is, the structured query language that does not meet the second rule is optimized.
  • some structured query languages that are frequently used may be extracted for data analysis.
  • algorithms such as machine learning clustering and classification are used to extract the above-mentioned structured query language.
  • the execution information corresponding to the structured query language includes, but is not limited to: data redistribution or broadcast, index hierarchy, nested loops, overflow files, involved partitions, data skew information, data filter conditions, and the like. It can be understood that the above structured query language needs to follow the best practice rules of the database, so it is necessary to optimize the structured query language that does not comply with the second rule.
  • the second rule includes at least one of the following: data redistribution or broadcasting is less than or equal to the set threshold; the index level is less than or equal to the index level threshold, for example, the index level threshold is 2 levels; the number of nested loops is the loop number threshold, for example, the loop number The threshold is 0, which means that there should be no nested loops; the number of overflow files is the threshold of overflow times, for example, the threshold of overflow times is 0, which means that overflow files are avoided; the partition involved in the query is smaller than the partition threshold, for example, for a data table , which includes multiple partitions (for example, 100 partitions).
  • the partitions involved in the query In order to avoid too many partitions involved in the query that do not comply with the second rule, it is usually necessary to set the partitions involved in the query to be smaller than the partition threshold. If when executing the query process, all partitions (such as 100 partitions) are directly scanned for the whole table, or 80% of the partitions (such as 80 partitions) are scanned, it means that the query involves too many partitions, and If it does not meet the second rule, it needs to be optimized until it meets the second rule; the filtering rate of the partition involved in the query is less than the filtering rate threshold. For example, the filtering rate of the partition involved in the query should be low.
  • the number of data skews is the threshold of the number of tilts, for example, the threshold of the number of tilts is 0, which means that there is no skew in the data.
  • the data skew information in this embodiment includes data The number of tilts; and the number of times the data is filtered by the function is the threshold of the number of filters. For example, the threshold of the number of filters is 0, which means that the data is not filtered by the function.
  • the data filtering conditions of this embodiment include the number of times the data is filtered by the function, and so on.
  • data analysis is performed on the structured query language corresponding to the data processing process, so as to optimize the structured query language that does not conform to the second rule, and can ensure that the corresponding current measurement index in the same data processing process Data can be optimized.
  • the algorithm complexity and structured query language corresponding to the data processing process can also be obtained separately, so as to optimize the algorithm complexity and structured query language respectively.
  • the structured query language generally includes Data Query Language (Data Query Language, DQL), Data Manipulation Language (Data Manipulation Language, DML), Transaction Processing Language (Transaction Process Language, TPL) , Data Control Language (Data Control Language, DCL), Data Definition Language (Data Definition Language, DDL) and pointer control language (Cursor Control Language, CCL).
  • Data Query Language Data Query Language, DQL
  • Data Manipulation Language Data Manipulation Language
  • TPL Transaction Processing Language
  • Data Control Language Data Control Language
  • DCL Data Definition Language
  • DDL Data Definition Language
  • pointer control language CCL
  • This embodiment specifically performs data analysis on the data definition language and data operation language related to the database model. For example, the data definition language and the data operation language corresponding to the managed data stored in the database model are obtained respectively, so as to optimize the data definition language and the data operation language respectively.
  • this application mainly optimizes the managed data stored in the database model from two stages.
  • the above embodiments specifically describe the first stage, that is, to optimize the algorithm complexity and structured query language corresponding to the data processing process corresponding to the managed data.
  • the second stage of this embodiment will be described in detail below, which is to adjust the data of the database model for optimization in terms of the data processing efficiency of the managed data. Specifically, data adjustment is performed on the database model from the aspect of the metric data corresponding to the managed data for optimization.
  • the current measurement index data may include current efficiency index data and current resource usage index data. That is, the measurement index data includes efficiency index data and resource usage index data.
  • the current measurement index data includes the current efficiency index data
  • the specific description of the embodiment of the present application is as follows.
  • FIG. 4 is a flow chart for obtaining time series distribution results provided by an embodiment of the present application.
  • the step S200 of performing data analysis on the current metric data to obtain the analysis results specifically includes but is not limited to the following step S210.
  • Step S210 performing data analysis on the current efficiency index data in the time dimension to obtain time series distribution results.
  • the current efficiency index data corresponding to the data processing process can be represented in a time series manner. Therefore, in this embodiment, time series distribution results can be obtained by performing data analysis on the current efficiency index data in the time dimension.
  • the purpose of this setting is to facilitate further analysis of the time series distribution results to obtain the regular changes corresponding to the time series distribution results.
  • the regular change corresponding to the time series distribution result is associated with the preset partition range corresponding to the managed data stored in the database model. It can be understood that, in the data analysis of the current efficiency index data in the time dimension, different machine learning algorithms can be used to obtain the time series distribution results.
  • the method for dynamically adjusting the database model includes but does not It is limited to the following step S101.
  • Step S101 partition the managed data stored in the database model according to the corresponding preset partition range.
  • step S100 a database model needs to be constructed in a rule-model-driven manner, and this database model can be used as a benchmark model. the corresponding partition.
  • Such setting enables the benchmark model to partition the managed data according to the preset partition range, so as to improve efficiency.
  • the present application may perform data adjustment on the database model for optimization through the following embodiments.
  • An example is to collect the current efficiency index data corresponding to the data processing process of the optimized database model operation period; after that, analyze the current efficiency index data in the time dimension to obtain the time series distribution results, and then according to the time series
  • the distribution result drives the database model to adjust the data; optimize the adjusted database model to update the current efficiency index data until the current efficiency index data is the same as the preset target efficiency index data. Adjust the data of the database model through the time series distribution results corresponding to the current efficiency index data, so as to realize the dynamic adjustment of the managed data in the database model, and facilitate training to obtain an optimal database model.
  • the step S300 of driving the database model to perform data adjustment according to the analysis result includes, but is not limited to, the following steps S310 to S320.
  • Step S310 according to the time series distribution results, perform partition adjustment on the preset partition range to obtain the adjustment range.
  • step S320 data adjustment is performed on the managed data stored in the database model driven by the adjustment range.
  • the time series distribution results of this embodiment are mainly affected by factors such as long-term trend factor T, seasonal variation factor S, periodic variation factor (also called cyclic variation factor) C, and irregular variation factor I.
  • the long-term trend factor T represents the general trend of change formed by some fundamental factors in a long period of time
  • the seasonal change factor S represents a periodic fluctuation with a fixed length and amplitude caused by seasonal changes
  • the periodic change factor C means fluctuating up and down due to the influence of various factors
  • Irregular change factor I means an irregular change.
  • this embodiment needs to exclude the irregular variable factor I to extract the long-term trend factor T, seasonal variable factor S, and periodic variable factor C; According to the time series distribution results reflected by the variable factor C, the preset partition range is adjusted to obtain the adjustment range; then the managed data stored in the database model is driven to adjust the data according to the adjustment range, so as to realize the dynamic adjustment of the managed data in the database model.
  • Data management such as dynamically adjusting the number of logical sub-databases in the database model, the range of data table partitions, etc. according to the adjustment range.
  • the current metric index data includes the current resource usage index data
  • the specific description of the embodiment of the present application is as follows.
  • performing data analysis on the current metric data in step S200 to obtain an analysis result specifically includes but is not limited to the following step S220 .
  • Step S220 performing data analysis on the current resource usage index data to obtain resource distribution results.
  • resource usage indicator data includes at least one of the following: application resource usage, database resource usage, and storage device input/output resource usage. It can be understood that resource usage data may include: memory usage data, central processing unit (Central Processing Unit, CPU) usage data, input/output (Input/Output, I/O) response time data, input/output read and write traffic data, network traffic data, etc.
  • resource usage data may include: memory usage data, central processing unit (Central Processing Unit, CPU) usage data, input/output (Input/Output, I/O) response time data, input/output read and write traffic data, network traffic data, etc.
  • the data processing process corresponding to the managed data stored in the database model is optimized, and the current resource usage index data corresponding to the optimized data processing process is collected; data analysis is performed on the current resource usage index data , get the resource distribution result, drive the database model to adjust the data according to the resource distribution result, and finally optimize the database model after data adjustment to update the current resource usage index data until the current resource usage index data is consistent with the preset target resource usage
  • the indicator data is the same.
  • the quality of the database model is verified by collecting the current resource usage index data, so that the current resource usage index data corresponding to the managed data in the finally optimized and adjusted database model is optimal, such as obtaining the least resource usage , which means that the database model has been optimized and adjusted.
  • this embodiment uses complex algorithms in the data processing process corresponding to the managed data stored in the database model. optimization processing using degree and/or structured query language, and respectively collect the current efficiency index data and current resource usage index data corresponding to the optimized data processing process; at this time, this embodiment is in the second stage of optimization, and
  • the current efficiency index data is analyzed in the time dimension to obtain the time series distribution results;
  • the current resource usage index data is analyzed to obtain the resource distribution results; then according to the time series distribution results, the database model Adjust the preset partition range corresponding to the managed data stored in the database to obtain the adjustment range; then, according to the adjustment range and resource distribution results, drive the managed data stored in the database model to perform data adjustment; the database model after data adjustment Perform the above optimization process again to update the current efficiency index data and the current resource usage index data respectively until the current efficiency index data is the same as the preset target efficiency index data, and the current
  • a database model is further constructed by establishing a network element performance model, and the database model constructed above can be used as a benchmark model.
  • this application adjusts and optimizes the data of the database model (that is, the benchmark model) from the aspect of the measurement index data corresponding to the network element performance data.
  • the network element performance data includes, but is not limited to: managed network element standards, indicators and counters of managed objects under different standards, and the like.
  • the efficiency index data may also include network element performance indexes, usage frequency of counters, and the like.
  • the current efficiency index data can be characterized by the current time-consuming time (or current execution time) corresponding to the data processing.
  • this embodiment drives the database model for data adjustment through time series distribution results and resource distribution results, such as driving the number of logical sub-libraries in the database model, the definition of data tables, the range of data table partitions or indexes The definition and so on for data adjustment.
  • the optimal database model corresponding to managed data such as network element performance data can be finally obtained through training.
  • the method for dynamically adjusting a database model in the embodiment of the present application specifically corresponds to an apparatus for dynamically adjusting a database model.
  • the device includes but is not limited to: network element performance data extraction conversion loading module 100, network element performance data analysis module 200, data processing process optimization module 300, database model optimization module 400, database management and maintenance module 500 and analytical database module 600 .
  • the network element performance data extraction, conversion and loading module 100 is configured to obtain network element performance data, perform data cleaning and data conversion on the network element performance data, and send the network element performance data after data cleaning and data conversion to the analytical database module 600 .
  • the network element performance data analysis module 200 is configured to perform data statistics and data analysis on the network element performance data, and send the network element performance data after data statistics and data analysis to the analytical database module 600 .
  • the network element performance data extraction conversion loading module 100 and the network element performance data analysis module 200 are also configured to collect algorithm complexity and structured query language in the data processing process, and send the algorithm complexity and structured query language to
  • the data processing process optimization module 300 is also configured to collect the current metric data corresponding to the optimized data processing process, and send the current metric data to the database model optimization module 400; wherein, the current metric data includes the current Efficiency indicator data, current resource usage indicator data.
  • the data processing process optimization module 300 is configured to obtain the algorithmic complexity and structured query language sent by the network element performance data extraction, conversion and loading module 100 and the network element performance data analysis module 200, and perform an algorithmic complexity that does not meet the first rule. Optimize processing, and optimize the structured query language that does not meet the second rule; it is also set to maintain the database objects in the database model in a model-driven manner, where the database objects include databases, data tables, partitions, views, and indexes , custom function.
  • the database model optimization module 400 is configured to obtain the current metric index data corresponding to the data processing process sent by the network element performance data extraction conversion loading module 100 and the network element performance data analysis module 200, and perform data analysis on the current metric index data to obtain an analysis result, and send the analysis result to the database management and maintenance module 500 .
  • the database management and maintenance module 500 is configured to obtain the analysis result sent by the database model optimization module 400, and drive the database model to perform data adjustment according to the analysis result.
  • the analytical database module 600 is configured to obtain the network element performance data sent by the network element performance data extraction, transformation and loading module 100 after data cleaning and data conversion, and obtain the data statistics and data analysis sent by the network element performance data analysis module 200 network element performance data, and perform data storage and data management on the network element performance data; wherein, the analytical database module 600 includes a GBase database module and a Greenplum database module.
  • the network element performance data extraction conversion loading module 100 can collect network element performance data corresponding to the network element, such as obtaining network element performance data from a file transfer protocol (File Transfer Protocol, FTP) server; or obtaining network element performance data from a simple The network element performance data of the network management protocol (Simple Network Management Protocol, SNMP) server; or obtain the network element performance data from the socket (Socket) server, etc.
  • FTP File Transfer Protocol
  • SNMP Simple Network Management Protocol
  • Socket socket
  • the device in this embodiment may also include an application running module configured to support corresponding application running when the database model performs data adjustment; wherein the application running module includes a blade server module and a rack server module.
  • the network element performance data extraction conversion loading module 100 and the network element performance data analysis module 200 can collect current efficiency index data and current resource usage index data corresponding to the data processing process from the data processing efficiency of the network element performance data .
  • the data processing process includes: extraction process, conversion process, loading process, query process, summary process and supplementary summary process
  • the current efficiency index data corresponding to the data processing process can be expressed as: extraction process, conversion process, loading process
  • the current resource usage index data corresponding to the data processing process can be expressed as: current application resources such as current CPU usage or current memory usage, etc.
  • the network element performance data extraction conversion loading module 100 and the network element performance data analysis module 200 can also collect the frequency of queries corresponding to database tables and columns, or the distribution of performance data, etc. from the aspect of data analysis.
  • the database model optimization module 400 performs data analysis on the current measurement index data.
  • the machine learning algorithm is used to perform data analysis on the current efficiency index data to obtain the time series distribution results.
  • the time series distribution results are mainly It is affected by factors such as long-term trend factor T, seasonal variation factor S, cyclical variation factor C, and irregular variation factor I.
  • the current execution time corresponding to the summary process, loading process or query process of network element performance data may be analyzed.
  • the current efficiency index data (such as the current execution time) corresponding to the data processing process (such as the summary process) corresponding to the managed data (such as network element performance data) is established.
  • the relationship between the reflected time series distribution results (such as long-term trend factor T/seasonal variation factor S/periodic variation factor C) and the managed objects in the database model (such as logical sub-database, data table partition).
  • the number of indicators (such as network element performance indicators) and counters usually reaches tens of thousands, and different users can obtain and use their own information through query and analysis functions. Metrics and counters of interest.
  • the mapping to the database model is the query and analysis of database objects (such as database tables, etc.).
  • the usage frequency corresponding to the database objects (such as database tables) in the database model is also represented by time series.
  • the correlation between the data storage and data query of the network element performance data can be obtained, so as to provide a basis for data adjustment of the database model , such as providing a basis for structural adjustment of data tables, adjustment of operation frequency of data summary, creation and adjustment of intermediate tables, etc.
  • the database model optimization module 400 can also optimize the database model after data adjustment to update the current efficiency index data until the current efficiency index data is the same as the preset target efficiency index data, that is, through continuous Data analysis and data adjustment
  • the current efficiency index data corresponding to the optimized data processing process can ensure that the data processing function meets business requirements and the current resource usage index data is the target resource usage index data (such as resource usage does not increase) In this case, it is continuously judged whether the current efficiency index data becomes better, that is, whether the current efficiency index data is the same as the preset target efficiency index data. After iterative adjustment, an optimal database model is obtained through training.
  • the database model optimization module 400 verifies the quality of the database model through the current efficiency index data and the current resource usage index data, for example, the result reflected by the final current resource usage index data corresponding to the optimal database model is the least resource Use, the result reflected by the current efficiency index data is the optimal execution time or time-consuming time, etc.
  • the database management and maintenance module 500 performs data adjustment by driving the database model according to the analysis results; specifically, based on the rules corresponding to machine learning, the database model is continuously adjusted for data; such as adjusting the number of logical sub-databases, the range of data table partitions, and The structure of the table, the operation frequency of data summary, the creation of intermediate tables, etc.
  • Embodiment 1 This embodiment describes the process of data adjustment for the preset partition range corresponding to the database table, which includes but not limited to the following (1) to (5).
  • the current measurement index data in this embodiment includes current efficiency index data and current resource usage index data, and both the current efficiency index data and the current resource usage index data have time characteristics.
  • the efficiency index data includes summary time consumption, loading time consumption, query time consumption, etc.
  • resource usage index data includes application resource usage (such as CPU usage/memory usage), database resource usage (such as central processing unit occupancy/memory occupancy), storage device input/output resource usage (such as input/output per second (Input/Output Per Second, IOPS)/input and output wait (IOWait)), etc.
  • application resource usage such as CPU usage/memory usage
  • database resource usage such as central processing unit occupancy/memory occupancy
  • storage device input/output resource usage such as input/output per second (Input/Output Per Second, IOPS)/input and output wait (IOWait)
  • the database model optimization module 400 performs data analysis on the current efficiency index data in the time dimension to obtain the time series distribution results.
  • the time series distribution results are mainly affected by the long-term trend factor T, seasonal variation factor S, cycle variation factor C and irregularity Influenced by factors such as variable factor I, after excluding irregular variable factor I, the time series distribution results corresponding to long-term trend factor T/seasonal variable factor S/cyclical variable factor C are obtained.
  • the long-term trend factor T specifically presents an average trend
  • the seasonal variation factor S is consistent with the first preset partition range corresponding to the original data table
  • the periodic variation factor C is consistent with the second preset partition range corresponding to the summary table. unanimous.
  • the time-consuming time of summarization is specifically defined as the variable w; it can be understood that the maximum value of w Max(w) is less than the time-consuming threshold ⁇ defined by the business; the preset partition range is defined as the variable P, and the value range of P is [1,30]; wherein, the first preset partition range corresponding to the original data table is defined as variable P 1 , the first adjustment range corresponding to the original data table is defined as variable p 1 , and the second preset partition range corresponding to the summary table is defined as variable P 2 , and the second adjustment range corresponding to the summary table is defined as variable p 2 .
  • the units of P, P 1 , P 2 , p 1 , and p 2 are days.
  • the database management and maintenance module 500 is driven to adjust the data of the database model by the database model optimization module 400 based on the time series distribution result obtained in step (2), so as to optimize the database model Adjustment.
  • the first preset partition range P 1 corresponding to the original data table is usually set based on experience, and is usually set too large. Therefore, if the first preset partition range P 1 corresponding to the original data table>1, the first partition adjustment can be performed on the first preset partition range P 1 corresponding to the original data table to obtain the appropriate first partition range P 1 corresponding to the original data table.
  • the first adjustment range p 1 corresponding to the original data table can be stored if the input/output resources are used within the scope of the hardware capability, or the time-consuming time corresponding to other data processing processes does not deteriorate. It can be understood that, the above partition adjustment may be performed for the summary table, so as to obtain the appropriate second adjustment range p 2 corresponding to the summary table.
  • the original data table is partitioned by day and stores managed data at a granularity of 15 minutes
  • the summary table is partitioned by week and stored at a granularity of hours.
  • the managed data in the summary table is obtained by aggregating the managed data in the original data table, and then loading the managed data in the original data table into the summary table.
  • Data analysis is performed on the current efficiency index data corresponding to the summary table in the time dimension, and the time series distribution results are obtained.
  • the seasonal variation factor S corresponding to the summary table is 1 day
  • the periodic variation factor C is 1 week.
  • Rule 1 Since the original data table is partitioned by day, the amount of data corresponding to the managed data of the day gradually increases in the time dimension. Therefore, when summarizing by hour, the read The data range corresponding to the managed data gradually increases, which leads to an increase in the reading time, so the aggregation time increases.
  • the reason for rule 2 is that when the managed data is aggregated and loaded into the hour table, there will be duplicate data deletion and insertion operations, and the data range involved in the deletion operation depends on the corresponding data volume when the managed data is aggregated. Since the insertion operation will involve the operation of the index page, the index range will increase with the corresponding data volume when the managed data is summarized, resulting in Insert time-consuming increase. Therefore, the aggregation time for the same period of time on different days is gradually increasing.
  • Embodiment 2 This embodiment describes the process of data adjustment to the data table structure and data summary, which includes but not limited to the following (1) to (4).
  • the measurement index data includes the usage frequency of the index (such as the network element performance index) and the counter, the summary time range corresponding to the index (such as the network element performance index) and the counter, and the database table to which the index (such as the network element performance index) and the counter belong.
  • the database model optimization module 400 performs data analysis on the current measurement index data to obtain analysis results, and the analysis results include hot data distribution results, such as the distribution results of frequently used indicators, the distribution results of infrequently used indicators; and/or the high frequency
  • hot data distribution results such as the distribution results of frequently used indicators, the distribution results of infrequently used indicators; and/or the high frequency
  • the distribution results of used counters, the distribution results of low-frequency used counters, etc. so that the corresponding high-frequency used database tables and columns and low-frequency used databases can be obtained further according to the indicators (such as network element performance indicators) and the database tables to which the counters belong. tables and columns etc.
  • the database management and maintenance module 500 performs data adjustment on the database model, so as to optimize and adjust the database model. For example, the hourly summarization of performance data is applied to user's analysis of performance hourly data. For performance data that needs to be analyzed for the latest hourly data every hour, it needs to be summarized in time at 1-hour intervals.
  • performance data is summarized hourly and saved at a granularity of 15 minutes, that is, the performance data per hour corresponds to the current metric data; For the current time (such as 16:00), the user needs to analyze the data of the previous hour (such as 15:00 to 16:00) closest to the current time (such as 16:00), and it needs to be summarized in time at 1-hour intervals ; and for applications with little or no performance data involved in hourly data analysis, you can reduce the frequency of summarization and expand the summarization time range (for example, summarizing at 8-hour intervals) to improve the overall hourly summarization efficiency.
  • the performance data that the application does not care about can not be stored, and the database table structure can be adjusted.
  • this embodiment reduces the use of computing and storage resources of the application system by reducing the storage of performance data and the frequency of summary use, thereby improving the overall efficiency of the database model.
  • Embodiment 3 This embodiment describes the process of data adjustment to the database model in the database application field, which includes but not limited to the following (1) to (3).
  • the current measurement index data includes current efficiency index data and current resource usage index data.
  • the efficiency index data corresponding to data storage and data query (such as time-consuming), database resource usage (such as CPU usage/memory usage), storage device input/output resource usage (such as IOPS/IOWait), etc.
  • the current Metric data has temporal characteristics.
  • the analytical database is a cluster database constructed by multiple logical sub-libraries, the number of logical sub-libraries is directly proportional to the IOPS.
  • the IOPS generated by the data processing process does not reach the hardware capability range , you can increase the number of logical sub-libraries to increase IOPS; at the same time, IOWait should be within the corresponding target range, that is, within a reasonable range.
  • the IOPS and IOWait of this embodiment are within the corresponding target range, which will have a positive effect on the efficiency of the data processing process.
  • the method for dynamically adjusting the database model in this embodiment is a proactive system maintenance rather than problem-driven or manual maintenance, which can effectively improve the efficiency of system operation and maintenance.
  • Constructing and maintaining the database model through the model-driven method can improve the maintainability of the database model, improve the efficiency of version upgrade, and reduce the waiting time.
  • the trained database model can be exported and copied to the same type of system, reducing the training cost of the database model.
  • an embodiment of the present application also provides a device, which includes: a memory, a processor, and a computer program stored in the memory and operable on the processor.
  • the processor and memory can be connected by a bus or other means.
  • memory can be configured to store non-transitory software programs and non-transitory computer-executable programs.
  • the memory may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage devices.
  • the memory optionally includes memory located remotely from the processor, and these remote memories may be connected to the processor via a network. Examples of the aforementioned networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
  • the non-transitory software programs and instructions required to realize the method for dynamically adjusting the database model of the above-mentioned embodiment are stored in the memory, and when executed by the processor, the method for dynamically adjusting the database model in the above-mentioned embodiment is executed, for example, executing the above Described method steps S100 to S400 in Fig. 1, method step S110 in Fig. 2, method step S120 in Fig. 3, method step S210 in Fig. 4, method step S101 in Fig. 5, method step in Fig. 6 S310 to S320, the method step S220 in FIG. 7 .
  • the device embodiments described above are only illustrative, and the units described as separate components may or may not be physically separated, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • an embodiment of the present application also provides a computer-readable storage medium, the computer-readable storage medium stores computer-executable instructions, and the computer-executable instructions are executed by a processor or a controller, for example, by the above-mentioned Executed by a processor in the device embodiment, the above-mentioned processor can execute the method for dynamically adjusting the database model in the above-mentioned embodiment, for example, execute the method steps S100 to S400 in FIG. 1 and the method steps in FIG. 2 described above S110, method step S120 in FIG. 3 , method step S210 in FIG. 4 , method step S101 in FIG. 5 , method steps S310 to S320 in FIG. 6 , method step S220 in FIG. 7 .
  • the embodiment of the present application includes: optimizing the data processing process corresponding to the managed data stored in the database model, so that the data processing process of the managed data can be constrained, so that it can be kept in a continuously optimized state; After the data processing process of the pipe data is optimized, the corresponding current metric data will also change; therefore, after optimizing the data processing process, the current metric corresponding to the optimized data processing process can be obtained through calculation Index data; and then analyze the current measurement index data to obtain the analysis results; according to the analysis results, the database model is driven to adjust the data, and the managed data stored in the database model can be dynamically adjusted; after that, the adjusted database The model then performs the above optimization processing to update the current metric data until the current metric data is the same as the preset target metric data, so that an optimized and adjusted database model can be obtained.
  • the performance of the database model is good, it can be applied to the management and analysis of various types of data, and it solves the problem that the database cannot perform well due to the deviation in the runtime data and the model
  • Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disk (DVD) or other optical disk storage, magnetic cartridges, tape, magnetic disk storage or other magnetic storage devices, or can Any other medium used to store desired information and which can be accessed by a computer.
  • communication media typically embodies computer readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transport mechanism, and may include any information delivery media .

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种数据库模型动态调整的方法、设备及存储介质。方法包括对数据库模型中存储的被管数据对应的数据处理过程进行优化处理,得到经过优化处理后的数据处理过程所对应的当前度量指标数据(S100);对所述当前度量指标数据进行数据分析,得到分析结果(S200);根据所述分析结果,驱动所述数据库模型进行数据调整(S300);对经过数据调整后的数据库模型进行所述优化处理以更新所述当前度量指标数据,直到所述当前度量指标数据与预设的目标度量指标数据相同(S400)。

Description

数据库模型动态调整的方法、设备及存储介质
相关申请的交叉引用
本申请基于申请号为202110814308.X、申请日为2021年7月19日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。
技术领域
本申请实施例涉及但不限于数据库技术领域,尤其涉及一种数据库模型动态调整的方法、设备及存储介质。
背景技术
针对海量数据管理和分析通常是基于数据库模型。相关技术中,数据库的建模是基于规则的模型驱动,然而,通过这种方法构建的数据库模型,通常无法适用对多种类型数据的管理。例如,在管理多种制式网元的性能数据中,在运行期数据和模型中会存在偏差,进而导致数据库无法发挥较好性能。
发明内容
以下是对本文详细描述的主题的概述。本概述并非是为了限制权利要求的保护范围。
本申请实施例提供了一种数据库模型动态调整的方法、设备及存储介质。
第一方面,本申请实施例提供了一种数据库模型动态调整的方法,包括:对数据库模型中存储的被管数据对应的数据处理过程进行优化处理,得到经过优化处理后的数据处理过程所对应的当前度量指标数据;对所述当前度量指标数据进行数据分析,得到分析结果;根据所述分析结果,驱动所述数据库模型进行数据调整;对经过数据调整后的数据库模型进行所述优化处理以更新所述当前度量指标数据,直到所述当前度量指标数据与预设的目标度量指标数据相同。
第二方面,本申请实施例还提供了一种设备,包括:存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现如上第一方面所述的数据库模型动态调整的方法。
第三方面,本申请实施例还提供了一种计算机可读存储介质,存储有计算机可执行指令,所述计算机可执行指令被设置为执行如上所述的数据库模型动态调整的方法。
本申请的其它特征和优点将在随后的说明书中阐述,并且,部分地从说明书中变得显而易见,或者通过实施本申请而了解。本申请的目的和其他优点可通过在说明书、权利要求书以及附图中所特别指出的结构来实现和获得。
附图说明
附图用来提供对本申请技术方案的进一步理解,并且构成说明书的一部分,与本申请的实施例一起用于解释本申请的技术方案,并不构成对本申请技术方案的限制。
图1是本申请一个实施例提供的数据库模型动态调整的方法的流程图;
图2是本申请一个实施例提供的数据库模型动态调整的方法中对数据处理过程进行优化处理的流程图;
图3是本申请另一实施例提供的数据库模型动态调整的方法中对数据处理过程进行优化处理的流程图;
图4是本申请一个实施例提供的数据库模型动态调整的方法中得到时间序列分布结果的流程图;
图5是本申请一个实施例提供的数据库模型动态调整的方法中对被管数据进行分区的流程图;
图6是本申请一个实施例提供的数据库模型动态调整的方法中对数据库模型进行数据调整的流程图;
图7是本申请另一实施例提供的数据库模型动态调整的方法中得到资源分布结果的流程图;
图8是本申请一个实施例提供的数据库模型动态调整的装置示意图。
具体实施方式
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅用以解释本申请,并不用于限定本申请。
需要说明的是,虽然在装置示意图中进行了功能模块划分,在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于装置中的模块划分,或流程图中的顺序执行所示出或描述的步骤。说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。
相关技术中,针对海量数据管理和分析通常是基于数据库模型,而数据库的建模是基于规则的模型驱动,然而,通过这种方法构建的数据库模型,通常无法适用对多种类型数据的管理。可以理解的是,由于被管数据的数据量、数据类型等的不同,会导致数据库模型对被管数据中的数据处理过程也会存在差异,因此基于一种规则的模型驱动构建的数据库模型通常无法适用于多种场景环境中。例如,针对无线网元的性能数据,在管理多种制式网元性能数据中,如果是基于规则的模型驱动构建了数据库模型(即可以理解为基于专家经验构建的数据库模型),将该数据库模型作为基准模型直接使用时,容易在运行期数据和模型中存在偏差,进而导致数据库无法发挥较好性能,且容易导致被管数据中的应用系统效率低、资源使用和配置不合理等。
基于此,本申请提供了一种数据库模型动态调整的方法、设备及存储介质,通过动态调整数据库模型中存储的被管数据,使得被优化调整后的数据库模型性能较好,能够适用于对多种类型数据的管理和分析,解决了相关技术中在运行期数据和模型中存在偏差,导致数据库无法发挥较好性能的问题。
可以理解的是,本申请对数据库模型中存储的被管数据主要是从两个阶段上进行优化。第一阶段,是对数据库模型中存储的被管数据对应的数据处理过程进行优化处理;第二阶段,是从被管数据的数据处理效率方面对数据库模型进行数据调整以进行优化。
具体地,本申请实施例通过模型驱动方式构建数据库模型,该数据库模型可以作为基准模型。可以理解的是,该数据库模型(即基准模型)可以由以下步骤得到。
建立网元性能模型,该网元性能模型包括模型元素(即网元性能数据),可以理解的是,可以从业务角度建立模型元素的关系。该模型元素具体可包括网元制式、被管对象、测量类型、指标、计数器、数据库对应的表定义等,本实施例对此并不作具体限定;可以理解的是,上述指标可以为网元性能指标。由于网元性能模型中的模型元素并未与所要构建的数据库模型建立起联系,因此在建立网元性能模型之后,需要通过模型驱动方式来构建数据库模型。即通过解析网元性能模型,以将网元性能模型中的模型元素映射为数据库对象。可以理解的是,网元性能模型可以基于专家经验进行生成,或通过导入同类型网络训练好的模型作为网元性能模型。
可以理解的是,数据库对象包括但不限于:数据库、逻辑子库、数据库表、数据表、分区、视图、索引、自定义函数等,本实施例对此并不作具体限定。例如,可通过数据库对应的表定义来定义这些模型元素,以便于将模型元素与数据库模型中的数据库对象建立映射关系。如一些实施例中,通过将模型元素中的网元制式定义在数据库模型中的某个数据库或者某个数据表内等,从而建立起了映射关系。还可以理解的是,由于在对数据库模型的构建和维护过程中,将会涉及被管数据(如网元性能数据)的抽取-转换-加载(Extract-Transform-Load,ETL)、查询分析和历史数据清理等,将其映射到数据库模型中则是对数据库对象的数据定义语言(Data Definition Language,DDL)操作,例如对逻辑子库、数据表、索引、自定义函数、视图、分区的构建和维护;又或者,映射到数据库模型中是对数据库对象的数据操作语言(Data Manipulation Language,DML)操作,如对数据表的增加、查询、删除、更新。可以理解的是,针对数据库对象中的自定义函数,其能够有效帮助数据库模型处理数据,例如帮助被管数据进行转换操作,以便于存储在数据表内。针对视图,可理解为一种数据显示的方式,例如某个数据表内包括一千个字段,而用户只关注某个时刻内的一百个字段,通过视图可建立该一百个字段所对应的显示数据。
可以理解的是,本申请实施例的被管数据包括数据库对象。通过以数据库对象作为数据库模型的起点,从而根据数据库对象构建该数据库模型。
因此,本申请通过网元性能模型以进一步构建数据库模型,该数据库模型作为基准模型后,采用本申请的数据库模型动态调整的方法对该数据库模型进行数据处理过程的优化处理以及进行数据库模型的数据调整,从而得到被优化调整后的数据库模型,能够发挥较好的性能。
需要说明的是,本申请实施例描述的通过建立网元性能模型,来进一步构建数据库模型以及所涉及的具体应用场景,是为了更加清楚的说明本申请实施例的技术方案,并不构成对于本申请实施例提供的技术方案的限定。本领域技术人员可知,无线通信技术中的数据类型存在多样性,针对不同类型的数据管理和分析通常都基于数据库模型。而本实施例是具体针对多种制式网元性能数据管理和分析这个应用场景,而提出基于网元性能模型来构建数据库模型,但并不局限于此。本申请实施例提供的技术方案对于类似的技术问题,同样适用。
基于上述构建的数据库模型作为基准模型,提出本申请的数据库模型动态调整的方法的各个实施例。
下面结合附图,对本申请实施例作进一步阐述。
本申请实施例具体提供一种数据库模型动态调整的方法。如图1所示,图1是本申请一个实施例提供的数据库模型动态调整的方法的流程图,该数据库模型动态调整的方法包括但不限于有以下步骤S100至步骤S400。
步骤S100,对数据库模型中存储的被管数据对应的数据处理过程进行优化处理,得到经过优化处理后的数据处理过程所对应的当前度量指标数据。
步骤S200,对当前度量指标数据进行数据分析,得到分析结果。
步骤S300,根据分析结果,驱动数据库模型进行数据调整。
步骤S400,对经过数据调整后的数据库模型进行优化处理以更新当前度量指标数据,直到当前度量指标数据与预设的目标度量指标数据相同。
通过对数据库模型中存储的被管数据对应的数据处理过程进行优化处理,以使被管数据的数据处理过程能够被约束,从而能够保持在持续被优化的状态。
由于被管数据的数据处理过程被优化后,其所对应的当前度量指标数据也将发生改变;故对数据处理过程进行优化处理后,通过计算可得到经过优化处理后的数据处理过程所对应的当前度量指标数据。再对当前度量指标数据进行数据分析,能够得到分析结果。可以理解的是,本实施例从构建的数据库模型作为基准模型的基础上,通过采集经过优化处理后的数据处理过程所对应的当前度量指标数据,并对当前度量指标数据进行持续的机器学习和分析,以建立了依据数据库模型运行期的数据处理过程所对应的当前度量指标数据的分析结果,来驱动数据库模型进行动态调整的机制。即可根据分析结果,驱动数据库模型进行数据调整,从而可动态调整数据库模型中存储的被管数据。之后对经过数据调整后的数据库模型再进行上述优化处理,即返回步骤S100以更新当前度量指标数据,直到当前度量指标数据与预设的目标度量指标数据相同,从而能够得到被优化调整后的数据库模型。
可以理解的是,对数据库模型的数据调整,可以设置为程序自动调整或手工调整,目的是能够通过分析结果驱动数据库模型进行数据调整,再通过获取数据处理过程所对应的当前度量指标数据来验证数据库模型的质量,使得最终被优化调整后的数据库模型中的被管数据所对应的当前度量指标数据是最优的。
需要说明的是,最终被优化调整后的数据库模型能够应用到同类型的场景环境中,也能够适用于多种个性化场景。例如,针对同类型的场景环境,可以理解为,被管数据的数据类型相同,和/或,被管数据的数据量相同。对于被管数据中的数据库对象而言,假设被管数据的数据类型相同,如都为5G,而被管数据中的数据库对象对应的数据量为2000个,则所采用的数据库模型也是对应于2000个数据量的管理级别的;而将该被优化调整后的数据库模型拷贝应用到其他区域也是2000个数据量且数据类型相同的场景环境中,也是同样适用的。一些实施例中,数据库模型的管理级别可以以数据量的大小来区分,例如对数据库模型的管理级别在广义理解上,可以为:2000个数据量为一个管理级别,2万个数据量为另一个管理级别;又或者,在狭义理解上,2000个数据量为一个管理级别,2001个数据量则为另一个管理级别等。
又例如,针对多种个性化场景,由于被管数据的数据类型、数据量存在多样性,即可根据被管数据的数据类型不同,和/或,被管数据的数据量不同,对该数据库模型作优化调整,以适用于上述的个性化场景。对于被管数据的数据类型而言,例如针对2G/3G/4G/5G等不同类型的移动通信系统,其所对应的数据库模型之间也将存在差异,通过对数据库模型作对应 的个性化的优化调整后,使得最终被优化调整后的数据库模型能够应用到上述移动通信系统中。又或者,被管数据的数据类型相同,但被管数据中的数据库对象对应的数据量不同,如针对4G移动通信系统,由于被管数据中的数据库对象在不同局域点中所对应的数据量会存在差异,例如数据表的个数上的差异等,因此,所对应的数据库模型也会存在差异。
通过本实施例的数据库模型动态调整的方法,数据库模型性能较好,能够适用于对多种类型数据的管理和分析,解决了相关技术中在运行期数据和模型中存在偏差,导致数据库无法发挥较好性能的问题。
需要说明的是,数据库模型中存储的被管数据并不局限于网元性能数据,也可以是其他性能数据,其包括但不限于:所有被管系统对应的度量指标数据。
如图2所示,图2是本申请一个实施例提供的对数据处理过程进行优化处理的流程图。可以理解的是,步骤S100中的对数据库模型中存储的被管数据对应的数据处理过程进行优化处理,包括但不限于有以下步骤S110。
步骤S110,获取数据处理过程对应的算法复杂度,对不符合第一规则的算法复杂度进行优化处理;其中,第一规则包括算法复杂度小于或等于预设阈值。
本实施例具体对数据处理过程对应的算法复杂度基于第一规则进行优化处理。
可以理解的是,数据库模型中存储的被管数据对应的数据处理过程涉及的类型通常较多,故在一些实施例中,可以提取使用频率较高的一些数据处理过程来作数据分析。具体地,该数据处理过程包括但不限于:抽取过程、转换过程、加载过程、查询过程、汇总过程和补汇总过程等。因此,可对数据的抽取过程、转换过程、加载过程、查询过程、汇总过程或补汇总过程所对应的算法复杂度分别基于第一规则进行优化处理,即对不符合第一规则的算法复杂度进行优化处理,可以理解的是,被优化处理后的算法复杂度应小于或等于预设阈值。而在一些实施例中,通常采用O(n)函数用以表征算法复杂度,即可以使得算法复杂度小于或等于预设的O(n)。对于不符合第一规则的算法复杂度对应的数据处理过程进行持续优化,以降低其算法复杂度直至小于或等于O(n),本实施例的数据处理过程能够保持在持续被优化的状态。
如图3所示,图3是本申请另一实施例提供的对数据处理过程进行优化处理的流程图。可以理解的是,步骤S100中的对数据库模型中存储的被管数据对应的数据处理过程进行优化处理,还包括但不限于有以下步骤S120。
步骤S120,获取数据处理过程对应的结构化查询语言,对不符合第二规则的结构化查询语言进行优化处理。
需要说明的是,第二规则包括以下至少之一:数据重分布或广播小于或等于设定阈值;索引层次小于或等于索引层次阈值;嵌套循环次数为0;溢出文件次数为0;查询涉及的分区小于分区阈值;查询涉及的分区的筛选率小于筛选率阈值;数据倾斜次数为0;采用函数过滤数据次数为0。
若数据处理过程涉及到结构化查询语言(Structured Query Language,SQL),则本实施例需对数据处理过程对应的结构化查询语言基于第二规则进行优化处理。
可以理解的是,基于上述实施例可知,数据处理过程包括但不限于:抽取过程、转换过程、加载过程、查询过程、汇总过程和补汇总过程等,因此,可对数据的抽取过程、转换过程、加载过程、查询过程、汇总过程或补汇总过程所对应的结构化查询语言分别基于第二规 则进行优化处理,即对不符合第二规则的结构化查询语言进行优化处理。
需要说明的是,由于结构化查询语言涉及的类型通常也较多,故在一些实施例中,可以提取使用频率较高的一些结构化查询语言来作数据分析。例如通过机器学习聚类、分类等算法,来提取上述的结构化查询语言。
该结构化查询语言所对应的执行信息包括但不限于:数据重分布或广播、索引层次、嵌套循环、溢出文件、涉及的分区、数据倾斜信息、数据筛选条件等。可以理解的是,上述的结构化查询语言需要遵循数据库最佳实践规则,因此需要对不符合第二规则的结构化查询语言进行优化处理。
第二规则包括以下至少之一:数据重分布或广播小于或等于设定阈值;索引层次小于或等于索引层次阈值,例如索引层次阈值为2层;嵌套循环次数为循环次数阈值,例如循环次数阈值为0,其表示不应该存在嵌套循环;溢出文件次数为溢出次数阈值,例如溢出次数阈值为0,其表示避免出现溢出文件;查询涉及的分区小于分区阈值,例如对于一个数据表而言,其包括有多个分区(如100个分区),为了避免查询涉及的分区过多而导致不符合第二规则,通常需要设置查询涉及的分区小于分区阈值。若在执行查询过程时,直接将所有分区(如100个分区)进行了全表扫描,或者对80%分区(如80个分区)都进行了扫描,则表示该查询涉及的分区过多,而不符合第二规则,需要对其进行优化处理,直至符合第二规则;查询涉及的分区的筛选率小于筛选率阈值,例如,查询涉及的分区筛选率应当较低,当筛选率过高时,容易导致被管数据对应的当前度量指标数据较差,影响数据处理效率;数据倾斜次数为倾斜次数阈值,例如倾斜次数阈值为0,其表示数据不存在倾斜,本实施例的数据倾斜信息包括数据倾斜次数;以及采用函数过滤数据次数为过滤次数阈值,例如过滤次数阈值为0,其表示不采用函数过滤数据,本实施例的数据筛选条件包括采用函数过滤数据次数,等等。
本实施例通过对数据处理过程对应的结构化查询语言进行数据分析,以便于对不符合第二规则的结构化查询语言进行优化处理,能够保证在同等数据处理过程中,所对应的当前度量指标数据能够被优化。
可以理解的是,在一些实施例中,还可以分别获取数据处理过程对应的算法复杂度、结构化查询语言,以便于对算法复杂度、结构化查询语言分别进行优化。
还可以理解的是,在一些实施例中,结构化查询语言通常包括数据查询语言(Data Query Language,DQL)、数据操作语言(Data Manipulation Language,DML)、事务处理语言(Transaction Process Language,TPL)、数据控制语言(Data Control Language,DCL)、数据定义语言(Data Definition Language,DDL)以及指针控制语言(Cursor Control Language,CCL)。本实施例具体对与数据库模型相关的数据定义语言以及数据操作语言进行数据分析。例如,分别获取数据库模型中存储的被管数据对应的数据定义语言、数据操作语言,以便于对数据定义语言、数据操作语言分别进行优化。
需要说明的是,本申请对数据库模型中存储的被管数据主要是从两个阶段上进行优化。上述实施例具体描述了第一阶段,即对被管数据对应的数据处理过程所对应的算法复杂度和结构化查询语言进行优化处理。
下面具体描述本实施例的第二阶段,是从被管数据的数据处理效率方面对数据库模型进行数据调整以进行优化。具体地,是从被管数据对应的度量指标数据方面对数据库模型进行 数据调整以进行优化。
可以理解的是,当前度量指标数据可以包括当前效率指标数据、当前资源使用指标数据。即度量指标数据包括效率指标数据、资源使用指标数据。
当当前度量指标数据包括当前效率指标数据时,具体描述本申请实施例如下。
如图4所示,图4是本申请一个实施例提供的得到时间序列分布结果的流程图。步骤S200中的对当前度量指标数据进行数据分析,得到分析结果,具体包括但不限于有以下步骤S210。
步骤S210,在时间维度上对当前效率指标数据进行数据分析,得到时间序列分布结果。
可以理解的是,一些实施例中,能够通过时间序列方式来表征数据处理过程所对应的当前效率指标数据。因此,本实施例通过在时间维度上对当前效率指标数据进行数据分析,能够得到时间序列分布结果。这样设置的目的是,便于进一步对时间序列分布结果进行分析,以获取时间序列分布结果所对应的规律变化。可以理解的是,在一些实施例中,时间序列分布结果所对应的规律变化与数据库模型中存储的被管数据所对应的预设分区范围存在关联。可以理解的是,在时间维度上对当前效率指标数据进行数据分析,可以通过采用不同的机器学习算法,以便于获取得到时间序列分布结果。
如图5所示,可以理解的是,在一实施例中,步骤S100中的对数据库模型中存储的被管数据对应的数据处理过程进行优化处理之前,该数据库模型动态调整的方法包括但不限于有以下步骤S101。
步骤S101,根据对应的预设分区范围对数据库模型中存储的被管数据进行分区。
可以理解的是,在步骤S100之前,需要通过规则模型驱动方式构建数据库模型,该数据库模型可作为基准模型,因此,其具体可通过根据预设分区范围,对数据库模型中存储的被管数据进行对应的分区。如此设置,使得基准模型能够根据预设分区范围对被管数据进行分区,以便于提高效率。之后,本申请具体可通过以下实施例对该数据库模型进行数据调整以进行优化。
对数据库模型中存储的被管数据对应的数据处理过程中的算法复杂度和/或结构化查询语言进行优化处理;之后采集经过优化处理后的数据处理过程所对应的当前度量指标数据,本实施例则是采集经过优化处理后的数据库模型运行期的数据处理过程所对应的当前效率指标数据;之后,在时间维度上对当前效率指标数据进行数据分析,得到时间序列分布结果,再根据时间序列分布结果驱动数据库模型进行数据调整;对经过数据调整后的数据库模型进行优化处理以更新当前效率指标数据,直到当前效率指标数据与预设的目标效率指标数据相同。通过当前效率指标数据所对应的时间序列分布结果对数据库模型进行数据调整,以实现对数据库模型中的被管数据的动态调整,便于训练得到一个最优数据库模型。
具体地,在一实施例中,如图6所示,步骤S300中的根据分析结果,驱动数据库模型进行数据调整,包括但不限于有以下步骤S310至步骤S320。
步骤S310,根据时间序列分布结果,对预设分区范围进行分区调整,得到调整范围。
步骤S320,根据调整范围驱动数据库模型中存储的被管数据进行数据调整。
可以理解的是,本实施例的时间序列分布结果主要受长期趋势因素T、季节变动因素S、周期变动因素(也称循环变动因素)C和不规则变动因素I等因素的影响。长期趋势因素T表示在较长时期内受某种根本性因素作用而形成的总的变动趋势;季节变动因素S表示受季节变动影响而形成的一种长度和幅度固定的周期波动;周期变动因素C表示受各种因素影响 而形成的上下起伏不定的波动;不规则变动因素I表示一种无规律可循的变动。
需要说明的是,本实施例需要对不规则变动因素I进行排除,以提取出长期趋势因素T、季节变动因素S、周期变动因素C;通过根据上述长期趋势因素T/季节变动因素S/周期变动因素C反映的时间序列分布结果,对预设分区范围进行分区调整,得到调整范围;之后根据调整范围驱动数据库模型中存储的被管数据进行数据调整,以便于实现动态调整数据库模型中的被管数据,如根据调整范围,动态调整数据库模型中的逻辑子库的个数、数据表分区的范围等。最后,对经过数据调整后的数据库模型,再进行对数据库模型中存储的被管数据对应的数据处理过程中的算法复杂度和/或结构化查询语言的优化处理,以更新当前效率指标数据,直到当前效率指标数据与预设的目标效率指标数据相同。
当当前度量指标数据包括当前资源使用指标数据时,具体描述本申请实施例如下。
如图7所示,步骤S200中的对当前度量指标数据进行数据分析,得到分析结果,具体包括但不限于有以下步骤S220。
步骤S220,对当前资源使用指标数据进行数据分析,得到资源分布结果。
可以理解的是,资源使用指标数据包括以下至少之一:应用资源使用、数据库资源使用、存储设备输入/输出资源使用。可以理解的是,资源使用数据可包括:内存占用数据、中央处理器(Central Processing Unit,CPU)使用率数据、输入/输出(Input/Output,I/O)响应时间数据、输入/输出读写流量数据、网络流量数据等。
本实施例通过对数据库模型中存储的被管数据对应的数据处理过程进行优化处理,并采集经过优化处理后的数据处理过程所对应的当前资源使用指标数据;对当前资源使用指标数据进行数据分析,得到资源分布结果,根据资源分布结果驱动数据库模型进行数据调整,最后对经过数据调整后的数据库模型进行优化处理以更新当前资源使用指标数据,直到当前资源使用指标数据与预设的目标资源使用指标数据相同。本实施例通过采集当前资源使用指标数据来验证数据库模型的质量,使得最终被优化调整后的数据库模型中的被管数据所对应的当前资源使用指标数据是最优的,如得到最少的资源使用,此时表示数据库模型完成优化调整。
需要说明的是,一实施例中,当当前度量指标数据包括当前效率指标数据、当前资源使用指标数据时,本实施例通过对数据库模型中存储的被管数据对应的数据处理过程中的算法复杂度和/或结构化查询语言进行优化处理,并分别采集经过优化处理后的数据处理过程所对应的当前效率指标数据、当前资源使用指标数据;此时本实施例处于第二阶段的优化,一方面,在时间维度上对当前效率指标数据进行数据分析,得到时间序列分布结果;另一方面,对当前资源使用指标数据进行数据分析,得到资源分布结果;之后根据时间序列分布结果,对数据库模型中存储的被管数据对应的预设分区范围进行分区调整,得到调整范围;再根据调整范围、资源分布结果,驱动数据库模型中存储的被管数据进行数据调整;对经过数据调整后的数据库模型重新进行上述的优化处理,以分别更新当前效率指标数据和当前资源使用指标数据,直到当前效率指标数据与预设的目标效率指标数据相同,以及当前资源使用指标数据与预设的目标资源使用指标数据相同。可以理解的是,本实施例通过采集数据处理过程所对应的当前效率指标数据、当前资源使用指标数据等来验证数据库模型质量,以便于训练得到一个最优数据库模型。
下面以被管数据包括网元性能数据为例,具体描述本申请的数据库模型动态调整的方法。
本实施例通过建立网元性能模型来进一步构建数据库模型,上述所构建的数据库模型可作为基准模型。具体针对网元性能数据,本申请从网元性能数据对应的度量指标数据方面对数据库模型(即基准模型)进行数据调整和优化。可以理解的是,网元性能数据包括但不限于:管理的网元制式、以及不同制式下被管对象的指标和计数器等。
具体地,对数据库模型中存储的网元性能数据对应的数据处理过程中的算法复杂度、结构化查询语言进行优化处理,并采集经过优化处理后的数据处理过程所对应的当前度量指标数据,即采集数据库模型运行期的数据处理过程所对应的当前效率指标数据以及当前资源使用指标数据;其中,效率指标数据还可以包括网元性能指标、计数器的使用频次等。通过如此设置,能够保证同等数据处理过程中所对应的当前效率指标数据、当前资源使用指标数据能够被优化。例如在一些实施例中,可通过数据处理对应的当前耗时时长(或当前执行时间)来表征当前效率指标数据,当当前效率指标数据被优化后,则可有效提升数据处理效率;以及若当前资源使用指标数据被优化后(例如被优化后的当前资源使用率下降),则可保证资源使用在对应的目标范围内。
之后,在时间维度上对当前效率指标数据进行数据分析,得到时间序列分布结果;以及对当前资源使用指标数据进行数据分析,得到资源分布结果。即可通过对数据处理过程对应的当前效率指标数据、当前资源使用指标数据分别进行机器学习,以便于得到对应的分析结果。可以理解的是,本实施例通过时间序列分布结果、资源分布结果来驱动数据库模型进行数据调整,例如驱动数据库模型中的逻辑子库的个数、数据表的定义、数据表分区的范围或者索引的定义等等进行数据调整。对经过数据调整后的数据库模型返回到上述的优化处理,以更新当前效率指标数据、当前资源使用指标数据,直到当前效率指标数据与预设的目标效率指标数据相同,以及当前资源使用指标数据与预设的目标资源使用指标数据相同。
由此可知,经过对数据库模型的持续优化和调整,最终通过训练可得到被管数据如网元性能数据所对应的最优数据库模型。
参照图8,可以理解的是,本申请实施例的数据库模型动态调整的方法具体对应有数据库模型动态调整的装置。该装置包括但不限于有:网元性能数据抽取转换加载模块100、网元性能数据分析模块200、数据处理过程优化模块300、数据库模型优化模块400、数据库管理维护模块500和分析型数据库模块600。
网元性能数据抽取转换加载模块100被设置为获取网元性能数据,并对网元性能数据进行数据清洗和数据转换,将数据清洗和数据转换后的网元性能数据发送给分析型数据库模块600。
网元性能数据分析模块200被设置为对网元性能数据进行数据统计和数据分析,并将数据统计和数据分析后的网元性能数据发送给分析型数据库模块600。
其中,网元性能数据抽取转换加载模块100、网元性能数据分析模块200还被设置为采集数据处理过程中的算法复杂度和结构化查询语言,并将算法复杂度和结构化查询语言发送给数据处理过程优化模块300;还被设置为采集经过优化处理后的数据处理过程所对应的当前度量指标数据,并将当前度量指标数据发送给数据库模型优化模块400;其中,当前度量指标数据包括当前效率指标数据、当前资源使用指标数据。
数据处理过程优化模块300被设置为获取网元性能数据抽取转换加载模块100、网元性能数据分析模块200发送的算法复杂度和结构化查询语言,并对不符合第一规则的算法复杂 度进行优化处理,以及对不符合第二规则的结构化查询语言进行优化处理;还被设置为通过模型驱动方式维护数据库模型中的数据库对象,其中,数据库对象包括数据库、数据表、分区、视图、索引、自定义函数。
数据库模型优化模块400被设置为获取网元性能数据抽取转换加载模块100、网元性能数据分析模块200发送的数据处理过程所对应的当前度量指标数据,并对当前度量指标数据进行数据分析得到分析结果,并将分析结果发送给数据库管理维护模块500。
数据库管理维护模块500被设置为获取数据库模型优化模块400发送的分析结果,并根据分析结果驱动数据库模型进行数据调整。
分析型数据库模块600,被设置为获取网元性能数据抽取转换加载模块100发送的数据清洗和数据转换后的网元性能数据,以及获取网元性能数据分析模块200发送的数据统计和数据分析后的网元性能数据,并对网元性能数据进行数据存储和数据管理;其中,分析型数据库模块600包括GBase数据库模块、Greenplum数据库模块。
可以理解的是,网元性能数据抽取转换加载模块100能够采集网元所对应的网元性能数据,例如获取来自文件传输协议(File Transfer Protocol,FTP)服务器的网元性能数据;或者获取来自简单网络管理协议(Simple Network Management Protocol,SNMP)服务器的网元性能数据;又或者获取来自套接字(Socket)服务器的网元性能数据等。
可以理解的是,本实施例的装置还可包括应用运行模块,应用运行模块被设置为支持数据库模型进行数据调整时对应的应用运行;其中,应用运行模块包括刀片服务器模块、架式服务器模块。
可以理解的是,网元性能数据抽取转换加载模块100和网元性能数据分析模块200能够从网元性能数据的数据处理效率方面采集数据处理过程所对应的当前效率指标数据和当前资源使用指标数据。例如,数据处理过程包括:抽取过程、转换过程、加载过程、查询过程、汇总过程和补汇总过程,则该数据处理过程所对应的当前效率指标数据,可表示为:抽取过程、转换过程、加载过程、查询过程、汇总过程和补汇总过程所对应的当前执行时间;而该数据处理过程所对应的当前资源使用指标数据,可表示为:当前应用资源例如当前中央处理器占用或者当前内存占用等。在一些实施例中,网元性能数据抽取转换加载模块100和网元性能数据分析模块200还能够从数据分析方面采集数据库表及列对应的查询的频率,或者性能数据分布等。
可以理解的是,数据库模型优化模块400对当前度量指标数据进行数据分析,具体地,在时间维度上通过机器学习算法对当前效率指标数据进行数据分析,得到时间序列分布结果,时间序列分布结果主要受长期趋势因素T、季节变动因素S、周期变动因素C和不规则变动因素I等因素影响。例如,可以对网元性能数据的汇总过程、加载过程或查询过程对应的当前执行时间进行分析。一些实施例中,需要对不规则变动因素I进行排除后,建立被管数据(如网元性能数据)对应的数据处理过程(如汇总过程)对应的当前效率指标数据(如当前执行时间)所反映的时间序列分布结果(如长期趋势因素T/季节变动因素S/周期变动因素C),和数据库模型中的被管对象(如逻辑子库、数据表分区)间的关联。
例如,对于2G/3G/4G/5G共管的移动通信系统,其所包括的指标(如网元性能指标)和计数器个数通常达到数万个,不同用户能够通过查询分析功能来获取并使用自己关注的指标和计数器。例如,对于网元性能指标和计数器的查询,映射到数据库模型中则是对数据库对 象(例如数据库表等)的查询分析。而数据库模型中的数据库对象(例如数据库表)对应的使用频次也是通过时间序列方式来表征的。因此,通过对网元性能数据对应的当前度量指标数据进行数据分析所得到分析结果,可以获取到网元性能数据的数据存储和数据查询之间的关联,以便于对数据库模型的数据调整提供依据,例如为数据表的结构调整、数据汇总的操作频次调整、中间表的创建调整等提供依据。
还可以理解的是,数据库模型优化模块400还能够通过对经过数据调整后的数据库模型进行优化处理以更新当前效率指标数据,直到当前效率指标数据与预设的目标效率指标数据相同,即通过持续的数据分析和数据调整经过优化处理后的数据处理过程所对应的当前效率指标数据,能够保证数据处理功能满足业务要求以及当前资源使用指标数据为目标资源使用指标数据(如资源使用不增加)的情况下,持续判断当前效率指标数据是否变的更优,即当前效率指标数据是否与预设的目标效率指标数据相同。经过迭代调整,以训练得到一个最优数据库模型。可以理解的是,数据库模型优化模块400通过当前效率指标数据和当前资源使用指标数据来验证数据库模型的质量,例如最优数据库模型对应的最终的当前资源使用指标数据所反映的结果是最少的资源使用,当前效率指标数据所反映的结果是最优的执行时间或耗时时长等。
数据库管理维护模块500通过根据分析结果驱动数据库模型进行数据调整;具体地,基于机器学习对应的规则对数据库模型进行持续的数据调整;如调整逻辑子库的个数、数据表分区的范围、数据表的结构、数据汇总的操作频次、中间表的创建等等。
以下以具体实施例描述本申请的数据库模型动态调整的方法。
实施例1:本实施例描述的是对数据库表对应的预设分区范围进行数据调整的过程,其包括但不限于以下(1)至(5)。
(1)通过网元性能数据抽取转换加载模块100和网元性能数据分析模块200,来采集经过优化处理后的数据处理过程所对应的当前度量指标数据,并将当前度量指标数据上报给数据库模型优化模块400。本实施例的当前度量指标数据包括当前效率指标数据、当前资源使用指标数据,且当前效率指标数据、当前资源使用指标数据均具有时间特征。可以理解的是,效率指标数据包括有汇总耗时时长、加载耗时时长、查询耗时时长等;而资源使用指标数据包括有应用资源使用(如中央处理器占用/内存占用)、数据库资源使用(如中央处理器占用/内存占用)、存储设备输入/输出资源使用(如每秒的输入输出量(Input/Output Per Second,IOPS)/输入输出量等待(IOWait))等。
(2)数据库模型优化模块400在时间维度上对当前效率指标数据进行数据分析,得到时间序列分布结果,时间序列分布结果主要受长期趋势因素T、季节变动因素S、周期变动因素C和不规则变动因素I等因素影响,在排除不规则变动因素I后,得到长期趋势因素T/季节变动因素S/周期变动因素C对应的时间序列分布结果。可以理解的是,长期趋势因素T具体呈现平均趋势,而季节变动因素S和原始数据表对应的第一预设分区范围保持一致,周期变动因素C和汇总表对应的第二预设分区范围保持一致。
(3)汇总耗时时长具体定义为变量w;可以理解的是,w的最大值Max(w)小于业务定义的耗时阈值λ;预设分区范围定义为变量P,P的取值范围为[1,30];其中,原始数据表对应的第一预设分区范围定义为变量P 1,原始数据表对应的第一调整范围定义为变量p 1,汇总表对应的第二预设分区范围定义为变量P 2,汇总表对应的第二调整范围定义为变量p 2。P、P 1、 P 2、p 1、p 2的单位均为天。为了获取数据库模型对应的最优的w值,通过数据库模型优化模块400基于步骤(2)得到的时间序列分布结果,来驱动数据库管理维护模块500对数据库模型进行数据调整,以对数据库模型进行优化调整。可以理解的是,原始数据表对应的第一预设分区范围P 1通常是基于经验进行设置的,通常会设置的偏大。因此若原始数据表对应的第一预设分区范围P 1>1,则可以对原始数据表对应的第一预设分区范围P 1进行第一分区调整,以得到原始数据表对应的合适的第一调整范围p 1。具体地,第一分区调整可以设置为:p 1=P 1-1;其中,p 1大于或等于1。
(4)通过数据库模型优化模块400对经过数据调整后的数据库模型进行优化处理以更新当前效率指标数据,同时辅助参考当前资源使用指标数据,若应用资源使用或数据库资源使用总体不增加、存储设备输入/输出资源使用在硬件能力范围内,或者其他数据处理过程对应的耗时时长没有恶化,便可以存储原始数据表对应的第一调整范围p 1。可以理解的是,汇总表对应可执行如上述的分区调整,以得到汇总表对应的合适的第二调整范围p 2
(5)实施例1说明:例如,原始数据表按天分区并保存15分钟粒度的被管数据,汇总表则按周分区并保存小时粒度的被管数据。具体地,汇总表中的被管数据是通过聚合原始数据表中的被管数据,之后将原始数据表中的被管数据加载到汇总表内。在时间维度上对汇总表对应的当前效率指标数据进行数据分析,得到时间序列分布结果,其中,汇总表所对应的季节变动因素S为1天,周期变动因素C为1周。
汇总耗时时长所对应的时间序列分布结果,主要呈现两个规律。
规律一:以天为观测粒度时,随着时间推移,汇总耗时时长呈逐步上升趋势。
规律一出现的原因是:由于原始数据表是按天分区的,故针对当天的被管数据而言,其对应的数据量在时间维度上是逐渐增加的,因此通过小时汇总时,所读取的被管数据对应的数据范围逐渐增加,从而导致读取的耗时增加,因此汇总耗时时长增加。
规律二:不同天的同一时段的汇总耗时时长呈逐步上升趋势。
规律二出现的原因是:由于被管数据被汇总加载到小时表中时,会存在重复数据的删除和插入操作,删除操作所涉及的数据范围随着被管数据被汇总时所对应的数据量不断增加而变大,从而导致删除耗时时长的增加;而由于插入操作会涉及到索引页面的操作,由于索引范围随着被管数据被汇总时所对应的数据量不断增加而增加,从而导致插入耗时时长的增加。因此不同天的同一时段的汇总耗时时长呈逐步上升趋势。
可以理解的是,本实施例的原始数据表中,若第一预设分区范围P 1=1时,则不需要对原始数据表进行第一分区调整。而对于汇总表对应的第二预设分区范围P 2=7时,则可以对第二预设分区范围P 2进行第二分区调整,第二分区调整可以设置为:p 2=P 2-1,以得到汇总表对应的合适的第二调整范围p 2,以便于确保资源使用指标数据处在对应的目标范围内。
实施例2:本实施例描述的是对数据表结构、数据汇总进行数据调整的过程,其包括但不限于以下(1)至(4)。
(1)通过网元性能数据分析模块200来获取数据处理过程所对应的当前度量指标数据,并将当前度量指标数据上报给数据库模型优化模块400。度量指标数据包括指标(如网元性能指标)和计数器的使用频次、指标(如网元性能指标)和计数器对应的汇总时间范围、指标(如网元性能指标)和计数器所归属的数据库表。
(2)数据库模型优化模块400对当前度量指标数据进行数据分析,得到分析结果,分析 结果包括热点数据分布结果,如高频使用的指标分布结果、低频使用的指标分布结果;和/或高频使用的计数器分布结果、低频使用的计数器分布结果等,从而能够进一步根据指标(如网元性能指标)和计数器所归属的数据库表,得到对应的高频使用的数据库表和列、低频使用的数据库表和列等。
(3)数据库管理维护模块500对数据库模型进行数据调整,以对数据库模型进行优化调整。例如对性能数据进行小时汇总是为了应用于用户对性能小时数据分析。对于每小时都要进行最近小时数据分析的性能数据,就需要按1小时间隔及时进行汇总,例如对于性能数据进行小时汇总并保存15分钟粒度,即每小时的性能数据对应有当前度量指标数据;用户针对当前时刻(例如16:00),需要对距离当前时刻(例如16:00)最近的前1小时(例如15:00至16:00)进行数据分析,则需要按照1小时间隔及时进行汇总;而对于应用很少或没有小时数据分析所涉及的性能数据,则可以降低汇总的使用频次,并同时扩大汇总时间范围(例如按8小时间隔汇总),以提升总体小时汇总效率。对应用不关注的性能数据,则可以不存储,并调整数据库表结构。
(4)验证数据库模型效果,本实施例通过减少对性能数据的存储和降低汇总的使用频次,从而降低对应用系统计算和存储资源的使用,从而提升数据库模型整体效率。
实施例3:本实施例描述的是数据库应用领域中的对数据库模型进行数据调整的过程,其包括但不限于以下(1)至(3)。
(1)获取数据处理过程所对应的当前度量指标数据,当前度量指标数据包括当前效率指标数据和当前资源使用指标数据。例如数据入库和数据查询对应的效率指标数据(如耗时时长)、数据库资源使用(如中央处理器占用/内存占用)、存储设备输入/输出资源使用(如IOPS/IOWait)等,该当前度量指标数据具有时间特征。
(2)由于分析型数据库是由多个逻辑子库构建的集群数据库,逻辑子库的个数和IOPS成类正比关系。在时间维度上对存储设备输入/输出资源使用对应的当前资源使用指标数据进行数据分析,以判断IOPS和IOWait是否在硬件能力范围内。若数据处理过程所产生的IOPS,没有达到或已经超过硬件能力范围,则可以通过增加或减少逻辑子库的个数来提升或降低IOPS,例如,数据处理过程所产生的IOPS没有达到硬件能力范围时,则可以增加逻辑子库的个数来提升IOPS;同时IOWait应处于对应的目标范围内,即处于合理范围内。本实施例的IOPS和IOWait处在对应的目标范围内,会对数据处理过程的效率起正向作用。
(3)获取数据的查询过程对应的当前效率指标数据,驱动数据库模型中的数据库表分区/分表进行数据调整,从而使数据库模型发挥最大性能,其实现原理可参照实施例1。
可以理解的是,本实施例的数据库模型动态调整的方法,是主动式系统维护,而不是问题驱动或人工维护,能够有效提升系统运维效率。
通过对网元性能指标和计数器的使用频次进行数据分析,能够获取热点数据分布结果,进而可以调整数据库表的结构以及数据的存储,提升数据汇总和分析的效率,能够保证资源使用指标数据处在对应的目标范围内。
通过模型驱动方式构建以及维护数据库模型,能够提高数据库模型的易维护性,提高了版本升级效率,降低了等待时间。
通过模型共享性,可以将训练好的数据库模型导出并复制到同类型的系统中,降低数据库模型的训练成本。
另外,本申请的一个实施例还提供了一种设备,该设备包括:存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序。
处理器和存储器可以通过总线或者其他方式连接。
存储器作为一种非暂态计算机可读存储介质,可被设置为存储非暂态软件程序以及非暂态性计算机可执行程序。此外,存储器可以包括高速随机存取存储器,还可以包括非暂态存储器,例如至少一个磁盘存储器件、闪存器件、或其他非暂态固态存储器件。在一些实施方式中,存储器可选包括相对于处理器远程设置的存储器,这些远程存储器可以通过网络连接至该处理器。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。
实现上述实施例的数据库模型动态调整的方法所需的非暂态软件程序以及指令存储在存储器中,当被处理器执行时,执行上述实施例中的数据库模型动态调整的方法,例如,执行以上描述的图1中的方法步骤S100至S400、图2中的方法步骤S110、图3中的方法步骤S120、图4中的方法步骤S210、图5中的方法步骤S101、图6中的方法步骤S310至S320、图7中的方法步骤S220。
以上所描述的设备实施例仅仅是示意性的,其中作为分离部件说明的单元可以是或者也可以不是物理上分开的,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。
此外,本申请的一个实施例还提供了一种计算机可读存储介质,该计算机可读存储介质存储有计算机可执行指令,该计算机可执行指令被一个处理器或控制器执行,例如,被上述设备实施例中的一个处理器执行,可使得上述处理器执行上述实施例中的数据库模型动态调整的方法,例如,执行以上描述的图1中的方法步骤S100至S400、图2中的方法步骤S110、图3中的方法步骤S120、图4中的方法步骤S210、图5中的方法步骤S101、图6中的方法步骤S310至S320、图7中的方法步骤S220。
本申请实施例包括:通过对数据库模型中存储的被管数据对应的数据处理过程进行优化处理,以使被管数据的数据处理过程能够被约束,从而能够保持在持续被优化的状态;由于被管数据的数据处理过程被优化后,其所对应的当前度量指标数据也将发生改变;故对数据处理过程进行优化处理后,通过计算可得到经过优化处理后的数据处理过程所对应的当前度量指标数据;再对当前度量指标数据进行数据分析,能够得到分析结果;根据分析结果,驱动数据库模型进行数据调整,即可动态调整数据库模型中存储的被管数据;之后对经过数据调整后的数据库模型再进行上述优化处理以更新当前度量指标数据,直到当前度量指标数据与预设的目标度量指标数据相同,从而能够得到被优化调整后的数据库模型。该数据库模型性能较好,能够适用于对多种类型数据的管理和分析,解决了相关技术中在运行期数据和模型中存在偏差,导致数据库无法发挥较好性能的问题。
本领域普通技术人员可以理解,上文中所公开方法中的全部或某些步骤、系统可以被实施为软件、固件、硬件及其适当的组合。某些物理组件或所有物理组件可以被实施为由处理器,如中央处理器、数字信号处理器或微处理器执行的软件,或者被实施为硬件,或者被实施为集成电路,如专用集成电路。这样的软件可以分布在计算机可读介质上,计算机可读介质可以包括计算机存储介质(或非暂时性介质)和通信介质(或暂时性介质)。如本领域普通技术人员公知的,术语计算机存储介质包括在用于存储信息(诸如计算机可读指令、数据 结构、程序模块或其他数据)的任何方法或技术中实施的易失性和非易失性、可移除和不可移除介质。计算机存储介质包括但不限于RAM、ROM、EEPROM、闪存或其他存储器技术、CD-ROM、数字多功能盘(DVD)或其他光盘存储、磁盒、磁带、磁盘存储或其他磁存储装置、或者可以用于存储期望的信息并且可以被计算机访问的任何其他的介质。此外,本领域普通技术人员公知的是,通信介质通常包含计算机可读指令、数据结构、程序模块或者诸如载波或其他传输机制之类的调制数据信号中的其他数据,并且可包括任何信息递送介质。
以上是对本申请的一些实施进行了具体说明,但本申请并不局限于上述实施方式,熟悉本领域的技术人员在不违背本申请精神的前提下还可作出种种的等同变形或替换,这些等同的变形或替换均包含在本申请权利要求所限定的范围内。

Claims (11)

  1. 一种数据库模型动态调整的方法,包括:
    对数据库模型中存储的被管数据对应的数据处理过程进行优化处理,得到经过优化处理后的数据处理过程所对应的当前度量指标数据;
    对所述当前度量指标数据进行数据分析,得到分析结果;
    根据所述分析结果,驱动所述数据库模型进行数据调整;
    对经过数据调整后的数据库模型进行所述优化处理以更新所述当前度量指标数据,直到所述当前度量指标数据与预设的目标度量指标数据相同。
  2. 根据权利要求1所述的方法,其中,所述数据库模型由以下步骤得到:
    建立网元性能模型,所述网元性能模型包括模型元素;
    解析所述网元性能模型以将所述模型元素映射为数据库对象,其中,所述被管数据包括所述数据库对象;
    根据所述数据库对象构建所述数据库模型。
  3. 根据权利要求1所述的方法,其中,所述对数据库模型中存储的被管数据对应的数据处理过程进行优化处理,包括:
    获取数据处理过程对应的算法复杂度,对不符合第一规则的所述算法复杂度进行优化处理;其中,所述第一规则包括所述算法复杂度小于或等于预设阈值。
  4. 根据权利要求1所述的方法,其中,所述对数据库模型中存储的被管数据对应的数据处理过程进行优化处理,包括:
    获取数据处理过程对应的结构化查询语言,对不符合第二规则的所述结构化查询语言进行优化处理;
    其中,所述第二规则包括以下至少之一:
    数据重分布或广播小于或等于设定阈值;
    索引层次小于或等于索引层次阈值;
    嵌套循环次数为0;
    溢出文件次数为0;
    查询涉及的分区小于分区阈值;
    查询涉及的分区的筛选率小于筛选率阈值;
    数据倾斜次数为0;
    采用函数过滤数据次数为0。
  5. 根据权利要求1至4任一项所述的方法,其中,所述当前度量指标数据包括当前效率指标数据;
    所述对所述当前度量指标数据进行数据分析,得到分析结果,包括:
    在时间维度上对所述当前效率指标数据进行数据分析,得到时间序列分布结果。
  6. 根据权利要求5所述的方法,其中,所述对数据库模型中存储的被管数据对应的数据处理过程进行优化处理之前,所述方法还包括:
    根据对应的预设分区范围对所述数据库模型中存储的被管数据进行分区。
  7. 根据权利要求6所述的方法,其中,所述根据所述分析结果,驱动所述数据库模型进 行数据调整,包括:
    根据所述时间序列分布结果,对所述预设分区范围进行分区调整,得到调整范围;
    根据所述调整范围驱动所述数据库模型中存储的被管数据进行数据调整。
  8. 根据权利要求7所述的方法,其中,所述对经过数据调整后的数据库模型进行所述优化处理以更新所述当前度量指标数据,直到所述当前度量指标数据与预设的目标度量指标数据相同,包括:
    对经过数据调整后的所述数据库模型进行所述优化处理以更新所述当前效率指标数据,直到所述当前效率指标数据与预设的目标效率指标数据相同。
  9. 根据权利要求1至4任一项所述的方法,其中,所述当前度量指标数据包括当前资源使用指标数据;
    所述对所述当前度量指标数据进行数据分析,得到分析结果,包括:
    对所述当前资源使用指标数据进行数据分析,得到资源分布结果。
  10. 一种设备,包括:存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现如权利要求1至9中任意一项所述的数据库模型动态调整的方法。
  11. 一种计算机可读存储介质,存储有计算机可执行指令,其中,所述计算机可执行指令被设置为执行权利要求1至9中任意一项所述的数据库模型动态调整的方法。
PCT/CN2022/091510 2021-07-19 2022-05-07 数据库模型动态调整的方法、设备及存储介质 WO2023000766A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110814308.XA CN115640274A (zh) 2021-07-19 2021-07-19 数据库模型动态调整的方法、设备及存储介质
CN202110814308.X 2021-07-19

Publications (1)

Publication Number Publication Date
WO2023000766A1 true WO2023000766A1 (zh) 2023-01-26

Family

ID=84939426

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/091510 WO2023000766A1 (zh) 2021-07-19 2022-05-07 数据库模型动态调整的方法、设备及存储介质

Country Status (2)

Country Link
CN (1) CN115640274A (zh)
WO (1) WO2023000766A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116842223A (zh) * 2023-08-29 2023-10-03 天津鑫宝龙电梯集团有限公司 一种工况数据管理方法、装置、设备和介质

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115858699B (zh) * 2023-02-28 2023-05-09 北京仁科互动网络技术有限公司 数据仓库的构建方法、装置、电子设备和可读存储介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060085484A1 (en) * 2004-10-15 2006-04-20 Microsoft Corporation Database tuning advisor
CN1983274A (zh) * 2005-12-13 2007-06-20 国际商业机器公司 数据库调整方法和系统
CN101059810A (zh) * 2007-03-16 2007-10-24 华为技术有限公司 一种实现数据库系统自动优化的系统和方法
US20190034463A1 (en) * 2016-04-19 2019-01-31 Sysbank Co., Ltd. Apparatus and method for tuning relational database
CN111339066A (zh) * 2020-05-20 2020-06-26 腾讯科技(深圳)有限公司 数据库优化方法、装置、电子设备及计算机可读存储介质
US20200401598A1 (en) * 2019-06-20 2020-12-24 International Business Machines Corporation Dynamically self-indexing database-management system
CN112231236A (zh) * 2020-12-14 2021-01-15 广东睿江云计算股份有限公司 一种数据库性能的测试方法

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060085484A1 (en) * 2004-10-15 2006-04-20 Microsoft Corporation Database tuning advisor
CN1983274A (zh) * 2005-12-13 2007-06-20 国际商业机器公司 数据库调整方法和系统
CN101059810A (zh) * 2007-03-16 2007-10-24 华为技术有限公司 一种实现数据库系统自动优化的系统和方法
US20190034463A1 (en) * 2016-04-19 2019-01-31 Sysbank Co., Ltd. Apparatus and method for tuning relational database
US20200401598A1 (en) * 2019-06-20 2020-12-24 International Business Machines Corporation Dynamically self-indexing database-management system
CN111339066A (zh) * 2020-05-20 2020-06-26 腾讯科技(深圳)有限公司 数据库优化方法、装置、电子设备及计算机可读存储介质
CN112231236A (zh) * 2020-12-14 2021-01-15 广东睿江云计算股份有限公司 一种数据库性能的测试方法

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116842223A (zh) * 2023-08-29 2023-10-03 天津鑫宝龙电梯集团有限公司 一种工况数据管理方法、装置、设备和介质
CN116842223B (zh) * 2023-08-29 2023-11-10 天津鑫宝龙电梯集团有限公司 一种工况数据管理方法、装置、设备和介质

Also Published As

Publication number Publication date
CN115640274A (zh) 2023-01-24

Similar Documents

Publication Publication Date Title
US11216461B2 (en) Query transformations in a hybrid multi-cloud database environment per target query performance
US10509785B2 (en) Policy-driven data manipulation in time-series database systems
US11030189B2 (en) Maintaining up-to-date materialized views for time-series database analytics
US11138175B2 (en) Type-specific compression in database systems
WO2023000766A1 (zh) 数据库模型动态调整的方法、设备及存储介质
US11429609B2 (en) Geo-scale analytics with bandwidth and regulatory constraints
US20170083573A1 (en) Multi-query optimization
US20170293642A1 (en) Automatically revising synopsis table structure
CN107016115B (zh) 数据导出方法、装置、计算机可读存储介质及电子设备
US11734308B2 (en) Autonomous caching for views
US20170031982A1 (en) Maintaining Performance in the Presence of Insertions, Deletions, and Streaming Queries
WO2023241246A1 (zh) 索引建立方法、电子设备和计算机可读存储介质
US11995084B1 (en) Database system for querying time-series data stored in a tiered storage using a cloud platform

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22844933

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE