CN108268489B - Method and device for evaluating data platform - Google Patents

Method and device for evaluating data platform Download PDF

Info

Publication number
CN108268489B
CN108268489B CN201611259558.7A CN201611259558A CN108268489B CN 108268489 B CN108268489 B CN 108268489B CN 201611259558 A CN201611259558 A CN 201611259558A CN 108268489 B CN108268489 B CN 108268489B
Authority
CN
China
Prior art keywords
data
analyzing
evaluation
platform
evaluating
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201611259558.7A
Other languages
Chinese (zh)
Other versions
CN108268489A (en
Inventor
樊炼
林洁
薛超
曾磊
王卉
郭慈
徐庆
张欣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
China Mobile Group Hubei Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
China Mobile Group Hubei Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, China Mobile Group Hubei Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN201611259558.7A priority Critical patent/CN108268489B/en
Publication of CN108268489A publication Critical patent/CN108268489A/en
Application granted granted Critical
Publication of CN108268489B publication Critical patent/CN108268489B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method and a device for evaluating a data platform, wherein the method comprises the following steps: analyzing a Structured Query Language (SQL) statement related to a data entity in a data platform to obtain redundant data; analyzing an evaluation item including redundant data according to an Epanechnikow kernel function; evaluating the data platform according to the analyzed evaluation item. The embodiment of the invention also discloses a device for evaluating the data platform, which can evaluate the data platform in real time according to the evaluation items comprising the redundant data, is convenient for adjusting the related settings of the data platform in time, and ensures the working efficiency of the data platform.

Description

Method and device for evaluating data platform
Technical Field
The invention relates to the field of computers, in particular to a method and a device for evaluating a data platform.
Background
With the rapid development of applications such as mobile internet, internet of things and the like, the global data volume has increased explosively. The rapid increase in the amount of data predicts that the big data era has been entered. Not only is the data size larger and larger, but the complexity of processing large data is greatly increased by the large number of data types and high real-time requirements for processing data.
The signaling data in the communication field has a super large data volume, and the real-time requirement of the analysis service is gradually increased, so the method is particularly important for the health degree evaluation of a large data platform of a signaling analysis system.
In the prior art, when system resources or processing have alarms and faults, relevant processing is carried out, and normalized analysis cannot be carried out on a data platform.
Disclosure of Invention
The embodiment of the invention provides a method for evaluating a data platform, which can evaluate the data platform in real time according to an evaluation item comprising redundant data, is convenient for adjusting the related settings of the data platform in time, and ensures the working efficiency of the data platform.
The embodiment of the invention also provides a device for evaluating the data platform, which can evaluate the data platform in real time according to the evaluation items of the redundant data, is convenient for adjusting the relevant settings of the data platform in time, and ensures the working efficiency of the data platform.
A method of evaluating a data platform, the method comprising:
analyzing a Structured Query Language (SQL) statement related to a data entity in a data platform to obtain redundant data;
analyzing an evaluation item including redundant data according to an Epanechnikow kernel function;
evaluating the data platform according to the analyzed evaluation item.
Optionally, the analyzing SQL statements related to data entities in the data platform to obtain redundant data includes:
and analyzing SQL sentences related to the data entities in the data platform by using an edit distance algorithm to obtain redundant data.
Optionally, the analyzing, by using an edit distance algorithm, the SQL statements related to the data entities in the data platform to obtain the redundant data includes:
analyzing the SQL statement to obtain a data processing path and a data source of each model table;
combining and splicing the data structure corresponding to the data source and the data processing path in a character mode to form a processing characteristic character string of the model table;
and comparing the processing characteristic character strings of different model tables pairwise by using an edit distance algorithm to obtain redundant data.
Optionally, the analyzing the evaluation term including the redundant data according to the Epanechnikow kernel function includes:
obtaining bandwidth parameters according to the historical redundant data minimum mean square error;
the evaluation terms are analyzed in terms of bandwidth parameters, redundant data and Epanechnikow kernel functions.
Optionally, the evaluation item further includes:
one or more categories of space usage data, system load data, storage specification data, degree of standardization data, data usage data, or heat assessment data;
analyzing an evaluation term comprising redundant data according to an Epanechnikow kernel function, comprising:
for different categories, obtaining bandwidth parameters corresponding to the categories according to the historical category data minimum mean square error;
analyzing the evaluation item according to the bandwidth parameter corresponding to the category, the category data and the Epanechnikow kernel function;
the evaluating the data platform according to the post-analysis evaluation item includes:
and evaluating the data platform according to the evaluation items corresponding to the categories after the analysis and the weights corresponding to the categories.
An apparatus to evaluate a data platform, the apparatus comprising:
the analysis module is used for acquiring redundant data from a Structured Query Language (SQL) statement related to a data entity in the data platform;
an analysis module for analyzing an evaluation term including redundant data by using an Epanechnikow kernel function;
and the evaluation module is used for evaluating the data platform according to the evaluation items after analysis.
Optionally, the parsing module is further configured to parse, by using an edit distance algorithm, SQL statements related to data entities in the data platform to obtain redundant data.
Optionally, the parsing module is further configured to parse the SQL statement, and obtain a data processing path and a data source of each model table; combining and splicing the data structure corresponding to the data source and the processing path in a character mode to form a processing characteristic character string of the model; and comparing the processing characteristic character strings of different models pairwise by using an edit distance algorithm to obtain redundant data.
Optionally, the analysis module is further configured to minimize a mean square error according to the historical redundant data to obtain a bandwidth parameter; the evaluation terms are analyzed in terms of bandwidth parameters, redundant data and Epanechnikow kernel functions.
Optionally, the evaluation item further includes:
one or more categories of space usage data, system load data, storage specification data, degree of standardization data, data usage data, or heat assessment data;
the analysis module is also used for obtaining bandwidth parameters corresponding to different categories according to the historical category data minimum mean square error; analyzing the evaluation item according to the bandwidth parameter corresponding to the category, the category data and the Epanechnikow kernel function;
and the evaluation module is also used for evaluating the data platform according to the analyzed evaluation items corresponding to the categories and the weights corresponding to the categories.
According to the technical scheme, in the embodiment of the invention, firstly, SQL sentences related to data entities in a data platform are analyzed to obtain redundant data; then analyzing an evaluation item comprising redundant data according to an Epanechnikow kernel function; and finally, evaluating the data platform according to the analyzed evaluation item. The data platform can be evaluated in real time according to the evaluation items of the redundant data, so that the related settings of the data platform can be conveniently adjusted in time subsequently, and the working efficiency of the data platform is ensured.
Drawings
The present invention will be better understood from the following description of specific embodiments thereof taken in conjunction with the accompanying drawings, in which like or similar reference characters designate like or similar features.
The present invention will be better understood from the following description of specific embodiments thereof taken in conjunction with the accompanying drawings, in which like or similar reference characters designate like or similar features.
FIG. 1 is a schematic flow chart illustrating a method for evaluating a data platform according to an embodiment of the present invention;
FIG. 2 is a schematic diagram illustrating a process of analyzing SQL statements related to data entities in a data platform to obtain redundant data according to an embodiment of the present invention;
FIG. 3 is a flow chart illustrating an embodiment of analyzing an evaluation item including redundant data;
FIG. 4 is a schematic diagram of an apparatus for evaluating a data platform according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail with reference to the accompanying drawings and specific embodiments.
In the embodiment of the invention, various accidents are not fully considered while the data platform is established, so that redundant data exists in the data platform, and the unnecessary redundant data causes low working efficiency of the data platform. Analyzing SQL sentences related to data entities in a data platform to obtain redundant data; analyzing an evaluation item including redundant data according to an Epanechnikow kernel function; and finally evaluating the data platform. Because the data platform can be evaluated in real time according to the evaluation items of the redundant data, the generation of the redundant data is reduced by conveniently and timely adjusting the relevant settings of the data platform subsequently, and the working efficiency of the data platform is further ensured.
Referring to fig. 1, a schematic flow chart of a method for evaluating a data platform specifically includes the following steps:
101. and analyzing SQL sentences related to data entities in the data platform to obtain redundant data.
SQL is a database query and programming language for accessing data, querying, updating, and managing relational database systems. And through the analysis of the task log, SQL sentences related to each data entity in the data platform are obtained, and then redundant data is obtained.
Referring to fig. 2, obtaining redundant data by analyzing SQL statements related to data entities in a data platform specifically includes:
1011. and analyzing the SQL statement to obtain a data processing path and a data source of each model table.
The model table is abstract and summary of the entity table in the database, if the entity tables with the same table structure but different time points can be abstracted into the model table, the model table is taken as an object during concrete analysis, and repetition and redundancy of analysis results are avoided. And analyzing SQL sentences related to the data entities to obtain a data processing path and a data source of each model table. A data processing path refers to a logical path in the data processing process.
1012. And combining and splicing the data structure and the processing path corresponding to the data source in a character mode to form a processing characteristic character string of the model table.
Analyzing the data source model to obtain a data structure, combining and splicing the data structure of the data source model and the data processing path of the model table in a character mode to form a processing characteristic character string of each model table.
For example: the characteristic string of the model TABLE1 is [ TABLE structure information ] + [ process information ] (COL1| COL2| COL3) (TIME _ ID ═ 201612), wherein the data processing path is the data corresponding to the TIME _ ID character.
1013. And comparing the processing characteristic character strings of different model tables pairwise by using an edit distance algorithm to obtain redundant data.
The character string similarity algorithm is an algorithm for determining whether two character strings are similar, and specifically includes: and character string similarity calculation methods such as a Jaro-Winkler Distance algorithm (Jaro-Winkler Distance), a longest common substring algorithm (LCS) and a GST algorithm.
Any of the above string similarity algorithms can be used in the present invention, but in the algorithm selection, on one hand, the data characteristics of the telecommunication service need to be considered, and on the other hand, the character comparison performance also needs to be considered. First, the string of the data table processing procedure is composed of the SQL grammar on the data characteristic, and is an ordered string, so the string matching procedure thereof should be ordered. This is true for both the edit Distance algorithm (Jaro-Winkler Distance) and the GST algorithm, which also accounts for the comparison of two string change sequences. However, because the GST algorithm has a high time complexity of O (), the performance requirements of a system processed in 3 hours in ten thousand tables cannot be basically met in actual code operation, and the problem of a character string sequence can be well ordered uniformly by character preprocessing without being solved in the algorithm, so that the invention adopts an edit distance algorithm, and the following is an exemplary explanation of the edit distance algorithm:
two given character strings S1And S2The distance of (a) is:
Figure BDA0001199432440000051
m is the number of matched characters; t is the number of transpositions.
Two are respectively from S1And S2If the distance between the characters does not exceed
Figure BDA0001199432440000061
The two strings are considered to match. The characters matched with each other determine the number t of transposition, and in short, half of the number of matched characters in different sequences is the number t of transposition.
For example, the characters of MARTHA and MARTHA are all matched, but T and H in these matched characters need to be transposed to change MARTHA to MARTHA, then T and H are matched characters in different orders, and T is 2/2 is 1.
Then the distance between the two strings is:
Figure BDA0001199432440000062
whereas Jaro-Winkler gives a higher score to the initial part for the same string, defining a prefix p, and gives both strings, the Jaro-Winkler distance is, if the prefix parts are identical with a part of length l:
dw=dj+[lp(1-dj)] (2)
djis the distance of two strings; l is the same length of the prefix, but is specified to be at most 4; p is a constant for adjusting the fraction, provided that d cannot exceed 0.25, otherwise d may occurwIn the case of more than 1, this constant is defined as 0.1.
Thus, the above mentioned Jaro-Winkler distances for MARTHA and MARHTA are:
dw=0.944+[3*0.1(1-0.944)]=0.961
according to practical experience, when the Jaro-Winkler distance of the feature character strings of two different model tables is larger than 0.9, the processing process and the feature of the two model tables are considered to be similar, and the two models are redundant. I.e. the number of redundancies is 1 at this time.
And counting the redundancy times per day by taking the day as a unit, and dividing the redundancy times per day by the number of all the model tables per day to obtain the day redundancy.
And counting the redundancy times of each month by taking the month as a unit, and dividing the redundancy times of each month by the number of all model tables of each month to obtain the month redundancy.
The data redundancy, i.e. the redundant data, is equal to the redundancy of 0.7 day +0.3 month. Thus combining the redundancies in the two angles of day and month to obtain redundant data. That is, the redundant data is counted in units of days and months. Thereby ensuring the coverage and time span of redundant data.
102. The evaluation terms including the redundant data were analyzed according to the Epanechnikow kernel function.
The development trend of the future related data can be analyzed by using a nuclear density estimation algorithm. That is, by analyzing the evaluation item including the redundant data according to the Epanechnikow kernel function, it is possible to know whether the result of the evaluation item after evaluation is developed in a good direction or in a bad direction in a future time. The data platform is re-evaluated according to the trend developed in the manner described above.
The kernel density estimation algorithm proposed by Rosenblatt and Parzen is currently the most efficient and most widely used non-parametric density estimation algorithm. The data distribution characteristics are obtained only from the training samples, and can be used for estimating the density function of any shape. The unit variables and density estimates are described below.
Let x1、x2、x3,…,xnThe distribution function of the random variable is f (x) and x is equal to R.
Figure BDA0001199432440000071
Let (3) be the density estimate of the density function f (x), where K () is the kernel function; h is a bandwidth parameter.
For convenience, let K denotehWhere (u) ═ K (u/h) h, then formula (3) can be expressed as:
Figure BDA0001199432440000072
as can be seen from equation (3), the kernel density estimate of the distribution function f is related to the given sample set, and also to the selection of the kernel function K and the selection of the bandwidth parameter h.
Among them, the present invention selects the Epanechnikow kernel function as the kernel function of the analysis distribution function f (x).
Epanechnikow kernel function:
Figure BDA0001199432440000073
K(u)=0,|u|>1
referring to fig. 3, a schematic flow chart of analyzing an evaluation item including redundant data specifically includes:
1021. obtaining bandwidth parameters according to historical redundant data minimum mean square error
The bandwidth parameter can be obtained by minimizing mean square error according to historical redundant data.
The selection method of the bandwidth parameter h comprises the following steps: the integrated mean square error MISE (h) is used as a criterion for judging whether the density measurement is good or bad.
Figure BDA0001199432440000081
Wherein:
Figure BDA0001199432440000082
AMISE (h) is referred to as progressive mean square integral error. σ is the average of the distances of the data from the mean, which is the root of the squared sum of the mean deviations, which reflects the degree of dispersion of a data set. Wherein
Figure BDA0001199432440000083
To minimize AMISE (h), h must be set at some intermediate value, so that f can be avoidedh(x) With too large a deviation (too smooth) or too large a variance (i.e. too smooth). Regarding h-minimization of amise (h) it is shown that it is best to exactly balance the order of the variance term and the deviation term in amise (h), the optimal bandwidth is:
Figure BDA0001199432440000084
wherein, K (x), f (x) are historical redundant data. Namely, the bandwidth parameter is obtained according to the minimum mean square error of the historical redundant data.
1022. Analyzing the evaluation term according to the bandwidth parameter, the redundant data and the Epanechnikow kernel function
The evaluation terms including the redundant data are analyzed according to formula 4 with the bandwidth data calculated at 1021, the redundant data in or obtained at 101, and the substituted Epanechnikow kernel function.
103. Evaluating the data platform according to the post-analysis evaluation item
And predicting and evaluating the data platform according to the analyzed evaluation items. For example, in the current situation of redundant data, the development trend of data platforms is to move towards the good direction or the poor direction.
Analyzing SQL sentences related to data entities in a data platform to obtain redundant data; analyzing an evaluation item including redundant data according to an Epanechnikow kernel function; and finally evaluating the data platform. The data platform can be evaluated in real time according to the evaluation items of the redundant data, namely, the development trend of the data platform can be evaluated by utilizing the technical scheme of the invention. Therefore, the related settings of the data platform can be conveniently adjusted in time subsequently, namely, the generation of redundant data is reduced, and further, the working efficiency of the data platform is ensured.
Further, on the basis of the above-described embodiments, the evaluation item may further include one or more of space usage data, system load data, storage specification data, degree of standardization data, data usage data, or heat evaluation data. That is, the evaluation item may further include one or more of the above categories on the basis of including the redundant data.
For different categories, firstly, according to the historical category data, the mean square error is minimized to obtain the bandwidth parameters corresponding to the data. I.e. different classes correspond to different bandwidth parameters. For example: the space usage data corresponds to a first bandwidth parameter; the stored specification data corresponds to a second bandwidth parameter.
And analyzing the evaluation items according to the bandwidth parameters and the category data corresponding to the categories and the Epanechnikow kernel function to obtain the analyzed evaluation items corresponding to the categories. And evaluating the data platform according to the analyzed evaluation items corresponding to the categories and the weights corresponding to the categories, wherein the weights occupied by the different categories are different.
Fig. 4 is a schematic structural diagram of an apparatus for evaluating a data platform, which corresponds to the method in the first embodiment. The method specifically comprises the following steps: a parsing module 401, an analysis module 402 and an evaluation module 403.
The parsing module 401 is configured to obtain redundant data from a structured query language SQL statement related to a data entity in the data platform.
SQL is a database query and programming language for accessing data, querying, updating, and managing relational database systems. And through the analysis of the task log, SQL sentences related to each data entity in the data platform are obtained, and then redundant data is obtained.
An analysis module 402 for analyzing an evaluation term including redundant data for an Epanechnikow kernel function;
and an evaluation module 403, configured to evaluate the data platform according to the analyzed evaluation item.
Specifically, the parsing module 401 is further configured to parse, by using an edit distance algorithm, SQL statements related to data entities in the data platform to obtain redundant data.
Specifically, the parsing module 401 is further configured to parse the SQL statement to obtain a data processing path and a data source of each model table; combining and splicing the data structure corresponding to the data source and the processing path in a character mode to form a processing characteristic character string of the model; and comparing the processing characteristic character strings of different models pairwise by using an edit distance algorithm to obtain redundant data. The detailed process can be seen in step 101.
Specifically, the analysis module 402 is further configured to minimize a mean square error according to the historical redundant data to obtain a bandwidth parameter; the evaluation terms are analyzed in terms of bandwidth parameters, redundant data and Epanechnikow kernel functions.
The development trend of the future related data can be analyzed by using a nuclear density estimation algorithm. That is, by analyzing the evaluation item including the redundant data according to the Epanechnikow kernel function, it is possible to know whether the result of the evaluation item after evaluation is developed in a good direction or in a bad direction in a future time. The data platform is re-evaluated according to the trend developed in the manner described above.
In addition, the evaluation item further includes, on the basis of including the redundant data: one or more categories of space usage data, system load data, storage specification data, degree of standardization data, data usage data, or heat assessment data.
Specifically, the analysis module 402 is further configured to, for different categories, minimize a mean square error according to historical category data to obtain bandwidth parameters corresponding to the categories; and analyzing the evaluation item according to the bandwidth parameter corresponding to the category, the category data and the Epanechnikow kernel function.
Specifically, the evaluation module 403 is further configured to evaluate the data platform according to the analyzed evaluation items corresponding to the categories and the weights corresponding to the categories.
The technical effect of the device for evaluating the data platform in the second embodiment is the same as that of the method in the first embodiment, and is not described herein again.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (6)

1. A method of evaluating a data platform, the method comprising:
analyzing a Structured Query Language (SQL) statement related to a data entity in a data platform to obtain redundant data;
analyzing an evaluation item including redundant data according to an Epanechnikow kernel function;
evaluating the data platform according to the analyzed evaluation items;
the data platform is based on signaling data; the analyzing SQL statements related to the data entities in the data platform to obtain redundant data comprises the following steps:
analyzing SQL sentences related to data entities in the data platform by using an edit distance algorithm to obtain redundant data; the method for analyzing SQL sentences related to data entities in the data platform by using the edit distance algorithm to obtain redundant data comprises the following steps:
analyzing the SQL statement to obtain a data processing path and a data source of each model table;
combining and splicing the data structure corresponding to the data source and the data processing path in a character mode to form a processing characteristic character string of the model table;
and comparing the processing characteristic character strings of different model tables pairwise by using an edit distance algorithm to obtain redundant data.
2. The method of evaluating a data platform according to claim 1, wherein analyzing the evaluation terms including redundant data according to an Epanechnikow kernel function comprises:
obtaining bandwidth parameters according to the historical redundant data minimum mean square error;
the evaluation terms are analyzed in terms of bandwidth parameters, redundant data and Epanechnikow kernel functions.
3. The method of evaluating a data platform of claim 1, wherein the evaluation term further comprises:
one or more categories of space usage data, system load data, storage specification data, degree of standardization data, data usage data, or heat assessment data;
analyzing an evaluation term comprising redundant data according to an Epanechnikow kernel function, comprising:
for different categories, obtaining bandwidth parameters corresponding to the categories according to the historical category data minimum mean square error;
analyzing the evaluation item according to the bandwidth parameter corresponding to the category, the category data and the Epanechnikow kernel function;
the evaluating the data platform according to the post-analysis evaluation item includes:
and evaluating the data platform according to the evaluation items corresponding to the categories after the analysis and the weights corresponding to the categories.
4. An apparatus for evaluating a data platform, the apparatus comprising:
the analysis module is used for acquiring redundant data from a Structured Query Language (SQL) statement related to a data entity in the data platform;
an analysis module for analyzing an evaluation term including redundant data by using an Epanechnikow kernel function;
the evaluation module is used for evaluating the data platform according to the evaluation items after analysis;
the data platform is based on signaling data; the analysis module is also used for analyzing SQL sentences related to the data entities in the data platform by using an edit distance algorithm to obtain redundant data; the analysis module is also used for analyzing the SQL statement and acquiring a data processing path and a data source of each model table; combining and splicing the data structure corresponding to the data source and the processing path in a character mode to form a processing characteristic character string of the model; and comparing the processing characteristic character strings of different models pairwise by using an edit distance algorithm to obtain redundant data.
5. The apparatus for evaluating a data platform of claim 4, wherein the analysis module is further configured to minimize a mean square error from historical redundancy data to obtain a bandwidth parameter; the evaluation terms are analyzed in terms of bandwidth parameters, redundant data and Epanechnikow kernel functions.
6. The apparatus for evaluating a data platform of claim 4, wherein the evaluation term further comprises:
one or more categories of space usage data, system load data, storage specification data, degree of standardization data, data usage data, or heat assessment data;
the analysis module is also used for obtaining bandwidth parameters corresponding to different categories according to the historical category data minimum mean square error; analyzing the evaluation item according to the bandwidth parameter corresponding to the category, the category data and the Epanechnikow kernel function;
and the evaluation module is also used for evaluating the data platform according to the analyzed evaluation items corresponding to the categories and the weights corresponding to the categories.
CN201611259558.7A 2016-12-30 2016-12-30 Method and device for evaluating data platform Active CN108268489B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611259558.7A CN108268489B (en) 2016-12-30 2016-12-30 Method and device for evaluating data platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611259558.7A CN108268489B (en) 2016-12-30 2016-12-30 Method and device for evaluating data platform

Publications (2)

Publication Number Publication Date
CN108268489A CN108268489A (en) 2018-07-10
CN108268489B true CN108268489B (en) 2020-12-01

Family

ID=62753649

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611259558.7A Active CN108268489B (en) 2016-12-30 2016-12-30 Method and device for evaluating data platform

Country Status (1)

Country Link
CN (1) CN108268489B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006191512A (en) * 2005-01-05 2006-07-20 Fujio Morita Method of reproducing and searching data from communication information record and system using the same
CN102735966A (en) * 2012-06-12 2012-10-17 燕山大学 Power transmission line evaluation and diagnosis system and power transmission line evaluation and diagnosis method
CN103679296A (en) * 2013-12-24 2014-03-26 云南电力调度控制中心 Grid security risk assessment method and model based on situation awareness
CN104951763A (en) * 2015-06-16 2015-09-30 北京四方继保自动化股份有限公司 Power generator set subsynchronous risk evaluating method based on wave recording big data abnormal detection

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006191512A (en) * 2005-01-05 2006-07-20 Fujio Morita Method of reproducing and searching data from communication information record and system using the same
CN102735966A (en) * 2012-06-12 2012-10-17 燕山大学 Power transmission line evaluation and diagnosis system and power transmission line evaluation and diagnosis method
CN103679296A (en) * 2013-12-24 2014-03-26 云南电力调度控制中心 Grid security risk assessment method and model based on situation awareness
CN104951763A (en) * 2015-06-16 2015-09-30 北京四方继保自动化股份有限公司 Power generator set subsynchronous risk evaluating method based on wave recording big data abnormal detection

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
考虑参数冗余度的变压器状态评估方法;常安等;《电气应用》;20151231;第840-845页 *

Also Published As

Publication number Publication date
CN108268489A (en) 2018-07-10

Similar Documents

Publication Publication Date Title
US10671812B2 (en) Text classification using automatically generated seed data
US9317591B2 (en) Ranking search results based on word weight
US20080255760A1 (en) Forecasting system
JP2021533450A (en) Identification and application of hyperparameters for machine learning
US10956504B2 (en) Graph database query classification based on previous queries stored in repository
US8473486B2 (en) Training parsers to approximately optimize NDCG
US8954910B1 (en) Device mismatch contribution computation with nonlinear effects
CN110287332B (en) Method and device for selecting simulation model in cloud environment
CN108710662B (en) Language conversion method and device, storage medium, data query system and method
US9400826B2 (en) Method and system for aggregate content modeling
US20220207032A1 (en) Automated linear clustering recommendation for database zone maps
EP2713288B1 (en) Foreign key identification in database management systems
US20030208285A1 (en) Method for comparing solid models
US20070219646A1 (en) Device performance approximation
US11442947B2 (en) Issues recommendations using machine learning
US20040210335A1 (en) Generating a sampling plan for testing generated content
CN108268489B (en) Method and device for evaluating data platform
US20230153286A1 (en) Method and system for hybrid query based on cloud analysis scene, and storage medium
CN114861800A (en) Model training method, probability determination method, device, equipment, medium and product
CN114371950A (en) Root cause positioning method and device for application service abnormity
Chakrapani et al. Predicting performance analysis of system configurations to contrast feature selection methods
CN111240652A (en) Data processing method and device, computer storage medium and electronic equipment
CN116860183B (en) Data storage method, electronic equipment and storage medium
CN117574877B (en) Session text matching method and device, storage medium and equipment
US20240195864A1 (en) Correlations between workload characteristics and elapsed times

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant