CN116955119B - System performance test method based on data analysis - Google Patents

System performance test method based on data analysis Download PDF

Info

Publication number
CN116955119B
CN116955119B CN202311211922.2A CN202311211922A CN116955119B CN 116955119 B CN116955119 B CN 116955119B CN 202311211922 A CN202311211922 A CN 202311211922A CN 116955119 B CN116955119 B CN 116955119B
Authority
CN
China
Prior art keywords
data
dimension
performance
value
taking
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311211922.2A
Other languages
Chinese (zh)
Other versions
CN116955119A (en
Inventor
庞秋宏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin Tongde And Light Polytron Technologies Inc
Original Assignee
Tianjin Tongde And Light Polytron Technologies Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin Tongde And Light Polytron Technologies Inc filed Critical Tianjin Tongde And Light Polytron Technologies Inc
Priority to CN202311211922.2A priority Critical patent/CN116955119B/en
Publication of CN116955119A publication Critical patent/CN116955119A/en
Application granted granted Critical
Publication of CN116955119B publication Critical patent/CN116955119B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3409Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2135Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention relates to the technical field of digital data processing, and provides a system performance testing method based on data analysis, which comprises the following steps: collecting performance test related data; obtaining extremum measurement according to the distribution range of each dimensional performance data; acquiring a performance positive correlation index according to the statistical histogram and the distribution curve of each dimensional performance data; acquiring a forward processing result according to the performance positive correlation index; acquiring the information content ratio according to the proportion of the performance forward data carrying main information in the direction of each characteristic vector; and acquiring a principal component analysis result according to the information content ratio and the accumulated contribution rate corresponding to the feature vector, and completing the dimension reduction of the performance data. The invention utilizes the ambiguity of each dimension performance data characteristic to enhance the interpretability of each main component in the dimension reduction process, reduces the calculation error of the forward performance data of non-Gaussian distribution on the accumulation contribution rate, and improves the dimension reduction precision and the performance test rate of the performance data.

Description

System performance test method based on data analysis
Technical Field
The invention relates to the technical field of digital data processing, in particular to a system performance test method based on data analysis.
Background
In the digital age, the scale and complexity of the application program are gradually increased, and the performance problem of the application program is also becoming more and more difficult to process, so that whether the performance of the application program is good or not needs to be tested and analyzed through a performance monitoring technology, and the performance monitoring technology is used for realizing rapid identification of the real performance of the application program and solving the existing performance problem by collecting and analyzing the performance data of the application program so as to ensure stable and efficient operation of the application program.
The focus of the front-end performance test and the back-end performance test of the application program are different, the front-end performance test indexes mainly comprise response time, loading speed, network flow and the like, and the back-end performance test indexes mainly comprise memory overhead, concurrency user number, error rate, CPU (central processing unit) use rate, memory use rate, throughput (Transactions Per Second) TPS and the like. Therefore, the use analysis of the performance data is a key link of system performance evaluation, and the data is usually processed by a machine learning algorithm at the present stage, including data detection, data clustering, data dimension reduction and the like, wherein related parameters in common data clustering algorithms such as a K-means algorithm, a DBSCAN algorithm, a CURE algorithm and the like are required to be set according to experience, and different experience values have great influence on the data clustering result; in the data dimension reduction algorithms such as the PCA algorithm, the ICA algorithm and the t-SNE algorithm, the PCA algorithm has the defects of poor data interpretability and incapability of completely retaining data information, the ICA algorithm has the problems of difficult determination of the number of independent sources and the like. Therefore, according to the actual distribution situation of the performance data of the application program, a machine learning algorithm with higher applicability is selected for improvement, and the accuracy of performance test is improved.
Disclosure of Invention
The invention provides a system performance test method based on data analysis, which aims to solve the problem of poor interpretability of each main component caused by the mutual influence relation between different dimensional performance data in the forward direction of the traditional PCA dimension reduction algorithm, and adopts the following specific technical scheme:
one embodiment of the invention provides a system performance test method based on data analysis, which comprises the following steps:
collecting performance test related data, wherein the related data comprises CPU (Central processing Unit) utilization rate, concurrent user number, memory utilization rate, network flow, response time and error rate;
obtaining extremum measurement of each dimensional performance data at different moments according to the distribution range of each dimensional performance data; acquiring a statistical histogram of each dimension extremum measure according to the value range of the extremum measure at all times of each dimension; acquiring a performance positive correlation index according to the statistical histogram and the distribution curve of each dimensional performance data; acquiring performance forward data corresponding to each dimensional performance data at each moment according to the performance positive correlation index;
acquiring the information content ratio of each feature vector according to the proportion of the performance forward data carrying main information in the direction of each feature vector; acquiring an information contribution ratio according to the information content ratio and the variance contribution ratio corresponding to the feature vector; acquiring dimension reduction data of the performance forward data according to a comparison result of the information contribution ratio and the threshold value;
and obtaining a classification result of the dimension reduction data of the performance forward data by using a data clustering algorithm, and obtaining a test result of the system performance according to a visual result of each type of dimension reduction data.
Preferably, the method for obtaining the extremum measure of each dimension performance data at different moments according to the distribution range of each dimension performance data comprises the following steps:
for any dimension, taking the ratio of the accumulated sum of the performance data acquired in the dimension at each moment to the performance data acquired in the dimension at all moments as the performance data parameters in the dimension at each moment, and taking the maximum value and the minimum value of the performance data parameters in the dimension as the dimension peak value and the dimension valley value of the dimension respectively;
taking the difference value between the performance data parameter and the dimension peak value at each moment as a first difference value, and taking the difference value between the dimension peak value and the dimension valley value as a second difference value;
the extremum measurement at each moment consists of a first difference value and a second difference value, wherein the extremum measurement is in a direct proportion relation with the first difference value, and the extremum measurement is in an inverse proportion relation with the second difference value.
Preferably, the method for obtaining the performance positive correlation index according to the statistical histogram and the distribution curve of each dimensional performance data comprises the following steps:
for any dimension, acquiring the data feature ambiguity of each dimension according to the distribution features of all moment extremum metrics in each dimension;
acquiring a mutual influence index of each dimension according to the distribution distance between each dimension performance data and the distribution curves of the rest dimension performance data;
the performance positive correlation index of each dimension consists of data characteristic ambiguity of each dimension and a mutual influence index, wherein the performance positive correlation index is in a direct proportion relation with the data characteristic ambiguity, and the performance positive correlation index is in an inverse proportion relation with the mutual influence index.
Preferably, the method for obtaining the data feature ambiguity of each dimension according to the distribution features of all the time extremum metrics in each dimension includes:
for any dimension, acquiring distribution variance of extremum measurement at all moments of each dimension, acquiring extremum measurement value types in a statistical histogram of each dimension, and taking the product of the distribution variance and the value types as a first composition factor;
acquiring the distribution variance of the time interval between any two moments in each extreme value measurement value in each dimension statistical histogram, and taking the accumulation of the distribution variance corresponding to each extreme value measurement value in all the value measurement categories as a second composition factor;
the data characteristic ambiguity of each dimension consists of a first composition factor and a second composition factor, wherein the data characteristic ambiguity is in inverse proportion to the first composition factor, and the data characteristic ambiguity is in direct proportion to the second composition factor.
Preferably, the method for obtaining the interaction index of each dimension according to the distribution distance between each dimension performance data and the distribution curve of the rest dimension performance data comprises the following steps:
for any dimension, a trend curve and a trend removal curve of each dimension performance data are obtained by using a data trend removal algorithm;
taking the measurement distance between the detrending curve of each dimensional performance data and the detrending curve of the rest dimensional performance data as a first distance value, taking the measurement distance between the trending curve of each dimensional performance data and the trending curve of the rest dimensional performance data as a second distance value, and taking the accumulation of the ratio of the first distance value to the second distance value of each dimension in all dimensions as a first accumulated value of each dimension;
the mutual influence index of each dimension consists of a first accumulated value and the number of dimensions of each dimension, wherein the mutual influence index is in direct proportion to the first accumulated value, and the mutual influence index is in inverse proportion to the number of dimensions.
Preferably, the method for obtaining the performance forward data corresponding to each dimensional performance data at each moment according to the performance forward correlation index comprises the following steps:
for any dimension, obtaining a difference value between a performance positive correlation index of performance data of each moment of the dimension and a minimum value of the performance positive correlation index in the dimension, and taking the opposite number of the difference value as an index and taking a normalized value of a calculation result with a natural constant as a base as a forward coefficient of each moment of the dimension;
taking the difference value between the performance data parameter of each moment of the dimension and the average value of all the performance data parameters in the dimension as a numerator, taking the distribution variance of all the performance data parameters in the dimension as a denominator, and taking the ratio of the numerator to the denominator as a second product factor of each moment;
the performance forward data of each moment in the dimension consists of a forward coefficient and a second product factor of each moment, wherein the performance forward data and the forward coefficient are in a direct proportion relation, and the performance forward data and the second product factor are in a direct proportion relation.
Preferably, the method for obtaining the information content ratio of each feature vector according to the proportion of the performance forward data carrying main information in the direction of each feature vector comprises the following steps:
for any one of the feature vectors, obtaining the local density variation of each feature vector according to the local density of the data point on each feature vector in the feature space, and taking the difference value between the local density variation of each feature vector and the local density variation mean value of all the feature vectors as a third composition factor of each feature vector;
taking the variation coefficient of the local density composition sequence of all data points on each feature vector as a fourth composition factor of each feature vector;
the information content ratio of each feature vector consists of a third component factor and a fourth component factor of each feature vector, wherein the information content ratio is in a proportional relation with the third component factor, and the information content ratio is in a proportional relation with the fourth component factor.
Preferably, the method for obtaining the local density variation of each feature vector according to the local density of the data point on each feature vector in the feature space comprises the following steps:
for any one feature vector, acquiring the distribution positions of all data points on each feature vector, and taking each data point as a circle center, and acquiring the number of data points in a spherical region as molecules by taking a preset parameter as a radius;
taking the intersection point when all the feature vectors form a feature space as an origin, taking the origin as a circle center, taking the total number of data points in a spherical region obtained by taking a preset parameter as a radius as a denominator, and taking the ratio of a numerator to the denominator as the local density of each data point;
accumulating the local density of each data point on each feature vector at each radius to be used as a second accumulated value of the feature vector;
the local density variation of each feature vector consists of a second accumulated value and a radius maximum value of the feature vector, wherein the local density variation is in direct proportion to the second accumulated value, and the local density variation is in inverse proportion to the radius maximum value.
Preferably, the method for obtaining the information contribution ratio according to the information content ratio and the variance contribution ratio corresponding to the feature vector comprises the following steps:
for any one feature vector, taking the product of the feature value and the information content ratio of each feature vector as a numerator, taking the accumulated sum of the product of the feature value and the information content ratio of all feature vectors as a denominator, and taking the ratio of the numerator and the denominator as a first scale factor of each feature vector;
and taking the accumulated sum of the first scale factors on the preset number of feature vectors as the information contribution ratio of the preset number of feature vectors.
Preferably, the method for obtaining the test result of the system performance according to the visual result of each type of dimension reduction data comprises the following steps:
and obtaining a classification result of the dimensionality reduction data by using a data clustering algorithm, obtaining a visual result of each type of dimensionality reduction data by using a data visual technology, obtaining a test result of each performance test according to the visual result, and taking the test results of all the test items as the test result of the performance of the whole system.
The beneficial effects of the invention are as follows: the invention builds extremum measurement and performance positive correlation index by analyzing the distribution characteristics of each dimensional performance data, and the performance positive correlation index considers the ambiguity of each dimensional data characteristic. The method has the advantages that the amount of the performance data information contained in each feature vector can be comprehensively estimated by combining the variance contribution rate of the feature vector, calculation errors of forward performance data of non-Gaussian distribution on the accumulation contribution rate are avoided, the dimension reduction precision of the performance data is improved, and the follow-up performance test rate is facilitated.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions of the prior art, the drawings which are used in the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the description below are only some embodiments of the invention, and that other drawings can be obtained according to these drawings without inventive faculty for a person skilled in the art.
Fig. 1 is a flow chart of a system performance testing method based on data analysis according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1, a flowchart of a system performance testing method based on data analysis according to an embodiment of the invention is shown, and the method includes the following steps:
step S001, acquiring initial relevant data of a system performance test, and acquiring a performance data matrix based on the initial relevant data.
The invention relates to a method for testing the performance of a video playing system, which is characterized in that the initial data collection can be realized through an agent or an API (application program interface), wherein the agent is usually installed on a host computer where an application program is located, the API is embedded into the application program through codes, the installation of the agent and the embedding of the API are known techniques, and the specific process is not repeated. The initial data comprises CPU usage rate, concurrent user number, memory usage rate, network traffic, response time and error rate.
In the invention, the time interval between two adjacent times of data acquisition is 1s, M times of data acquisition is taken, the magnitude of M takes an empirical value of 600, in order to avoid data loss in the data transmission process caused by network fluctuation and other reasons, the acquired data is processed by using a k-nearest neighbor filling algorithm, the k-nearest neighbor filling algorithm is a known technology, the specific process is not repeated, the processed data is recorded as performance data, a multidimensional performance data matrix G is constructed by using performance data of all dimensions, and the performance data of each dimension forms a row vector of the matrix G according to the ascending order of time.
Thus, a multi-dimensional performance data matrix for system performance test is obtained.
Step S002, a performance positive correlation index is constructed based on the distribution characteristics of the performance data of each dimension, and the performance forward data of each dimension at each moment is obtained based on the performance positive correlation index.
When the system performance is tested by utilizing multidimensional data, each dimension data index has a certain relation with the system performance, mainly comprising a positive correlation relation, a negative correlation relation and a stable relation, and the corresponding data indexes are respectively marked as a positive index, a negative index and a neutral index. When the data index is larger or is closer to a certain threshold value, the system performance is better, at this time, a positive correlation exists between the data index and the system performance, the data index is considered to be a forward index, for example, as concurrent users increase in continuous time, the system response time is faster, the flow bearing capacity of the video playing system is stronger, and the response time is considered to be the forward index.
In the process of performing dimension reduction analysis on performance data by using principal component analysis PCA, forward processing of the performance data is an essential step in the early stage. The invention therefore contemplates forward processing of performance data for each dimension based on the correlation of the performance data for each dimension with the video system being played. In addition, in order to accurately reflect the positive correlation or the negative correlation between the single-dimension performance data and the video playing system, it is necessary to avoid the mutual influence relationship between the different-dimension data, for example, the higher the CPU occupancy rate is, the slower the system response is, and the negative correlation exists between the two.
For performance data of any dimension, calculating the ratio of each element in the corresponding row of the dimension to the sum of all elements in the row in the matrix G, taking the maximum value of the ratio as the dimension peak value of the dimension, taking the minimum value of the ratio as the dimension valley value of the dimension, and then respectively obtaining the dimension peak value and the dimension valley value of all the dimensions. Obtaining extremum measurement of each dimensional performance data at each moment by using each dimensional performance data obtained at each moment and the corresponding dimensional peak value and dimensional valley value, and calculating extremum measurement of ith dimensional performance data at a moment
In the method, in the process of the invention,is the performance data parameter of the ith dimension at time a,/->、/>The dimension peak value and the dimension valley value of the ith dimension are respectively. />The greater the value of +.>The smaller the difference from the dimension peak value of the ith dimension, the more the dimension feature of the ith dimension can be expressed.
Therefore, the capability of expressing dimensional characteristics of the performance data parameters of the ith dimension at the moment a can be accurately expressed by analyzing the relation between the performance data parameters of the ith dimension at the moment a and the dimension peak value and the dimension valley value of the ith dimension, whereinThe smaller the difference from the dimension peak of the ith dimension is, the first difference +.>The larger the value of (2), the more stable the ith dimensional property, the second difference +.>The greater the value of +.>The greater the value of (2).
Further, an extremum measure of each dimension at each moment is obtained, if a certain extremum measure of a certain dimension appears at a plurality of moments, the distinguishing property of the dimension at the plurality of moments is poor, the extremum measure has interpretability at the plurality of moments, and the extremum measure has poorer capability of expressing the data characteristics of the dimension. Statistics of all dimensionsThe extreme value measurement of time is taken as the abscissa of the histogram, each extreme value measurement comprises the number of time as the ordinate, the statistical histogram of each dimension extreme value measurement is obtained, and the statistical histogram of the ith dimension extreme value measurement is recorded as. Secondly, a trend curve and a trend-removing curve of each dimensional performance data are obtained by using a trend-removing DFA algorithm, wherein the trend-removing DFA algorithm is a known technology, the specific process is not repeated, and the trend curve and the trend-removing curve of the ith dimension are respectively marked as +.>、/>
Based on the analysis, a performance positive correlation index V is constructed, used for representing the suitability of the performance positive index of each dimension data construction system, and the performance positive correlation index of the ith dimension data is calculated
In the method, in the process of the invention,is the data feature ambiguity of the ith dimension, < +.>Is the distribution variance of all extremum metrics of the ith dimensionN is the number of bins in the statistical histogram, +.>Is the variance of the distribution of the time interval between any two moments in all moments in the b-th cylinder. />The smaller the value of i-th dimension the more blurred the data features.
Is the i-th dimension of the interaction index, m is the number of dimensions of the performance data, +.>、/>The detrending curves of the ith and jth dimension, respectively, < >>、/>Trend curves in the ith and jth dimensions, +.>、/>Respectively is a curve->And->Between, curve->And->DTW distance between the branches, DTW distance is publicThe specific process of the known technology is not repeated.The larger the value of i-th dimensional property data, the more the distribution of i-th dimensional property data is affected by the remaining dimensional property data.
The performance positive correlation index reflects the suitability of each dimension data to construct a system performance forward index. The fewer the valued species of the extremum measure in the ith dimension, the more similar the size of the performance data of the ith dimension at multiple times,the smaller the value of (1) the first composition factor +.>The smaller the value of (i) the more likely the same extremum measure in the ith dimension is to be randomly distributed at a plurality of moments, +.>The larger the value of (2) the second composition factor +.>The greater the value of (2); the distribution of the ith dimensional property data is greatly influenced by the jth dimensional property data, the trending curve +.>、/>The larger the difference between the first distance values +.>The larger the value of (2), the more similar the timing trend of the ith dimension is to the timing trend of the jth dimension, the second distance value +.>The smaller the value of (2), the first accumulated value +.>The bigger the->The greater the value of (2); i.e. < ->The larger the value of the (i) th dimension performance data is, the larger the feature blurring degree of the (i) th dimension performance data is, the larger the influence of the rest dimension performance data is, and the more unsuitable for constructing the forward index is. The performance positive correlation index considers the ambiguity of each dimension data characteristic, and has the advantages of eliminating the mutual influence relationship between different dimension performance data in the subsequent PCA forward conversion process, enhancing the interpretability of each subsequent main component, and improving the direct relevance of single dimension data and system performance.
Further, a performance positive correlation index of each dimension is obtained, and a forward conversion result of the performance data of each dimension at each moment is obtained by combining the performance data of each dimension. Calculating forward data of ith dimension performance data at time a
In the method, in the process of the invention,is the forward coefficient of the ith dimension performance data at time a, norm () is the normalization function,/>Is the positive correlation index of the performance of the ith dimension performance data at time a,/th>Is the minimum of all dimensions. />The larger the value of i-th dimension is, the less affected by the remaining dimensional performance data, and the more beneficial the forward data is in building dimensional data features.
Is the forward data of the ith dimension performance data at time a,/th dimension performance data>Is the performance data parameter of the ith dimension at time a,/->Is the mean value of all performance data parameters of the ith dimension,/->Is the variance of all performance data parameters for the ith dimension.
Therefore, forward data of the ith dimension performance data at the time a can be accurately obtained, wherein the smaller the ith dimension is influenced by the rest dimension performance data, the more stable the ith dimension performance data is distributed, the larger the performance data parameter of the ith dimension at the time a is, and the second product factor isThe greater the value of +.>The greater the value of (2).
Thus, the forward processing result of the performance data is obtained.
And step S003, constructing an information content ratio based on the variation of the data information on each feature vector in the feature space, and obtaining a dimension reduction result based on the information content ratio.
After the forward direction result of each dimension performance data at each moment is obtained, the traditional PCA algorithm utilizes all forward direction data to construct a similarity matrix, and then a main component is determined according to the contribution rate of the feature value corresponding to the similarity matrix, wherein the determined main component contains most of information of the original variable.
However, for a video playing system, in the video playing peak period, the click rate of a user is high, the video searching in the system is frequent, at the moment, the concurrent user number and the CPU occupation rate in the performance data are gradually increased from smaller values, a large-amplitude data fluctuation phenomenon can occur, and the system is in a stable state in a subsequent period of time; the response time in the performance data will usually have a small amplitude, and the decrease in a short time will then return to the level before the amplitude decrease, and a small fluctuation range will be maintained with the increase or decrease of the number of concurrent users, that is, the forward data obtained in the above steps does not completely conform to the gaussian distribution. Therefore, the invention considers the distribution situation of forward data according to each dimension, and judges which components are supposed to be the main components according to the data information carried by the forward data at each moment.
Based on the analysis, an information content ratio P is constructed, which is used for representing the proportion of the main information contained in the information carried by each feature vector, and the information content ratio of the c-th feature vector is calculated
In the method, in the process of the invention,is the local density variation of the feature vector c, < >>Is that the r point on the characteristic vector c is at the radius of valueThe local density is obtained by the following steps: about the r point as the center of a circle +.>Taking the intersection point of the number of data points in the spherical space obtained for the radius and the time when all the feature vectors form the feature space as the origin, and taking +.>For the ratio of the total number of data points in the sphere region obtained by the radius, M is the number of points on the feature vector c, L is the maximum value of the radius value in the invention, the radius value is increased by 2 each time, and the size of L is checked to obtain a value 21./>The larger the value of (c), the larger the data distribution range on the feature vector c.
Is the information content ratio of the c-th feature vector,>is the variation coefficient of local density of M points on the feature vector c when the value radius is L, < ->Is the average of the local density variation of all feature vectors.
The information content ratio reflects the proportion of the main information contained in the information carried on each feature vector. The larger the data distribution range on the feature vector c, the more likely the performance information has a larger variation in the direction of the feature vector c,the larger the value of (2), the second accumulated value +.>The greater the value of +.>The larger the value of (2), the third composition factor +.>The larger the value of the video playing platform is, the more the change of the performance data of the video playing platform in different time periods can be reflected; the larger the information change of the forward performance data on the feature vector c is, the more points on the feature vector c have different local densities, and the fourth component factor is +>The greater the value of (2); i.e. < ->The larger the value of (c), the more likely it is that the feature vector c contains more data information in the direction, and the more should be the main component of the forward performance data. The information content ratio considers the data density in different ranges of the feature space, and has the advantages that the variance contribution rate of the feature vectors can be combined to comprehensively evaluate the quantity of the performance data information contained in each feature vector, and the calculation error of the forward performance data of non-Gaussian distribution on the accumulated contribution rate is avoided.
Further, the information content ratio of each feature vector is obtained, and the principal component analysis result is obtained based on the information content ratio and the contribution rate corresponding to the feature vector. Calculating the information contribution ratio of the first k principal components
In the method, in the process of the invention,、/>the eigenvalue, the information content ratio of the eigenvector c, respectively +.>The number of eigenvectors is obtained by constructing a correlation matrix from the forward performance data.
Thereby, it can pass throughThe characteristic value and the information content ratio of the characteristic vector c determine the information contribution ratio of the main component, and the more the information content on the characteristic vector c is, the first scale factorThe greater the value of +.>The greater the value of (2).
Further, k is sequentially acquired from 1 to 1The value of the information contribution ratio, information contribution ratio +.>The feature vector of the previous k when the threshold is larger than the threshold is taken as the main component of forward performance data, and the threshold size takes a checked value of 0.8. And taking the k principal components as principal component analysis results in a PCA algorithm, and obtaining a dimension reduction result of the forward performance data by using the PCA algorithm, wherein the PCA algorithm is a known technology, and the specific process is not repeated.
Thus, the dimension reduction result of the performance forward data is obtained.
And S004, obtaining a visual result of each type of dimension reduction data according to the dimension reduction result, and obtaining a system performance test result according to the visual result.
Obtaining dimension reduction data corresponding to the multi-dimensional performance data according to the steps, taking the dimension reduction data as input, taking Euclidean distance between the data as measurement distance, obtaining a classification result of the dimension reduction data by using a K-means clustering algorithm, wherein the size of K is 10 tested values, the K-means clustering algorithm is a known technology, and the specific process is not repeated.
Further, the data visualization technology is utilized to obtain the visualization result of each type of the dimension-reduced data, each item in the performance test items is sequentially carried out, the test result of each item is obtained according to the visualization result, the data visualization is a known technology, and the specific process is not repeated. And taking the test results of all the test items as the test results of the performance of the whole system, and maintaining and upgrading the performance of the system by research and development personnel according to the performance test results.
The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, alternatives, and improvements that fall within the spirit and scope of the invention.

Claims (4)

1. A system performance test method based on data analysis, the method comprising the steps of:
collecting performance test related data, wherein the related data comprises CPU (Central processing Unit) utilization rate, concurrent user number, memory utilization rate, network flow, response time and error rate; constructing a multi-dimensional performance data matrix based on the correlation data;
obtaining extremum measurement of each dimensional performance data at different moments according to the distribution range of each dimensional performance data; acquiring a statistical histogram of each dimension extremum measure according to the value range of the extremum measure at all times of each dimension; acquiring a performance positive correlation index according to the statistical histogram and the distribution curve of each dimensional performance data; acquiring performance forward data corresponding to each dimensional performance data at each moment according to the performance positive correlation index;
acquiring the information content ratio of each feature vector according to the proportion of the performance forward data carrying main information in the direction of each feature vector; acquiring an information contribution ratio according to the information content ratio and the variance contribution ratio corresponding to the feature vector; acquiring dimension reduction data of the performance forward data according to a comparison result of the information contribution ratio and the threshold value;
the method comprises the steps of obtaining classification results of dimension reduction data of performance forward data by using a data clustering algorithm, obtaining a visual result of each type of dimension reduction data by using a data visual technology, obtaining a test result of each performance test according to the visual result, and taking the test results of all test items as test results of the performance of the whole system;
the extremum measurement of each dimension performance data at different moments is obtained according to the distribution range of each dimension performance data:
for any dimension, taking the ratio of the accumulated sum of the performance data acquired in the dimension at each moment to the performance data acquired in the dimension at all moments as the performance data parameters in the dimension at each moment, and taking the maximum value and the minimum value of the performance data parameters in the dimension as the dimension peak value and the dimension valley value of the dimension respectively;
taking the difference value between the performance data parameter and the dimension peak value at each moment as a first difference value, and taking the difference value between the dimension peak value and the dimension valley value as a second difference value;
the extremum measurement at each moment consists of a first difference value and a second difference value, wherein the extremum measurement is in a direct proportion relation with the first difference value, and the extremum measurement is in an inverse proportion relation with the second difference value;
the performance positive correlation index obtained according to the statistical histogram and the distribution curve of each dimensional performance data is as follows:
for any dimension, acquiring the data feature ambiguity of each dimension according to the distribution features of all moment extremum metrics in each dimension;
acquiring a mutual influence index of each dimension according to the distribution distance between each dimension performance data and the distribution curves of the rest dimension performance data;
the performance positive correlation index of each dimension consists of two parts of data characteristic ambiguity and a mutual influence index of each dimension, wherein the performance positive correlation index is in a direct proportion relation with the data characteristic ambiguity, and the performance positive correlation index is in an inverse proportion relation with the mutual influence index;
the data feature ambiguity of each dimension is obtained according to the distribution features of all time extremum metrics on each dimension, and the data feature ambiguity is:
for any dimension, acquiring distribution variance of extremum measurement at all moments of each dimension, acquiring extremum measurement value types in a statistical histogram of each dimension, and taking the product of the distribution variance and the value types as a first composition factor;
acquiring the distribution variance of the time interval between any two moments in each extreme value measurement value in each dimension statistical histogram, and taking the accumulation of the distribution variance corresponding to each extreme value measurement value in all the value measurement categories as a second composition factor;
the data characteristic ambiguity of each dimension consists of a first composition factor and a second composition factor, wherein the data characteristic ambiguity is in inverse proportion to the first composition factor, and the data characteristic ambiguity is in direct proportion to the second composition factor;
the obtaining the interaction index of each dimension according to the distribution distance between the performance data of each dimension and the distribution curves of the performance data of the other dimensions is as follows:
for any dimension, a trend curve and a trend removal curve of each dimension performance data are obtained by using a data trend removal algorithm;
taking the measurement distance between the detrending curve of each dimensional performance data and the detrending curve of the other dimensional performance data as a first distance value, taking the measurement distance between the trending curve of each dimensional performance data and the trending curve of the other dimensional performance data as a second distance value, and taking the accumulation of the ratio of the first distance value to the second distance value of each dimension in all the other dimensions as a first accumulated value of each dimension;
the interaction index of each dimension consists of a first accumulated value and the number of dimensions of each dimension, wherein the interaction index is in direct proportion to the first accumulated value, and the interaction index is in inverse proportion to the number of dimensions;
the information content ratio of each feature vector is obtained according to the proportion of the performance forward data carrying main information in the direction of each feature vector, and is as follows:
for any one of the feature vectors, obtaining the local density variation of each feature vector according to the local density of the data point on each feature vector in the feature space, and taking the difference value between the local density variation of each feature vector and the local density variation mean value of all the feature vectors as a third composition factor of each feature vector;
taking the variation coefficient of the local density composition sequence of all data points on each feature vector as a fourth composition factor of each feature vector;
the information content ratio of each feature vector consists of a third component factor and a fourth component factor of each feature vector, wherein the information content ratio is in a proportional relation with the third component factor, and the information content ratio is in a proportional relation with the fourth component factor.
2. The method for testing system performance based on data analysis according to claim 1, wherein the obtaining performance forward data corresponding to each dimensional performance data at each moment according to the performance forward correlation index is:
for any dimension, obtaining a difference value between a performance positive correlation index of performance data of each moment of the dimension and a minimum value of the performance positive correlation index in the dimension, and taking the opposite number of the difference value as an index and taking a normalized value of a calculation result with a natural constant as a base as a forward coefficient of each moment of the dimension;
taking the difference value between the performance data parameter of each moment of the dimension and the average value of all the performance data parameters in the dimension as a numerator, taking the distribution variance of all the performance data parameters in the dimension as a denominator, and taking the ratio of the numerator to the denominator as a second product factor of each moment;
the performance forward data of each moment in the dimension consists of a forward coefficient and a second product factor of each moment, wherein the performance forward data and the forward coefficient are in a direct proportion relation, and the performance forward data and the second product factor are in a direct proportion relation.
3. The method for testing system performance based on data analysis according to claim 1, wherein the obtaining the local density variation of each feature vector according to the local density of the data point on each feature vector in the feature space is:
for any one feature vector, acquiring the distribution positions of all data points on each feature vector, and taking each data point as a circle center, and acquiring the number of data points in a spherical region as molecules by taking a preset parameter as a radius;
taking the intersection point when all the feature vectors form a feature space as an origin, taking the origin as a circle center, taking the total number of data points in a spherical region obtained by taking a preset parameter as a radius as a denominator, and taking the ratio of a numerator to the denominator as the local density of each data point;
accumulating the local density of each data point on each feature vector at each radius to be used as a second accumulated value of the feature vector;
the local density variation of each feature vector consists of a second accumulated value and a radius maximum value of the feature vector, wherein the local density variation is in direct proportion to the second accumulated value, and the local density variation is in inverse proportion to the radius maximum value.
4. The method for testing system performance based on data analysis according to claim 1, wherein the obtaining the information contribution ratio according to the information content ratio and the variance contribution ratio corresponding to the feature vector is:
for any one feature vector, taking the product of the feature value and the information content ratio of each feature vector as a numerator, taking the accumulated sum of the product of the feature value and the information content ratio of all feature vectors as a denominator, and taking the ratio of the numerator and the denominator as a first scale factor of each feature vector;
and taking the accumulated sum of the first scale factors on the preset number of feature vectors as the information contribution ratio of the preset number of feature vectors.
CN202311211922.2A 2023-09-20 2023-09-20 System performance test method based on data analysis Active CN116955119B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311211922.2A CN116955119B (en) 2023-09-20 2023-09-20 System performance test method based on data analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311211922.2A CN116955119B (en) 2023-09-20 2023-09-20 System performance test method based on data analysis

Publications (2)

Publication Number Publication Date
CN116955119A CN116955119A (en) 2023-10-27
CN116955119B true CN116955119B (en) 2023-12-05

Family

ID=88462428

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311211922.2A Active CN116955119B (en) 2023-09-20 2023-09-20 System performance test method based on data analysis

Country Status (1)

Country Link
CN (1) CN116955119B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109934456A (en) * 2019-01-29 2019-06-25 中国电力科学研究院有限公司 A kind of method and system for acquisition operational system progress intelligent trouble detection
CN114003636A (en) * 2021-10-20 2022-02-01 河海大学 Multivariate time sequence similarity searching method based on variable correlation
CN116502112A (en) * 2023-06-29 2023-07-28 深圳市联明电源有限公司 New energy power supply test data management method and system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120209575A1 (en) * 2011-02-11 2012-08-16 Ford Global Technologies, Llc Method and System for Model Validation for Dynamic Systems Using Bayesian Principal Component Analysis

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109934456A (en) * 2019-01-29 2019-06-25 中国电力科学研究院有限公司 A kind of method and system for acquisition operational system progress intelligent trouble detection
CN114003636A (en) * 2021-10-20 2022-02-01 河海大学 Multivariate time sequence similarity searching method based on variable correlation
CN116502112A (en) * 2023-06-29 2023-07-28 深圳市联明电源有限公司 New energy power supply test data management method and system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
A PCA-based similarity measure for multivariate time series;Kiyoung Yang el.;《Proceedings of the 2nd ACM international workshop on Multimedia databases》;全文 *
基于互信息的主成分分析用于声场景分类;范雪莉;冯海泓;原猛;;声学技术(第03期);全文 *

Also Published As

Publication number Publication date
CN116955119A (en) 2023-10-27

Similar Documents

Publication Publication Date Title
CN102842042B (en) Biometric authentication technology
CN110633725B (en) Method and device for training classification model and classification method and device
CN110968272B (en) Time sequence prediction-based method and system for optimizing storage performance of mass small files
CN110826618A (en) Personal credit risk assessment method based on random forest
CN110728313B (en) Classification model training method and device for intention classification recognition
CN111626821A (en) Product recommendation method and system for realizing customer classification based on integrated feature selection
CN112819299A (en) Differential K-means load clustering method based on center optimization
CN116596095B (en) Training method and device of carbon emission prediction model based on machine learning
CN113222062A (en) Method, device and computer readable medium for tobacco leaf classification
CN113674862A (en) Acute renal function injury onset prediction method based on machine learning
CN113568368A (en) Self-adaptive determination method for industrial control data characteristic reordering algorithm
CN111461923A (en) Electricity stealing monitoring system and method based on deep convolutional neural network
Nair et al. A life cycle on processing large dataset-LCPL
CN110472659A (en) Data processing method, device, computer readable storage medium and computer equipment
CN114463587A (en) Abnormal data detection method, device, equipment and storage medium
CN116955119B (en) System performance test method based on data analysis
CN111815209A (en) Data dimension reduction method and device applied to wind control model
CN116362251A (en) Named entity recognition model training method, device, equipment and medium
CN113157814B (en) Query-driven intelligent workload analysis method under relational database
CN115271442A (en) Modeling method and system for evaluating enterprise growth based on natural language
US20220114460A1 (en) Apparatus of Identifying Heterogeneous Time-Series Data Expression with High Efficiency
CN115186138A (en) Comparison method and terminal for power distribution network data
CN113792141A (en) Feature selection method based on covariance measurement factor
CN110265151B (en) Learning method based on heterogeneous temporal data in EHR
CN111723223B (en) Multi-label image retrieval method based on subject inference

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant