CN115981970A - Operation and maintenance data analysis method, device, equipment and medium - Google Patents

Operation and maintenance data analysis method, device, equipment and medium Download PDF

Info

Publication number
CN115981970A
CN115981970A CN202310265533.1A CN202310265533A CN115981970A CN 115981970 A CN115981970 A CN 115981970A CN 202310265533 A CN202310265533 A CN 202310265533A CN 115981970 A CN115981970 A CN 115981970A
Authority
CN
China
Prior art keywords
feature
data
spaces
maintenance data
local
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310265533.1A
Other languages
Chinese (zh)
Other versions
CN115981970B (en
Inventor
孟江波
李占兵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CCB Finetech Co Ltd
Original Assignee
CCB Finetech Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CCB Finetech Co Ltd filed Critical CCB Finetech Co Ltd
Priority to CN202310265533.1A priority Critical patent/CN115981970B/en
Publication of CN115981970A publication Critical patent/CN115981970A/en
Application granted granted Critical
Publication of CN115981970B publication Critical patent/CN115981970B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention provides an operation and maintenance data analysis method, device, equipment and medium, which can be applied to the technical field of computers and the technical field of data processing. The method comprises the following steps: distributing multi-dimensional operation and maintenance data samples related to the operation state of the database to a plurality of first data spaces according to a first preset rule; clustering multi-dimensional operation and maintenance data in a first data space aiming at each first data space in a plurality of first data spaces to obtain a first local feature; clustering first local features corresponding to the first data spaces respectively to obtain a plurality of first feature spaces; clustering the plurality of first feature spaces to obtain first global features; determining an operational state of the database based on the first global feature and the plurality of first local features.

Description

Operation and maintenance data analysis method, device, equipment and medium
Technical Field
The invention relates to the technical field of computers and data processing, in particular to an operation and maintenance data analysis method, device, equipment and medium.
Background
In system operation and maintenance, different systems, applications, databases and middleware can expand along with time dimension to generate a large amount of complex operation and maintenance data information with multiple dimensions, and operation and maintenance personnel can apply multiple algorithms to perform modeling analysis according to the multi-dimensional data so as to realize state monitoring, fault alarm, service analysis, root cause analysis and the like.
In implementing the concept of the present invention, the inventors found that at least the following problems exist in the related art: the efficiency of processing the multidimensional operation and maintenance data is low and the operation and maintenance effect is poor.
Disclosure of Invention
In view of the above problems, the present invention provides an operation and maintenance data analysis method, apparatus, device and medium.
According to a first aspect of the present invention, there is provided an operation and maintenance data analysis method, including:
distributing multi-dimensional operation and maintenance data samples related to the operation state of the database to a plurality of first data spaces according to a first preset rule;
clustering the multidimensional operation and maintenance data in the first data space to obtain a first local feature for each first data space in the plurality of first data spaces;
clustering the first local features corresponding to the first data spaces to obtain a plurality of first feature spaces;
clustering the plurality of first feature spaces to obtain first global features;
and determining the operation state of the database according to the first global characteristic and a plurality of the first local characteristics.
According to an embodiment of the present invention, the method further includes:
distributing the multi-dimensional operation and maintenance data samples to a plurality of second data spaces according to expert experience;
clustering the multidimensional operation and maintenance data in the second data spaces to obtain a second global feature;
wherein the determining the operation state of the database according to the first global characteristic and the plurality of first local characteristics comprises:
and determining the operation state of the database according to the first global feature, the second global feature and the plurality of first local features.
According to an embodiment of the present invention, the determining the operation state of the database according to the first global feature, the second global feature and the plurality of first local features includes:
calculating a first similarity between the second global feature and the first global feature;
and inputting the first global feature and the plurality of first local features into a machine learning algorithm and outputting the operation state of the database when the first similarity is greater than a first threshold.
According to an embodiment of the present invention, the method further includes:
under the condition that the first similarity is smaller than or equal to the first threshold, distributing the multidimensional operation and maintenance data samples to a plurality of third data spaces according to a second preset rule, wherein the number of the third data spaces is larger than that of the first data spaces;
clustering the multidimensional operation and maintenance data in the third data space aiming at each third data space in the plurality of third data spaces to obtain a second local characteristic;
clustering the second local features corresponding to the plurality of third data spaces to obtain a plurality of second feature spaces;
clustering the plurality of second feature spaces to obtain a third global feature;
and determining the operation state of the database according to the third global characteristic and a plurality of second local characteristics.
According to an embodiment of the present invention, the clustering the second local features corresponding to the plurality of third data spaces to obtain a plurality of second feature spaces includes:
and clustering the second local features corresponding to the plurality of third data spaces with the same feature to obtain a plurality of second feature spaces.
According to an embodiment of the present invention, the clustering the multidimensional operation and maintenance data in the first data space for each of the plurality of first data spaces to obtain a first local feature includes:
calculating a second similarity between the multi-dimensional operation and maintenance data in the first data space;
and when the second similarity is greater than a second threshold, obtaining the first local feature by clustering the multidimensional data corresponding to the second similarity into one group.
According to an embodiment of the present invention, the clustering the first local features corresponding to the first data spaces to obtain the first feature spaces includes:
the first local features corresponding to the plurality of first data spaces having the same feature are clustered to obtain the plurality of first feature spaces.
According to an embodiment of the present invention, before the distributing the multidimensional operation and maintenance data samples related to the operation state of the database to the plurality of first data spaces according to the first preset rule, the method further includes:
acquiring original operation and maintenance data related to the operation state of the database;
and preprocessing the original operation and maintenance data to obtain a multi-dimensional operation and maintenance data sample.
According to an embodiment of the present invention, the original operation and maintenance data related to the database operation state includes: port access amount of the database, memory calling condition of the database, whether the port of the database is attacked or not, and memory capacity condition of the database.
A second aspect of the present invention provides an operation and maintenance data analysis apparatus, including:
the first space distribution module is used for distributing the multi-dimensional operation and maintenance data samples related to the operation state of the database to a plurality of first data spaces according to a first preset rule;
a first local feature obtaining module, configured to cluster the multidimensional operation and maintenance data in the first data space to obtain a first local feature for each of the plurality of first data spaces;
a first feature space obtaining module, configured to cluster the first local features corresponding to the multiple first data spaces, so as to obtain multiple first feature spaces;
a first global feature obtaining module, configured to cluster the plurality of first feature spaces to obtain a first global feature;
and the first determination module of the running state is used for determining the running state of the database according to the first global characteristic and the plurality of first local characteristics.
A third aspect of the present invention provides an electronic device comprising:
one or more processors;
a memory for storing one or more programs,
wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method as described above.
The fourth aspect of the invention also provides a computer readable storage medium having stored thereon executable instructions which, when executed by a processor, cause the processor to perform the method as described above.
The fifth aspect of the invention also provides a computer program product comprising computer executable instructions which, when executed, implement the method as described above.
According to the embodiment of the invention, the multidimensional operation and maintenance data samples related to the database operation state are distributed to a plurality of first data spaces according to a first preset rule, disordered multidimensional operation and maintenance data samples are distributed to the plurality of first data spaces according to the first preset rule, so that the multidimensional operation and maintenance data samples in the first data spaces have characteristics corresponding to the first preset rule, then, for each first data space in the plurality of first data spaces, clustering is performed on the multidimensional operation and maintenance data in the first data spaces to obtain first local characteristics, clustering is performed on the first local characteristics corresponding to each of the plurality of first data spaces to obtain a plurality of first characteristic spaces, clustering is performed on the plurality of first characteristic spaces to obtain first global characteristics, so that the first local characteristics related to the database operation state are obtained by performing clustering on the multidimensional operation and maintenance data related to the database operation state once, the first local characteristics are obtained by performing clustering on the first local characteristics multiple times, so that the first local characteristics are obtained by performing clustering on the multidimensional operation and maintenance data related to the database operation state at least partially process the first local characteristics, and the first local characteristics are accurately extracted, so that the local characteristics represent the local characteristics.
Drawings
The foregoing and other objects, features and advantages of the invention will be apparent from the following description of embodiments of the invention, which proceeds with reference to the accompanying drawings, in which:
fig. 1 is an application scenario diagram illustrating an operation and maintenance data analysis method according to an embodiment of the present invention;
FIG. 2 is a flow chart of an operation and maintenance data analysis method according to an embodiment of the invention;
FIG. 3 is another flow chart of an operation and maintenance data analysis method according to an embodiment of the invention;
fig. 4 is a block diagram showing a configuration of an operation and maintenance data analysis apparatus according to an embodiment of the present invention; and
fig. 5 shows a block diagram of an electronic device suitable for implementing the operation and maintenance data analysis method according to an embodiment of the present invention.
Detailed Description
Hereinafter, embodiments of the present invention will be described with reference to the accompanying drawings. It is to be understood that this description is made only by way of example and not as a limitation on the scope of the invention. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the invention. It may be evident, however, that one or more embodiments may be practiced without these specific details. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present invention.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. The terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.
All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It is noted that the terms used herein should be interpreted as having a meaning that is consistent with the context of this specification and should not be interpreted in an idealized or overly formal sense.
Where a convention analogous to "at least one of A, B, and C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B, and C" would include but not be limited to systems that have A alone, B alone, C alone, A and B together, A and C together, B and C together, and/or A, B, and C together, etc.).
In the technical scheme of the invention, the collection, storage, use, processing, transmission, provision, disclosure, application and other processing of the related data (such as personal information including but not limited to users) meet the requirements of relevant laws and regulations, necessary security measures are taken, and the customs of the public order is not violated.
In the process of processing multidimensional operation and maintenance data in the related technology, the technical problems of low efficiency of processing the multidimensional operation and maintenance data and poor operation and maintenance effect exist.
In order to at least partially solve the technical problems in the related art, the invention provides an operation and maintenance data analysis method, device, equipment and medium, which can be applied to the technical field of computers and the technical field of data processing. The operation and maintenance data analysis method comprises the following steps: distributing multi-dimensional operation and maintenance data samples related to the operation state of the database to a plurality of first data spaces according to a first preset rule; clustering multi-dimensional operation and maintenance data in a first data space aiming at each first data space in a plurality of first data spaces to obtain a first local feature; clustering first local features corresponding to the first data spaces respectively to obtain a plurality of first feature spaces; clustering the plurality of first feature spaces to obtain first global features; and determining the operation state of the database according to the first global characteristic and the plurality of first local characteristics.
Fig. 1 is an application scenario diagram illustrating an operation and maintenance data analysis method according to an embodiment of the present invention.
As shown in fig. 1, the application scenario 100 according to this embodiment may include a first terminal device 101, a second terminal device 102, a third terminal device 103, a network 104, and a server 105. The network 104 is used to provide a medium of communication links between the first terminal device 101, the second terminal device 102, the third terminal device 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
The user may interact with the server 105 via the network 104 using at least one of the first terminal device 101, the second terminal device 102, the third terminal device 103, to receive or send messages or the like. Various communication client applications, such as a shopping application, a web browser application, a search application, an instant messaging tool, a mailbox client, social platform software, etc. (for example only), may be installed on the first terminal device 101, the second terminal device 102, and the third terminal device 103.
The first terminal device 101, the second terminal device 102, and the third terminal device 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 105 may be a server providing various services, such as a background management server (for example only) providing support for websites browsed by the user using the first terminal device 101, the second terminal device 102, and the third terminal device 103. The background management server may analyze and perform other processing on the received data such as the user request, and feed back a processing result (e.g., a webpage, information, or data obtained or generated according to the user request) to the terminal device.
It should be noted that the operation and maintenance data analysis method provided by the embodiment of the present invention may be generally executed by the server 105. Accordingly, the operation and maintenance data analysis apparatus provided by the embodiment of the present invention may be generally disposed in the server 105. The operation and maintenance data analysis method provided by the embodiment of the present invention may also be executed by a server or a server cluster that is different from the server 105 and can communicate with the first terminal device 101, the second terminal device 102, the third terminal device 103, and/or the server 105. Correspondingly, the operation and maintenance data analysis apparatus provided in the embodiment of the present invention may also be disposed in a server or a server cluster that is different from the server 105 and is capable of communicating with the first terminal device 101, the second terminal device 102, the third terminal device 103, and/or the server 105.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for an implementation.
The operation and maintenance data analysis method of the disclosed embodiment will be described in detail below with reference to fig. 2 to 3 based on the scenario described in fig. 1.
Fig. 2 is a flowchart illustrating an operation and maintenance data analysis method according to an embodiment of the present invention.
As shown in FIG. 2, the operation and maintenance data analysis method 200 of the embodiment includes operations S210-S250.
In operation S210, multidimensional operation and maintenance data samples related to the operation state of the database are allocated to a plurality of first data spaces according to a first preset rule.
According to an embodiment of the invention, the multi-dimensional operation and maintenance data sample comprises a plurality of multi-dimensional operation and maintenance data.
According to an embodiment of the invention, the multidimensional operation and maintenance data represents the operation and maintenance data of multiple dimensions, which are normalized to the original operation and maintenance data.
According to the embodiment of the invention, because the original operation and maintenance data is the operation and maintenance data with the time series as the main characteristic, after the original operation and maintenance data is normalized to obtain the multidimensional operation and maintenance data sample related to the database operation state, the multidimensional operation and maintenance data sample related to the database operation state is also the operation and maintenance data with the time series as the main characteristic, and therefore, the multidimensional operation and maintenance data sample can be distributed to a plurality of first data spaces according to the generation time of the multidimensional operation and maintenance data in the multidimensional operation and maintenance data sample.
According to an embodiment of the present invention, the first preset rule may be a generation time of the multidimensional operation and maintenance data. For example, in the case of analyzing the multidimensional operation and maintenance data samples related to the database operation state in one day, the multidimensional operation and maintenance data samples may be allocated by hours, and a plurality of multidimensional data in the multidimensional operation and maintenance data samples may be allocated into the 24 first data spaces.
According to the embodiment of the invention, for example, in the case of analyzing the multidimensional operation and maintenance data samples related to the operation state of the database in one year, the multidimensional operation and maintenance data samples can be allocated according to months, and a plurality of multidimensional data in the multidimensional operation and maintenance data samples can be allocated into 12 first data spaces.
According to the embodiment of the invention, in the case that the first preset rule is the generation time of the multidimensional operation and maintenance data, the multidimensional operation and maintenance data sample can be subdivided according to the actual situation, and the subdivision rule is not limited to one hour, one month and the like.
According to an embodiment of the present invention, the database operation state may include a port access status, a memory call status, a data storage status, and the like of the database.
According to an embodiment of the invention, the first data space represents a data set composed of a plurality of multi-dimensional operation and maintenance data having characteristics corresponding to a first preset rule.
According to the embodiment of the invention, the multidimensional operation and maintenance data samples can be further subdivided according to a certain state in the database running state, and the multidimensional operation and maintenance data samples are distributed to the plurality of first data spaces. For example, when the database operating state includes a memory utilization rate of the database, the memory utilization rate of the database may be used as one-dimensional data in the multidimensional operation and maintenance data, and the memory utilization rate of the database may be subdivided to allocate the multidimensional operation and maintenance data samples to the plurality of first data spaces.
According to the embodiment of the present invention, in a case that the multidimensional operation and maintenance data samples are allocated to the plurality of first data spaces according to the memory utilization rate of the database, the memory utilization rate of the database ranges from 0 to 100%, for example, the multidimensional operation and maintenance data including the memory utilization rate of 0 to 20% may be divided into a first data space, the multidimensional operation and maintenance data including the memory utilization rate of 20% to 40% may be divided into a second first data space, the multidimensional operation and maintenance data including the memory utilization rate of 40% to 60% may be divided into a third first data space, the multidimensional operation and maintenance data including the memory utilization rate of 60% to 80% may be divided into a fourth first data space, and the multidimensional operation and maintenance data including the memory utilization rate of 80% to 100% may be divided into a fifth first data space, so as to obtain 5 first data spaces.
According to the embodiment of the invention, the multidimensional operation and maintenance data samples related to the database operation state are allocated to the plurality of first data spaces according to the first preset rule, and the disordered multidimensional operation and maintenance data samples can be allocated to the plurality of first data spaces according to the first preset rule, so that the multidimensional operation and maintenance data samples in the first data spaces have the characteristics corresponding to the first preset rule.
In operation S220, for each first data space in the plurality of first data spaces, the multidimensional operation and maintenance data in the first data space is clustered, and a first local feature is obtained.
According to the embodiment of the present invention, multidimensional local operation and maintenance data in the first data space may be clustered by using Mahalanobis Distance (Mahalanobis Distance), euclidean Distance, manhattan Distance, hamming Distance, kmeans, mahalanobis Distance combined with k-means clustering algorithm (Kmeans), euclidean Distance combined with Kmeans, manhattan Distance combined with Kmeans, hamming Distance combined with Kmeans, or the like, so as to obtain the first partial feature.
In operation S230, the first local features corresponding to the first data spaces are clustered to obtain a plurality of first feature spaces.
According to the embodiment of the present invention, under the condition that there exists a feature that is the same as or has a similarity greater than a preset similarity threshold between first local features corresponding to each of a plurality of first data spaces, the first local features corresponding to each of the plurality of first data spaces may be clustered to obtain a plurality of first feature spaces.
According to an embodiment of the invention, the first feature space characterizes a dataset composed of a plurality of multi-dimensional data having periodic features.
According to the embodiment of the present invention, the first local features corresponding to the plurality of first data spaces may be clustered by using mahalanobis distance, euclidean distance, manhattan distance, hamming distance, kmeans, mahalanobis distance combined with Kmeans, euclidean distance combined with Kmeans, manhattan distance combined with Kmeans, or hamming distance combined with Kmeans, to obtain the plurality of first feature spaces.
According to the embodiment of the present invention, for example, when the same multidimensional operation and maintenance data exists between the first local feature corresponding to the first data space and the first local feature corresponding to the second first data space as the same feature, the multidimensional operation and maintenance data in the first local feature corresponding to the first data space and the first local feature corresponding to the second first data space may be clustered to obtain the first feature space.
According to the embodiment of the present invention, for example, when there are the same multidimensional operation and maintenance data among the first local feature corresponding to the first data space, the first local feature corresponding to the second first data space, and the first local feature corresponding to the third first data space as the same feature, the multidimensional operation and maintenance data in the first local feature corresponding to the first data space, the first local feature corresponding to the second first data space, and the first local feature corresponding to the third first data space may be clustered to obtain the second first feature space.
In operation S240, a plurality of first feature spaces are clustered to obtain a first global feature.
According to the embodiment of the invention, a plurality of first feature spaces can be clustered by using mahalanobis distance, euclidean distance, manhattan distance, hamming distance, kmeans, mahalanobis distance combined with Kmeans, euclidean distance combined with Kmeans, manhattan distance combined with Kmeans or hamming distance combined with Kmeans, and the like, so as to obtain the first global feature.
According to the embodiment of the present invention, the clustering algorithm for obtaining the first local feature, the clustering algorithm for obtaining the plurality of first feature spaces, and the clustering algorithm for obtaining the first global feature may be the same or different, and the specific clustering algorithm for obtaining the first local feature, the clustering algorithm for obtaining the plurality of first feature spaces, and the clustering algorithm for obtaining the first global feature may be selected according to actual situations.
In operation S250, an operation state of the database is determined according to the first global feature and the plurality of first local features.
According to the embodiment of the invention, the operation state of the database determined according to the first global feature and the plurality of first local features may be a time period during which the port access amount of the database is too high, a time period during which the memory occupancy rate of the database is too high, whether the storage amount of the database reaches a preset maximum value, what the average daily or monthly increase amount of the storage amount of the database is, and a time period during which the database is attacked is large.
According to the embodiment of the invention, the operation and maintenance personnel reasonably maintain the database by checking the operation state of the database determined according to the first global characteristic and the plurality of first local characteristics.
According to the embodiment of the invention, the first global feature and the plurality of first local features can be input into a trained machine learning algorithm, and the operation state of the database can be determined according to the output result of the machine learning algorithm.
According to the embodiment of the invention, the first global feature may also be verified by using the global feature obtained according to the artificial experience, and when the first global feature meets the verification requirement, for example, when the similarity between the global feature obtained according to the artificial experience and the first global feature is greater than a first similarity threshold, the first global feature and the plurality of first local features are input into a trained machine learning algorithm, and the operating state of the database is determined according to the result output by the machine learning algorithm.
According to the embodiment of the invention, under the condition that the first global feature does not meet the verification requirement, for example, under the condition that the similarity between the global feature obtained according to manual experience and the first global feature is less than or equal to a first similarity threshold, a rule with granularity finer than that of a first preset rule is used for carrying out more detailed distribution on the multidimensional operation and maintenance data sample related to the database operation state, then the nth local feature, the nth feature space and the nth global feature are recalculated until the nth global feature meets the verification requirement, the optimal global feature is found, so that the nth global feature has more global representativeness, then the nth global feature and a plurality of nth local features corresponding to the nth global feature are input into a trained machine learning algorithm, the operation state of the database is determined according to the result output by the machine learning algorithm, the n represents the number of times of loop calculation, and the first similarity threshold can be selected according to the actual situation, so that the global feature obtained according to manual experience and the nth global feature meet the verification requirement after multiple cycles.
According to an embodiment of the invention, the nth local feature corresponds to the first local feature, the nth feature space corresponds to the first feature space and the nth global feature corresponds to the first global feature.
According to the embodiment of the invention, under the condition that the first global feature does not meet the verification requirement, the global feature obtained by the manual experience can be adjusted for multiple times, so that the global feature obtained by the manual experience after multiple times of adjustment and the first global feature meet the verification requirement, the optimal global feature is found, the first global feature is considered to be more global representative at the moment, then the first global feature and the plurality of first local features are input into a trained machine learning algorithm, and the operation state of the database is determined according to the result output by the machine learning algorithm.
According to the embodiment of the invention, under the condition that the first global feature does not meet the verification requirement, the global feature obtained by manual experience can be adjusted for multiple times, under the condition that the global feature obtained by manual experience is adjusted for multiple times and the first global feature still cannot meet the verification requirement, a rule with granularity finer than that of a first preset rule is used for carrying out more detailed distribution on multidimensional operation and maintenance data samples related to the operation state of the database, the nth local feature, the nth feature space and the nth global feature are recalculated until the nth global feature meets the verification requirement, the optimal global feature is found to enable the nth global feature to have global representativeness, then the nth global feature and the plurality of nth local features corresponding to the nth global feature are input into a trained machine learning algorithm, and the operation state of the database is determined according to the result output by the machine learning algorithm, wherein n represents the number of the cycle calculation.
According to the embodiment of the invention, under the condition that a first global feature does not meet a verification requirement, clustering can be carried out on a plurality of first feature spaces under the condition that features with the same or similarity larger than a preset similarity threshold exist among the plurality of first feature spaces to obtain a plurality of second feature spaces, then under the condition that features with the same or similarity larger than a preset similarity threshold exist among the plurality of second feature spaces, clustering is carried out on the plurality of second feature spaces to obtain a plurality of third feature spaces, and so on, a plurality of ith feature spaces are obtained, so that the plurality of ith feature spaces have local feature representativeness, an ith global feature is obtained by clustering the plurality of ith feature spaces, so that the ith global feature meets the verification requirement, an optimal global feature is found, so that the ith global feature has global representativeness, then the ith global feature and the ith local features corresponding to the ith global feature are input into a good machine learning algorithm, an operation state of a database is determined according to a result output by the machine learning algorithm, and the number of the ith global feature is selected according to the actual condition of the first feature.
According to the embodiment of the invention, the operation and maintenance data analysis method provided by the embodiment of the invention obtains the first local feature related to the database operation state by performing primary clustering on the multidimensional operation and maintenance data related to the database operation state, and obtains the first global feature by performing multiple clustering on the first local feature, so that the technical problems of low efficiency and poor operation and maintenance effect of processing the multidimensional operation and maintenance data in the related technology are at least partially overcome, the speed of extracting the first local feature and the first global feature is increased, and the internal relation between the first local feature and the first global feature is tight, so that the first global feature has global representativeness, and the operation state of the database can be quickly and accurately determined according to the first global feature and the plurality of first local features.
According to the embodiment of the invention, the operation and maintenance data analysis method provided by the embodiment of the invention can be used for obtaining the first local feature related to the database operation state by carrying out one-time clustering on the multidimensional operation and maintenance data related to the database operation state, and carrying out multiple clustering on the first local feature to obtain the first global feature, so that the periodic feature in the time sequence operation and maintenance data can be captured strongly, the global feature of the dimension with higher value can be obtained, and the global feature of the dimension with higher value can be provided for a downstream algorithm, so that the more accurate operation state of the database can be obtained.
According to an embodiment of the present invention, the operation and maintenance data analysis method 200 further includes the following operations:
distributing the multi-dimensional operation and maintenance data samples to a plurality of second data spaces according to expert experience;
clustering the multidimensional operation and maintenance data in the second data spaces to obtain a second global feature;
wherein determining the operational state of the database based on the first global characteristic and the plurality of first local characteristics comprises:
the operating state of the database is determined based on the first global feature, the second global feature, and the plurality of first local features.
According to the embodiment of the invention, the same or similar data characteristics of the multi-dimensional operation and maintenance data in the multi-dimensional operation and maintenance data sample can be determined according to expert experience, and then the multi-dimensional operation and maintenance data with characteristic representativeness in the multi-dimensional operation and maintenance data sample is distributed to the second data spaces according to the same or similar data characteristics.
According to the embodiment of the invention, the multidimensional operation and maintenance data in a plurality of second data spaces can be clustered by using the Mahalanobis distance, the Euclidean distance, the Manhattan distance, the Hamming distance, the Kmeans, the Mahalanobis distance combined with the Kmeans, the Manhattan distance combined with the Kmeans, the Hamming distance combined with the Kmeans, or the Hamming distance combined with the Kmeans, etc., so as to obtain the second global feature.
According to the embodiment of the invention, the multi-dimensional operation and maintenance data samples are distributed to the plurality of second data spaces according to expert experience, the multi-dimensional operation and maintenance data in the plurality of second data spaces are clustered to obtain the second global feature, and preparation is made for carrying out accuracy verification on the first global feature by utilizing the second global feature subsequently.
According to an embodiment of the present invention, determining the operational state of the database according to the first global feature, the second global feature and the plurality of first local features comprises:
calculating a first similarity of the second global feature and the first global feature;
and inputting the first global feature and the plurality of first local features into a machine learning algorithm and outputting the running state of the database under the condition that the first similarity is larger than a first threshold value.
According to the embodiment of the invention, the mahalanobis distance can be used for calculating the first similarity between the second global feature and the first global feature, and the Jacobsd similarity coefficient can be used for calculating the first similarity between the second global feature and the first global feature.
According to the embodiment of the present invention, for example, a first similarity between the second global feature and the first global feature may be calculated by using mahalanobis distance, an average value and a covariance of a plurality of multidimensional operation and maintenance data included in the second global feature may be calculated first according to mahalanobis distance, then a distance between each of the plurality of multidimensional operation and maintenance data included in the first global feature and the second global feature may be calculated according to the average value and the covariance to obtain a plurality of second global feature distances, then an average value is calculated for the plurality of second global feature distances, the average value is mapped to a range from 0 to 1, the smaller the average value is, the larger the mapped value is, the value mapped to a range from 0 to 1 is used as the first similarity between the second global feature and the first global feature, and in the case that the first similarity is greater than a first threshold value, the first global feature and the plurality of first local features are input into a machine learning algorithm, and an operation state of a database is output.
According to the embodiment of the present invention, the first threshold is set according to the scale of the first similarity, and when the first similarity varies from 0 to 1, the first threshold may take a value of 0.7, 0.8, or 0.9, for example.
According to the embodiment of the invention, the first similarity of the second global feature and the first global feature is calculated, under the condition that the first similarity is larger than the first threshold, the first global feature and the plurality of first local features are input into the machine learning algorithm, the running state of the database is output, the accuracy verification of the first global feature by using the second global feature is realized, the optimal first global feature is found, the first global feature has global representativeness, and therefore, the running state of the database obtained according to the first global feature and the plurality of first local features is more accurate.
According to an embodiment of the present invention, the operation and maintenance data analysis method 200 further includes:
under the condition that the first similarity is smaller than or equal to a first threshold value, distributing the multidimensional operation and maintenance data samples to a plurality of third data spaces according to a second preset rule, wherein the number of the third data spaces is larger than that of the first data spaces;
clustering the multidimensional operation and maintenance data in the third data space aiming at each third data space in the plurality of third data spaces to obtain a second local characteristic;
clustering second local features corresponding to the plurality of third data spaces respectively to obtain a plurality of second feature spaces;
clustering the plurality of second feature spaces to obtain a third global feature;
and determining the operation state of the database according to the third global characteristic and the plurality of second local characteristics.
According to an embodiment of the invention, the second preset rule characterizes a rule of finer granularity than the first preset rule.
According to an embodiment of the present invention, for example, in the case of analyzing the multidimensional operation and maintenance data samples related to the operation state of the database in a day, the first preset rule may allocate the multidimensional operation and maintenance data samples according to an hour, and allocate a plurality of multidimensional data in the multidimensional operation and maintenance data samples into 24 first data spaces, and the second preset rule may allocate the multidimensional operation and maintenance data samples according to a half hour, and allocate a plurality of multidimensional data in the multidimensional operation and maintenance data samples into 48 third data spaces.
According to an embodiment of the present invention, for example, in the case of analyzing the multidimensional operation and maintenance data samples related to the operation state of the database in one year, the first preset rule may allocate the multidimensional operation and maintenance data samples by month, allocate a plurality of multidimensional data in the multidimensional operation and maintenance data samples into 12 first data spaces, and the second preset rule may allocate the multidimensional operation and maintenance data samples by half month, and allocate the plurality of multidimensional data in the multidimensional operation and maintenance data samples into 24 third data spaces.
According to an embodiment of the present invention, for example, in a case that the multidimensional operation and maintenance data samples are allocated to the plurality of first data spaces according to the memory utilization of the database, the memory utilization of the database ranges from 0 to 100%, the first preset rule may be that the multidimensional operation and maintenance data samples are divided into the fifth first data space according to the memory utilization interval of 0 to 20%,20% to 40%,40% to 60%,60% to 80%,80% to 100%, and the second preset rule may be that the multidimensional operation and maintenance data samples are divided into the tenth third data space according to the memory utilization interval of 0 to 10%,10% to 20%,20% to 30%,30% to 40%,40% to 50%,50% to 60%,60% to 70%,70% to 80%,80% to 90%,90% to 100%.
According to the implementation of the invention, under the condition that the first similarity is less than or equal to the first threshold, the multidimensional operation and maintenance data samples are allocated to a plurality of third data spaces according to a second preset rule, wherein the number of the third data spaces is greater than that of the first data spaces, and the multidimensional operation and maintenance data samples are allocated according to the second preset rule in a finer granularity, so that the multidimensional operation and maintenance data samples in the third data spaces have the characteristics corresponding to the second preset rule.
According to the embodiment of the invention, the multidimensional operation and maintenance data in the third data space can be clustered by using the Mahalanobis distance, the Euclidean distance, the Manhattan distance, the Hamming distance, the Kmeans, the Mahalanobis distance combined with the Kmeans, the Manhattan distance combined with the Kmeans, or the Hamming distance combined with the Kmeans, and the like, so as to obtain the second local feature.
According to the embodiment of the present invention, under the condition that there exists a feature that is the same as or has a similarity greater than a preset similarity threshold between second local features corresponding to respective multiple third data spaces, clustering the second local features corresponding to respective multiple third data spaces to obtain multiple second feature spaces.
According to the embodiment of the invention, the second local features corresponding to the plurality of third data spaces can be clustered by using mahalanobis distance, euclidean distance, manhattan distance, hamming distance, kmeans, mahalanobis distance combined with Kmeans, euclidean distance combined with Kmeans, manhattan distance combined with Kmeans, hamming distance combined with Kmeans, or the like, so as to obtain a plurality of second feature spaces.
According to the embodiment of the present invention, for example, when the same multidimensional operation and maintenance data exists between the second local feature corresponding to the first third data space and the second local feature corresponding to the second third data space as the same feature, the multidimensional operation and maintenance data in the second local feature corresponding to the first third data space and the second local feature corresponding to the second third data space may be clustered to obtain the first second feature space.
According to the embodiment of the invention, a plurality of second feature spaces can be clustered by using Mahalanobis distance, euclidean distance, manhattan distance, hamming distance, kmeans, mahalanobis distance combined with Kmeans, euclidean distance combined with Kmeans, manhattan distance combined with Kmeans, hamming distance combined with Kmeans, or Hamming distance combined with Kmeans, and the like to obtain a third global feature.
According to the embodiment of the invention, the clustering algorithm for obtaining the second local feature, the clustering algorithm for obtaining the plurality of second feature spaces and the clustering algorithm for obtaining the third global feature may be the same or different, and the specific clustering algorithm for obtaining the second local feature, the clustering algorithm for obtaining the plurality of second feature spaces and the clustering algorithm for obtaining the third global feature may be selected according to actual conditions.
According to the embodiment of the present invention, the operation state of the database determined according to the third global feature and the plurality of second local features may be a time period during which the port access amount of the database is too high, a time period during which the memory occupancy rate of the database is too high, whether the storage amount of the database reaches a preset maximum value, what the average daily or monthly increase amount of the storage amount of the database is, and a time period during which the database is attacked is large.
According to the embodiment of the invention, the operation and maintenance personnel reasonably maintain the database by checking the operation state of the database determined according to the third global characteristic and the plurality of second local characteristics.
According to the embodiment of the invention, the third global feature and the plurality of second local features can be input into a trained machine learning algorithm, and the operation state of the database can be determined according to the output result of the machine learning algorithm.
According to the embodiment of the invention, the global feature obtained according to the manual experience can be used for verifying the third global feature, and under the condition that the third global feature meets the verification requirement, for example, under the condition that the similarity between the global feature obtained according to the manual experience and the third global feature is greater than the first similarity threshold, the third global feature and the plurality of first local features are input into the trained machine learning algorithm, and the operation state of the database is determined according to the result output by the machine learning algorithm.
According to the embodiment of the invention, under the condition that the first similarity is smaller than or equal to the first threshold, the multidimensional operation and maintenance data samples are distributed to a plurality of third data spaces according to a second preset rule, the number of the third data spaces is larger than that of the first data spaces, and the multidimensional operation and maintenance data samples are distributed in a finer granularity according to the second preset rule, so that the multidimensional operation and maintenance data samples in the third data spaces have characteristics corresponding to the second preset rule, then, for each third data space in the plurality of third data spaces, the multidimensional operation and maintenance data in the third data spaces are clustered to obtain second local characteristics, second local characteristics corresponding to the plurality of third data spaces are clustered to obtain a plurality of second characteristic spaces, so that the periodic characteristics of the multidimensional operation and maintenance data included in the second characteristic spaces are more obvious, then, the plurality of second characteristic spaces are clustered to obtain third global characteristics, the more representative global characteristics are obtained, and the accuracy of the operation and the operation of the multidimensional operation and maintenance data in the global characteristic library is improved according to the second preset rule.
According to the embodiment of the present invention, clustering the second local features corresponding to the respective third data spaces to obtain the plurality of second feature spaces includes:
and clustering second local features corresponding to the plurality of third data spaces with the same features to obtain a plurality of second feature spaces.
According to the embodiment of the invention, the same feature characterization includes the same multidimensional operation and maintenance data as the same feature in the second local features corresponding to the plurality of third data spaces respectively.
According to the embodiment of the present invention, for example, the mahalanobis distance pair may be used to cluster the second local features corresponding to the plurality of second brake data spaces having the same feature, so as to obtain a plurality of second feature spaces.
According to the embodiment of the present invention, for example, a mahalanobis distance algorithm may be used to calculate a mean value and a covariance value for a plurality of multidimensional operation and maintenance data included in a second local feature corresponding to each of a plurality of third data spaces having the same feature, and then according to a mahalanobis distance calculation formula, a distance may be calculated for a plurality of multidimensional operation and maintenance data included in a second local feature corresponding to each of a plurality of third data spaces having the same feature using the mean value and the covariance value, and the multidimensional operation and maintenance data whose distance is smaller than a threshold of the second feature space may be grouped into one group to obtain the second feature space.
According to the embodiment of the present invention, for example, the number of the plurality of third data spaces may be 3, the first third data space corresponds to the second local feature 1, the second third data space corresponds to the second local feature 2, and the third data space corresponds to the second local feature 3.
According to the embodiment of the present invention, for example, when the same multidimensional operation and maintenance data exists between the second local feature 1 and the second local feature 2 as the same feature, and the same multidimensional operation and maintenance data exists between the second local feature 2 and the second local feature 3 as the same feature, the second local feature 1 and the second local feature 2 may be clustered by using the mahalanobis distance to obtain the second feature space 1, and the second local feature 2 and the second local feature 3 may be clustered to obtain the second feature space 2.
According to the embodiment of the invention, the second local features corresponding to the plurality of third data spaces with the same features are clustered to obtain the plurality of second feature spaces, so that the periodic features of the multidimensional operation and maintenance data in the second feature spaces are more obvious.
According to an embodiment of the present invention, for operation S220, clustering, for each of the plurality of first data spaces, the multidimensional operation and maintenance data in the first data space to obtain the first local feature, may include the following operations:
calculating a second similarity between the multi-dimensional operation and maintenance data in the first data space;
and under the condition that the second similarity is larger than a second threshold value, clustering the multidimensional data corresponding to the second similarity into a class to obtain the first local feature.
According to the embodiment of the present invention, for example, the second similarity between the multidimensional operation and maintenance data in the first data space may be calculated by using mahalanobis distance, the average value and the covariance in the multidimensional operation and maintenance data included in the first data space may be calculated according to mahalanobis distance, then the distance between each multidimensional operation and maintenance data in the first data space and the first data space may be calculated according to the average value and the covariance, the distance is mapped to between 0 and 1, the smaller the distance, the larger the mapped value, the distance is mapped to between 0 and 1 as the second similarity, and in the case that the second similarity is greater than the second threshold, the multidimensional data corresponding to the second similarity are grouped into one class to obtain the first local feature.
According to the embodiment of the present invention, the second threshold is set according to the scale of the second similarity, and the second threshold may be, for example, 0.7, 0.8, or 0.9 when the second similarity varies from 0 to 1.
According to the embodiment of the invention, the second similarity between the multidimensional operation and maintenance data in the first data space is calculated, and under the condition that the second similarity is greater than the second threshold, the multidimensional data corresponding to the second similarity are gathered into a class to obtain the first local feature, so that the multidimensional operation and maintenance data with a relatively close association relation in the first data space are gathered into the first local feature, and the data in the first local feature has local representativeness.
According to the embodiment of the present invention, for operation S230, clustering the first local features corresponding to the respective first data spaces to obtain a plurality of first feature spaces, may include the following operations:
and clustering the first local features corresponding to the plurality of first data spaces with the same features to obtain a plurality of first feature spaces.
According to the embodiment of the invention, the same feature characterization includes the same multi-dimensional operation and maintenance data as the same feature in the first local features corresponding to the plurality of first data spaces respectively.
According to the embodiment of the present invention, for example, the mahalanobis distance pair may be used to cluster the first local features corresponding to the plurality of first data spaces having the same feature, so as to obtain a plurality of first feature spaces.
According to the embodiment of the present invention, for example, a mahalanobis distance algorithm may be used to calculate a mean value and a covariance value for a plurality of multidimensional operation and maintenance data included in a first local feature corresponding to each of a plurality of first data spaces having the same feature, and then according to a mahalanobis distance calculation formula, a distance may be calculated for a plurality of multidimensional operation and maintenance data included in a first local feature corresponding to each of a plurality of first data spaces having the same feature using the mean value and the covariance value, and the multidimensional operation and maintenance data whose distance is smaller than a first feature space threshold value may be grouped into a group to obtain a first feature space.
According to the embodiment of the present invention, for example, the number of the plurality of first data spaces is 5, the first local feature 1 corresponds to the first data space, the first local feature 2 corresponds to the second first data space, the first local feature 3 corresponds to the third first data space, the first local feature 4 corresponds to the fourth first data space, and the first local feature 5 corresponds to the fifth first data space.
According to the embodiment of the present invention, for example, when the same multidimensional operation and maintenance data exists between the first local feature 1 and the first local feature 2 as the same feature, the same multidimensional operation and maintenance data exists between the first local feature 2, the first local feature 3, and the first local feature 4 as the same feature, and the same multidimensional operation and maintenance data exists between the first local feature 4 and the first local feature 5 as the same feature, the first local feature 1 and the first local feature 2 may be clustered by using the mahalanobis distance to obtain the first feature space 1, the first local feature 2, the first local feature 3, and the first local feature 4 may be clustered to obtain the first feature space 2, and the first local feature 4 and the first local feature 5 may be clustered to obtain the first feature space 3.
According to the embodiment of the invention, the first local features corresponding to the multiple first data spaces with the same feature are clustered to obtain multiple first feature spaces, so that the periodic features of the multidimensional operation and maintenance data in the first feature spaces are more obvious.
According to an embodiment of the present invention, before the multidimensional data samples related to the database operation state are allocated to the plurality of first data spaces according to the first preset rule, the operation and maintenance data analysis method 200 further includes the following operations:
acquiring original operation and maintenance data related to the operation state of the database;
and preprocessing the original operation and maintenance data to obtain a multi-dimensional operation and maintenance data sample.
According to the embodiment of the invention, the original operation and maintenance data can be log-type operation and maintenance data output by the database, and can also be operation and maintenance data obtained by detecting the database in real time by detection software.
According to the embodiment of the invention, the original operation and maintenance data are large operation and maintenance data with time series as main characteristics.
According to the embodiment of the invention, the preprocessing of the original operation and maintenance data comprises the steps of carrying out dimension division and data normalization processing on the original operation and maintenance data.
According to the embodiment of the invention, the normalization processing of the original operation and maintenance data can be realized by carrying out uniform operation on the scales of data with different dimensions in the original operation and maintenance data.
According to the embodiment of the invention, for example, when there are 4 types of operation and maintenance data in the original operation and maintenance data, the 4 types of operation and maintenance data are divided into a, B, C, and D, at this time, the data corresponding to the type a may be put into the first dimension of the multidimensional operation and maintenance data, the data corresponding to the type B may be put into the second dimension of the multidimensional operation and maintenance data, the data corresponding to the type C may be put into the third dimension of the multidimensional operation and maintenance data, the data corresponding to the type D may be put into the fourth dimension of the multidimensional operation and maintenance data, and then the multidimensional operation and maintenance data may be normalized to obtain the multidimensional operation and maintenance data sample.
According to the embodiment of the invention, the original operation and maintenance data are preprocessed to obtain the multidimensional operation and maintenance data sample, and the normalized multidimensional operation and maintenance data are obtained, so that the multidimensional operation and maintenance data in the multidimensional operation and maintenance data sample can be uniformly processed by using a clustering algorithm in the follow-up process.
According to an embodiment of the present invention, the original operation and maintenance data related to the database operation state comprises: port access amount of the database, memory calling condition of the database, whether the port of the database is attacked or not and memory capacity of the database.
Fig. 3 shows another flowchart of an operation and maintenance data analysis method according to an embodiment of the present invention.
As shown in fig. 3, the operation and maintenance data analysis method includes obtaining original operation and maintenance data 311 related to an operation state of a database, preprocessing the original operation and maintenance data 311 to obtain multidimensional data samples 321, then distributing the multidimensional data samples 321 to a plurality of first data spaces according to a first preset rule to obtain a first data space 1 (331), a first data space 2 (332), and a first data space n (333), then clustering the first data space 1 (331) by using a clustering algorithm (e.g., mahalanobis distance) to obtain a first local feature 1 (341), clustering the first data space 2 (332) to obtain a first local feature 2 (342), and clustering the first data space n (333) to obtain a first local feature n (343).
As can be seen from fig. 3, when there is a feature having the same similarity or a similarity greater than a preset similarity threshold between a plurality of first local features, the plurality of first local features are clustered to obtain a plurality of first feature spaces, where the plurality of first feature spaces include first feature space 1 (351), and first feature space 2 (352) through first feature space n (353).
As can be seen from fig. 3, the same multidimensional operation and maintenance data exists between the first local feature 1 (341) and the first local feature 2 (342) as the same feature, the same multidimensional operation and maintenance data exists between the first local feature 1 (341) and the first local feature n (343) as the same feature, the same multidimensional operation and maintenance data exists between the first local feature 2 (342) and the first local feature n (343) as the same feature, the first local feature 1 (341) and the first local feature 2 (342) are clustered by a clustering algorithm (for example, mahalanobis distance) to obtain a first feature space 1 (351), the first local feature 1 (341) and the first local feature n (343) are clustered to obtain a first feature space k (353), and the first local feature 2 (342) and the first local feature n (343) are clustered to obtain a first feature space 2 (352).
As can be seen from fig. 3, a clustering algorithm (e.g., mahalanobis distance plus Kmeans) is used to cluster the plurality of first feature spaces, so as to obtain the first global features 361.
As shown in fig. 3, the multidimensional operation and maintenance data samples 321 are allocated to a plurality of second data spaces according to expert experience, and then the multidimensional operation and maintenance data in the plurality of second data spaces are clustered to obtain the second global features 371.
As can be seen from fig. 3, when the similarity between the first global feature 361 and the second global feature 371 is greater than the preset threshold value, the similarity of the first global feature 361 and the second global feature 371 is determined, the first global feature 361 is used as the optimal global feature 382, the first local feature corresponding to the first global feature 361 is used as the optimal local feature 381, the optimal global feature 382 and the optimal local feature 381 are input into the machine learning algorithm, and the operating state 391 of the database is output.
According to the embodiment of the present invention, as can be seen from fig. 3, the operation and maintenance data analysis method provided in the embodiment of the present invention obtains the first local feature related to the database operation state by performing one-time clustering on the multidimensional operation and maintenance data related to the database operation state, and obtains the first global feature by performing multiple clustering on the first local feature, thereby at least partially overcoming the technical problems of low efficiency and poor operation and maintenance effect in processing the multidimensional operation and maintenance data in the related art, increasing the speed of extracting the first local feature and the first global feature, and achieving a close internal relationship between the first local feature and the first global feature, so that the first global feature has a global representativeness, and thus achieving a quick and accurate determination of the operation state of the database according to the first global feature and the plurality of first local features.
Fig. 4 is a block diagram illustrating a configuration of an operation and maintenance data analysis apparatus according to an embodiment of the present invention.
As shown in fig. 4, the operation and maintenance data analysis apparatus 400 of this embodiment includes a first space allocation module 410, a first local feature obtaining module 420, a first feature space obtaining module 430, a first global feature obtaining module 440, and an operation state determining module 450.
The first space allocation module 410 is configured to allocate the multidimensional operation and maintenance data samples related to the database operation state to a plurality of first data spaces according to a first preset rule.
The first local feature obtaining module 420 is configured to cluster, for each first data space in the multiple first data spaces, the multidimensional operation and maintenance data in the first data space to obtain a first local feature.
The first feature space obtaining module 430 is configured to cluster the first local features corresponding to the multiple first data spaces, so as to obtain multiple first feature spaces.
A first global feature obtaining module 440, configured to cluster the plurality of first feature spaces to obtain a first global feature;
a first determination module 450 of the operating state is configured to determine the operating state of the database according to the first global characteristic and the plurality of first local characteristics.
According to an embodiment of the present invention, the operation and maintenance data analysis apparatus 400 further includes:
and the second space distribution module is used for distributing the multi-dimensional operation and maintenance data samples to a plurality of second data spaces according to expert experience.
And the second global feature obtaining module is used for clustering the multidimensional operation and maintenance data in the plurality of second data spaces to obtain a second global feature.
The operation status determining module 450 includes an operation status obtaining sub-module.
And the running state obtaining submodule is used for determining the running state of the database according to the first global feature, the second global feature and the plurality of first local features.
According to an embodiment of the present invention, the operation status obtaining sub-module includes:
and the first similarity calculation unit is used for calculating the first similarity of the second global feature and the first global feature.
And the running state output unit is used for inputting the first global feature and the plurality of first local features into the machine learning algorithm and outputting the running state of the database under the condition that the first similarity is larger than a first threshold value.
According to an embodiment of the present invention, the operation and maintenance data analysis apparatus 400 further includes:
and the third space allocation module is used for allocating the multidimensional operation and maintenance data samples to a plurality of third data spaces according to a second preset rule under the condition that the first similarity is smaller than or equal to the first threshold, wherein the number of the third data spaces is larger than that of the first data spaces.
And the second local feature obtaining module is used for clustering the multidimensional operation and maintenance data in the third data space to obtain a second local feature for each third data space in the plurality of third data spaces.
A second feature space obtaining module, configured to cluster second local features corresponding to the multiple third data spaces, to obtain multiple second feature spaces;
the third global feature obtaining module is used for clustering the plurality of second feature spaces to obtain a third global feature;
and the second operating state determining module is used for determining the operating state of the database according to the third global characteristic and the plurality of second local characteristics.
According to an embodiment of the present invention, the second feature space obtaining module includes a second feature space obtaining submodule.
And the second feature space obtaining submodule is used for clustering second local features corresponding to a plurality of third data spaces with the same features to obtain a plurality of second feature spaces.
According to an embodiment of the present invention, the first local feature derivation module 420 includes a second similarity calculation sub-module and a first local feature derivation sub-module.
The second similarity calculation submodule is used for calculating second similarities among the multidimensional data in the first data space;
and the first local feature obtaining submodule is used for gathering the multidimensional data corresponding to the second similarity into a class to obtain the first local feature under the condition that the second similarity is larger than the second threshold value.
According to an embodiment of the present invention, the first feature space derivation module includes a first feature space derivation submodule.
The first feature space obtaining sub-module is configured to cluster first local features corresponding to multiple first data spaces with the same feature to obtain multiple first feature spaces.
According to the embodiment of the present invention, before the multidimensional data samples related to the operation status of the database are distributed to the plurality of first data spaces according to the first preset rule, the operation and maintenance data analysis apparatus 400 further includes an original operation and maintenance data acquisition module and a multidimensional operation and maintenance data sample acquisition module.
The original operation and maintenance data acquisition module is used for acquiring original operation and maintenance data related to the operation state of the database;
and the multi-dimensional operation and maintenance data sample acquisition module is used for preprocessing the original operation and maintenance data to obtain a multi-dimensional operation and maintenance data sample.
According to an embodiment of the present invention, the original operation and maintenance data related to the database operation state comprises: port access amount of the database, memory calling condition of the database, whether the port of the database is attacked or not and memory capacity of the database.
Any number of modules, sub-modules, units, sub-units, or at least part of the functionality of any number thereof according to embodiments of the invention may be implemented in one module. Any one or more of the modules, sub-modules, units, and sub-units according to the embodiments of the present invention may be implemented by being divided into a plurality of modules. Any one or more of the modules, sub-modules, units, sub-units according to embodiments of the present invention may be implemented at least partly as a hardware circuit, e.g. a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or by any other reasonable way of integrating or packaging a circuit in hardware or firmware, or in any one of three implementations, or in a suitable combination of any of them. Alternatively, one or more of the modules, sub-modules, units, sub-units according to embodiments of the present invention may be at least partially implemented as computer program modules, which, when executed, may perform the corresponding functions.
For example, any plurality of the first space allocation module 410, the first local feature obtaining module 420, the first feature space obtaining module 430, the first global feature obtaining module 440, and the operation state determining module 450 may be combined into one module/unit/sub-unit to be implemented, or any one of the modules/units/sub-units may be split into a plurality of modules/units/sub-units. Alternatively, at least part of the functionality of one or more of these modules/units/sub-units may be combined with at least part of the functionality of other modules/units/sub-units and implemented in one module/unit/sub-unit. According to an embodiment of the present invention, at least one of the first space allocation module 410, the first local feature obtaining module 420, the first feature space obtaining module 430, the first global feature obtaining module 440, and the operation state determining module 450 may be at least partially implemented as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented by hardware or firmware in any other reasonable manner of integrating or packaging a circuit, or implemented by any one of three implementations of software, hardware, and firmware, or any suitable combination of any of them. Alternatively, at least one of the first space allocation module 410, the first local feature derivation module 420, the first feature space derivation module 430, the first global feature derivation module 440, and the operating state determination module 450 may be implemented at least in part as a computer program module that, when executed, may perform corresponding functions.
It should be noted that the operation and maintenance data analysis device part in the embodiment of the present invention corresponds to the operation and maintenance data analysis method part in the embodiment of the present invention, and the description of the operation and maintenance data analysis device part specifically refers to the operation and maintenance data analysis method part, and is not described herein again.
Fig. 5 shows a block diagram of an electronic device suitable for implementing the operation and maintenance data analysis method according to an embodiment of the present invention.
As shown in fig. 5, an electronic device 500 according to an embodiment of the present invention includes a processor 501 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 502 or a program loaded from a storage section 508 into a Random Access Memory (RAM) 503. The processor 501 may comprise, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or associated chipset, and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), among others. The processor 501 may also include onboard memory for caching purposes. Processor 501 may include a single processing unit or multiple processing units for performing the different actions of the method flows according to embodiments of the present invention.
In the RAM503, various programs and data necessary for the operation of the electronic apparatus 500 are stored. The processor 501, the ROM 502, and the RAM503 are connected to each other by a bus 504. The processor 501 performs various operations of the method flow according to the embodiments of the present invention by executing programs in the ROM 502 and/or the RAM 503. Note that the programs may also be stored in one or more memories other than the ROM 502 and the RAM 503. The processor 501 may also perform various operations of method flows according to embodiments of the present invention by executing programs stored in the one or more memories.
According to an embodiment of the present invention, electronic device 500 may also include an input/output (I/O) interface 505, input/output (I/O) interface 505 also being connected to bus 504. The electronic device 500 may also include one or more of the following components connected to the I/O interface 505: an input portion 506 including a keyboard, a mouse, and the like; an output portion 507 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 508 including a hard disk and the like; and a communication section 509 including a network interface card such as a LAN card, a modem, or the like. The communication section 509 performs communication processing via a network such as the internet. The driver 510 is also connected to the I/O interface 505 as necessary. A removable medium 511 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 510 as necessary, so that a computer program read out therefrom is mounted into the storage section 508 as necessary.
The present invention also provides a computer-readable storage medium, which may be contained in the apparatus/device/system described in the above embodiments; or may exist separately and not be assembled into the device/apparatus/system. The computer-readable storage medium carries one or more programs which, when executed, implement the method according to an embodiment of the present invention.
According to embodiments of the present invention, the computer readable storage medium may be a non-volatile computer readable storage medium, which may include, for example but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. For example, according to embodiments of the invention, a computer-readable storage medium may include ROM 502 and/or RAM503 and/or one or more memories other than ROM 502 and RAM503 as described above.
Embodiments of the invention also include a computer program product comprising a computer program comprising program code for performing the method illustrated in the flow chart. When the computer program product runs in a computer system, the program code is used for causing the computer system to implement the operation and maintenance data analysis method provided by the embodiment of the invention.
Which when executed by the processor 501 performs the above-described functions defined in the system/apparatus of an embodiment of the invention. The above described systems, devices, modules, units, etc. may be implemented by computer program modules according to embodiments of the invention.
In one embodiment, the computer program may be hosted on a tangible storage medium such as an optical storage device, a magnetic storage device, or the like. In another embodiment, the computer program may also be transmitted, distributed in the form of a signal on a network medium, downloaded and installed through the communication section 509, and/or installed from the removable medium 511. The computer program containing program code may be transmitted using any suitable network medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.
In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 509, and/or installed from the removable medium 511. The computer program, when executed by the processor 501, performs the above-described functions defined in the system of the embodiment of the present invention. The above described systems, devices, apparatuses, modules, units, etc. may be implemented by computer program modules according to embodiments of the present invention.
According to embodiments of the present invention, program code for executing a computer program provided by embodiments of the present invention may be written in any combination of one or more programming languages, and in particular, the computer program may be implemented using a high level procedural and/or object oriented programming language, and/or an assembly/machine language. The programming language includes, but is not limited to, programming languages such as Java, C + +, python, the "C" language, or the like. The program code may execute entirely on the user computing device, partly on the user device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
It will be appreciated by a person skilled in the art that various combinations and/or combinations of features described in the various embodiments and/or in the claims of the invention are possible, even if such combinations or combinations are not explicitly described in the invention. In particular, various combinations and/or combinations of the features recited in the various embodiments and/or claims of the present invention may be made without departing from the spirit or teaching of the invention. All such combinations and/or associations fall within the scope of the present invention.
The embodiments of the present invention have been described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present invention. Although the embodiments are described separately above, this does not mean that the measures in the embodiments cannot be used in advantageous combination. The scope of the invention is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be devised by those skilled in the art without departing from the scope of the invention, and these alternatives and modifications are intended to fall within the scope of the invention.

Claims (12)

1. An operation and maintenance data analysis method, characterized in that the method comprises:
distributing multi-dimensional operation and maintenance data samples related to the operation state of the database to a plurality of first data spaces according to a first preset rule;
clustering the multidimensional operation and maintenance data in the first data space aiming at each first data space in the plurality of first data spaces to obtain a first local feature;
clustering the first local features corresponding to the first data spaces to obtain a plurality of first feature spaces;
clustering the plurality of first feature spaces to obtain first global features;
determining an operational state of the database based on the first global characteristic and the plurality of first local characteristics.
2. The method of claim 1, further comprising:
distributing the multi-dimensional operation and maintenance data samples to a plurality of second data spaces according to expert experience;
clustering the multidimensional operation and maintenance data in the plurality of second data spaces to obtain a second global feature;
wherein the determining the operational state of the database from the first global feature and the plurality of first local features comprises:
determining an operational state of the database based on the first global feature, the second global feature, and the plurality of first local features.
3. The method of claim 2, wherein determining the operational state of the database based on the first global feature, the second global feature, and the plurality of first local features comprises:
calculating a first similarity of the second global feature to the first global feature;
and inputting the first global feature and the plurality of first local features into a machine learning algorithm and outputting the running state of the database when the first similarity is larger than a first threshold value.
4. The method of claim 3, further comprising:
under the condition that the first similarity is smaller than or equal to the first threshold, distributing the multidimensional operation and maintenance data samples to a plurality of third data spaces according to a second preset rule, wherein the number of the third data spaces is larger than that of the first data spaces;
clustering the multidimensional operation and maintenance data in the third data space to obtain a second local feature for each third data space in the plurality of third data spaces;
clustering the second local features corresponding to the plurality of third data spaces to obtain a plurality of second feature spaces;
clustering the plurality of second feature spaces to obtain a third global feature;
and determining the running state of the database according to the third global characteristic and the plurality of second local characteristics.
5. The method according to claim 4, wherein the clustering the second local features corresponding to the third data spaces to obtain second feature spaces comprises:
and clustering the second local features corresponding to the plurality of third data spaces with the same features to obtain a plurality of second feature spaces.
6. The method of claim 1, wherein the clustering the multidimensional operation and maintenance data in the first data space for each of the plurality of first data spaces to obtain a first local feature comprises:
calculating a second similarity between the multi-dimensional operation and maintenance data in the first data space;
and under the condition that the second similarity is larger than a second threshold value, the multidimensional operation and maintenance data corresponding to the second similarity are gathered into one class to obtain the first local feature.
7. The method of claim 1, wherein clustering the first local features corresponding to the first data spaces to obtain a plurality of first feature spaces comprises:
clustering the first local features corresponding to the multiple first data spaces with the same features to obtain multiple first feature spaces.
8. The method of claim 1, wherein before the allocating the multidimensional operation and maintenance data samples related to the database operation state to the plurality of first data spaces according to the first preset rule, the method further comprises:
acquiring original operation and maintenance data related to the operation state of the database;
and preprocessing the original operation and maintenance data to obtain a multi-dimensional operation and maintenance data sample.
9. The method of claim 8, wherein the raw operation and maintenance data associated with the database operating state comprises: port access amount of the database, memory calling condition of the database, whether the port of the database is attacked or not, and memory capacity condition of the database.
10. An operation and maintenance data analysis device, characterized in that the device comprises:
the first space distribution module is used for distributing the multidimensional operation and maintenance data samples related to the database operation state to a plurality of first data spaces according to a first preset rule;
a first local feature obtaining module, configured to cluster, for each of the first data spaces, the multidimensional operation and maintenance data in the first data space to obtain a first local feature;
a first feature space obtaining module, configured to cluster the first local features corresponding to the multiple first data spaces, to obtain multiple first feature spaces;
the first global feature obtaining module is used for clustering the plurality of first feature spaces to obtain first global features;
and the first determination module of the running state is used for determining the running state of the database according to the first global characteristic and the plurality of first local characteristics.
11. An electronic device, comprising:
one or more processors;
a memory for storing one or more programs,
wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method of any of claims 1-9.
12. A computer readable storage medium having stored thereon executable instructions which, when executed by a processor, cause the processor to perform the method of any one of claims 1 to 9.
CN202310265533.1A 2023-03-20 2023-03-20 Fortune dimension analysis method, device, equipment and medium Active CN115981970B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310265533.1A CN115981970B (en) 2023-03-20 2023-03-20 Fortune dimension analysis method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310265533.1A CN115981970B (en) 2023-03-20 2023-03-20 Fortune dimension analysis method, device, equipment and medium

Publications (2)

Publication Number Publication Date
CN115981970A true CN115981970A (en) 2023-04-18
CN115981970B CN115981970B (en) 2023-05-16

Family

ID=85972540

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310265533.1A Active CN115981970B (en) 2023-03-20 2023-03-20 Fortune dimension analysis method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN115981970B (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070022120A1 (en) * 2005-07-25 2007-01-25 Microsoft Corporation Caching and modifying portions of a multi-dimensional database on a user device
US20100106723A1 (en) * 2008-10-24 2010-04-29 Industry-Academic Cooperation Foundation, Yonsei University Method and system of clustering for multi-dimensional data streams
US20120137367A1 (en) * 2009-11-06 2012-05-31 Cataphora, Inc. Continuous anomaly detection based on behavior modeling and heterogeneous information analysis
US20180024875A1 (en) * 2016-07-20 2018-01-25 International Business Machines Corporation Anomaly detection in performance management
CN108446200A (en) * 2018-02-07 2018-08-24 福建星瑞格软件有限公司 Server intelligence O&M method based on big data machine learning and computer equipment
CN110443264A (en) * 2018-05-03 2019-11-12 北京京东尚科信息技术有限公司 A kind of method and apparatus of cluster
US10572778B1 (en) * 2019-03-15 2020-02-25 Prime Research Solutions LLC Machine-learning-based systems and methods for quality detection of digital input
CN113051452A (en) * 2021-04-12 2021-06-29 清华大学 Operation and maintenance data feature selection method and device
CN113535673A (en) * 2020-04-17 2021-10-22 北京京东振世信息技术有限公司 Method and device for generating configuration file and processing data
CN114612514A (en) * 2022-03-14 2022-06-10 西安邮电大学 Multi-feature multi-resolution track anomaly detection method
CN114897074A (en) * 2022-05-13 2022-08-12 北京纪新泰富机电技术股份有限公司 Method and device for determining running state of equipment, equipment and storage medium

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070022120A1 (en) * 2005-07-25 2007-01-25 Microsoft Corporation Caching and modifying portions of a multi-dimensional database on a user device
US20100106723A1 (en) * 2008-10-24 2010-04-29 Industry-Academic Cooperation Foundation, Yonsei University Method and system of clustering for multi-dimensional data streams
US20120137367A1 (en) * 2009-11-06 2012-05-31 Cataphora, Inc. Continuous anomaly detection based on behavior modeling and heterogeneous information analysis
US20180024875A1 (en) * 2016-07-20 2018-01-25 International Business Machines Corporation Anomaly detection in performance management
CN108446200A (en) * 2018-02-07 2018-08-24 福建星瑞格软件有限公司 Server intelligence O&M method based on big data machine learning and computer equipment
CN110443264A (en) * 2018-05-03 2019-11-12 北京京东尚科信息技术有限公司 A kind of method and apparatus of cluster
US10572778B1 (en) * 2019-03-15 2020-02-25 Prime Research Solutions LLC Machine-learning-based systems and methods for quality detection of digital input
CN113535673A (en) * 2020-04-17 2021-10-22 北京京东振世信息技术有限公司 Method and device for generating configuration file and processing data
CN113051452A (en) * 2021-04-12 2021-06-29 清华大学 Operation and maintenance data feature selection method and device
CN114612514A (en) * 2022-03-14 2022-06-10 西安邮电大学 Multi-feature multi-resolution track anomaly detection method
CN114897074A (en) * 2022-05-13 2022-08-12 北京纪新泰富机电技术股份有限公司 Method and device for determining running state of equipment, equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王勇等: "AI深度学习在移动网异常小区检测分类中的应用", 《邮电设计技术》 *

Also Published As

Publication number Publication date
CN115981970B (en) 2023-05-16

Similar Documents

Publication Publication Date Title
US11580136B2 (en) Method and apparatus of user clustering, computer device and medium
CN109471783B (en) Method and device for predicting task operation parameters
CN111814910B (en) Abnormality detection method, abnormality detection device, electronic device, and storage medium
CN113537337A (en) Training method, abnormality detection method, apparatus, device, and storage medium
CN114462532A (en) Model training method, device, equipment and medium for predicting transaction risk
CN116155628B (en) Network security detection method, training device, electronic equipment and medium
CN113420935A (en) Fault location method, apparatus, device and medium
CN115981970B (en) Fortune dimension analysis method, device, equipment and medium
CN113722177B (en) Timing index anomaly detection method, apparatus, system, device and storage medium
CN114218283A (en) Abnormality detection method, apparatus, device, and medium
CN113869904B (en) Suspicious data identification method, device, electronic equipment, medium and computer program
CN115795345A (en) Information processing method, device, equipment and storage medium
CN114816955A (en) Database performance prediction method and device
CN114443663A (en) Data table processing method, device, equipment and medium
CN113052509A (en) Model evaluation method, model evaluation apparatus, electronic device, and storage medium
CN113961441A (en) Alarm event processing method, auditing method, device, equipment, medium and product
CN113986671A (en) Operation and maintenance data anomaly detection method, device, equipment and medium
CN113112352A (en) Risk service detection model training method, risk service detection method and device
CN116467613A (en) Application classification method and device, electronic equipment and computer readable storage medium
CN113515713B (en) Webpage caching strategy generation method and device and webpage caching method and device
CN114742648A (en) Product pushing method, device, equipment and medium
CN113674011A (en) Data processing method, device, computing equipment and medium for user behaviors
CN114693421A (en) Risk assessment method, apparatus, electronic device and medium
CN116010952A (en) Dynamic baseline determination method, transaction data detection method, device and electronic equipment
CN113987032A (en) Method, device, equipment and storage medium for determining cloud service implementation strategy

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant