CN115981970B - Fortune dimension analysis method, device, equipment and medium - Google Patents

Fortune dimension analysis method, device, equipment and medium Download PDF

Info

Publication number
CN115981970B
CN115981970B CN202310265533.1A CN202310265533A CN115981970B CN 115981970 B CN115981970 B CN 115981970B CN 202310265533 A CN202310265533 A CN 202310265533A CN 115981970 B CN115981970 B CN 115981970B
Authority
CN
China
Prior art keywords
data
feature
spaces
database
local
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310265533.1A
Other languages
Chinese (zh)
Other versions
CN115981970A (en
Inventor
孟江波
李占兵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CCB Finetech Co Ltd
Original Assignee
CCB Finetech Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CCB Finetech Co Ltd filed Critical CCB Finetech Co Ltd
Priority to CN202310265533.1A priority Critical patent/CN115981970B/en
Publication of CN115981970A publication Critical patent/CN115981970A/en
Application granted granted Critical
Publication of CN115981970B publication Critical patent/CN115981970B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a method, a device, equipment and a medium for analyzing operation and data, which can be applied to the technical field of computers and the technical field of data processing. The method comprises the following steps: distributing multidimensional operation and data samples related to the operation state of the database to a plurality of first data spaces according to a first preset rule; clustering multidimensional operation and maintenance data in the first data space aiming at each of the plurality of first data spaces to obtain first local features; clustering the first local features corresponding to the first data spaces to obtain a plurality of first feature spaces; clustering the plurality of first feature spaces to obtain first global features; and determining the running state of the database according to the first global feature and the first local features.

Description

Fortune dimension analysis method, device, equipment and medium
Technical Field
The present invention relates to the field of computer technology and the field of data processing technology, and in particular, to a method, an apparatus, a device, and a medium for analyzing operation and maintenance data.
Background
In the system operation and maintenance, different systems, applications, databases and middleware can be unfolded along with the time dimension to generate a large amount of complex operation and maintenance data information with various dimensions, operation and maintenance personnel can apply various algorithms to perform modeling analysis according to the multi-dimensional data, so as to realize state monitoring, fault alarming, service analysis, root cause analysis and the like.
In the process of implementing the inventive concept, the inventor finds that at least the following problems exist in the related art: the efficiency of processing multidimensional operation and maintenance data is low and the operation and maintenance effect is poor.
Disclosure of Invention
In view of the above, the present invention provides a method, apparatus, device, and medium for analyzing operation data.
According to a first aspect of the present invention, there is provided a method of analysis of a dimension of motion, comprising:
distributing multidimensional operation and data samples related to the operation state of the database to a plurality of first data spaces according to a first preset rule;
clustering the multidimensional data in the first data space to obtain a first local feature for each of the plurality of first data spaces;
clustering the first local features corresponding to the first data spaces to obtain a plurality of first feature spaces;
Clustering the plurality of first feature spaces to obtain first global features;
and determining the running state of the database according to the first global feature and the plurality of first local features.
According to an embodiment of the present invention, the method further includes:
distributing the multi-dimensional operation data samples to a plurality of second data spaces according to expert experience;
clustering the multidimensional operation data in the plurality of second data spaces to obtain second global features;
wherein determining the operating state of the database according to the first global feature and the plurality of first local features includes:
and determining the running state of the database according to the first global feature, the second global feature and the plurality of first local features.
According to an embodiment of the present invention, the determining the operation state of the database according to the first global feature, the second global feature, and the plurality of first local features includes:
calculating a first similarity between the second global feature and the first global feature;
and under the condition that the first similarity is larger than a first threshold value, inputting the first global feature and a plurality of first local features into a machine learning algorithm, and outputting the running state of the database.
According to an embodiment of the present invention, the method further includes:
if the first similarity is smaller than or equal to the first threshold, distributing the multidimensional data samples to a plurality of third data spaces according to a second preset rule, wherein the number of the third data spaces is larger than that of the first data spaces;
clustering the multidimensional data in the third data space to obtain a second local feature for each of the plurality of third data spaces;
clustering the second local features corresponding to the third data spaces to obtain a plurality of second feature spaces;
clustering the plurality of second feature spaces to obtain a third global feature;
and determining the running state of the database according to the third global feature and the second local features.
According to an embodiment of the present invention, the clustering the second local features corresponding to each of the plurality of third data spaces to obtain a plurality of second feature spaces includes:
and clustering the second local features corresponding to the plurality of third data spaces with the same features to obtain a plurality of second feature spaces.
According to an embodiment of the present invention, the clustering the multidimensional data in the first data space for each of the plurality of first data spaces to obtain a first local feature includes:
calculating a second similarity between the multi-dimensional operation data in the first data space;
and if the second similarity is greater than a second threshold, grouping the multidimensional data corresponding to the second similarity into one class to obtain the first local feature.
According to an embodiment of the present invention, the clustering the first local features corresponding to each of the plurality of first data spaces to obtain a plurality of first feature spaces includes:
and clustering the first local features corresponding to the plurality of first data spaces with the same features to obtain the plurality of first feature spaces.
According to an embodiment of the present invention, before the distributing the multidimensional operation data sample related to the operation state of the database to the plurality of first data spaces according to the first preset rule, the method further includes:
acquiring original operation data related to the operation state of a database;
and preprocessing the original operation and maintenance data to obtain a multidimensional operation and maintenance data sample.
According to an embodiment of the present invention, the original operation data related to the database operation state includes: the method comprises the steps of accessing a port of a database, calling a memory of the database, attacking the port of the database or not, and storing the memory of the database.
A second aspect of the present invention provides a transport-dimension analysis apparatus comprising:
the first space allocation module is used for allocating multidimensional operation and data samples related to the operation state of the database to a plurality of first data spaces according to a first preset rule;
the first local feature obtaining module is used for clustering the multidimensional dimension data in the first data space according to each of the plurality of first data spaces to obtain first local features;
the first feature space obtaining module is used for clustering the first local features corresponding to the plurality of first data spaces to obtain a plurality of first feature spaces;
the first global feature obtaining module is used for clustering the plurality of first feature spaces to obtain first global features;
and the running state first determining module is used for determining the running state of the database according to the first global feature and the plurality of first local features.
A third aspect of the present invention provides an electronic device comprising:
one or more processors;
a memory for storing one or more programs,
wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method as described above.
A fourth aspect of the invention also provides a computer readable storage medium having stored thereon executable instructions which, when executed by a processor, cause the processor to perform a method as described above.
The fifth aspect of the invention also provides a computer program product comprising computer executable instructions which when executed implement a method as described above.
According to the embodiment of the invention, the multidimensional operation and maintenance data samples related to the operation state of the database are distributed to a plurality of first data spaces according to the first preset rule, the disordered multidimensional operation and maintenance data samples are distributed to the plurality of first data spaces according to the first preset rule, so that the multidimensional operation and maintenance data samples in the first data spaces have the characteristics corresponding to the first preset rule, then the multidimensional operation and maintenance data in the first data spaces are clustered according to each first data space in the plurality of first data spaces to obtain the first local characteristics, the first local characteristics corresponding to the first data spaces are clustered to obtain the plurality of first characteristic spaces, the first global characteristics are clustered to obtain the first global characteristics, the first local characteristics related to the operation state of the database are clustered once, and the first global characteristics are clustered for a plurality of times, so that the problems of low efficiency of processing the multidimensional operation data and poor operation and maintenance in the first data in the related technology and the overall operation and the first local characteristics are more closely related to the first global characteristics are solved, and the global characteristics are more accurate and the global characteristics are obtained.
Drawings
The foregoing and other objects, features and advantages of the invention will be apparent from the following description of embodiments of the invention with reference to the accompanying drawings, in which:
FIG. 1 illustrates an application scenario diagram of a method of analysis of a dimension of motion data according to an embodiment of the present invention;
FIG. 2 shows a flow chart of a method of analysis of a dimension of motion according to an embodiment of the invention;
FIG. 3 illustrates another flow chart of a method of analysis of a dimension of motion according to an embodiment of the invention;
fig. 4 shows a block diagram of a transport data analysis apparatus according to an embodiment of the present invention; and
fig. 5 shows a block diagram of an electronic device adapted to implement the method of operation and data analysis according to an embodiment of the invention.
Detailed Description
Hereinafter, embodiments of the present invention will be described with reference to the accompanying drawings. It should be understood that the description is only illustrative and is not intended to limit the scope of the invention. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the invention. It may be evident, however, that one or more embodiments may be practiced without these specific details. In addition, in the following description, descriptions of well-known structures and techniques are omitted so as not to unnecessarily obscure the present invention.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. The terms "comprises," "comprising," and/or the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.
All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It should be noted that the terms used herein should be construed to have meanings consistent with the context of the present specification and should not be construed in an idealized or overly formal manner.
Where expressions like at least one of "A, B and C, etc. are used, the expressions should generally be interpreted in accordance with the meaning as commonly understood by those skilled in the art (e.g.," a system having at least one of A, B and C "shall include, but not be limited to, a system having a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).
In the technical scheme of the invention, the related data (such as including but not limited to personal information of a user) are collected, stored, used, processed, transmitted, provided, disclosed, applied and the like, all meet the requirements of related laws and regulations, necessary security measures are adopted, and the public welcome is not violated.
In the process of processing the multidimensional operation and maintenance data, the related technology has the technical problems of low efficiency and poor operation and maintenance effect in the process of processing the multidimensional operation and maintenance data.
In order to at least partially solve the technical problems in the related art, the invention provides a method, a device, equipment and a medium for analyzing operation data, which can be applied to the technical field of computers and the technical field of data processing. The operation and data analysis method comprises the following steps: distributing multidimensional operation and data samples related to the operation state of the database to a plurality of first data spaces according to a first preset rule; clustering multidimensional operation and maintenance data in the first data space aiming at each of the plurality of first data spaces to obtain first local features; clustering the first local features corresponding to the first data spaces to obtain a plurality of first feature spaces; clustering the plurality of first feature spaces to obtain first global features; the operating state office feature and the first local features of the database are determined according to the first global feature and the first local features.
Fig. 1 shows an application scenario diagram of a method of analysis of a motion vector data according to an embodiment of the present invention.
As shown in fig. 1, an application scenario 100 according to this embodiment may include a first terminal device 101, a second terminal device 102, a third terminal device 103, a network 104, and a server 105. The network 104 is a medium used to provide a communication link between the first terminal device 101, the second terminal device 102, the third terminal device 103, and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
The user may interact with the server 105 through the network 104 using at least one of the first terminal device 101, the second terminal device 102, the third terminal device 103, to receive or send messages, etc. Various communication client applications, such as a shopping class application, a web browser application, a search class application, an instant messaging tool, a mailbox client, social platform software, etc. (by way of example only) may be installed on the first terminal device 101, the second terminal device 102, and the third terminal device 103.
The first terminal device 101, the second terminal device 102, the third terminal device 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablets, laptop and desktop computers, and the like.
The server 105 may be a server providing various services, such as a background management server (by way of example only) providing support for websites browsed by the user using the first terminal device 101, the second terminal device 102, and the third terminal device 103. The background management server may analyze and process the received data such as the user request, and feed back the processing result (e.g., the web page, information, or data obtained or generated according to the user request) to the terminal device.
It should be noted that the method for analyzing the operation and data provided by the embodiment of the present invention may be generally performed by the server 105. Accordingly, the operation and data analysis device provided by the embodiment of the present invention may be generally disposed in the server 105. The operation data analysis method provided by the embodiment of the present invention may also be performed by a server or a server cluster that is different from the server 105 and is capable of communicating with the first terminal device 101, the second terminal device 102, the third terminal device 103, and/or the server 105. Accordingly, the operation and data analysis device provided by the embodiment of the present invention may be provided in a server or a server cluster that is different from the server 105 and is capable of communicating with the first terminal device 101, the second terminal device 102, the third terminal device 103, and/or the server 105.
It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely. There may be any number of terminal devices, networks, and servers, as desired for implementation.
The operation and data analysis method of the disclosed embodiment will be described in detail with reference to fig. 2 to 3 based on the scenario described in fig. 1.
Fig. 2 shows a flow chart of a method of analysis of a dimension of a motion vector according to an embodiment of the invention.
As shown in fig. 2, the operation and data analysis method 200 of this embodiment includes operations S210 to S250.
In operation S210, multidimensional operation data samples related to the database operation state are allocated to a plurality of first data spaces according to a first preset rule.
According to an embodiment of the present invention, the multi-dimensional operation and maintenance data sample includes a plurality of multi-dimensional operation and maintenance data.
According to an embodiment of the invention, the multidimensional operation and dimension data characterizes operation and dimension data of a plurality of dimensions normalized to the original operation and dimension data.
According to the embodiment of the invention, since the original operation and maintenance data is operation and maintenance data with a time sequence as a main characteristic, after the original operation and maintenance data is normalized, the obtained multi-dimensional operation and maintenance data sample related to the operation state of the database is also operation and maintenance data with a time sequence as a main characteristic, and therefore, the multi-dimensional operation and maintenance data sample can be distributed to a plurality of first data spaces according to the generation time of the multi-dimensional operation and maintenance data in the multi-dimensional operation and maintenance data sample.
According to an embodiment of the present invention, the first preset rule may be a generation time of the multidimensional operation and maintenance data. For example, in the case of analyzing a multi-dimensional operation data sample related to the database operation state in one day, the multi-dimensional operation data sample may be allocated by hour, and a plurality of multi-dimensional data in the multi-dimensional operation data sample may be allocated into 24 first data spaces.
According to an embodiment of the present invention, for example, in the case of analyzing a multi-dimensional operation data sample related to the operation state of a database in one year, the multi-dimensional operation data sample may be allocated in months, and a plurality of multi-dimensional data in the multi-dimensional operation data sample may be allocated into 12 first data spaces.
According to the embodiment of the invention, in the case that the first preset rule is the generation time of the multidimensional operation and maintenance data, the multidimensional operation and maintenance data sample can be subdivided according to actual conditions, and the subdivision rule is not limited to one hour, one month and the like.
According to embodiments of the present invention, the database operating state may include port access status, memory call status, data storage status, etc. with the database.
According to an embodiment of the invention, the first data space characterizes a dataset consisting of a plurality of multidimensional data having features corresponding to a first preset rule.
According to an embodiment of the present invention, the multidimensional operation data sample may be further subdivided according to a certain state of the database operation states, and the multidimensional operation data sample may be allocated to the plurality of first data spaces. For example, in the case where the running state of the database includes the memory utilization of the database, the memory utilization of the database may be used as one-dimensional data in the multi-dimensional operation and maintenance data, and the memory utilization of the database may be subdivided to allocate the multi-dimensional operation and maintenance data samples to the plurality of first data spaces.
According to the embodiment of the invention, in the case of distributing the multi-dimensional operation data sample to the plurality of first data spaces according to the memory utilization rate of the database, the memory utilization rate of the database may have a value ranging from 0% to 100%, for example, the multi-dimensional operation data including 0% to 20% of the memory utilization rate may be distributed to the first data space, the multi-dimensional operation data including 20% to 40% of the memory utilization rate may be distributed to the second first data space, the multi-dimensional operation data including 40% to 60% of the memory utilization rate may be distributed to the third first data space, the multi-dimensional operation data including 60% to 80% of the memory utilization rate may be distributed to the fourth first data space, and the multi-dimensional operation data including 80% to 100% of the memory utilization rate may be distributed to the fifth first data space, thereby obtaining 5 first data spaces.
According to the embodiment of the invention, the multi-dimensional operation data samples related to the operation state of the database are distributed to the plurality of first data spaces according to the first preset rule, so that the scrambled multi-dimensional operation data samples are distributed to the plurality of first data spaces according to the first preset rule, and the multi-dimensional operation data samples in the first data spaces have the characteristics corresponding to the first preset rule.
In operation S220, for each of the plurality of first data spaces, the multidimensional operation data in the first data space is clustered to obtain a first local feature.
According to the embodiment of the invention, the multi-dimensional data in the first data space can be clustered by utilizing a mahalanobis distance (Mahalanobis Distance), an euclidean distance, a manhattan distance, a hamming distance, kmeans, a mahalanobis distance combined k-means clustering algorithm (k-means clustering algorithm, kmeans), an euclidean distance combined Kmeans, a manhattan distance combined Kmeans or a hamming distance combined Kmeans and the like to obtain the first local feature, and the embodiment of the invention does not limit a specific clustering algorithm and can be selected according to actual conditions.
In operation S230, the first local features corresponding to each of the plurality of first data spaces are clustered to obtain a plurality of first feature spaces.
According to the embodiment of the invention, under the condition that the same or similar characteristics larger than the preset similarity threshold exist among the first local characteristics corresponding to the plurality of first data spaces, the first local characteristics corresponding to the plurality of first data spaces are clustered to obtain a plurality of first characteristic spaces.
According to an embodiment of the invention, the first feature space characterizes a dataset consisting of a plurality of multi-dimensional data having periodic features.
According to the embodiment of the invention, the first local features corresponding to each of the plurality of first data spaces can be clustered by utilizing the mahalanobis distance, the euclidean distance, the manhattan distance, the hamming distance, the Kmeans distance, the euclidean distance, the Kmeans distance, the manhattan distance, the Kmeans distance, the hamming distance, and the like, so as to obtain a plurality of first feature spaces.
According to the embodiment of the present invention, for example, in the case where the same multidimensional operation data exists as the same feature between the first local feature corresponding to the first data space and the first local feature corresponding to the second first data space, the multidimensional operation data in the first local feature corresponding to the first data space and the first local feature corresponding to the second first data space may be clustered to obtain the first feature space.
According to the embodiment of the present invention, for example, when the same multidimensional data exists as the same feature among the first local feature corresponding to the first data space, the first local feature corresponding to the second first data space, and the first local feature corresponding to the third first data space, the multidimensional data in the first local feature corresponding to the first data space, the first local feature corresponding to the second first data space, and the first local feature corresponding to the third first data space may be clustered to obtain the second first feature space.
In operation S240, a plurality of first feature spaces are clustered to obtain first global features.
According to the embodiment of the invention, the first global features can be obtained by clustering a plurality of first feature spaces by utilizing the mahalanobis distance, the euclidean distance, the manhattan distance, the hamming distance, the Kmeans distance, the mahalanobis distance, the euclidean distance, the manhattan distance, the Kmeans distance, the hamming distance, the Kmeans distance and the like, and the embodiment of the invention is not limited to a specific clustering algorithm and can be selected according to actual situations.
According to the embodiment of the invention, the clustering algorithm for obtaining the first local features, the clustering algorithm for obtaining a plurality of first feature spaces and the clustering algorithm for obtaining the first global features can be the same or different, and the specific clustering algorithm for obtaining the first local features, the clustering algorithm for obtaining a plurality of first feature spaces and the clustering algorithm for obtaining the first global features can be selected according to actual conditions.
In operation S250, an operational state of the database is determined based on the first global feature and the plurality of first local features.
According to the embodiment of the invention, the running state of the database determined according to the first global feature and the plurality of first local features can be too high in which time period the port access amount of the database is too high, the memory occupancy rate of the database is too high in which time period, whether the memory storage amount of the database reaches the preset highest value or not, what is the average daily or monthly increment amount of the memory storage amount of the database, and in which time period the database is attacked by a larger amount.
According to the embodiment of the invention, an operation and maintenance person reasonably maintains the database by checking the operation state of the database determined according to the first global feature and the plurality of first local features.
According to the embodiment of the invention, the first global feature and the first local features can be input into a trained machine learning algorithm, and the running state of the database is determined according to the output result of the machine learning algorithm.
According to the embodiment of the invention, the first global feature can be verified by using the global feature obtained according to the manual experience, and the running state of the database is determined according to the result output by the machine learning algorithm when the first global feature meets the verification requirement, for example, when the similarity between the global feature obtained according to the manual experience and the first global feature is greater than a first similarity threshold value.
According to the embodiment of the invention, under the condition that the first global feature does not meet the verification requirement, for example, under the condition that the similarity between the global feature obtained according to manual experience and the first global feature is smaller than or equal to a first similarity threshold, a rule with finer granularity than a first preset rule is used for carrying out finer distribution on a multidimensional operation and dimension sample related to the operation state of a database, then the nth local feature, an nth feature space and the nth global feature are recalculated until the nth global feature meets the verification requirement, the optimal global feature is found out, so that the nth global feature is more global representative, then the nth global feature and a plurality of nth local features corresponding to the nth global feature are input into a trained machine learning algorithm, the operation state of the database is determined according to the output result of the machine learning algorithm, the number of times of cyclic calculation of the nth feature is calculated, and the first similarity threshold can be selected according to the actual condition, so that the global feature obtained according to manual experience meets the verification requirement through multiple times of cyclic operation.
According to an embodiment of the present invention, the nth local feature corresponds to the first local feature, the nth feature space corresponds to the first feature space, and the nth global feature corresponds to the first global feature.
According to the embodiment of the invention, under the condition that the first global feature does not meet the verification requirement, the global feature obtained by the manual experience can be adjusted for multiple times, so that the global feature obtained by the manual experience after the multiple times of adjustment and the first global feature meet the verification requirement, the optimal global feature is found, the first global feature is considered to be more global representative at the moment, then the first global feature and the plurality of first local features are input into a trained machine learning algorithm, and the running state of the database is determined according to the output result of the machine learning algorithm.
According to the embodiment of the invention, under the condition that the first global feature does not meet the verification requirement, the global feature obtained by the manual experience can be adjusted for multiple times, under the condition that the global feature obtained by the manual experience is adjusted for multiple times and still cannot meet the verification requirement, the multi-dimensional operation data sample related to the operation state of the database is distributed more carefully by using a rule with finer granularity than a first preset rule, then the nth local feature, the nth feature space and the nth global feature are recalculated until the nth global feature meets the verification requirement, the optimal global feature is found, so that the nth global feature is more global representative, then the nth global feature and a plurality of nth local features corresponding to the nth global feature are input into a trained machine learning algorithm, and the operation state of the database is determined according to the result output by the machine learning algorithm, wherein the nth represents the number of times of cyclic calculation.
According to the embodiment of the invention, under the condition that the first global features do not meet the verification requirement, the first feature spaces can be clustered under the condition that the same or features with the similarity larger than the preset similarity threshold exist among the first feature spaces to obtain the second feature spaces, then under the condition that the features with the same or the similarity larger than the preset similarity threshold exist among the second feature spaces, the second feature spaces are clustered to obtain the third feature spaces, and so on to obtain the i feature spaces, so that the i feature spaces are more representative of local features, the i global features are clustered to obtain the i global features, so that the i global features meet the verification requirement, the optimal global features are found, so that the i global features are more representative of the i global features, then the i global features and the i local features corresponding to the i global features are input into a trained machine learning algorithm, the running state of the database is determined according to the result output by the machine learning algorithm, wherein the i feature spaces can be clustered according to the actual times of the first feature, and the i feature spaces can be clustered according to the actual times.
According to the embodiment of the invention, the operation and maintenance data analysis method provided by the embodiment of the invention realizes that the first local feature related to the operation state of the database is obtained by carrying out one-time clustering on the multidimensional operation and maintenance data related to the operation state of the database, and the first global feature is obtained by carrying out multiple clustering on the first local feature, so that the technical problems of low efficiency and poor operation and maintenance effect in processing multidimensional operation and maintenance data in the related technology are at least partially overcome, the speed of extracting the first local feature and the first global feature is improved, and the first local feature and the first global feature are closely connected with each other, so that the first global feature is more global representative, and the operation state of the database is rapidly and accurately determined according to the first global feature and a plurality of first local features.
According to the embodiment of the invention, the operation and maintenance data analysis method provided by the embodiment of the invention realizes that the first local feature related to the operation state of the database is obtained by carrying out one-time clustering on the multidimensional operation and maintenance data related to the operation state of the database, the first global feature is obtained by carrying out multiple clustering on the first local feature, the periodic feature in the time sequence operation and maintenance data can be strongly captured, the global feature with higher value is obtained, and the global feature with higher value is provided for a downstream algorithm to obtain the operation state of the database with more precision.
According to an embodiment of the present invention, the operation data analysis method 200 further includes the operations of:
distributing the multidimensional data samples to a plurality of second data spaces according to expert experience;
clustering the multidimensional operation and maintenance data in a plurality of second data spaces to obtain second global features;
wherein determining the operational state of the database based on the first global feature and the first plurality of local features comprises:
and determining the running state of the database according to the first global feature, the second global feature and the plurality of first local features.
According to the embodiment of the invention, the same or similar data characteristics of the plurality of pieces of multidimensional operation and maintenance data in the multidimensional operation and maintenance data sample can be determined according to expert experience, and then the multidimensional operation and maintenance data with characteristic representativeness in the multidimensional operation and maintenance data sample is distributed to a plurality of second data spaces according to the same or similar data characteristics.
According to the embodiment of the invention, the multi-dimensional operation data in the plurality of second data spaces can be clustered by utilizing the mahalanobis distance, the euclidean distance, the manhattan distance, the hamming distance, the Kmeans distance, the euclidean distance, the manhattan distance, the Kmeans distance, the hamming distance and the like to obtain the second global feature.
According to the embodiment of the invention, multidimensional operation and data samples are distributed to a plurality of second data spaces according to expert experience, the multidimensional operation and data in the plurality of second data spaces are clustered to obtain second global features, and preparation is made for the follow-up accuracy verification of the first global features by using the second global features.
According to an embodiment of the present invention, determining an operational state of a database from a first global feature, a second global feature, and a plurality of first local features includes:
calculating first similarity of the second global feature and the first global feature;
and under the condition that the first similarity is larger than a first threshold value, inputting the first global feature and the plurality of first local features into a machine learning algorithm, and outputting the running state of the database.
According to the embodiment of the invention, the first similarity between the second global feature and the first global feature can be calculated by utilizing the mahalanobis distance, the first similarity between the second global feature and the first global feature can be calculated by utilizing the Jacquard similarity coefficient, the specific method for calculating the first similarity between the second global feature and the first global feature is not limited, and the selection can be carried out according to actual conditions.
According to the embodiment of the invention, for example, the first similarity between the second global feature and the first global feature can be calculated by using the mahalanobis distance, the average value and the covariance of a plurality of multi-dimensional operation and maintenance data included in the second global feature can be calculated according to the mahalanobis distance, then the distance between each multi-dimensional operation and maintenance data included in the first global feature and the second global feature is calculated according to the average value and the covariance, a plurality of second global feature distances are obtained, then the plurality of second global feature distances are averaged, the average value is mapped to between 0 and 1, the smaller the average value is, the larger the mapped value is, the average value is mapped to between 0 and 1 as the first similarity between the second global feature and the first global feature, and under the condition that the first similarity is larger than a first threshold value, the first global feature and the plurality of first local features are input into the machine learning algorithm, and the running state of the database is output.
According to the embodiment of the invention, the first threshold value is set correspondingly according to the scale of the first similarity, and in the case that the first similarity varies between 0 and 1, the first threshold value may take on a value of 0.7, 0.8, 0.9, or the like, for example.
According to the embodiment of the invention, the first similarity between the second global feature and the first global feature is calculated, the first global feature and the plurality of first local features are input into the machine learning algorithm under the condition that the first similarity is larger than the first threshold value, the running state of the database is output, the accuracy verification of the first global feature by using the second global feature is realized, the optimal first global feature is found, and the first global feature is enabled to be more global representative, so that the running state of the database obtained according to the first global feature and the plurality of first local features is more accurate.
According to an embodiment of the present invention, the operation and data analysis method 200 further includes:
if the first similarity is smaller than or equal to a first threshold value, distributing the multidimensional operation data samples to a plurality of third data spaces according to a second preset rule, wherein the number of the third data spaces is larger than that of the first data spaces;
clustering the multidimensional operation and maintenance data in the third data space aiming at each of the plurality of third data spaces to obtain a second local feature;
clustering the second local features corresponding to the third data spaces to obtain a plurality of second feature spaces;
Clustering the plurality of second feature spaces to obtain a third global feature;
an operational state of the database is determined based on the third global feature and the plurality of second local features.
According to an embodiment of the invention, the second preset rule characterizes a rule having a finer granularity than the first preset rule.
According to an embodiment of the present invention, for example, in the case of analyzing a multi-dimensional operation data sample related to an operation state of a database in one day, the first preset rule may allocate the multi-dimensional operation data sample according to each hour, allocate a plurality of multi-dimensional data in the multi-dimensional operation data sample to 24 first data spaces, and the second preset rule may allocate the multi-dimensional operation data sample according to each half hour, and allocate a plurality of multi-dimensional data in the multi-dimensional operation data sample to 48 third data spaces.
According to an embodiment of the present invention, for example, in the case of analyzing a multi-dimensional operation data sample related to an operation state of a database in one year, the first preset rule may allocate the multi-dimensional operation data sample by month, allocate a plurality of multi-dimensional data in the multi-dimensional operation data sample to 12 first data spaces, and the second preset rule may allocate the multi-dimensional operation data sample by half month, and allocate a plurality of multi-dimensional data in the multi-dimensional operation data sample to 24 third data spaces.
According to an embodiment of the present invention, for example, in the case where the multi-dimensional operation data samples are allocated to the plurality of first data spaces according to the memory utilization rate of the database, the memory utilization rate of the database may range from 0% to 100%, the first preset rule may be to divide the multi-dimensional operation data samples into the fifth first data space according to the memory utilization rate interval from 0% to 20%, from 20% to 40%, from 40% to 60%, from 60% to 80%, from 80% to 100%, and the second preset rule may be to divide the multi-dimensional operation data samples into the tenth third data space according to the memory utilization rate interval from 0% to 10%, from 10% to 20%, from 20% to 30%, from 30% to 40%, from 40% to 50%, from 50% to 60%, from 60% to 70%, from 70% to 80%, from 80% to 90%, from 90% to 100%.
According to the implementation of the invention, under the condition that the first similarity is smaller than or equal to the first threshold value, the multidimensional operation and data samples are distributed to a plurality of third data spaces according to the second preset rule, the number of the third data spaces is larger than that of the first data spaces, and finer granularity distribution of the multidimensional operation and data samples according to the second preset rule is realized, so that the multidimensional operation and data samples in the third data spaces have characteristics corresponding to the second preset rule.
According to the embodiment of the invention, the multi-dimensional operation data in the third data space can be clustered by utilizing the mahalanobis distance, the euclidean distance, the manhattan distance, the hamming distance, the Kmeans distance, the euclidean distance, the Kmeans distance, the manhattan distance, the Kmeans distance, the hamming distance and the like to obtain the second local feature.
According to the embodiment of the invention, under the condition that the same or similar characteristics larger than the preset similarity threshold exist among the second local characteristics corresponding to the third data spaces, the second local characteristics corresponding to the third data spaces are clustered to obtain a plurality of second characteristic spaces.
According to the embodiment of the invention, the mahalanobis distance, the euclidean distance, the manhattan distance, the hamming distance, the Kmeans distance, the mahalanobis distance, the euclidean distance, the manhattan distance, the Kmeans distance, the hamming distance, and the like can be utilized to cluster the second local features corresponding to the third data spaces respectively to obtain a plurality of second feature spaces.
According to the embodiment of the present invention, for example, in the case where the same multidimensional operation data exists as the same feature between the second local feature corresponding to the first third data space and the second local feature corresponding to the second third data space, the multidimensional operation data in the second local feature corresponding to the first third data space and the second local feature corresponding to the second third data space may be clustered to obtain the first second feature space.
According to the embodiment of the invention, the mahalanobis distance, the euclidean distance, the manhattan distance, the hamming distance, the Kmeans distance, the mahalanobis distance, the euclidean distance, the manhattan distance, the Kmeans distance, the hamming distance, the Kmeans distance, the hamming distance, and the like can be used for clustering the plurality of second feature spaces to obtain the third global feature.
According to the embodiment of the invention, the clustering algorithm for obtaining the second local features, the clustering algorithm for obtaining a plurality of second feature spaces and the clustering algorithm for obtaining the third global features can be the same or different, and the specific clustering algorithm for obtaining the second local features, the clustering algorithm for obtaining a plurality of second feature spaces and the clustering algorithm for obtaining the third global features can be selected according to actual conditions.
According to the embodiment of the invention, the running state of the database determined according to the third global feature and the plurality of second local features can be too high in which time period the port access amount of the database is too high, the memory occupancy rate of the database is too high in which time period, whether the memory storage amount of the database reaches the preset highest value or not, what is the average daily or monthly increment amount of the memory storage amount of the database, and in which time period the database is attacked by a larger amount.
According to the embodiment of the invention, an operation and maintenance person reasonably maintains the database by checking the operation state of the database determined according to the third global feature and the plurality of second local features.
According to the embodiment of the invention, the third global feature and the plurality of second local features can be input into a trained machine learning algorithm, and the running state of the database is determined according to the output result of the machine learning algorithm.
According to the embodiment of the invention, the third global feature can be verified by using the global feature obtained according to the manual experience, and the running state of the database is determined according to the result output by the machine learning algorithm when the third global feature meets the verification requirement, for example, when the similarity between the global feature obtained according to the manual experience and the third global feature is greater than the first similarity threshold value.
According to the embodiment of the invention, under the condition that the first similarity is smaller than or equal to the first threshold value, the multidimensional operation and maintenance data samples are distributed to a plurality of third data spaces according to the second preset rule, the number of the third data spaces is larger than that of the first data spaces, the multidimensional operation and maintenance data samples are distributed in a finer granularity according to the second preset rule, the multidimensional operation and maintenance data samples in the third data spaces have the characteristics corresponding to the second preset rule, then the multidimensional operation and maintenance data in the third data spaces are clustered according to each third data space in the third data spaces to obtain second local characteristics, the second local characteristics corresponding to each third data space in the third data spaces are clustered to obtain a plurality of second characteristic spaces, the periodic characteristics of the multidimensional operation and maintenance data included in the second characteristic spaces are more obvious, then the second characteristic spaces are clustered to obtain third global characteristics, and the accuracy of the data base determined according to the third global characteristics and the local operation states of the second characteristics is improved.
According to an embodiment of the present invention, clustering the second local features corresponding to each of the plurality of third data spaces to obtain a plurality of second feature spaces includes:
And clustering the second local features corresponding to the third data spaces with the same features to obtain a plurality of second feature spaces.
According to an embodiment of the invention, the same feature characterization comprises the same multi-dimensional operation data as the same feature in the second local features corresponding to each of the plurality of third data spaces.
According to the embodiment of the invention, for example, the mahalanobis distance can be utilized to cluster the second local features corresponding to each of the plurality of the first brake data spaces with the same features, so as to obtain a plurality of the second feature spaces.
According to the embodiment of the invention, for example, a mahalanobis distance algorithm is utilized to calculate a mean value and a covariance of a plurality of multidimensional operation and maintenance data included in a second local feature corresponding to each of a plurality of third data spaces with the same feature, then according to a mahalanobis distance calculation formula, a distance is calculated for a plurality of multidimensional operation and maintenance data included in a second local feature corresponding to each of a plurality of third data spaces with the same feature by utilizing the mean value and the covariance, and multidimensional operation and maintenance data with the distance smaller than a threshold value of the second feature space are gathered into one type to obtain the second feature space.
According to an embodiment of the present invention, for example, the number of the plurality of third data spaces may be 3, the second local feature 1 corresponding to the first third data space, the second local feature 2 corresponding to the second third data space, and the second local feature 3 corresponding to the third data space.
According to the embodiment of the present invention, for example, in the case where the same multi-dimensional operation data exists between the second local feature 1 and the second local feature 2 as the same feature, and the same multi-dimensional operation data exists between the second local feature 2 and the second local feature 3 as the same feature, the second local feature 1 and the second local feature 2 may be clustered by using the mahalanobis distance to obtain the second feature space 1, and the second local feature 2 and the second local feature 3 may be clustered to obtain the second feature space 2.
According to the embodiment of the invention, the second local features corresponding to the third data spaces with the same features are clustered to obtain the second feature spaces, so that the periodic features of the multidimensional operation and maintenance data included in the second feature spaces are more obvious.
According to an embodiment of the present invention, for operation S220, clustering the multidimensional operation data in the first data space to obtain a first local feature for each of the plurality of first data spaces may include the following operations:
calculating a second similarity between the multidimensional operation data in the first data space;
and under the condition that the second similarity is larger than a second threshold value, clustering multidimensional data corresponding to the second similarity into one class to obtain a first local feature.
According to the embodiment of the invention, for example, the second similarity between the multidimensional operation and maintenance data in the first data space can be calculated by using the mahalanobis distance, the average value and the covariance in the multidimensional operation and maintenance data included in the first data space can be calculated according to the mahalanobis distance, then the distance between each multidimensional operation and maintenance data in the first data space and the first data space is calculated according to the average value and the covariance, the distance is mapped to be between 0 and 1, the smaller the distance is the larger the mapped value, the distance is mapped to be between 0 and 1 as the second similarity, and the multidimensional data corresponding to the second similarity is gathered into one class under the condition that the second similarity is larger than the second threshold value, so as to obtain the first local feature.
According to an embodiment of the present invention, the second threshold is set according to a scale of the second similarity, and in the case where the second similarity varies between 0 and 1, the second threshold may take a value of 0.7, 0.8, or 0.9, for example.
According to the embodiment of the invention, the first local feature is obtained by calculating the second similarity between the multidimensional operation and maintenance data in the first data space, and the multidimensional data corresponding to the second similarity are gathered into one type under the condition that the second similarity is larger than the second threshold value, so that the aggregation of a plurality of multidimensional operation and maintenance data with closer association relations in the first data space to the first local feature is realized, and the data in the first local feature is more local representative.
According to an embodiment of the present invention, for operation S230, clustering the first local features corresponding to each of the plurality of first data spaces to obtain a plurality of first feature spaces may include the following operations:
and clustering the first local features corresponding to the first data spaces with the same features to obtain a plurality of first feature spaces.
According to an embodiment of the present invention, the same feature characterization includes the same multi-dimensional operation data as the same feature in the first local features corresponding to each of the plurality of first data spaces.
According to the embodiment of the invention, for example, the mahalanobis distance can be utilized to cluster the first local features corresponding to each of the plurality of first data spaces with the same features, so as to obtain a plurality of first feature spaces.
According to the embodiment of the invention, for example, a mahalanobis distance algorithm is utilized to calculate a mean value and a covariance of a plurality of multidimensional operation and maintenance data included in a first local feature corresponding to each of a plurality of first data spaces with the same feature, then according to a mahalanobis distance calculation formula, a distance is calculated for a plurality of multidimensional operation and maintenance data included in a first local feature corresponding to each of a plurality of first data spaces with the same feature by utilizing the mean value and the covariance, and multidimensional operation and maintenance data with a distance smaller than a first feature space threshold value are gathered into one type to obtain a first feature space.
According to an embodiment of the present invention, for example, the number of the plurality of first data spaces is 5, the first local feature 1 corresponds to the first data space, the first local feature 2 corresponds to the second first data space, the first local feature 3 corresponds to the third first data space, the first local feature 4 corresponds to the fourth first data space, and the first local feature 5 corresponds to the fifth first data space.
According to the embodiment of the present invention, for example, when the same multi-dimensional dimension data exists between the first local feature 1 and the first local feature 2 as the same feature, the same multi-dimensional dimension data exists between the first local feature 2, the first local feature 3 and the first local feature 4 as the same feature, and the same multi-dimensional dimension data exists between the first local feature 4 and the first local feature 5 as the same feature, the first local feature 1 and the first local feature 2 may be clustered by using the mahalanobis distance to obtain the first feature space 1, the first local feature 2, the first local feature 3 and the first local feature 4 may be clustered to obtain the first feature space 2, and the first local feature 4 and the first local feature 5 may be clustered to obtain the first feature space 3.
According to the embodiment of the invention, the first local features corresponding to the first data spaces with the same features are clustered to obtain the first feature spaces, so that the periodic features of the multidimensional operation and maintenance data included in the first feature spaces are more obvious.
In accordance with an embodiment of the present invention, before the multidimensional data samples related to the database operating state are allocated to the plurality of first data spaces according to the first preset rule, the operating data analysis method 200 further comprises the operations of:
acquiring original operation data related to the operation state of a database;
and preprocessing the original operation and data to obtain a multidimensional operation and data sample.
According to the embodiment of the invention, the original operation and maintenance data can be log type operation and maintenance data output by the database, and can also be operation and maintenance data obtained by detecting the database in real time by detection software.
According to an embodiment of the present invention, the original operation and maintenance data is operation and maintenance big data with time series as main characteristics.
According to the embodiment of the invention, preprocessing the original operation and maintenance data comprises dimension division and data normalization processing on the original operation and maintenance data.
According to the embodiment of the invention, the normalization processing of the original operation and maintenance data can be unified operation on the scales of the data with different dimensions in the original operation and maintenance data.
According to the embodiment of the invention, for example, in the case that the original operation and maintenance data has 4 types of operation and maintenance data, the 4 types of operation and maintenance data are divided into A, B, C, D, at this time, the data corresponding to the type a can be put into the first dimension of the multi-dimensional operation and maintenance data, the data corresponding to the type B can be put into the second dimension of the multi-dimensional operation and maintenance data, the data corresponding to the type C can be put into the third dimension of the multi-dimensional operation and maintenance data, the data corresponding to the type D can be put into the fourth dimension of the multi-dimensional operation and maintenance data, and then the multi-dimensional operation and maintenance data can be normalized to obtain a multi-dimensional operation and maintenance data sample.
According to the embodiment of the invention, the original operation and data are preprocessed to obtain the multi-dimensional operation and data sample, and the normalized multi-dimensional operation and data are obtained, so that the multi-dimensional operation and data in the multi-dimensional operation and data sample can be uniformly processed by using a clustering algorithm.
According to an embodiment of the present invention, an original operation data related to an operation state of a database includes: the method comprises the steps of accessing a port of a database, calling a memory of the database, attacking the port of the database or not, and storing the memory of the database.
Fig. 3 shows another flow chart of a method of analysis of a dimension of motion according to an embodiment of the invention.
As can be seen from fig. 3, the operation and data analysis method includes obtaining original operation and data 311 related to an operation state of a database, preprocessing the original operation and data 311 to obtain multi-dimensional data samples 321, then distributing the multi-dimensional data samples 321 to a plurality of first data spaces according to a first preset rule to obtain a first data space 1 (331), a first data space 2 (332), a first data space n (333), then clustering the first data space 1 (331) by using a clustering algorithm (such as mahalanobis distance) to obtain a first local feature 1 (341), clustering the first data space 2 (332) to obtain a first local feature 2 (342), and clustering the first data space n (333) to obtain a first local feature n (343).
As can be seen from fig. 3, when there are features with the same or a similarity greater than a preset similarity threshold value between the plurality of first local features, the plurality of first local features are clustered to obtain a plurality of first feature spaces, where the plurality of first feature spaces include a first feature space 1 (351), and the first feature space 2 (352) reaches a first feature space n (353).
As can be seen from fig. 3, the same multidimensional dimension data exists between the first local feature 1 (341) and the first local feature 2 (342) as the same feature, the same multidimensional dimension data exists between the first local feature 1 (341) and the first local feature n (343) as the same feature, the same multidimensional dimension data exists between the first local feature 2 (342) and the first local feature n (343) as the same feature, a clustering algorithm (for example, mahalanobis distance) is used for clustering the first local feature 1 (341) and the first local feature 2 (342) to obtain a first feature space 1 (351), the first local feature 1 (341) and the first local feature n (343) are clustered to obtain a first feature space k (353), and the first local feature 2 (342) and the first local feature n (343) are clustered to obtain a first feature space 2 (352).
As can be seen from fig. 3, the plurality of first feature spaces are clustered by using a clustering algorithm (e.g., mahalanobis distance plus Kmeans) to obtain a first global feature 361.
As can be seen from fig. 3, the multidimensional operation and data samples 321 are distributed to a plurality of second data spaces according to expert experience, and then the multidimensional operation and data in the plurality of second data spaces are clustered to obtain a second global feature 371.
As can be seen from fig. 3, the second global feature 371 is used to determine the similarity of the first global feature 361, and when the similarity between the first global feature 361 and the second global feature 371 is greater than a preset threshold, the first global feature 361 is used as an optimal global feature 382, the first local feature corresponding to the first global feature 361 is used as an optimal local feature 381, the optimal global feature 382 and the optimal local feature 381 are input into a machine learning algorithm, and the running state 391 of the database is output.
According to the embodiment of the invention, as can be seen from fig. 3, the operation and maintenance data analysis method provided by the embodiment of the invention obtains the first local feature related to the operation state of the database by performing one-time clustering on the multidimensional operation and maintenance data related to the operation state of the database, and performs multiple clustering on the first local feature to obtain the first global feature, so that the technical problems of low efficiency and poor operation and maintenance effect in processing multidimensional operation and maintenance data in the related technology are at least partially overcome, the speed of extracting the first local feature and the first global feature is improved, and the first local feature and the first global feature are closely connected with each other, so that the first global feature is more global in representation, and the operation state of the database is rapidly and accurately determined according to the first global feature and the multiple first local features.
Fig. 4 shows a block diagram of a motion-data analysis apparatus according to an embodiment of the present invention.
As shown in fig. 4, the operation data analysis apparatus 400 of this embodiment includes a first space allocation module 410, a first local feature obtaining module 420, a first feature space obtaining module 430, a first global feature obtaining module 440, and an operation state determining module 450.
The first space allocation module 410 is configured to allocate multidimensional operation data samples related to the operation state of the database to a plurality of first data spaces according to a first preset rule.
The first local feature obtaining module 420 is configured to cluster, for each of the plurality of first data spaces, the multidimensional operation data in the first data space to obtain a first local feature.
The first feature space obtaining module 430 is configured to cluster the first local features corresponding to each of the plurality of first data spaces to obtain a plurality of first feature spaces.
A first global feature obtaining module 440, configured to cluster the plurality of first feature spaces to obtain a first global feature;
the operation state first determining module 450 is configured to determine an operation state of the database according to the first global feature and the plurality of first local features.
According to an embodiment of the present invention, the operation-data analysis apparatus 400 further includes:
and the second space allocation module is used for allocating the multidimensional operation data samples to a plurality of second data spaces according to expert experience.
And the second global feature obtaining module is used for clustering the multidimensional operation and maintenance data in the plurality of second data spaces to obtain second global features.
The operation state determining module 450 includes an operation state obtaining sub-module.
The running state obtaining sub-module is used for determining the running state of the database according to the first global feature, the second global feature and the plurality of first local features.
According to an embodiment of the present invention, the operation state obtaining submodule includes:
and the first similarity calculation unit is used for calculating the first similarity between the second global feature and the first global feature.
And the running state output unit is used for inputting the first global feature and the plurality of first local features into the machine learning algorithm and outputting the running state of the database under the condition that the first similarity is larger than a first threshold value.
According to an embodiment of the present invention, the operation-data analysis apparatus 400 further includes:
and the third space allocation module is used for allocating the multidimensional data samples to a plurality of third data spaces according to a second preset rule under the condition that the first similarity is smaller than or equal to a first threshold value, and the number of the third data spaces is larger than that of the first data spaces.
The second local feature obtaining module is used for clustering the multidimensional operation data in the third data space aiming at each third data space in the plurality of third data spaces to obtain the second local feature.
The second feature space obtaining module is used for clustering the second local features corresponding to the third data spaces respectively to obtain a plurality of second feature spaces;
the third global feature obtaining module is used for clustering the plurality of second feature spaces to obtain third global features;
and the second running state determining module is used for determining the running state of the database according to the third global feature and the plurality of second local features.
According to an embodiment of the invention, the second feature space obtaining module comprises a second feature space obtaining sub-module.
The second feature space obtaining submodule is used for clustering second local features corresponding to a plurality of third data spaces with the same features to obtain a plurality of second feature spaces.
According to an embodiment of the present invention, the first local feature obtaining module 420 includes a second similarity calculation sub-module and a first local feature obtaining sub-module.
A second similarity calculation sub-module for calculating a second similarity between the multi-dimensional data in the first data space;
The first local feature obtaining submodule is used for gathering multidimensional data corresponding to the second similarity into one type under the condition that the second similarity is larger than a second threshold value to obtain the first local feature.
According to an embodiment of the invention, the first feature space obtaining module comprises a first feature space obtaining sub-module.
The first feature space obtaining submodule is used for clustering the first local features corresponding to the first data spaces with the same features to obtain a plurality of first feature spaces.
According to an embodiment of the present invention, before the multidimensional data samples related to the database operation state are allocated to the plurality of first data spaces according to the first preset rule, the operation data analysis device 400 further includes an original operation data acquisition module and a multidimensional operation data sample acquisition module.
The original operation and maintenance data acquisition module is used for acquiring the original operation and maintenance data related to the operation state of the database;
and the multidimensional operation and data sample acquisition module is used for preprocessing the original operation and data to obtain multidimensional operation and data samples.
According to an embodiment of the present invention, an original operation data related to an operation state of a database includes: the method comprises the steps of accessing a port of a database, calling a memory of the database, attacking the port of the database or not, and storing the memory of the database.
Any number of the modules, sub-modules, units, sub-units, or at least part of the functionality of any number of the sub-units according to embodiments of the invention may be implemented in one module. Any one or more of the modules, sub-modules, units, sub-units according to embodiments of the present invention may be implemented as a split into multiple modules. Any one or more of the modules, sub-modules, units, sub-units according to embodiments of the invention may be implemented at least in part as hardware circuitry, such as a Field Programmable Gate Array (FPGA), programmable Logic Array (PLA), system-on-chip, system-on-substrate, system-on-package, application Specific Integrated Circuit (ASIC), or in hardware or firmware in any other reasonable manner of integrating or packaging circuitry, or in any one of, or in any suitable combination of, software, hardware, and firmware. Alternatively, one or more of the modules, sub-modules, units, sub-units according to embodiments of the invention may be at least partly implemented as computer program modules, which, when run, may perform the respective functions.
For example, any of the first space allocation module 410, the first local feature derivation module 420, the first feature space derivation module 430, the first global feature derivation module 440, and the operational state determination module 450 may be implemented in one module/unit/sub-unit, or any of the modules/units/sub-units may be split into multiple modules/units/sub-units. Alternatively, at least some of the functionality of one or more of these modules/units/sub-units may be combined with at least some of the functionality of other modules/units/sub-units and implemented in one module/unit/sub-unit. According to an embodiment of the present invention, at least one of the first space allocation module 410, the first local feature derivation module 420, the first feature space derivation module 430, the first global feature derivation module 440, and the operational state determination module 450 may be implemented at least in part as hardware circuitry, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system-on-chip, a system-on-substrate, a system-on-package, an Application Specific Integrated Circuit (ASIC), or in hardware or firmware in any other reasonable manner of integrating or packaging the circuitry, or in any one of or a suitable combination of three of software, hardware, and firmware. Alternatively, at least one of the first space allocation module 410, the first local feature derivation module 420, the first feature space derivation module 430, the first global feature derivation module 440, and the running state determination module 450 may be implemented at least in part as a computer program module that, when executed, performs the corresponding functions.
It should be noted that, in the embodiment of the present invention, the operation and data analysis device portion corresponds to the operation and data analysis method portion in the embodiment of the present invention, and the description of the operation and data analysis device portion specifically refers to the operation and data analysis method portion, which is not described herein.
Fig. 5 shows a block diagram of an electronic device adapted to implement the method of operation and data analysis according to an embodiment of the invention.
As shown in fig. 5, an electronic device 500 according to an embodiment of the present invention includes a processor 501 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 502 or a program loaded from a storage section 508 into a Random Access Memory (RAM) 503. The processor 501 may include, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or an associated chipset and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), or the like. The processor 501 may also include on-board memory for caching purposes. The processor 501 may comprise a single processing unit or a plurality of processing units for performing different actions of the method flow according to an embodiment of the invention.
In the RAM 503, various programs and data required for the operation of the electronic apparatus 500 are stored. The processor 501, ROM 502, and RAM 503 are connected to each other by a bus 504. The processor 501 performs various operations of the method flow according to an embodiment of the present invention by executing programs in the ROM 502 and/or the RAM 503. Note that the program may be stored in one or more memories other than the ROM 502 and the RAM 503. The processor 501 may also perform various operations of the method flow according to embodiments of the present invention by executing programs stored in the one or more memories.
According to an embodiment of the invention, the electronic device 500 may further comprise an input/output (I/O) interface 505, the input/output (I/O) interface 505 also being connected to the bus 504. The electronic device 500 may also include one or more of the following components connected to the I/O interface 505: an input section 506 including a keyboard, a mouse, and the like; an output portion 507 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker, and the like; a storage portion 508 including a hard disk and the like; and a communication section 509 including a network interface card such as a LAN card, a modem, or the like. The communication section 509 performs communication processing via a network such as the internet. The drive 510 is also connected to the I/O interface 505 as needed. A removable medium 511 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 510 as needed so that a computer program read therefrom is mounted into the storage section 508 as needed.
The present invention also provides a computer-readable storage medium that may be embodied in the apparatus/device/system described in the above embodiments; or may exist alone without being assembled into the apparatus/device/system. The computer-readable storage medium carries one or more programs which, when executed, implement methods in accordance with embodiments of the present invention.
According to embodiments of the present invention, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example, but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. For example, according to embodiments of the invention, the computer-readable storage medium may include ROM 502 and/or RAM503 and/or one or more memories other than ROM 502 and RAM503 described above.
Embodiments of the present invention also include a computer program product comprising a computer program containing program code for performing the method shown in the flowcharts. The program code means for causing a computer system to carry out the method of analysis of operational data provided by the embodiments of the present invention when the computer program product is run on the computer system.
The above-described functions defined in the system/apparatus of the embodiment of the present invention are performed when the computer program is executed by the processor 501. The systems, apparatus, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the invention.
In one embodiment, the computer program may be based on a tangible storage medium such as an optical storage device, a magnetic storage device, or the like. In another embodiment, the computer program may also be transmitted, distributed, and downloaded and installed in the form of a signal on a network medium, and/or installed from a removable medium 511 via the communication portion 509. The computer program may include program code that may be transmitted using any appropriate network medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.
In such an embodiment, the computer program may be downloaded and installed from a network via the communication portion 509, and/or installed from the removable media 511. The above-described functions defined in the system of the embodiment of the present invention are performed when the computer program is executed by the processor 501. The systems, devices, apparatus, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the invention.
According to embodiments of the present invention, program code for carrying out computer programs provided by embodiments of the present invention may be written in any combination of one or more programming languages, and in particular, such computer programs may be implemented in high-level procedural and/or object-oriented programming languages, and/or in assembly/machine languages. Programming languages include, but are not limited to, such as Java, c++, python, "C" or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Those skilled in the art will appreciate that the features recited in the various embodiments of the invention and/or in the claims may be combined in various combinations and/or combinations, even if such combinations or combinations are not explicitly recited in the invention. In particular, the features recited in the various embodiments of the invention and/or in the claims can be combined in various combinations and/or combinations without departing from the spirit and teachings of the invention. All such combinations and/or combinations fall within the scope of the invention.
The embodiments of the present invention are described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present invention. Although the embodiments are described above separately, this does not mean that the measures in the embodiments cannot be used advantageously in combination. The scope of the invention is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be made by those skilled in the art without departing from the scope of the invention, and such alternatives and modifications are intended to fall within the scope of the invention.

Claims (12)

1. A method of operation and data analysis, the method comprising:
distributing multidimensional operation and data samples related to the operation state of the database to a plurality of first data spaces according to a first preset rule;
Clustering the multidimensional data in the first data space aiming at each of the plurality of first data spaces to obtain a first local feature;
clustering the first local features corresponding to the first data spaces to obtain a plurality of first feature spaces;
clustering the plurality of first feature spaces to obtain first global features;
and determining the running state of the database according to the first global feature and the plurality of first local features.
2. The method according to claim 1, wherein the method further comprises:
assigning the multi-dimensional data samples to a plurality of second data spaces based on expert experience;
clustering the multidimensional operation data in the plurality of second data spaces to obtain second global features;
wherein said determining an operational state of the database from the first global feature and the plurality of first local features comprises:
and determining the running state of the database according to the first global feature, the second global feature and the plurality of first local features.
3. The method of claim 2, wherein said determining an operational state of the database from the first global feature, the second global feature, and the plurality of first local features comprises:
Calculating a first similarity of the second global feature and the first global feature;
and under the condition that the first similarity is larger than a first threshold value, inputting the first global feature and the plurality of first local features into a machine learning algorithm, and outputting the running state of the database.
4. A method according to claim 3, characterized in that the method further comprises:
if the first similarity is smaller than or equal to the first threshold, distributing the multidimensional data samples to a plurality of third data spaces according to a second preset rule, wherein the number of the third data spaces is larger than that of the first data spaces;
clustering the multidimensional data in the third data space aiming at each third data space in the plurality of third data spaces to obtain a second local feature;
clustering the second local features corresponding to the third data spaces to obtain a plurality of second feature spaces;
clustering the plurality of second feature spaces to obtain a third global feature;
and determining the running state of the database according to the third global feature and the second local features.
5. The method of claim 4, wherein clustering the second local features corresponding to each of the plurality of third data spaces to obtain a plurality of second feature spaces comprises:
and clustering the second local features corresponding to the plurality of third data spaces with the same features to obtain a plurality of second feature spaces.
6. The method of claim 1, wherein the clustering the multi-dimensional operation data in the first data space for each of the plurality of first data spaces to obtain a first local feature comprises:
calculating a second similarity between the multi-dimensional operation data in the first data space;
and under the condition that the second similarity is larger than a second threshold value, grouping the multidimensional operation and maintenance data corresponding to the second similarity into one type to obtain the first local feature.
7. The method of claim 1, wherein clustering the first local features corresponding to each of the plurality of first data spaces to obtain a plurality of first feature spaces comprises:
and clustering the first local features corresponding to the plurality of first data spaces with the same features to obtain the plurality of first feature spaces.
8. The method of claim 1, wherein prior to said assigning the multi-dimensional operational data samples associated with the database operational state to the plurality of first data spaces according to the first preset rule, the method further comprises:
acquiring original operation data related to the operation state of a database;
and preprocessing the original operation and maintenance data to obtain a multidimensional operation and maintenance data sample.
9. The method of claim 8, wherein the original run-data associated with the database run-state comprises: the method comprises the steps of accessing a port of a database, calling a memory of the database, attacking the port of the database or not, and storing the memory of the database.
10. A fortune dimension analysis device, the device comprising:
the first space allocation module is used for allocating multidimensional operation and data samples related to the operation state of the database to a plurality of first data spaces according to a first preset rule;
the first local feature obtaining module is used for clustering the multidimensional dimension data in the first data space aiming at each of the plurality of first data spaces to obtain first local features;
The first feature space obtaining module is used for clustering the first local features corresponding to the plurality of first data spaces to obtain a plurality of first feature spaces;
the first global feature obtaining module is used for clustering the plurality of first feature spaces to obtain first global features;
and the running state first determining module is used for determining the running state of the database according to the first global feature and the plurality of first local features.
11. An electronic device, comprising:
one or more processors;
a memory for storing one or more programs,
wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method of any of claims 1-9.
12. A computer readable storage medium having stored thereon executable instructions which, when executed by a processor, cause the processor to perform the method according to any of claims 1 to 9.
CN202310265533.1A 2023-03-20 2023-03-20 Fortune dimension analysis method, device, equipment and medium Active CN115981970B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310265533.1A CN115981970B (en) 2023-03-20 2023-03-20 Fortune dimension analysis method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310265533.1A CN115981970B (en) 2023-03-20 2023-03-20 Fortune dimension analysis method, device, equipment and medium

Publications (2)

Publication Number Publication Date
CN115981970A CN115981970A (en) 2023-04-18
CN115981970B true CN115981970B (en) 2023-05-16

Family

ID=85972540

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310265533.1A Active CN115981970B (en) 2023-03-20 2023-03-20 Fortune dimension analysis method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN115981970B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117093850A (en) * 2023-08-25 2023-11-21 鱼快创领智能科技(南京)有限公司 Feature extraction method of driving data based on topology analysis

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108446200A (en) * 2018-02-07 2018-08-24 福建星瑞格软件有限公司 Server intelligence O&M method based on big data machine learning and computer equipment
CN110443264A (en) * 2018-05-03 2019-11-12 北京京东尚科信息技术有限公司 A kind of method and apparatus of cluster
US10572778B1 (en) * 2019-03-15 2020-02-25 Prime Research Solutions LLC Machine-learning-based systems and methods for quality detection of digital input
CN113051452A (en) * 2021-04-12 2021-06-29 清华大学 Operation and maintenance data feature selection method and device
CN113535673A (en) * 2020-04-17 2021-10-22 北京京东振世信息技术有限公司 Method and device for generating configuration file and processing data
CN114612514A (en) * 2022-03-14 2022-06-10 西安邮电大学 Multi-feature multi-resolution track anomaly detection method
CN114897074A (en) * 2022-05-13 2022-08-12 北京纪新泰富机电技术股份有限公司 Method and device for determining running state of equipment, equipment and storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070022120A1 (en) * 2005-07-25 2007-01-25 Microsoft Corporation Caching and modifying portions of a multi-dimensional database on a user device
KR101003842B1 (en) * 2008-10-24 2010-12-23 연세대학교 산학협력단 Method and system of clustering for multi-dimensional data streams
US20120137367A1 (en) * 2009-11-06 2012-05-31 Cataphora, Inc. Continuous anomaly detection based on behavior modeling and heterogeneous information analysis
US10223191B2 (en) * 2016-07-20 2019-03-05 International Business Machines Corporation Anomaly detection in performance management

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108446200A (en) * 2018-02-07 2018-08-24 福建星瑞格软件有限公司 Server intelligence O&M method based on big data machine learning and computer equipment
CN110443264A (en) * 2018-05-03 2019-11-12 北京京东尚科信息技术有限公司 A kind of method and apparatus of cluster
US10572778B1 (en) * 2019-03-15 2020-02-25 Prime Research Solutions LLC Machine-learning-based systems and methods for quality detection of digital input
CN113535673A (en) * 2020-04-17 2021-10-22 北京京东振世信息技术有限公司 Method and device for generating configuration file and processing data
CN113051452A (en) * 2021-04-12 2021-06-29 清华大学 Operation and maintenance data feature selection method and device
CN114612514A (en) * 2022-03-14 2022-06-10 西安邮电大学 Multi-feature multi-resolution track anomaly detection method
CN114897074A (en) * 2022-05-13 2022-08-12 北京纪新泰富机电技术股份有限公司 Method and device for determining running state of equipment, equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
AI深度学习在移动网异常小区检测分类中的应用;王勇等;《邮电设计技术》;第1-5页 *

Also Published As

Publication number Publication date
CN115981970A (en) 2023-04-18

Similar Documents

Publication Publication Date Title
WO2021174944A1 (en) Message push method based on target activity, and related device
CN111814910B (en) Abnormality detection method, abnormality detection device, electronic device, and storage medium
CN112236761B (en) Dynamic delta updating of data cubes
WO2015148159A1 (en) Determining a temporary transaction limit
CN107392259B (en) Method and device for constructing unbalanced sample classification model
CN115981970B (en) Fortune dimension analysis method, device, equipment and medium
CN112016793B (en) Resource allocation method and device based on target user group and electronic equipment
CN111210109A (en) Method and device for predicting user risk based on associated user and electronic equipment
CN116155628B (en) Network security detection method, training device, electronic equipment and medium
CN116701935A (en) Sensitivity prediction model training method, sensitivity information processing method and sensitivity information processing device
CN114218283A (en) Abnormality detection method, apparatus, device, and medium
CN114358024A (en) Log analysis method, apparatus, device, medium, and program product
CN114443663A (en) Data table processing method, device, equipment and medium
CN113052509A (en) Model evaluation method, model evaluation apparatus, electronic device, and storage medium
CN113986671A (en) Operation and maintenance data anomaly detection method, device, equipment and medium
CN113657947B (en) Data processing method and device executed by electronic equipment and electronic equipment
CN114219053B (en) User position information processing method and device and electronic equipment
CN116467613A (en) Application classification method and device, electronic equipment and computer readable storage medium
CN113674011A (en) Data processing method, device, computing equipment and medium for user behaviors
CN115203502A (en) Business data processing method and device, electronic equipment and storage medium
CN118796613A (en) Database alarm method and device
CN116680308A (en) Database query method and device, electronic equipment and computer readable storage medium
CN113570113A (en) Equipment loss prediction method and device and electronic equipment
CN116010952A (en) Dynamic baseline determination method, transaction data detection method, device and electronic equipment
CN117540140A (en) Method, device and equipment for determining probability distribution information of renewable energy sources

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant