CN111708682B - Data prediction method, device, equipment and storage medium - Google Patents

Data prediction method, device, equipment and storage medium Download PDF

Info

Publication number
CN111708682B
CN111708682B CN202010552482.7A CN202010552482A CN111708682B CN 111708682 B CN111708682 B CN 111708682B CN 202010552482 A CN202010552482 A CN 202010552482A CN 111708682 B CN111708682 B CN 111708682B
Authority
CN
China
Prior art keywords
data
sample
monitoring
features
service data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010552482.7A
Other languages
Chinese (zh)
Other versions
CN111708682A (en
Inventor
余国良
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202010552482.7A priority Critical patent/CN111708682B/en
Publication of CN111708682A publication Critical patent/CN111708682A/en
Application granted granted Critical
Publication of CN111708682B publication Critical patent/CN111708682B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/32Monitoring with visual or acoustical indication of the functioning of the machine
    • G06F11/324Display of status information
    • G06F11/327Alarm or error message display
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored

Abstract

The embodiment of the application provides a data prediction method, a data prediction device, data prediction equipment and a storage medium, wherein the method comprises the following steps: acquiring service data; performing feature extraction on the service data based on the index features of the monitoring indexes to obtain the data features of the service data; wherein the index features include independent features and joint features; the combined features are obtained by nonlinear combination of at least two independent features; predicting data characteristics by adopting a machine learning model based on the sample space of the monitoring index to obtain a prediction result of the service data; the data characteristic of each sample in the sample space comprises a first characteristic value and a second characteristic value; the first characteristic value is a characteristic value of the independent characteristic; the second characteristic value is a characteristic value of the joint characteristic; and generating and outputting alarm information when the service data are determined to be abnormal according to the prediction result. Through the method and the device, the strong dependence on the monitoring alarm rule can be cancelled, the complexity and the limitation of rule configuration are avoided, and the accuracy of monitoring alarm can be improved.

Description

Data prediction method, device, equipment and storage medium
Technical Field
The embodiment of the application relates to, but not limited to, the technical field of computers, and in particular relates to a data prediction method, device, equipment and storage medium.
Background
In the internet era, in order to ensure the normal operation of the service, the data of the service system or the service index needs to be tracked and monitored in all directions, so as to control the system state in real time, discover the abnormality of the service system and process the abnormality in time, thereby ensuring the correctness and stability of the system.
In the related art, the monitoring rule is configured to screen the service index data to be monitored, so that the monitoring effect is achieved, and the abnormity of the service system is found. This requires that the service responsible person pre-configures corresponding monitoring rules for specific abnormal conditions in the monitoring system according to the characteristics of each service. Because the manual definition of the rule is complicated and error is easy to occur, and the related threshold is not easy to determine when the rule is defined, the false alarm rate is increased due to the fact that the threshold is too tight, and the system stability is influenced due to the fact that the threshold is too loose. Furthermore, this monitoring approach is not discoverable for exceptions outside of the rules.
Disclosure of Invention
The embodiment of the application provides a data prediction method, a data prediction device, equipment and a computer readable storage medium, by adopting a model prediction mode, the business data of a big data acquisition and processing system is predicted based on a sample space of a monitoring index, the business data can be monitored, and then the running state of the big data acquisition and processing system is monitored, the cost of manually configuring rules can be avoided, the alarm limitation caused by the rules is avoided, meanwhile, the accuracy of model prediction can be improved, and further, the accuracy of monitoring alarm is improved.
The technical scheme of the embodiment of the application is realized as follows:
the embodiment of the application provides a data prediction method, which is applied to a monitoring system to monitor the running state of a big data acquisition and processing system, and comprises the following steps:
acquiring service data of the big data acquisition and processing system;
performing feature extraction on the service data based on the index features of the monitoring indexes to obtain the data features of the service data; the index characteristic is used for representing the data volume or the data quality of the service data; the index features comprise independent features and joint features; the joint features are obtained by nonlinear combination of at least two independent features;
predicting the data characteristics by adopting a machine learning model based on the sample space of the monitoring index to obtain a prediction result of the service data; the data feature of each sample in the sample space comprises a first feature value and a second feature value; the first characteristic value is a characteristic value of an independent characteristic; the second characteristic value is a characteristic value of the joint characteristic;
when the data volume or the data quality of the service data is determined to be abnormal according to the prediction result, determining that the operation state of the big data acquisition and processing system is abnormal, and generating alarm information;
and outputting the alarm information.
The embodiment of the present application provides a data prediction apparatus, which is applied to a monitoring system to monitor an operation state of a big data acquisition and processing system, and includes:
the acquisition module is used for acquiring the service data of the big data acquisition and processing system;
the extraction module is used for extracting the characteristics of the service data based on the index characteristics of the monitoring indexes to obtain the data characteristics of the service data; the index characteristic is used for representing the data volume or the data quality of the service data; the index features comprise independent features and joint features; the joint features are obtained by nonlinear combination of at least two independent features;
the prediction module is used for predicting the data characteristics by adopting a machine learning model based on the sample space of the monitoring index to obtain a prediction result of the business data; the data feature of each sample in the sample space comprises a first feature value and a second feature value; the first characteristic value is a characteristic value of an independent characteristic; the second characteristic value is a characteristic value of the joint characteristic;
an alarm module for: when the data volume or the data quality of the service data is determined to be abnormal according to the prediction result, determining that the operation state of the big data acquisition and processing system is abnormal, and generating alarm information; and outputting the alarm information.
The embodiment of the application provides a data prediction device, including:
a memory for storing executable instructions; and the processor is used for realizing the method when executing the executable instructions stored in the memory.
Embodiments of the present application provide a computer-readable storage medium storing executable instructions for causing a processor to implement the above-mentioned method when executed.
The embodiment of the application has the following beneficial effects:
by adopting a model prediction mode, the service data of the big data acquisition and processing system is predicted based on the sample space of the monitoring index, the monitoring of the service data can be realized, the monitoring of the operation state of the big data acquisition and processing system is further realized, and an alarm is given when the abnormal operation state of the big data acquisition and processing system is monitored. Because this process no longer need dispose loaded down with trivial details control rule, can carry out automatic study based on sample space, so, can avoid the cost of artifical configuration rule, avoid the warning limitation that the rule leads to, can also improve the accuracy that the model forecasted simultaneously, and then can improve the accuracy that the control was reported an emergency and asked for help or increased vigilance. In addition, due to the introduction of the joint features, the nonlinear relation among a plurality of features can be combined for prediction, so that the accuracy of model prediction can be further improved.
Drawings
FIG. 1 is a schematic flow chart of an application of a monitoring system in the related art;
FIG. 2A is a schematic diagram of an alternative architecture of a data prediction system according to an embodiment of the present application;
FIG. 2B is a schematic diagram of an alternative structure of the data prediction system applied to the blockchain system according to the embodiment of the present disclosure;
FIG. 2C is an alternative block diagram according to an embodiment of the present disclosure;
FIG. 3 is a schematic structural diagram of a server provided in an embodiment of the present application;
FIG. 4 is a schematic flow chart diagram illustrating an alternative data prediction method according to an embodiment of the present disclosure;
FIG. 5 is a schematic flow chart diagram illustrating an alternative data prediction method according to an embodiment of the present disclosure;
FIG. 6 is a schematic flow chart diagram illustrating an alternative data prediction method according to an embodiment of the present disclosure;
FIG. 7 is a schematic flow chart diagram illustrating an alternative data prediction method according to an embodiment of the present disclosure;
FIG. 8 is a schematic flow chart illustrating an implementation of a model training method according to an embodiment of the present disclosure;
FIG. 9 is a schematic flow chart illustrating an implementation of a model prediction method according to an embodiment of the present disclosure;
FIG. 10 is a schematic diagram illustrating an application flow of a monitoring system according to an embodiment of the present application;
fig. 11 is a schematic application scenario diagram of a data prediction method according to an embodiment of the present application.
Detailed Description
In order to make the objectives, technical solutions and advantages of the present application clearer, the present application will be described in further detail with reference to the attached drawings, the described embodiments should not be considered as limiting the present application, and all other embodiments obtained by a person of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.
In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or different subsets of all possible embodiments, and may be combined with each other without conflict. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the embodiments of the present application belong. The terminology used in the embodiments of the present application is for the purpose of describing the embodiments of the present application only and is not intended to be limiting of the present application.
Before further detailed description of the embodiments of the present application, terms and expressions referred to in the embodiments of the present application will be described, and the terms and expressions referred to in the embodiments of the present application will be used for the following explanation.
1) Support Vector Machine (SVM): a supervised learning model and associated learning algorithm for analyzing data in classification and regression analysis.
2) Data characteristics: the features obtained by processing the raw data are generally a vector.
3) Data acquisition: data is read from a data source and input to the system.
4) Data cleaning: and (4) screening, perfecting and formatting the original data according to specific requirements.
5) Monitoring indexes: the target variable observed in the system is monitored.
6) Model training: and obtaining model parameters through a specific algorithm by using the labeled data characteristics.
7) Model prediction: and calculating the value corresponding to the feature by using the model obtained by training.
In order to better understand the data prediction method provided in the embodiment of the present application, a description is first given of a monitoring method for a service index in the related art:
in the related art, a monitoring effect is achieved by screening data through predefined rules, and a service principal in a monitoring system often needs to configure corresponding data rules for specific abnormal conditions according to the characteristics of each service, for example, data meeting the following conditions are abnormal data:
rule 1: day data volume/yesterday data volume < 50%;
rule 2: a data integrity score < 0.9;
rule 3: the amount of data for which the value of the field a per unit time exceeds 100 exceeds 5%.
This monitoring method has the following disadvantages: 1) the definition rule is complicated and easy to make mistakes; 2) the correlation threshold is not well determined, the false alarm rate is increased due to the fact that the threshold is too tight, and the system stability is affected due to the fact that the false alarm rate is increased due to the fact that the threshold is too loose; 3) exceptions outside the rules cannot be discovered.
Fig. 1 is a schematic diagram of an application flow of a monitoring system in the related art, as shown in fig. 1, the application flow of the monitoring system includes:
step S101, a monitoring system accesses data to be monitored from a data source;
step S102, configuring a monitoring index calculation rule through a configuration interface of a monitoring system by a user;
step S103, a user configures/modifies the alarm rule through a configuration interface of the monitoring system;
step S104, the user receives the alarm through the terminal;
step S105, the user judges whether the alarm is accurate, if so, the step S106 is entered, otherwise, the step S103 is returned to;
and step S106, the user processes the alarm through the terminal.
Therefore, in the application process of the monitoring system, after the data in the data source is accessed into the monitoring system, the user needs to configure the index calculation rule and the alarm rule, and the monitoring system can be put into use after the configuration is completed. In the using process, if the user finds that the alarm is inaccurate, the alarm rule needs to be modified. When missing exceptions are found, the user also needs to add rules for the missing exceptions.
Based on at least one of the above problems in the related art, an embodiment of the present application provides a data prediction method, which includes first obtaining service data of a big data acquisition and processing system; then, based on the index characteristics of the monitoring indexes, performing characteristic extraction on the service data to obtain the data characteristics of the service data; the index features are used for representing the data volume or the data quality of the service data; the index features include independent features and joint features; the combined features are obtained by nonlinear combination of at least two independent features; secondly, predicting data characteristics by adopting a machine learning model based on a sample space of the monitoring index to obtain a prediction result of the service data; wherein the data characteristic of each sample in the sample space comprises a first characteristic value and a second characteristic value; the first characteristic value is a characteristic value of the independent characteristic; the second characteristic value is a characteristic value of the joint characteristic; and finally, when the data volume or the data quality of the service data is determined to be abnormal according to the prediction result, determining that the operation state of the big data acquisition and processing system is abnormal, generating alarm information and outputting the alarm information. Therefore, by adopting a model prediction mode, the sample space based on the monitoring index predicts the service data of the big data acquisition and processing system, the monitoring of the service data can be realized, the monitoring of the running state of the big data acquisition and processing system is realized, and the alarm is given when the running state of the big data acquisition and processing system is abnormal, the complex monitoring rule is not required to be configured in the process, the automatic learning can be carried out based on the sample space, the cost of manually configuring the rule can be avoided, the alarm limitation caused by the rule is avoided, and the accuracy of monitoring alarm can be improved. In addition, due to the introduction of the joint features, the nonlinear relation among a plurality of features can be combined for prediction, so that the accuracy of model prediction can be further improved.
An exemplary application of the data prediction apparatus provided in the embodiment of the present application is described below, and the data prediction apparatus provided in the embodiment of the present application may be implemented as any terminal having an on-screen display function, such as a notebook computer, a tablet computer, a desktop computer, a mobile device (e.g., a mobile phone, a portable music player, a personal digital assistant, a dedicated messaging device, a portable game device), an intelligent robot, or may be implemented as a server. Next, an exemplary application when the data prediction apparatus is implemented as a server will be explained.
Referring to fig. 2A, fig. 2A is a schematic diagram of an alternative architecture of the data prediction system 20 according to the embodiment of the present application. In order to realize data prediction of any kind of service index data, the data prediction system 20 includes a terminal 100, a network 200 and a server 300. An application program runs on the terminal 100, and when the data prediction method of the embodiment of the application is implemented, the server 300 obtains real-time data of the monitoring index from a data source; performing feature extraction on the real-time data to obtain data features of the real-time data; wherein the data features comprise independent features and joint features; the joint features are obtained by nonlinear combination of at least two independent features; predicting the data characteristics based on the sample space of the monitoring index to obtain a prediction result of the real-time data, and feeding the prediction result back to the terminal 100 through the network 200; the prediction result may be presented in a view form on the application program, and then the terminal 100 may display the view on the current interface 100-1 to present the prediction result of the real-time data of the index to be monitored to the user. In some embodiments, the server 300 may further generate alarm information when determining that the real-time data is abnormal according to the prediction result; and feeds back the alarm information to the terminal 100 through the network 200; after acquiring the alarm information, the terminal 100 displays the alarm information on the page 100-1 for the user to view and process. In some embodiments, the terminal 100 may also receive a processing result of the alarm information input by the user; and transmits the processing result to the server 300 through the network 200; the server 300 determines the processing state of the alarm information according to the processing result; wherein, the processing state is used for representing whether the alarm information is rejected; marking real-time data corresponding to the alarm information based on the processing state; generating a labeled sample based on the labeled real-time data; and adding the marked sample into the sample space of the monitoring index.
The data prediction system 20 according To the embodiment of the present application may also be a distributed system 201 of a blockchain system, referring To fig. 2B, where fig. 2B is an optional structural schematic diagram of the data prediction system 20 provided in the embodiment of the present application, where the distributed system 201 may be a distributed node formed by a plurality of nodes 202 (any form of computing devices in an access network, such as servers and user terminals) and a client 203, a Peer-To-Peer (P2P, Peer To Peer) network is formed between the nodes, and the P2P Protocol is an application layer Protocol operating on a Transmission Control Protocol (TCP). In a distributed system, any machine, such as a server or a terminal, can join to become a node, and the node comprises a hardware layer, a middle layer, an operating system layer and an application layer.
Referring to the functions of each node in the blockchain system shown in fig. 2B, the functions involved include:
1) routing, a basic function that a node has, is used to support communication between nodes.
Besides the routing function, the node may also have the following functions:
2) the application is used for being deployed in a block chain, realizing specific services according to actual service requirements, recording data related to the realization functions to form recording data, carrying a digital signature in the recording data to represent a source of task data, and sending the recording data to other nodes in the block chain system, so that the other nodes add the recording data to a temporary block when the source and integrity of the recording data are verified successfully.
For example, the services implemented by the application include:
2.1) wallet, for providing the function of transaction of electronic money, including initiating transaction (i.e. sending the transaction record of current transaction to other nodes in the blockchain system, after the other nodes are successfully verified, storing the record data of transaction in the temporary blocks of the blockchain as the response of confirming the transaction is valid; of course, the wallet also supports the querying of the electronic money remaining in the electronic money address.
And 2.2) sharing the account book, wherein the shared account book is used for providing functions of operations such as storage, query and modification of account data, record data of the operations on the account data are sent to other nodes in the block chain system, and after the other nodes verify the validity, the record data are stored in a temporary block as a response for acknowledging that the account data are valid, and confirmation can be sent to the node initiating the operations.
2.3) Intelligent contracts, computerized agreements, which can enforce the terms of a contract, implemented by codes deployed on a shared ledger for execution when certain conditions are met, for completing automated transactions according to actual business requirement codes, such as querying the logistics status of goods purchased by a buyer, transferring the buyer's electronic money to the merchant's address after the buyer signs for the goods; of course, smart contracts are not limited to executing contracts for trading, but may also execute contracts that process received information.
3) And the Block chain comprises a series of blocks (blocks) which are mutually connected according to the generated chronological order, new blocks cannot be removed once being added into the Block chain, and recorded data submitted by nodes in the Block chain system are recorded in the blocks.
4) Consensus (Consensus), a process in a blockchain network, is used to agree on transactions in a block among a plurality of nodes involved, the agreed block is to be appended to the end of the blockchain, and the mechanisms for achieving Consensus include Proof of workload (PoW, Proof of Work), Proof of rights and interests (PoS, Pr oof of stamp), Proof of equity authority (DPoS, released Proof of-of-stamp), Proof of Elapsed Time (PoET, Proof of Elapsed Time), and so on.
Referring to fig. 2C, fig. 2C is an optional schematic diagram of a Block Structure (Block Structure) provided in this embodiment, each Block includes a hash value of a transaction record (hash value of the Block) stored in the Block and a hash value of a previous Block, and the blocks are connected by the hash values to form a Block chain. The block may include information such as a time stamp at the time of block generation. A block chain (Blockchain), which is essentially a decentralized database, is a string of data blocks associated by using cryptography, and each data block contains related information for verifying the validity (anti-counterfeiting) of the information and generating a next block.
Referring to fig. 3, fig. 3 is a schematic structural diagram of a server 300 according to an embodiment of the present application, where the server 300 shown in fig. 3 includes: at least one processor 310, memory 350, at least one network interface 320, and a user interface 330. The various components in server 300 are coupled together by a bus system 340. It will be appreciated that the bus system 340 is used to enable communications among the components connected. The bus system 340 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, however, the various buses are labeled as bus system 340 in fig. 3.
The Processor 310 may be an integrated circuit chip having Signal processing capabilities, such as a general purpose Processor, a Digital Signal Processor (DSP), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like, wherein the general purpose Processor may be a microprocessor or any conventional Processor, or the like.
The user interface 330 includes one or more output devices 331, including one or more speakers and/or one or more visual display screens, that enable presentation of media content. The user interface 330 also includes one or more input devices 332, including user interface components to facilitate user input, such as a keyboard, mouse, microphone, touch screen display, camera, other input buttons and controls.
The memory 350 may be removable, non-removable, or a combination thereof. Exemplary hardware devices include solid state memory, hard disk drives, optical disk drives, and the like. Memory 350 optionally includes one or more storage devices physically located remote from processor 310. The memory 350 may include either volatile memory or nonvolatile memory, and may also include both volatile and nonvolatile memory. The nonvolatile Memory may be a Read Only Memory (ROM), and the volatile Memory may be a Random Access Memory (RAM). The memory 350 described in embodiments herein is intended to comprise any suitable type of memory. In some embodiments, memory 350 is capable of storing data, examples of which include programs, modules, and data structures, or subsets or supersets thereof, as exemplified below, to support various operations.
An operating system 351 including system programs for processing various basic system services and performing hardware-related tasks, such as a framework layer, a core library layer, a driver layer, etc., for implementing various basic services and processing hardware-based tasks;
a network communication module 352 for communicating to other computing devices via one or more (wired or wireless) network interfaces 320, exemplary network interfaces 320 including: bluetooth, wireless compatibility authentication (WiFi), and Universal Serial Bus (USB), etc.;
an input processing module 353 for detecting one or more user inputs or interactions from one of the one or more input devices 332 and translating the detected inputs or interactions.
In some embodiments, the apparatus provided by the embodiments of the present application may be implemented in software, and fig. 3 illustrates a data prediction apparatus 354 stored in the memory 350, where the data prediction apparatus 354 may be a data prediction apparatus in the server 300, which may be software in the form of programs and plug-ins, and the like, and includes the following software modules: the obtaining module 3541, the extracting module 3542, and the predicting module 3543, which are logical and thus may be arbitrarily combined or further separated depending on the functionality implemented. The functions of the respective modules will be explained below.
In other embodiments, the apparatus provided in the embodiments of the present Application may be implemented in hardware, and for example, the apparatus provided in the embodiments of the present Application may be a processor in the form of a hardware decoding processor, which is programmed to execute the data prediction method provided in the embodiments of the present Application, for example, the processor in the form of the hardware decoding processor may be one or more Application Specific Integrated Circuits (ASICs), DSPs, Programmable Logic Devices (PLDs), Complex Programmable Logic Devices (CPLDs), Field Programmable Gate Arrays (FPGAs), or other electronic components.
The data prediction method provided by the embodiment of the present application will be described below in conjunction with an exemplary application and implementation of the server 300 provided by the embodiment of the present application. Referring to fig. 4, fig. 4 is an alternative flow chart of a data prediction method provided in an embodiment of the present application, which will be described with reference to the steps shown in fig. 4.
Step S401, acquiring service data;
here, the service data is data to be predicted, and may be online data acquired in real time or historical offline data. The business data may be obtained from a particular data source, which may include, but is not limited to, a database, a message queue, a file system, and the like. The server can actively send a data acquisition request to the data source, and the data source responds to the data acquisition request and returns the service data to the server. The data source can also send the collected service data to the server at regular time, and the server monitors and receives the service data sent by the data source. In implementation, a person skilled in the art may select an appropriate manner to obtain service data according to an actual application scenario, which is not limited in the embodiment of the present application.
Step S402, extracting the characteristics of the service data based on the index characteristics of the monitoring indexes to obtain the data characteristics of the service data; wherein the index features include independent features and joint features; the combined features are obtained by nonlinear combination of at least two independent features;
here, the monitoring index is a target variable to be monitored in the service data, and may include, but is not limited to, any suitable index such as a data index, a system performance index, a network performance index, and the like. For example, the monitoring metrics may be the amount of data entering the system to be monitored, the correctness and completeness of the fields in the database, and various characteristics of the data sets output by the computing tasks.
The data characteristic of the service data is data extracted from the service data according to the index characteristic of the monitoring index, and is generally a vector. The index features may include individual features and joint features. The independent feature may be a variable for independently embodying a certain characteristic of the monitoring index, for example, the data amount of the monitoring index per unit time in the database, the data integrity score, the data range of the key field, the length of a single piece of data, and the like may be selected as the independent feature. The joint feature may be a non-linear combination of at least two independent features, which may represent a non-linear relationship between multiple features of the monitoring index. In practice, the non-linear combination between the at least two independent features may include, but is not limited to, multiplication, division, logarithm, etc., e.g., the (amount of data per unit time per length of data per strip), (data integrity score) per length of data, and (amount of data per unit time per data integrity score) may be chosen as the joint feature.
In some embodiments, multiple independent features with associated relationships may be selected for non-linear combination to obtain a joint feature. Here, the association relationship may include, but is not limited to, a business logic association, a data attribute association, and the like.
Step S403, predicting data characteristics by adopting a machine learning model based on the sample space of the monitoring index to obtain a prediction result of the service data; wherein the data characteristic of each sample in the sample space comprises a first characteristic value and a second characteristic value; the first characteristic value is a characteristic value of the independent characteristic; the second eigenvalue is an eigenvalue of the joint eigenvalue.
Here, the sample in the sample space includes the data feature and a label corresponding to the data feature, and the label can be used to characterize whether the corresponding data feature is abnormal or not. An annotation can be any suitable identifier including, but not limited to, a particular numerical value, text, symbol, etc. For example, 1 may be used to indicate that the data feature is normal, and-1 may be used to indicate that the data feature is abnormal; y can also be used for indicating that the data characteristics are normal, and N can be used for indicating that the data characteristics are abnormal.
The samples in the sample space may include samples obtained by labeling the historical data, or may include samples automatically generated according to a specific rule.
According to the prediction result, whether the service data is abnormal or not can be determined. The prediction result may include a prediction value of the service data obtained by prediction based on the sample in the sample space, and it may be determined whether the service data is abnormal according to the prediction value. The prediction result may also be a prediction of whether the traffic data is abnormal, including an identifier that characterizes whether the traffic data is abnormal. In some embodiments, the prediction result comprises a predicted value, and when the predicted value is smaller than a specific anomaly threshold value, the business data is determined to be abnormal. For example, when the predicted value in the prediction result is less than 0, it indicates that the service data is abnormal, and when the predicted value in the prediction result is greater than or equal to 0, it indicates that the service data is normal.
The machine learning model used in predicting the data features may be any suitable classification model including, but not limited to, a support vector machine model, a logistic regression model, and the like.
The data prediction method provided by the embodiment of the application can realize the monitoring of the business data by adopting a model prediction mode and predicting the business data based on the sample space. Because this process no longer need dispose loaded down with trivial details control rule, and can carry out automatic study based on the sample space of control index, so, can avoid the cost of artifical configuration rule, avoid the limitation of reporting an emergency and asking for help or increased vigilance that the rule leads to, can also improve the accuracy of model prediction simultaneously, and then can improve the accuracy of monitoring and reporting an emergency and asking for help or increased vigilance. In addition, due to the introduction of the joint features, the nonlinear relation among a plurality of features can be combined for prediction, so that the accuracy of model prediction can be further improved.
In some embodiments, the sample space may include initial samples of the monitoring metrics, the initial samples including positive and negative samples. Correspondingly, fig. 5 is an optional flowchart of the data prediction method provided in the embodiment of the present application, and as shown in fig. 5, before step S403, the method may further include:
step S501, randomly generating a first number of positive examples and a second number of negative examples according to a preset starting rule; wherein the start-up rule includes an abnormality judgment condition; the positive example sample does not satisfy the abnormality determination condition, and the negative example sample satisfies the abnormality determination condition.
Here, in the initial stage of operation of the data prediction system, due to the lack of samples, the model may not be effectively trained, and a simple start rule may be preset for the monitoring index to generate an initial sample.
The positive example is a sample with normal data characteristics, and the negative example is a sample with abnormal data characteristics. The preset starting rule can be used for judging whether the data characteristics are abnormal or not, and can include but not be limited to a simple abnormal judgment condition. Based on a preset starting rule, a positive sample which does not meet the abnormity judgment condition and a negative sample which meets the abnormity judgment condition can be randomly generated. For example, the abnormality determination condition in the startup rule includes: and if the data quantity of the data features with the data integrity score of less than 0.9 and the value of the unit time field A exceeding 100 exceeds 5%, the data features of the positive sample generated by the starting rule do not meet the abnormity judgment condition, and the data features of the negative sample generated by the starting rule meet the abnormity judgment condition.
In practice, the first amount and the second amount can be selected by those skilled in the art according to the actual situation.
In the embodiment of the application, a part of initial samples can be generated by presetting a simple starting rule for the monitoring index, and the initial samples comprise positive examples and negative examples. Therefore, the problem of cold start of the model can be solved, and enough samples can be effectively trained at the initial running stage of the data prediction system, so that the accuracy of model prediction is improved, the accuracy of monitoring alarm can be improved, and the monitoring false alarm is reduced.
Based on fig. 4, fig. 6 is an optional flowchart of the data prediction method provided in the embodiment of the present application, and as shown in fig. 6, after step S403, the method may further include: step S601, generating alarm information when determining that the service data is abnormal according to the prediction result; and step S602, outputting the alarm information.
Here, when it is determined that the traffic data is abnormal, it is necessary to generate alarm information and notify an alarm-related responsible person. The warning information is used to prompt that the service data is abnormal, and may be simple text information or rich text information including characters, pictures, and the like. The alarm information can be sent to the terminal in a short message mode, and can also be pushed to an interactive interface of the terminal as a system message so as to be viewed and processed by the alarm-related responsible person. In implementation, a person skilled in the art can select an appropriate manner to output the alarm information according to actual conditions.
In some embodiments, the data prediction method is applied to a monitoring system to monitor the operation state of a big data acquisition and processing system; the service data comprises service data of a big data acquisition and processing system; the index features are used for representing the data volume or the data quality of the service data; correspondingly, step S602 may include: and when the data volume or the data quality of the service data is determined to be abnormal according to the prediction result, determining that the operation state of the big data acquisition and processing system is abnormal, and generating alarm information. Here, the data volume of the business data may include, but is not limited to, the data volume entering the big data collecting and processing system, the data volume output by the computing task in the big data collecting and processing system, and the like; the data quality of the traffic data may include, but is not limited to, correctness, integrity, etc. of the fields in the traffic data. In implementation, a person skilled in the art may select a suitable index characteristic according to actual conditions to characterize data volume or data quality of service data, which is not limited in the embodiment of the present application. And when the characteristic value corresponding to the index characteristic representing the data quantity is abnormal in the prediction result of the service data, the data quantity of the service data is indicated to be abnormal, and when the characteristic value corresponding to the index characteristic representing the data quality is abnormal, the data quality of the service data is indicated to be abnormal. For example, when the data volume entering the big data collecting and processing system has irregular change or the size of the result set output by the computing task has abrupt change, the data volume of the business data can be determined to have abnormity. For another example, when the integrity of the field a in the traffic data is lower than a specific integrity threshold, or the number of errors in the field a exceeds a specific error threshold, it may be determined that the data quality of the traffic data is abnormal.
In the embodiment of the application, when the abnormal business data is monitored, the corresponding alarm information is generated and output, so that a relevant alarm responsible person can be informed to check whether the relevant system is abnormal or not in time, and the abnormal business data monitoring method and the system are beneficial to recovering the system abnormality as soon as possible and reducing loss caused by the abnormality. In addition, the data prediction method can be applied to a monitoring system to monitor the running state of the big data acquisition and processing system and give an alarm when the running state of the big data acquisition and processing system is monitored to be abnormal.
Referring still to fig. 6, in some embodiments, the method may further include:
step S603, determining the processing state of the output alarm information; the processing state is used for representing whether the alarm information is rejected;
here, the alarm-related responsible person may process the alarm information according to the output alarm information, in combination with the specific data characteristics, other monitoring information, and other information of the system. If the service system related to the alarm has no problem, judging that the current alarm is a false alarm, and rejecting the alarm information; and if the alarm related service system has a problem, performing related processing. The server can determine the processing state of the alarm information according to the received alarm processing information sent by the user terminal. For example, if the alarm reject information is received, the processing status of the alarm information is determined to be rejected, and if the alarm processed information is received, the processing status of the alarm information is determined to be processed.
Step S604, marking the service data corresponding to the alarm information based on the processing state;
here, if the alarm information is determined to be rejected according to the processing state, the alarm is represented as false alarm, and the service data corresponding to the alarm information can be marked as normal; and if the alarm information is determined not to be rejected according to the processing state, the alarm is not false alarm, and the service data corresponding to the alarm information can be marked as abnormal.
In some embodiments, step S604 may be implemented by: step S6041, when the processing state of the alarm information is rejected, marking the service data as normal; step S6042, when the processing state of the alarm information is processed, mark the service data as abnormal.
Step S605, generating a labeled sample based on the labeled service data;
here, the annotation sample includes a data characteristic of the service data and an annotation of the service data.
Step S606, adding the marked sample into the sample space of the monitoring index.
In the embodiment of the application, according to the processing state of the alarm information, the service data corresponding to the alarm information is automatically labeled, a labeled sample is generated, and the labeled sample is added into the sample space of the monitoring index. Therefore, the marked historical data is added into the sample space, the model effect can be continuously iterated and optimized, the accuracy of model prediction is improved, the monitoring alarm accuracy is improved, and false alarms are reduced. In addition, because additional labeling operation is not needed, the alarm processing flow is optimized, and the labor cost caused by manual labeling can be avoided.
In some embodiments, step S606 may be implemented by:
step S6061, according to a preset update period, updating the labeled sample generated in the update period to the sample space of the monitoring index.
Here, the preset update period may be determined according to a traffic demand. For example, the update period may be 1 day, and the labeled samples generated from the last update time to the current update time may be added to the sample space at regular time each day through a timing task.
In the embodiment of the application, the sample space is updated according to the preset updating period in a timing mode, so that the resource consumption caused by frequently updating the sample space can be reduced, and the performance of the data prediction system is improved.
Based on fig. 4, fig. 7 is an optional flowchart of the data prediction method provided in the embodiment of the present application, and as shown in fig. 7, step S403 is implemented by the following steps:
step S701, obtaining a trained support vector machine model;
step S702, inputting the data characteristics into a classification decision function of the support vector machine model to obtain a prediction result of the service data.
In the embodiment of the application, the data characteristics of the service data are predicted through the support vector machine model, the abnormal conditions of the service data are generally less, and the abnormal conditions of the service data are less and less along with the continuous optimization of a service system, so that the number of samples which can be used for model training is also less, and the support vector machine utilizes the kernel function to map the data characteristics to a higher-dimensional characteristic space, so that the nonlinear relation can be found. Therefore, the data characteristics of the service data are predicted through the support vector machine model, more accurate results can be obtained under fewer samples, and the method is easy to implement.
In some embodiments, the support vector machine model in step S701 may be obtained by training:
s7011, optimizing a target function of the support vector machine model based on a sample space of the monitoring index to obtain a Lagrange multiplier of each sample in the sample space; the kernel function of the support vector machine model adopts a Gaussian kernel function;
step S7012, determining the offset of a classification decision function based on the sample space and the Lagrange multiplier of each sample;
step S7013, determining the Lagrangian multiplier and the offset of each sample in the sample space as model parameters of the support vector machine model.
Next, an exemplary application of the embodiment of the present application in a practical application scenario will be described.
The embodiment of the application provides a data prediction method, which is based on a machine learning algorithm and can automatically learn to obtain a data anomaly detection model according to a small amount of sample data under the condition that a monitoring rule is imperfect or even no monitoring rule exists. And with the increase of sample data, the model precision is continuously improved, the accuracy of anomaly detection is improved, and the misinformation is reduced.
The data prediction method provided by the embodiment of the application is applied to a monitoring system, and the monitoring system is used for monitoring a big data acquisition and processing system. In a big data acquisition and processing system, all links from data acquisition to calculation (including data acquisition, data cleaning, calculation tasks and the like) need to be tracked and monitored in a full flow, so that the system state is mastered in real time, and the correctness and stability of the system are ensured. The monitoring system is used for controlling the quantity and quality of data in real time, carrying out corresponding processing if necessary, or generating an alarm and pushing the alarm to a related person in charge so as to ensure the healthy operation of the system. For example, the monitoring system needs to monitor the amount of data entering the big data collecting and processing system, the correctness and integrity of each field, and various characteristics of the data set output by each computing task, and when the amount of data entering the system changes irregularly, the integrity of the field is lower than expected, or the size of the result set output by the computing task changes suddenly, and the like, the monitoring system needs to identify an abnormal condition, and perform processing or alarming.
In the embodiment of the application, the machine learning algorithm is used for replacing the rule judgment in the related technology to predict whether the data and the system state are normal or not. Firstly, generating a sample space based on marked historical feature data and a starting rule, and training by using data features in the sample space to obtain a machine learning model; then, the new data is predicted based on the model, and the model is periodically updated by using the labeled new data to replace the old model, so that the accuracy of prediction is continuously improved. The specific machine learning algorithm can be a support vector machine algorithm, a logistic regression algorithm and the like, and the kernel function can be a Gaussian kernel, a linear kernel, a polynomial kernel, a Sigm oil kernel and the like. From the effect of practical application, the machine learning algorithm selects the support vector machine algorithm, and the kernel function can select the Gaussian kernel with better effect.
At the initial stage of operation of the monitoring system, due to the lack of samples, the model cannot be effectively trained. In the embodiment of the application, the initial samples are randomly generated by using the starting rule, so that the problem of cold starting of the model can be solved.
In the aspect of selecting data characteristics, in addition to independent characteristics, a proper amount of 'joint characteristics' is introduced to reflect the nonlinear relation between indexes. For example, the data volume per unit time, the data integrity score, the data range of the key field, the length of a single piece of data, and the like may be selected as independent features, and (the data volume per unit time x the length of a single piece of data), (the data integrity score x the length of a single piece of data), (the data volume per unit time x the data integrity score) may be introduced as joint features.
When the data prediction method provided by the embodiment of the application is applied to a monitoring system, in the aspect of alarm processing flow, after an alarm receiver receives an alarm, if the alarm is considered as a false alarm, a 'refund alarm' can be selected on a processing interface, so that feedback information is provided for a model, the model tends to be accurate after iteration, and the alarm false alarm rate is effectively reduced. In addition, the labeling sample is automatically generated through the processing of the alarm by the alarm receiver, and manual labeling data is not needed.
Fig. 8 is a schematic flowchart of an implementation flow of a model training method provided in an embodiment of the present application, where the method may be executed by a processor of a server, and as shown in fig. 8, the method may include the following steps:
step S801, extracting historical data characteristics to form a sample space (x)i,yi) I is 1,2, … m; wherein xiAs a feature vector of the data, yiIs xiHealth value of yiIs 1 denotes xiHealth, yiIs-1 represents xiIt is not healthy.
Step S802, utilizing the sample space (x)i,yi) And i is 1,2 and … m, model training is carried out according to the following formula (1-1) by adopting a support vector machine algorithm, and a model parameter alpha is calculatedi,i=1,2,…m:
Figure BDA0002543018740000181
Wherein alpha isiI is 1,2, …, m is a model parameter, xi=(xi1,xi2,…,xin),yiI is 1,2, …, m is the sample space, n is the data featureDimension of (c), k (x)i,xj) The kernel function is expressed, and the kernel function used here is a gaussian kernel function, as shown in the following equation (1-2):
Figure BDA0002543018740000182
step S803, calculate the offset b according to the following equation (1-3):
Figure BDA0002543018740000183
wherein b is an offset and also belongs to a model parameter, (x)s,ys) Is an arbitrary support vector and S is a set of support vectors.
Step S804, converting alphaiI-1, 2, …, m and b are published as model parameters to the model prediction service.
Fig. 9 is a schematic flowchart of an implementation flow of a model prediction method provided in an embodiment of the present application, where the method may be executed by a processor of a server, and as shown in fig. 9, the method may include the following steps:
step 901, extracting data characteristics x of real-time data;
step S902, based on the data characteristic x, using the trained model parameter alphaiI ═ 1,2, …, m, and b, the predicted value f (x) is calculated according to the following equation (1-4):
Figure BDA0002543018740000191
wherein, k (x)i,xj) The kernel function is expressed, and the kernel function used here is a gaussian kernel function, as shown in equation (1-2).
Step S903, judging whether the predicted value f (x) is less than 0; if the predicted value is less than 0, the step S904 is entered, and if the predicted value is greater than or equal to 0, the flow is ended;
step S904, an abnormal alarm is generated and output to the terminal.
Fig. 10 is a schematic view of an application flow of a monitoring system provided in an embodiment of the present application, and as shown in fig. 10, the application flow includes the following steps:
step S1001, a monitoring system accesses data of monitoring indexes;
step S1002, a user defines data characteristics of monitoring indexes through a monitoring system;
step S1003, a user defines a starting rule of a monitoring index through a monitoring system;
step S1004, the user receives the alarm;
step S1005, the monitoring system judges whether the alarm is rejected; if the alarm is rejected, the process is ended, and if the alarm is not rejected, the process goes to step S1006;
in step S1006, the user processes the alert.
In the application process of the monitoring system product provided by the embodiment of the application, after the data of the monitoring index is accessed into the monitoring system, a user defines the data characteristics of the monitoring index and configures a rough starting rule inputting system so as to randomly generate an initial sample and solve the problem of model cold start. Thus, after the user receives the alarm, if the alarm is found to be a false alarm, the user directly rejects the alarm without manually adjusting the alarm rule.
In practical application, it is assumed that two data tables of sku and order exist in data collected into a monitoring system, wherein the sku table comprises the following fields: merchant _ id, sku _ id, sku _ name, sku _ desc, print _ info, sales _ info, images, urls, update _ time, etc.; the order table contains the fields: marcant _ id, order _ id, skus, account, print _ info, store _ id, create _ time, etc. The process of applying the data prediction method provided by the embodiment of the present application to the two data tables may include the following steps:
in step S1011, the user defines the index features for the two tables, for example, the following index features may be defined: the vacancy rate of each field, the average length of arrays of image fields of the sku table, the average length of strings of sku _ desc, the average length of arrays of urls, the average length of strings of urls, the average length of arrays of skus of the order table, the maximum value of amount, the minimum value of amount, the average value of amount, the time difference of create _ time from the current and the like. The extraction period of the index feature may be set to 1 hour.
Step S1012, the user inputs a certain amount of marked normal characteristics and abnormal characteristics as samples according to the data characteristics of the historical data; initial startup rules may also be defined by the user, such as: there is a field null rate>Data feature of 0.01 is an anomaly feature, the amount min<The data feature of 1 is an abnormal feature. The monitoring system randomly generates a certain amount of positive and negative examples according to an initial start-up rule. The monitoring system performs support vector machine model training according to the generated samples to generate an initial model M0 (namely determining a model parameter alpha)iI ═ 1,2, … m, and b);
in step S1013, the monitoring system starts to extract data features from the real-time data according to the feature definitions, and inputs the data features into M0 to obtain the predicted value output by M0. Assume that the predicted value >0, such as: 1.2, the current characteristic is normal without warning; assume a predicted value <0, such as: 2, indicating that the current characteristic is an abnormal characteristic, namely that the real-time data is abnormal, generating alarm information and pushing the alarm information to a related person in charge;
and step S1014, after receiving the alarm, the related responsible person enters a monitoring system platform to check the alarm information, the specific data characteristics and other monitoring information, and determines the current problem by combining with other information of the system to be monitored. If the relevant responsible person confirms that the system to be monitored corresponding to the monitoring index has no problem, the current alarm is a false alarm, the relevant responsible person can mark the current alarm as the false alarm by clicking a button of a terminal interface, and the alarm processing is finished; and if the relevant responsible person confirms that the system to be monitored corresponding to the monitoring index really has a problem, relevant processing is carried out, and the alarm is marked as processed. And the monitoring service takes the data characteristics corresponding to rejected alarms or rejected alarms as labeled data to generate labeled data characteristics.
In step S1015, the system inputs the newly generated labeled data features into the model training service at regular time (e.g., every day), so as to obtain a new model Mi, i is 1,2,3 … …, and replaces Mi-1 in model prediction.
Fig. 11 is a schematic view of an application scenario of the monitoring method provided in the embodiment of the present application, and as shown in fig. 11, a monitoring system 1100 performs full-flow tracking monitoring on each link from acquisition to calculation of data in a big data acquisition and processing system 1200, where the monitored data includes data stored in a data source 1210, data acquired by a data acquisition service 1220, data cleaned by a data cleaning service 1230, and data generated by each calculation task (including a task 1241, a task 1242, a task 1243, and the like) in a data calculation service 1240. The monitoring system 1100 includes a data feature real-time extraction service 1110, a monitoring service 1120, a model prediction service 1130, a model training service 1140, a rules engine 1150, and an alert push service 1160. In the monitoring system 1100, a rule engine 1150 randomly generates partial data features according to well-defined data features (including independent features and joint features) and initial start rules, and inputs the generated data features into a model training service 1140 for training to obtain an initial model; model training service 1140 publishes the trained model parameters to model prediction service 1130. When real-time data of an index to be monitored enters the monitoring system 1100 from the big data acquisition and processing system 1200, the data feature real-time extraction service 1110 performs feature extraction on the accessed real-time data and sends the extracted data features to the monitoring service 1120; monitoring service 1120 submits the received data characteristics to model prediction service 1130; the model prediction service 1130 predicts the input data features based on the model parameters issued by the model training service 1140 to obtain a prediction result, and returns the prediction result to the monitoring service 1120; if the monitoring service 1120 determines that the current data feature is abnormal according to the prediction result, the alarm push service 1160 is invoked to push the alarm, wherein the prediction result includes information whether the current feature is abnormal. After receiving the alarm, the alarm receiver rejects the alarm if judging the alarm is false alarm, and carries out corresponding alarm processing if the alarm is not false alarm and marks the alarm as processed; the alarm push service 1160 sends alarm processing information of the alarm receiver to the monitoring service 1120; the monitoring service 1120 feeds back data characteristics corresponding to rejected alarms or rejected alarms as labeled data to the model training service 1140 according to the processing information of the alarms; the model training service 1140 retrains the model according to the fed back labeled data to obtain new model parameters, and updates the new model parameters to the model prediction service 1130 to implement the iteration of the model.
The data prediction method provided by the embodiment of the application cancels strong dependence on the monitoring alarm rule and avoids the complexity and limitation of rule configuration. Moreover, the data prediction is carried out by adopting a model with learning capability, and the abnormality except the rule can be found by combining the historical data characteristics and the current data characteristics. In addition, whether the alarm is a false alarm can be determined through the rejection alarm or the non-rejection alarm of the alarm receiver, and additional manual marking is not needed.
Continuing with the exemplary structure of the data prediction apparatus 354 implemented as a software module provided in the embodiments of the present application, in some embodiments, as shown in fig. 3, the software module stored in the data prediction apparatus 354 of the memory 350 may be a data prediction apparatus in the server 300, including:
an obtaining module 3541, configured to obtain service data;
an extraction module 3542, configured to perform feature extraction on the service data based on the index features of the monitoring indexes, to obtain data features of the service data; wherein the index features include independent features and joint features; the combined features are obtained by nonlinear combination of at least two independent features;
the prediction module 3543 is configured to predict data features by using a machine learning model based on a sample space of the monitoring index, so as to obtain a prediction result of the service data; the data characteristic of each sample in the sample space comprises a first characteristic value and a second characteristic value; the first characteristic value is a characteristic value of the independent characteristic; the second eigenvalue is an eigenvalue of the joint eigenvalue.
In some embodiments, the sample space includes initial samples of the monitoring metrics, the initial samples including positive and negative samples. Correspondingly, the monitoring device further comprises: the first generation module is used for randomly generating a first number of positive examples and a second number of negative examples according to a preset starting rule; wherein the start-up rule includes an abnormality judgment condition; the positive example sample does not satisfy the abnormality determination condition, and the negative example sample satisfies the abnormality determination condition.
In some embodiments, the monitoring device further comprises:
an alarm module for: when the data volume or the data quality of the service data is determined to be abnormal according to the prediction result, determining that the operation state of the big data acquisition and processing system is abnormal, and generating alarm information; and outputting alarm information.
In some embodiments, the monitoring device further comprises:
the determining module is used for determining the processing state of the output alarm information; the processing state is used for representing whether the alarm information is rejected;
the marking module is used for marking the service data corresponding to the alarm information based on the processing state;
and the third generation module is used for generating a labeled sample based on the labeled service data and adding the labeled sample into the sample space of the monitoring index.
In some embodiments, the annotation module is further to: when the processing state of the alarm information is rejected, marking the service data as normal; and when the processing state of the alarm information is processed, marking the service data as abnormal.
In some embodiments, the prediction module is further to: obtaining a trained support vector machine model; and inputting the data characteristics into a classification decision function of a support vector machine model to obtain a prediction result of the service data.
In some embodiments, the monitoring device further comprises:
a training module to: optimizing a target function of the support vector machine model based on the sample space of the monitoring index to obtain a Lagrange multiplier of each sample in the sample space; the kernel function of the support vector machine model adopts a Gaussian kernel function; determining the offset of a classification decision function based on the sample space and the Lagrange multiplier of each sample; the lagrange multiplier and offset for each sample in the sample space are determined as model parameters for the support vector machine model.
It should be noted that the description of the apparatus in the embodiment of the present application is similar to the description of the method embodiment, and has similar beneficial effects to the method embodiment, and therefore, the description is not repeated. For technical details not disclosed in the embodiments of the apparatus, reference is made to the description of the embodiments of the method of the present application for understanding.
Embodiments of the present application provide a storage medium having stored therein executable instructions, which when executed by a processor, will cause the processor to perform a method provided by embodiments of the present application, for example, the method as illustrated in fig. 4.
In some embodiments, the storage medium may be a computer-readable storage medium, such as a Ferroelectric Random Access Memory (FRAM), a Read Only Memory (ROM), a Programmable Read Only Memory (PROM), an Erasable Programmable Read Only Memory (EPROM), a charged Erasable Programmable Read Only Memory (EEPROM), a flash Memory, a magnetic surface Memory, an optical disc, or a Compact disc Read Only Memory (CD-ROM), among other memories; or may be various devices including one or any combination of the above memories.
In some embodiments, executable instructions may be written in any form of programming language (including compiled or interpreted languages), in the form of programs, software modules, scripts or code, and may be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.
By way of example, executable instructions may correspond, but do not necessarily have to correspond, to files in a file system, and may be stored in a portion of a file that holds other programs or data, such as in one or more scripts in a hypertext Markup Language (HTML) document, in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code). By way of example, executable instructions may be deployed to be executed on one computing device or on multiple computing devices at one site or distributed across multiple sites and interconnected by a communication network.
The above description is only an example of the present application, and is not intended to limit the scope of the present application. Any modification, equivalent replacement, and improvement made within the spirit and scope of the present application are included in the protection scope of the present application.

Claims (9)

1. A data prediction method is applied to a monitoring system to realize monitoring of the running state of a big data acquisition and processing system, and is characterized by comprising the following steps:
acquiring service data of the big data acquisition and processing system;
performing feature extraction on the service data based on the index features of the monitoring indexes to obtain the data features of the service data; the index characteristic is used for representing the data volume or the data quality of the service data; the index features comprise independent features and joint features, and the independent features are used for independently reflecting variables of a certain characteristic of the monitoring index; the joint features are obtained by nonlinear combination of at least two independent features with incidence relation;
generating an initial sample of the monitoring index according to a preset starting rule; wherein the initial samples comprise a first number of positive examples and a second number of negative examples; the starting rule comprises an abnormal judgment condition; the positive sample does not meet the abnormity judgment condition, and the negative sample meets the abnormity judgment condition;
generating a sample space of the monitoring index based on the initial sample of the monitoring index;
predicting the data characteristics by adopting a machine learning model based on the sample space of the monitoring index to obtain a prediction result of the service data; wherein the data feature of each sample in the sample space comprises a first feature value and a second feature value; the first characteristic value is a characteristic value of an independent characteristic; the second characteristic value is a characteristic value of the joint characteristic;
when the data volume or the data quality of the service data is determined to be abnormal according to the prediction result, determining that the operation state of the big data acquisition and processing system is abnormal, and generating alarm information;
and outputting the alarm information.
2. The method of claim 1, further comprising:
determining the processing state of the output alarm information; the processing state is used for representing whether the alarm information is rejected;
marking the service data corresponding to the alarm information based on the processing state;
generating a labeled sample based on the labeled service data;
and adding the marked sample into the sample space of the monitoring index.
3. The method of claim 2, wherein the adding the labeled sample to the sample space of the monitoring index comprises:
and updating the labeled samples generated in the updating period to the sample space of the monitoring index according to a preset updating period.
4. The method according to claim 3, wherein the labeling the real-time data corresponding to the alarm information based on the processing status comprises:
when the processing state of the alarm information is rejected, marking the service data as normal;
and when the processing state of the alarm information is processed, marking the service data as abnormal.
5. The method of any of claims 1-4, wherein the machine learning model comprises a support vector machine model; the predicting of the data characteristics by adopting a machine learning model based on the sample space of the monitoring index to obtain the prediction result of the service data comprises the following steps:
obtaining a trained support vector machine model;
and inputting the data characteristics into a classification decision function of the support vector machine model to obtain a prediction result of the service data.
6. The method of claim 5, wherein the support vector machine model is trained by:
optimizing a target function of the support vector machine model based on the sample space of the monitoring index to obtain a Lagrange multiplier of each sample in the sample space; wherein, the kernel function of the support vector machine model adopts a Gaussian kernel function;
determining an offset for the classification decision function based on the sample space and the Lagrangian multiplier for each sample;
determining a Lagrangian multiplier and the offset for each sample in the sample space as model parameters of the support vector machine model.
7. A data prediction device is applied to a monitoring system to realize the monitoring of the running state of a big data acquisition and processing system, and is characterized by comprising:
the acquisition module is used for acquiring the service data of the big data acquisition and processing system;
the extraction module is used for extracting the characteristics of the service data based on the index characteristics of the monitoring indexes to obtain the data characteristics of the service data; the index characteristic is used for representing the data volume or the data quality of the service data; the index features comprise independent features and joint features, and the independent features are used for independently reflecting variables of a certain characteristic of the monitoring index; the joint features are obtained by nonlinear combination of at least two independent features with incidence relation;
the prediction module is used for generating an initial sample of the monitoring index according to a preset starting rule; wherein the initial samples comprise a first number of positive examples and a second number of negative examples; the starting rule comprises an abnormal judgment condition; the positive sample does not meet the abnormity judgment condition, and the negative sample meets the abnormity judgment condition; generating a sample space of the monitoring index based on the initial sample of the monitoring index; predicting the data characteristics by adopting a machine learning model based on the sample space of the monitoring index to obtain a prediction result of the service data; the data feature of each sample in the sample space comprises a first feature value and a second feature value; the first characteristic value is a characteristic value of an independent characteristic; the second characteristic value is a characteristic value of the joint characteristic;
an alarm module for: when the data volume or the data quality of the service data is determined to be abnormal according to the prediction result, determining that the operation state of the big data acquisition and processing system is abnormal, and generating alarm information; and outputting the alarm information.
8. A data prediction apparatus, comprising:
a memory for storing executable instructions; a processor for implementing the method of any one of claims 1 to 6 when executing executable instructions stored in the memory.
9. A computer-readable storage medium having stored thereon executable instructions for causing a processor, when executed, to implement the method of any one of claims 1 to 6.
CN202010552482.7A 2020-06-17 2020-06-17 Data prediction method, device, equipment and storage medium Active CN111708682B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010552482.7A CN111708682B (en) 2020-06-17 2020-06-17 Data prediction method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010552482.7A CN111708682B (en) 2020-06-17 2020-06-17 Data prediction method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111708682A CN111708682A (en) 2020-09-25
CN111708682B true CN111708682B (en) 2021-10-26

Family

ID=72540590

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010552482.7A Active CN111708682B (en) 2020-06-17 2020-06-17 Data prediction method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111708682B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112163413B (en) * 2020-10-26 2024-02-02 青岛明略软件技术开发有限公司 Analysis method and device of alarm event rule, electronic equipment and storage medium
CN112632127B (en) * 2020-12-29 2022-07-15 国华卫星数据科技有限公司 Data processing method for real-time data acquisition and time sequence of equipment operation
CN113762688A (en) * 2021-01-06 2021-12-07 北京沃东天骏信息技术有限公司 Business analysis system, method and storage medium
CN115426287B (en) * 2022-09-06 2024-03-26 中国农业银行股份有限公司 System monitoring and optimizing method and device, electronic equipment and medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109814527A (en) * 2019-01-11 2019-05-28 清华大学 Based on LSTM Recognition with Recurrent Neural Network industrial equipment failure prediction method and device

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106383766B (en) * 2016-09-09 2018-09-11 北京百度网讯科技有限公司 System monitoring method and apparatus
CN106992994B (en) * 2017-05-24 2020-07-03 腾讯科技(深圳)有限公司 Automatic monitoring method and system for cloud service
CN107358300A (en) * 2017-06-19 2017-11-17 北京至信普林科技有限公司 A kind of intelligent O&M alarm filtering method and system based on multi-platform Autonomic prediction
CN110278102A (en) * 2018-03-15 2019-09-24 勤智数码科技股份有限公司 A kind of IT automation operational system and method
US11579951B2 (en) * 2018-09-27 2023-02-14 Oracle International Corporation Disk drive failure prediction with neural networks
CN109669837A (en) * 2018-10-31 2019-04-23 平安科技(深圳)有限公司 Equipment state method for early warning, system, computer installation and readable storage medium storing program for executing
CN110008079A (en) * 2018-12-25 2019-07-12 阿里巴巴集团控股有限公司 Monitor control index method for detecting abnormality, model training method, device and equipment
CN109992473B (en) * 2019-02-27 2022-07-15 平安科技(深圳)有限公司 Application system monitoring method, device, equipment and storage medium
CN110275814A (en) * 2019-06-28 2019-09-24 深圳前海微众银行股份有限公司 A kind of monitoring method and device of operation system
CN110620905A (en) * 2019-09-06 2019-12-27 平安医疗健康管理股份有限公司 Video monitoring method and device, computer equipment and storage medium

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109814527A (en) * 2019-01-11 2019-05-28 清华大学 Based on LSTM Recognition with Recurrent Neural Network industrial equipment failure prediction method and device

Also Published As

Publication number Publication date
CN111708682A (en) 2020-09-25

Similar Documents

Publication Publication Date Title
CN111708682B (en) Data prediction method, device, equipment and storage medium
Du et al. BIM cloud score: Benchmarking BIM performance
Gozhyj et al. Web resources management method based on intelligent technologies
CN103778548A (en) Goods information and keyword matching method, and goods information releasing method and device
CN110705719A (en) Method and apparatus for performing automatic machine learning
Hussain et al. Provider-based optimized personalized viable SLA (OPV-SLA) framework to prevent SLA violation
KR101981962B1 (en) Method for securely trading used machines through network
CN112231533A (en) Data processing method, device, equipment and storage medium
JP6094593B2 (en) Information system construction device, information system construction method, and information system construction program
Pitchipoo et al. Modeling and development of a decision support system for supplier selection in the process industry
CN101120322B (en) Method, system for estimating transaction response times
WO2020237898A1 (en) Personalized recommendation method for online education system, terminal and storage medium
CN110717597A (en) Method and device for acquiring time sequence characteristics by using machine learning model
Zhang et al. Investigating participants’ attributes for participant estimation in knowledge-intensive crowdsourcing: a fuzzy DEMATEL based approach
CN111523810A (en) Enterprise-level model management method, system, device and storage medium
CN106575418A (en) Suggested keywords
CN107644042B (en) Software program click rate pre-estimation sorting method and server
US20130073504A1 (en) System and method for decision support services based on knowledge representation as queries
KR20200130767A (en) Method and device for evaluating whether cryptocurrency is listed on cryptocurrency market using artificial neural network
CN113254781A (en) Model determination method and device in recommendation system, electronic equipment and storage medium
CN114510575A (en) Relationship discovery and quantification
Araujo et al. The profession of public health informatics: Still emerging?
US20200342302A1 (en) Cognitive forecasting
CN115115322A (en) Target group identification method, risk assessment method, apparatus, device and medium
Ramalingam et al. A fuzzy based sensor web for adaptive prediction framework to enhance the availability of web service

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant