CN111724048A - Characteristic extraction method for finished product library scheduling system performance data based on characteristic engineering - Google Patents

Characteristic extraction method for finished product library scheduling system performance data based on characteristic engineering Download PDF

Info

Publication number
CN111724048A
CN111724048A CN202010494916.2A CN202010494916A CN111724048A CN 111724048 A CN111724048 A CN 111724048A CN 202010494916 A CN202010494916 A CN 202010494916A CN 111724048 A CN111724048 A CN 111724048A
Authority
CN
China
Prior art keywords
data
finished product
feature
scheduling system
product library
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010494916.2A
Other languages
Chinese (zh)
Inventor
潘佰林
许小双
乐欢
郭妙贞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Tobacco Zhejiang Industrial Co Ltd
Original Assignee
China Tobacco Zhejiang Industrial Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Tobacco Zhejiang Industrial Co Ltd filed Critical China Tobacco Zhejiang Industrial Co Ltd
Priority to CN202010494916.2A priority Critical patent/CN111724048A/en
Publication of CN111724048A publication Critical patent/CN111724048A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/20Administration of product repair or maintenance

Abstract

The invention discloses a characteristic extraction method for finished product library scheduling system performance data based on characteristic engineering, which comprises the following steps: 1) pre-judging the fault scene of the finished product warehouse logistics system scheduling subsystem according to experience, analyzing the data performance in the fault scene, and pertinently selecting corresponding indexes; 2) collecting selected index data at equal time intervals, cleaning and preprocessing the data to obtain a data set for feature extraction; 3) and extracting the characteristics of the data set, and amplifying and displaying the characteristics through an excitation function. The method extracts and amplifies the relatively fine features, and the KPI finds a proper feature detector and finds out the key features of the complex data so as to facilitate checking by operation and maintenance personnel, so that the information loss is less, the rules contained in the original data are still kept, and the uncertain factors in the original data can be effectively reduced.

Description

Characteristic extraction method for finished product library scheduling system performance data based on characteristic engineering
Technical Field
The invention relates to the field of logistics equipment monitoring management, in particular to a characteristic extraction method for finished product library scheduling system performance data based on characteristic engineering.
Background
Finished product cigarette scanning sorting backflow fault is a common fault on a logistics scheduling production line of a cigarette factory and is caused by reasons in the aspect of production PLC transmission mechanisms, most of PLC industrial control equipment of cigarettes at present are not monitored sufficiently, sufficient analysis data cannot be acquired, and due to complex production environment factors, specific reasons causing the fault are different, such as performance bottleneck of a firewall, database cluster heartbeat timeout, storage disk IO delay and the like. When the fault occurs, the phenomenon of code sweeping, sorting and backflow of finished cigarette pieces can occur, and a large number of finished cigarette pieces jump out of the production line, so that economic loss is caused. Therefore, finished cigarette smoke scanning sorting backflow faults are used as entry points, and a data base can be laid for correlating and early warning of the faults through environment application data through the research of the acquisition and feature extraction method of environment data related to the PLC equipment. The finished product warehouse logistics system scheduling subsystem of the cigarette factory generates a large amount of application performance data in the operation process, such as: CPU utilization, memory utilization, swap area utilization, disk IO rate, IO read-write frequency, disk average latency, network port rate, and the like. This massive, chaotic and cluttered information is often difficult for the algorithms to directly utilize before feature extraction. Regardless of machine learning, deep learning, or statistical methods, any intelligent system requires support of valid data. Therefore, how to process the original data into qualified data input becomes a difficult problem which troubles the operation and maintenance personnel of the equipment for many years.
Disclosure of Invention
In order to solve the above technical problems, an object of the present invention is to provide a feature extraction method for performance data of a finished product library scheduling system based on feature engineering, which extracts and amplifies relatively fine features, finds a suitable feature detector for KPI, and finds out key features of complex data, so as to facilitate checking by operation and maintenance personnel, reduce information loss, and effectively reduce uncertain factors in original data, while rules included in the original data are still retained.
Based on the above purpose, the invention provides a characteristic extraction method for finished product library scheduling system performance data based on characteristic engineering, which comprises the following steps:
1) pre-judging the fault scene of the finished product warehouse logistics system scheduling subsystem according to experience, analyzing the data performance in the fault scene, and pertinently selecting corresponding indexes;
2) collecting selected index data at equal time intervals, cleaning and preprocessing the data to obtain a data set for feature extraction;
3) and extracting the characteristics of the data set, and amplifying and displaying the characteristics through an excitation function.
Preferably, the characteristic extraction of the data set comprises extracting performance data of a finished product library scheduling system, and checking the continuity and integrity of the performance data of the finished product library scheduling system to remove the interference of CPU utilization rate, memory utilization rate and network port rate;
after the test, 1/2 of the data set is intercepted and used for training a feature selection model, 1/3 of the data set is intercepted in the rest part and used for auxiliary parameter adjustment in the training process, and the final 1/6 of the data set is used for verifying the effect of the model.
Preferably, the extraction of the feature points is performed by using a chi-square test feature point extraction algorithm.
Preferably, the integrity detection comprises detecting the extraction content, the extraction speed, the description of the coincidence condition and the description of the coincidence matching speed of the feature points.
Preferably, a regression analysis method is adopted to check the continuity of the performance data of the finished product library scheduling system.
Preferably, the specific method for checking the integrity of the performance data of the finished product library scheduling system comprises the following steps: and selecting a plurality of points around each time point according to the time dimension to form a set, and judging whether the performance data of the finished product library scheduling system is complete or not according to the kernel density of the set.
Preferably, the cleaning of the index data includes removing abnormal data in the operation data of the logistics sorting machine.
Preferably, the preprocessing the index data includes: and checking the consistency of the residual data, and carrying out ETL (extract transform load) processing, filtering, splitting and expanding on the data after the cleaned data enters the message bus.
Compared with the prior art, the invention has the beneficial effects that:
the method extracts and amplifies the relatively fine features, and the KPI finds a proper feature detector and finds out the key features of the complex data so as to facilitate checking by operation and maintenance personnel, so that the information loss is less, the rules contained in the original data are still kept, and the uncertain factors in the original data can be effectively reduced.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the application and, together with the description, serve to explain the application and are not intended to limit the application.
FIG. 1 is a flowchart of a feature extraction method for performance data of a finished product library scheduling system based on feature engineering according to an embodiment of the present invention;
FIG. 2 is a schematic diagram illustrating a feature extraction method for performance data of a finished product library scheduling system based on feature engineering according to an embodiment of the present invention;
FIG. 3 is a graph comparing the performance of the Chi-square test, stability selection, and recursive feature elimination three feature extraction algorithms;
FIG. 4 is a schematic illustration of feature extraction on data at a granularity of two hours in an embodiment of the invention;
FIG. 5 is a schematic diagram of the present invention employing Chi-Square testing for the number of different performance feature extractions;
fig. 6 is a schematic diagram of the CPU utilization index feature of the finished product library scheduling system being leveled and extracted to generate a linear feature curve in the embodiment of the present invention.
Detailed Description
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, elements, and/or combinations thereof, unless the context clearly indicates otherwise.
The embodiment provides a feature extraction method for performance data of a finished product library scheduling system based on feature engineering, as shown in fig. 1 and 2, the method includes the following steps:
1) pre-judging the fault scene of the finished product warehouse logistics system scheduling subsystem according to experience, analyzing the data performance in the fault scene, and pertinently selecting corresponding indexes;
2) collecting selected index data at equal time intervals, cleaning and preprocessing the data to obtain a data set for feature extraction;
3) and extracting the characteristics of the data set, and amplifying and displaying the characteristics through an excitation function.
As a preferred embodiment, the cleaning the index data includes removing abnormal data in the operation data of the logistics sorting machine. And cleaning abnormal data, namely cleaning the abnormal data in the operation data of the logistics sorting machine. The abnormal data comprises abnormal data and missing data contained in the production data, and some known external factors, such as data of abnormal working conditions, are screened and excluded according to actual production experience. The missing value is processed in a mode of eliminating the whole data containing the missing value; outliers were detected using statistical-based outliers: very poorly, this method is suitable for mining univariate numerical data.
As a preferred embodiment, the preprocessing the index data includes: and checking the consistency of the residual data, and carrying out ETL (extract transform load) processing, filtering, splitting and expanding on the data after the cleaned data enters the message bus. Specifically, the data preprocessing includes operations such as feature selection, data normalization, and the like. The data preprocessing also comprises operations of feature selection, data normalization, data standardization and the like. Wherein, standardizing: and z-score standardization, wherein the mean value of the processed data is 0, the standard deviation is 1, and the processing method comprises the following steps: x ═ x- μ. In formula one, x' is the normalized feature, x is the raw feature value, is the sample mean, and is the sample standard deviation. They can be estimated from existing samples. And the stability is relatively stable under the condition that the existing samples are enough. In addition, the data set of the processed data for about 6 months was divided into a training set, a validation set, and a test set. The training set is used for training the model, the verification set is used for assisting in parameter adjustment in the training process, and the test set is used for finally verifying the effect of the model.
After the data preprocessing is finished, different model combinations are reasonably selected for training and generating corresponding models according to the characteristics of the time sequence, the allocation of computing resources and the time of the data. Based on the above, the characteristic extraction of the data set comprises the steps of extracting performance data of the finished product library scheduling system, and checking the continuity and the integrity of the performance data of the finished product library scheduling system to remove the interference of the CPU utilization rate, the memory utilization rate and the network port rate;
after the test, 1/2 of the data set is intercepted and used for training a feature selection model, 1/3 of the data set is intercepted in the rest part and used for auxiliary parameter adjustment in the training process, and the final 1/6 of the data set is used for verifying the effect of the model.
As a better implementation mode, a chi-square test feature point extraction algorithm is adopted to extract feature points, wherein the classical chi-square test is used for testing the correlation of qualitative independent variables to qualitative dependent variables. Assuming that the independent variable has N values and the dependent variable has M values, the difference between the observed value of the sample frequency number of the independent variable equal to i and the dependent variable equal to j and the expectation is considered. The meaning of this statistic is simply the dependence of the independent variable on the dependent variable. And selecting K characteristics with the chi-square value in the front as final characteristic selection. The chi-square test calculation formula is as follows:
Figure BDA0002522471770000041
where fo is the observed frequency (count observed in the cell) and fe is the expected frequency if there is no relationship between the variables, as shown in the equation, chi-squared statistics is based on the difference between the values actually observed in the data and the expected values where there is indeed no relationship between the variables.
Preferably, the extraction of the feature points may also be performed by a method of stability selection or recursive feature elimination, specifically:
and (3) selecting stability: stability selection is a newer method based on a combination of subsampling and a selection algorithm, which may be regression, SVM, or other similar methods. The main idea is to run a feature selection algorithm on different data subsets and feature subsets, repeat the algorithm continuously, and finally summarize feature selection results, for example, the frequency of a certain feature considered as an important feature (the number of times of selecting as an important feature is divided by the number of times of testing the subset in which the feature is located) can be counted. Ideally, the score for an important feature would be close to 100%. A slightly weaker feature score would be a number other than 0, and the least useful feature score would be close to 0.
Recursive feature elimination: the main idea of recursive feature elimination is to iteratively build a model (e.g., SVM or regression model) and then select the best (or worst) feature (which may be selected based on coefficients), set aside the selected feature, and then repeat the process on the remaining features until all features have been traversed. The order in which features are eliminated in this process is the ordering of the features. Thus, this is a greedy algorithm to find the optimal feature subset. FIG. 3 is a graph comparing the performance of three feature extraction algorithms of Chi-squared test, stability selection, and recursive feature elimination. By checking the test results of various performances, the speed of the chi-square test feature point extraction algorithm is 370MS, and although the extraction speed has no advantage, as shown in fig. 5, the chi-square test feature point extraction algorithm has obvious advantages in terms of descriptor extraction, matching speed and matching point quantity.
After the feature selection is completed, the model can be directly trained, but the problems of large calculation amount and long training time can be caused due to the fact that the feature matrix is too large, so that the reduction of the dimension of the feature matrix is also indispensable. The dimensionality reduction method adopts Principal Component Analysis (PCA), which essentially maps original samples into a sample space with lower dimensionality, and in order to enable the mapped samples to have maximum divergence, the PCA is an unsupervised dimensionality reduction method.
As a preferred embodiment, the integrity detection includes detecting the extraction content of the feature points, extracting speed, describing the matching condition and describing the matching speed.
As a better implementation mode, a regression analysis method is adopted to check the continuity of the performance data of the finished product library scheduling system.
As a preferred embodiment, the specific method for checking the integrity of the performance data of the finished product library scheduling system is as follows: and selecting a plurality of points around each time point according to the time dimension to form a set, and judging whether the performance data of the finished product library scheduling system is complete or not according to the kernel density of the set.
In addition, in step 3), since a KPI is normal most of the time, there is no large fluctuation, and only random noise exists. Fluctuations occur only when the service is affected. Therefore, the amount of fluctuation is much smaller than normal data. To attenuate the effects of noise, a modified version of the excitation function is used: the larger the fluctuation degree of one KPI is, the larger the fluctuation feature is amplified, so that the fluctuation feature is more distinctive and the final relevance judgment is more helpful.
Extracting fluctuation characteristics of KPIs
For a time series S ═ S1, S2, …, sm ], si is the data for KPI S at time i, and m is the length of KPI. For a single KPI, the time interval between data at adjacent time instants is required to be the same during data acquisition and preprocessing. For two KPIs, if the time intervals are different, the least common multiple of the two KPI time intervals can be taken as the common time interval. The predicted sequence P of KPI S ═ P1, P2, …, pm ], pi is the predicted value of si. Thus, the prediction error sequence F ═ F1, F2, …, fm ], fi ═ si-pi for a KPI. For a KPI, normal parts are relatively accurate and easy to predict, but abnormal fluctuation parts are usually caused by some unpredictable burst factors and are difficult to predict. Therefore, the prediction error can be well used for representing the fluctuation characteristics of the KPI, and the KPI fluctuation characteristics are represented by using the KPI prediction error sequence.
After the model is built, online detection in an actual environment can be started after data are accumulated to a certain degree, the online detection uses a key feature generation algorithm corresponding to the trained model to generate features of a new time point, the trained model is used for scoring the abnormal degree of the new time point, and in the online detection process, the following actual problems need to be processed:
the disadvantages are as follows: no data at a certain fixed time acquisition point
Disorder: the latter time first reaches the anomaly detection algorithm while the point of the previous time is still in the queue
Characteristic change: the characteristics of the time series are different from before due to new deployment and the like
An abnormal score can be given to a value algorithm corresponding to each time point, whether one point is an abnormal result or not can be given according to a default threshold value of abnormal detection, of course, the meanings of time sequences generated in a production environment are different, and expected abnormal detection effects may be different if the meanings of the same time sequences are different, so that the algorithm is automatically adjusted to achieve the expected effects according to a mode of marking feedback of abnormal missing report and normal false report.
In the above process, for the curve which has been labeled, a version of the model can be trained first to predict the curve which has not been labeled. And then, the new curve and the predicted probability value are used together with the original clustering cluster to readjust the optimization direction of the model. The iteration process is repeated in a circulating mode until the predicted value of the curve which is not marked is not changed any more or the specified iteration times are reached.
To illustrate the method of the present invention, a feature extraction process is further described as an example below:
the experimental environment includes: an Intel dual-core processor (master frequency 2.6GHz, memory 4 GB); the software environment comprises a Windows server2008 operating system and finished product library scheduling client software; and the tested data is acquired by zabbix and APM, sent to a database server, collected through a message bus and stored in a time sequence database of the cloud platform.
When selecting the characteristics, the operation and maintenance personnel of the finished product warehouse dispatching system presumes and predicts the possible fault scenes of the finished product warehouse logistics system dispatching subsystem according to experience, and selects the corresponding indexes in a targeted manner by analyzing the performance of faults under the scenes, and for reflecting the time distribution of operation behaviors (code scanning, bar code transmission, sorting action execution and the like) in one day of the production actions, the KPIs corresponding to the specific abnormal scenes are further statistically extracted according to the granularity of two hours in addition to the statistics according to the day, as shown in FIG. 4, which is equivalent to further decomposing the characteristics counted according to the day into 12 characteristics counted according to the two hours.
Collecting selected index data, removing abnormal working condition data of all the index data through manual inspection, and removing the whole data containing a missing value; and checking the data consistency, entering the cleaned data into a message bus, and carrying out ETL (extract transform load) processing, filtering, splitting, expanding and the like on the data.
And (3) using the washed and preprocessed six monthly history data for feature extraction, wherein 3 months of data are used for training a feature selection model, 2 months of data are used for assisting parameter adjustment in the training process, and 1 month of data are used for finally verifying the effect of the model.
The implementation content of feature extraction includes checking the continuity and integrity of performance data of the finished product library scheduling system, and the continuity and integrity of performance analysis mainly aims at various interferences, such as: the method mainly comprises the following steps of checking continuity and integrity of the CPU utilization rate, the memory utilization rate, the network port rate and the like, and mainly comprises the aspects of extracting content of feature points, extracting speed, describing conforming conditions, describing conforming matching speed and the like.
And (3) checking continuity of feature extraction:
the operation and maintenance abnormity is analyzed through the monitoring data, and the continuity of the monitoring data directly influences the final abnormal result. Regression analysis (regression analysis) is a statistical analysis method for determining the quantitative relationship of interdependence between two or more variables, and for studying the dependency relationship of dependent variables on independent variables, aiming at estimating or predicting the mean value of the dependent variables by given values of the independent variables. It can be used for prediction, time series modeling and discovery of causal relationships between various variables. In contrast to discrete data from previous classifications, the regression is performed to process continuous target data, and therefore, the objective of regression is to predict values of target variables of the numerical type.
And (3) integrity check of feature extraction:
in the production of cigarettes, IT systems are operated 7x24 hours. The amount of data for the machine and application is not constant, however, because the data follows variations as the production volume varies. In addition, during the holiday period, the shutdown maintenance is carried out, and when the system is closed, the service index of the cigarette production related system is completely zero. The two phases of shutdown and traffic peak, perfectly clear, the normal algorithm is almost certainly misinformed at the moment of these two transitions.
Therefore, according to the dimension of the day, a plurality of points around each time point of each day are selected to form a set, nuclear density analysis is carried out, and then all the points in one day are combined to obtain a final data normal distribution model. Meanwhile, in order to improve the effect, some noise errors can be actively added to the training data. And then, during actual detection, comparing the distribution of the last small section of simulation curve obtained by encoding and decoding the test data with the actual data, and judging whether serious deviation occurs or not.
This model is somewhat analogous to the mountains of counties on a 3D map, where numerous normal distributions are piled up together. Then the value coming from the corresponding time at the time of detection is obviously abnormal if appearing in the plain zone. Similarly, the index is a very simple curve, so that the curve is cut into a section of small curve according to the form of a sliding window, the small curve and the small curve are combined to form a characteristic matrix, and then the characteristic matrix enters multi-layer coding and decoding, and iteration is repeated to obtain the best model.
In addition, in order to strengthen the processing of the time characteristics, according to the dimension of the day, a plurality of points around the time point are selected for each time point of each day to form a set.
The characteristic engineering is to 'beat' the characteristic log or multi-system data into the characteristic available for the model and make various changes on the characteristic to generate a curve, and fig. 6 is how to 'beat' and extract the CPU utilization index characteristic of the finished product library scheduling system on the system to generate a linear characteristic curve.
The method extracts and amplifies the relatively fine features, and the KPI finds a proper feature detector and finds out the key features of the complex data so as to facilitate checking by operation and maintenance personnel, so that the information loss is less, the rules contained in the original data are still kept, and the uncertain factors in the original data can be effectively reduced.
Although the embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and not to be construed as limiting the present invention, and those skilled in the art can make changes, modifications, substitutions and alterations to the above embodiments without departing from the principle and spirit of the present invention, and any simple modification, equivalent change and modification made to the above embodiments according to the technical spirit of the present invention still fall within the technical scope of the present invention.

Claims (8)

1. A characteristic extraction method for finished product library scheduling system performance data based on characteristic engineering is characterized by comprising the following steps:
1) pre-judging the fault scene of the finished product warehouse logistics system scheduling subsystem according to experience, analyzing the data performance in the fault scene, and pertinently selecting corresponding indexes;
2) collecting selected index data at equal time intervals, cleaning and preprocessing the data to obtain a data set for feature extraction;
3) and extracting the characteristics of the data set, and amplifying and displaying the characteristics through an excitation function.
2. The feature extraction method for the performance data of the finished product library scheduling system based on the feature engineering as claimed in claim 1, wherein the feature extraction of the data set comprises extracting the performance data of the finished product library scheduling system and checking the continuity and integrity of the performance data of the finished product library scheduling system;
after the test, 1/2 of the data set is intercepted and used for training a feature selection model, 1/3 of the data set is intercepted in the rest part and used for auxiliary parameter adjustment in the training process, and the final 1/6 of the data set is used for verifying the effect of the model.
3. The feature extraction method for the performance data of the finished product library scheduling system based on the feature engineering as claimed in claim 2, wherein the extraction of the feature points is performed by using a chi-square test feature point extraction algorithm.
4. The feature extraction method for the performance data of the finished product library scheduling system based on the feature engineering as claimed in claim 2, wherein the integrity detection includes detecting the extraction content, the extraction speed, the description coincidence condition and the description coincidence matching speed of the feature points.
5. The feature extraction method for the performance data of the finished product library scheduling system based on the feature engineering as claimed in claim 2, wherein a regression analysis method is adopted to check the continuity of the performance data of the finished product library scheduling system.
6. The feature extraction method for the performance data of the finished product library scheduling system based on the feature engineering as claimed in claim 2, wherein the specific method for checking the integrity of the performance data of the finished product library scheduling system is as follows: and selecting a plurality of points around each time point according to the time dimension to form a set, and judging whether the performance data of the finished product library scheduling system is complete or not according to the kernel density of the set.
7. The feature extraction method for the performance data of the finished product warehouse dispatching system based on the feature engineering as claimed in claim 1, wherein the cleaning of the index data comprises removing abnormal data in the operation data of the logistics sorting machine.
8. The feature extraction method for the performance data of the finished product library scheduling system based on the feature engineering as claimed in claim 1, wherein the preprocessing the index data comprises: and checking the consistency of the residual data, and carrying out ETL (extract transform load) processing, filtering, splitting and expanding on the data after the cleaned data enters the message bus.
CN202010494916.2A 2020-06-03 2020-06-03 Characteristic extraction method for finished product library scheduling system performance data based on characteristic engineering Pending CN111724048A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010494916.2A CN111724048A (en) 2020-06-03 2020-06-03 Characteristic extraction method for finished product library scheduling system performance data based on characteristic engineering

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010494916.2A CN111724048A (en) 2020-06-03 2020-06-03 Characteristic extraction method for finished product library scheduling system performance data based on characteristic engineering

Publications (1)

Publication Number Publication Date
CN111724048A true CN111724048A (en) 2020-09-29

Family

ID=72565626

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010494916.2A Pending CN111724048A (en) 2020-06-03 2020-06-03 Characteristic extraction method for finished product library scheduling system performance data based on characteristic engineering

Country Status (1)

Country Link
CN (1) CN111724048A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112365217A (en) * 2020-12-07 2021-02-12 吉林大学 Method for extracting spatial aggregation characteristics of logistics clusters entering factory

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108304941A (en) * 2017-12-18 2018-07-20 中国软件与技术服务股份有限公司 A kind of failure prediction method based on machine learning
CN108573021A (en) * 2018-02-24 2018-09-25 浙江金华伽利略数据科技有限公司 A kind of comprehensive value appraisal procedure of dynamic data
CN108665119A (en) * 2018-08-03 2018-10-16 清华大学 A kind of water supply network unusual service condition method for early warning

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108304941A (en) * 2017-12-18 2018-07-20 中国软件与技术服务股份有限公司 A kind of failure prediction method based on machine learning
CN108573021A (en) * 2018-02-24 2018-09-25 浙江金华伽利略数据科技有限公司 A kind of comprehensive value appraisal procedure of dynamic data
CN108665119A (en) * 2018-08-03 2018-10-16 清华大学 A kind of water supply network unusual service condition method for early warning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
陈新星: "车联网海量数据分析方法的研究" *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112365217A (en) * 2020-12-07 2021-02-12 吉林大学 Method for extracting spatial aggregation characteristics of logistics clusters entering factory

Similar Documents

Publication Publication Date Title
KR101984730B1 (en) Automatic predicting system for server failure and automatic predicting method for server failure
Manco et al. Fault detection and explanation through big data analysis on sensor streams
CN106951984B (en) Dynamic analysis and prediction method and device for system health degree
US10031829B2 (en) Method and system for it resources performance analysis
CN111506478A (en) Method for realizing alarm management control based on artificial intelligence
US9208209B1 (en) Techniques for monitoring transformation techniques using control charts
CN106708738B (en) Software test defect prediction method and system
CN111259947A (en) Power system fault early warning method and system based on multi-mode learning
CN112148561B (en) Method and device for predicting running state of business system and server
CN112084229A (en) Method and device for identifying abnormal gas consumption behaviors of town gas users
CN116559598B (en) Smart distribution network fault positioning method and system
CN115454778A (en) Intelligent monitoring system for abnormal time sequence indexes in large-scale cloud network environment
CN111597550A (en) Log information analysis method and related device
CN111724048A (en) Characteristic extraction method for finished product library scheduling system performance data based on characteristic engineering
CN112905671A (en) Time series exception handling method and device, electronic equipment and storage medium
CN112733897A (en) Method and equipment for determining abnormal reason of multi-dimensional sample data
CN114518988B (en) Resource capacity system, control method thereof, and computer-readable storage medium
US20230061829A1 (en) Outlier detection apparatus and method
CN112445687A (en) Blocking detection method of computing equipment and related device
CN115619539A (en) Pre-loan risk evaluation method and device
CN111680572B (en) Dynamic judgment method and system for power grid operation scene
CN109978038B (en) Cluster abnormity judgment method and device
CN114757495A (en) Membership value quantitative evaluation method based on logistic regression
CN113255096A (en) High-loss line abnormal distribution area positioning method and system based on forward stepwise regression
CN113591266A (en) Method and system for analyzing fault probability of electric energy meter

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination