CN109492767A - A kind of method for detecting abnormality applied to unsupervised field based on self-encoding encoder - Google Patents

A kind of method for detecting abnormality applied to unsupervised field based on self-encoding encoder Download PDF

Info

Publication number
CN109492767A
CN109492767A CN201811330477.0A CN201811330477A CN109492767A CN 109492767 A CN109492767 A CN 109492767A CN 201811330477 A CN201811330477 A CN 201811330477A CN 109492767 A CN109492767 A CN 109492767A
Authority
CN
China
Prior art keywords
data
self
encoding encoder
threshold value
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811330477.0A
Other languages
Chinese (zh)
Inventor
李锐
于治楼
尹青山
安程治
段强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jinan Inspur Hi Tech Investment and Development Co Ltd
Original Assignee
Jinan Inspur Hi Tech Investment and Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jinan Inspur Hi Tech Investment and Development Co Ltd filed Critical Jinan Inspur Hi Tech Investment and Development Co Ltd
Priority to CN201811330477.0A priority Critical patent/CN109492767A/en
Publication of CN109492767A publication Critical patent/CN109492767A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • G06F11/0709Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a distributed system consisting of a plurality of standalone computer nodes, e.g. clusters, client-server systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Hardware Design (AREA)
  • Biophysics (AREA)
  • Quality & Reliability (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Testing And Monitoring For Control Systems (AREA)

Abstract

The present invention provides a kind of method for detecting abnormality applied to unsupervised field based on self-encoding encoder, belongs to abnormality detection technical field, and legacy data is carried out unsupervised training by self-encoding encoder using the neural network function in self-encoding encoder by the present invention.Obtained model can be used to compress the data newly inputted, and compressed data are used for and compressed training data () is compared before.If compressed error is more than threshold value, it is judged as abnormal data.Data after compressed encoding can more embody the substantive characteristics of data, can catch the feature mode of data, therefore more accurate.

Description

A kind of method for detecting abnormality applied to unsupervised field based on self-encoding encoder
Technical field
The present invention relates to abnormality detection technology more particularly to a kind of exceptions applied to unsupervised field based on self-encoding encoder Detection method.
Background technique
When handling a large amount of high dimensional datas, on the one hand, because data volume is big, variable is more, time cost is very high;It is another Aspect, because variable is excessive, certain key variables features may be covered by other a large amount of characteristics of variables, be eventually led to The Partial key characteristics of variables of progress abnormality processing can not play the role of due.
Abnormality detection is a kind of algorithm being in daily use.It is mainly used to detect whether a data is abnormal data.Abnormal inspection The algorithm of survey has very much.
Abnormality detection is a research direction with very broad prospect of application, is examined in the failure of some engineering fields It surveys, the intrusion detection of the fraud detection of financial field, security fields suffers from extraordinary application scenarios.Abnormality detection is detection Data undesirably, behavior, but Internet era now, the complicated multiplicity of various information, possible a certain item data just have Hundreds of variable causes the difficulty of abnormality detection to increase at geometric multiple.Time cost is very high, this locates us in time It is significantly unfavorable to manage the abnormal conditions generated, it is possible to cause very big loss.
Self-encoding encoder (autoencoder) is a kind of unsupervised deep learning method, is also often used to compressed data.With Classical PCA(pivot in a column) analysis difference, self-encoding encoder is a kind of nonlinear compression method, can be extracted non-linear in data Information.In the occasion of most of self-encoding encoder, the function of compression and decompression is by neural fusion.
Error threshold setting is the key that realize abnormality processing, if threshold value setting is too low, may cause many normal numbers According to abnormal data is mistaken as, if instead threshold value setting is excessively high, it may cause some abnormal datas and be mistaken as normal data.
Summary of the invention
Based on the above content, the invention proposes a kind of applied to unsupervised abnormality detection side of the field based on self-encoding encoder Method, it is more suitable for data variable, without the abnormality detection under the unsupervised environment of label.
In the present invention, the algorithm parameter of self-encoding encoder can be set to default parameters, or can also rule of thumb into Row is adjusted.Self-encoding encoder also has many derivative algorithms, and this kind of algorithms can be similarly used in the method that we introduce.
Using self-encoding encoder, data are subjected to coding further decoding, obtained result is compared with former data, works as error After reaching threshold value, illustrate that the data are larger with the most data difference for constituting self-encoding encoder, it can be determined that for abnormal number According to.
Further,
First with the neural network function in self-encoding encoder, original normal data is subjected to unsupervised instruction by self-encoding encoder Practice.
Obtained model can be used to compress the data newly inputted, and compressed data are used for and compressed instruction Practice data (normal data) to be compared.
If compressed error is more than threshold value, it is judged as abnormal data.Data after compressed encoding can more embody number According to substantive characteristics, the feature mode of data can be caught, therefore more accurate.
Further,
Operating process are as follows:
1) partial history normal data training self-encoding encoder model is first taken;
2) data to be tested are carried out abnormality detection using trained model, and exports result;
5, according to the method described in claim 4, it is characterized in that,
Operating process is broadly divided into two aspects: 1) error threshold is arranged, and 2) detection foundation.
Wherein, the error threshold setting, after referring to that model training is good, holds each data for training sample Row encoding operation is to get to coded data corresponding to these data;It is calculated from the data after these codings average Coded data;Then each training sample data and this average data calculate Euclidean distance to get to one group of number and instruction Practice the consistent distance values of sample;Then average and standard deviation is calculated, threshold value is finally obtained.Threshold value is that average value adds 3 times Or 6 times of standard deviation.
The detection judge whether new data is extremely according to get to after threshold value in next step;Using model to newly into The sample come executes encoding operation, and obtained coded data and the average data data obtained before calculate Euclidean distance;This Distance is compared with threshold value and obtains result.
The beneficial effects of the invention are as follows
The abnormality detection model in current industrial application is improved, can preferably be applied using deep learning in the field of big data Jing Zhong allows abnormality detection to be applied under big data scene.
Algorithm realization is carried out by Major Epidemic programming language.Abnormality detection is industry 4.0, and industry internet field is most heavy One of application wanted plays the role of important technical support in industry internet application to company.
Detailed description of the invention
Fig. 1 is workflow schematic diagram of the invention.
Specific embodiment
More detailed elaboration is carried out to the contents of the present invention below:
Application scenarios of the present invention belong to unsupervised field, so needing gradually to adjust threshold parameter according to the actual situation.
Dynamic encoder is a kind of compression algorithm of data, wherein the compression and decompression function of data be data it is relevant, It is damaging, learn automatically from sample.In the occasion for largely mentioning autocoder, the function of compression and decompression is logical Cross neural fusion.
1) autocoder is that data are relevant (data-specific or data-dependent), it means that from Dynamic encoder can only compress those data similar with training data.Exist for example, training the autocoder come using face Compress other picture, such as poor performance when trees because it learn to be characterized in it is relevant to face.
2) autocoder damages, and means that the output of decompression is to degenerate compared with original input, MP3, The compression algorithms such as JPEG are also such.This is different from lossless compression algorithm.3) autocoder is learned automatically from data sample It practises, it means that be easy to train the input of specified class a kind of specific encoder, without completing any new work Make.
It is carried out abnormality detection using this unsupervised deep learning method of autocoder, this method may be implemented Software view is cured in hardware.This method is applied to edge calculations end or Embedded model, as one The innovative application of kind.
Operating procedure are as follows:
1: first taking a part of history normal data training self-encoding encoder model.This partial data has not needed label.
2: data to be tested being carried out abnormality detection using trained model, and export result.
Specific judgment basis is as follows:
After model training is good, coding (drop is executed to each data for training sample (trained history normal data) Dimension processing) it operates to get coded data corresponding to these data is arrived.It is calculated from the data after these codings average Coded data (column vector or row vector).Then each training sample data and this average data calculate Euclidean distance, Obtain one group of (number is consistent with training sample) distance values.Then average and standard deviation is calculated.Finally obtain threshold value For the standard deviation of average value plus 3 times (or 6 times).This threshold value is used to judge whether the data in future are abnormal data, i.e., super Crossing this threshold value is abnormal data.
It is in next step exactly to judge whether new data is abnormal after obtaining threshold value.The sample newly come in is held using model Row encoding operation (dimension-reduction treatment), obtained coded data and the average data data obtained before calculate Euclidean distance.This Distance is compared with threshold value and obtains result.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all in essence of the invention Within mind and principle, any modification, equivalent substitution, improvement and etc. done be should be included within the scope of the present invention.

Claims (8)

1. a kind of method for detecting abnormality applied to unsupervised field based on self-encoding encoder, which is characterized in that
Using self-encoding encoder, data are subjected to coding further decoding, obtained result are compared with former data, when error reaches After threshold value, illustrates that the data are larger with more than half data differences for constituting self-encoding encoder, be judged as abnormal data.
2. the method according to claim 1, wherein
First with the neural network function in self-encoding encoder, original normal data is subjected to unsupervised instruction by self-encoding encoder Practice.
3. according to the method described in claim 2, it is characterized in that,
The model obtained after training is used to compress the data newly inputted, new compressed data are obtained, after compression Data be used for and compressed training data be compared;If compressed error is more than threshold value, it is judged as abnormal data.
4. according to the method described in claim 3, it is characterized in that,
Operating process are as follows:
1) partial history normal data training self-encoding encoder model is first taken;
2) data to be tested are carried out abnormality detection using trained model, and exports result.
5. according to the method described in claim 4, it is characterized in that,
Operating process is broadly divided into two aspects: 1) error threshold is arranged, and 2) detection foundation.
6. according to the method described in claim 5, it is characterized in that,
Wherein, the error threshold setting, after referring to that model training is good, executes volume to each data for training sample Code operates to arrive coded data corresponding to these data;Average coding is calculated from the data after these codings Data;Then each training sample data and this average data calculate Euclidean distance to get to one group of number and training sample This consistent distance values;Then average and standard deviation is calculated, threshold value is finally obtained.
7. according to the method described in claim 6, it is characterized in that
Threshold value is the standard deviation that average value adds 3 times or 6 times.
8. according to the method described in claim 7, it is characterized in that
The detection judge whether new data is extremely according to get to after threshold value in next step;Using model to newly coming in Sample executes encoding operation, and obtained coded data and the average data data obtained before calculate Euclidean distance;This distance It is compared with threshold value and obtains result.
CN201811330477.0A 2018-11-09 2018-11-09 A kind of method for detecting abnormality applied to unsupervised field based on self-encoding encoder Pending CN109492767A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811330477.0A CN109492767A (en) 2018-11-09 2018-11-09 A kind of method for detecting abnormality applied to unsupervised field based on self-encoding encoder

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811330477.0A CN109492767A (en) 2018-11-09 2018-11-09 A kind of method for detecting abnormality applied to unsupervised field based on self-encoding encoder

Publications (1)

Publication Number Publication Date
CN109492767A true CN109492767A (en) 2019-03-19

Family

ID=65694191

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811330477.0A Pending CN109492767A (en) 2018-11-09 2018-11-09 A kind of method for detecting abnormality applied to unsupervised field based on self-encoding encoder

Country Status (1)

Country Link
CN (1) CN109492767A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110502895A (en) * 2019-08-27 2019-11-26 中国工商银行股份有限公司 Interface exception call determines method and device
CN110796497A (en) * 2019-10-31 2020-02-14 支付宝(杭州)信息技术有限公司 Method and device for detecting abnormal operation behaviors
CN111104241A (en) * 2019-11-29 2020-05-05 苏州浪潮智能科技有限公司 Server memory anomaly detection method, system and equipment based on self-encoder
CN111241688A (en) * 2020-01-15 2020-06-05 北京百度网讯科技有限公司 Method and device for monitoring composite production process
CN111538614A (en) * 2020-04-29 2020-08-14 济南浪潮高新科技投资发展有限公司 Method for detecting time sequence abnormal operation behavior of operating system
CN112395382A (en) * 2020-11-23 2021-02-23 武汉理工大学 Ship abnormal track data detection method and device based on variational self-encoder
CN113632140A (en) * 2019-06-17 2021-11-09 乐人株式会社 Automatic learning method and system for product inspection
CN115293663A (en) * 2022-10-10 2022-11-04 国网山东省电力公司滨州供电公司 Bus unbalance rate abnormity detection method, system and device

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113632140A (en) * 2019-06-17 2021-11-09 乐人株式会社 Automatic learning method and system for product inspection
CN110502895A (en) * 2019-08-27 2019-11-26 中国工商银行股份有限公司 Interface exception call determines method and device
CN110796497A (en) * 2019-10-31 2020-02-14 支付宝(杭州)信息技术有限公司 Method and device for detecting abnormal operation behaviors
CN111104241A (en) * 2019-11-29 2020-05-05 苏州浪潮智能科技有限公司 Server memory anomaly detection method, system and equipment based on self-encoder
CN111241688A (en) * 2020-01-15 2020-06-05 北京百度网讯科技有限公司 Method and device for monitoring composite production process
CN111241688B (en) * 2020-01-15 2023-08-25 北京百度网讯科技有限公司 Method and device for monitoring composite production process
CN111538614A (en) * 2020-04-29 2020-08-14 济南浪潮高新科技投资发展有限公司 Method for detecting time sequence abnormal operation behavior of operating system
CN111538614B (en) * 2020-04-29 2024-04-05 山东浪潮科学研究院有限公司 Time sequence abnormal operation behavior detection method of operating system
CN112395382A (en) * 2020-11-23 2021-02-23 武汉理工大学 Ship abnormal track data detection method and device based on variational self-encoder
CN115293663A (en) * 2022-10-10 2022-11-04 国网山东省电力公司滨州供电公司 Bus unbalance rate abnormity detection method, system and device

Similar Documents

Publication Publication Date Title
CN109492767A (en) A kind of method for detecting abnormality applied to unsupervised field based on self-encoding encoder
CN109413028B (en) SQL injection detection method based on convolutional neural network algorithm
CN109408389B (en) Code defect detection method and device based on deep learning
CN109034140B (en) Industrial control network signal abnormity detection method based on deep learning structure
CN113242207B (en) Iterative clustering network flow abnormity detection method
CN115033895A (en) Binary program supply chain safety detection method and device
Soukup et al. Reliably decoding autoencoders’ latent spaces for one-class learning image inspection scenarios
US11727052B2 (en) Inspection systems and methods including image retrieval module
CN114626426A (en) Industrial equipment behavior detection method based on K-means optimization algorithm
CN117574383A (en) Feature fusion and code visualization technology-based software vulnerability detection model method
CN116597635B (en) Wireless communication intelligent gas meter controller and control method thereof
CN115017015B (en) Method and system for detecting abnormal behavior of program in edge computing environment
CN114936615B (en) Small sample log information anomaly detection method based on characterization consistency correction
CN116680639A (en) Deep-learning-based anomaly detection method for sensor data of deep-sea submersible
Akavalappil et al. A convolutional neural network (CNN)‐based direct method to detect stiction in control valves
CN117333726B (en) Quartz crystal cutting abnormality monitoring method, system and device based on deep learning
CN117521042B (en) High-risk authorized user identification method based on ensemble learning
CN118094549B (en) Malicious behavior identification method based on bimodal fusion of source program and executable code
CN117574782B (en) Method, device, system and medium for judging winding materials based on transformer parameters
CN117237165B (en) Method for detecting fake data
CN116384797A (en) Digital infrastructure health assessment method oriented to data fusion
CN117892777A (en) Decision risk assessment method and system for target detection model
Abdurrazaq et al. Improving performance of network scanning detection through PCA-based feature selection
Wu et al. Automated Anomaly Detection Assisted by Discrimination Model for Time Series
CN118070273A (en) Webshell attack detection method based on graph semantic analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190319

RJ01 Rejection of invention patent application after publication