CN111767273B - Data intelligent detection method and device based on improved SOM algorithm - Google Patents

Data intelligent detection method and device based on improved SOM algorithm Download PDF

Info

Publication number
CN111767273B
CN111767273B CN202010575124.8A CN202010575124A CN111767273B CN 111767273 B CN111767273 B CN 111767273B CN 202010575124 A CN202010575124 A CN 202010575124A CN 111767273 B CN111767273 B CN 111767273B
Authority
CN
China
Prior art keywords
self
sample set
feature mapping
mapping algorithm
organizing feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010575124.8A
Other languages
Chinese (zh)
Other versions
CN111767273A (en
Inventor
胡伟
郭秋婷
黄建平
陈浩
盛银波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
State Grid Corp of China SGCC
Jiaxing Power Supply Co of State Grid Zhejiang Electric Power Co Ltd
Original Assignee
Tsinghua University
State Grid Corp of China SGCC
Jiaxing Power Supply Co of State Grid Zhejiang Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University, State Grid Corp of China SGCC, Jiaxing Power Supply Co of State Grid Zhejiang Electric Power Co Ltd filed Critical Tsinghua University
Priority to CN202010575124.8A priority Critical patent/CN111767273B/en
Publication of CN111767273A publication Critical patent/CN111767273A/en
Application granted granted Critical
Publication of CN111767273B publication Critical patent/CN111767273B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Quality & Reliability (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a data intelligent detection method and device based on an improved SOM algorithm, wherein the method comprises the following steps: acquiring a sample set, decomposing the sample set according to dimensions, detecting one-dimensional isolated points based on density by dimension, primarily screening outliers according to dimensions by the multi-dimensional sample set, and removing the outliers; clustering the sample set based on a self-organizing feature mapping algorithm, and removing abnormal data points; improving a self-organizing feature-based mapping algorithm by a kernel function method, clustering a sample set by the improved self-organizing feature-based mapping algorithm, and removing abnormal data points; and removing abnormal data points in the sample set according to expert experience to finish intelligent detection of the data. According to the method, abnormal data can be removed by means of density-based one-dimensional isolated point detection, the data quality is improved, the influence of sample data nonlinearity can be reduced by introducing a kernel function into weight updating of a self-organizing map algorithm, and the clustering effect of an SOM algorithm is improved.

Description

Data intelligent detection method and device based on improved SOM algorithm
Technical Field
The invention relates to the technical field of big data intelligent detection, in particular to a data intelligent detection method and device based on an improved SOM algorithm.
Background
The data acquisition devices such as various sensors are arranged on the equipment, so that the equipment operation data can be acquired, and the equipment operation state can be monitored. Due to factors such as complex system, bad environment and the like, the data collected by the system has the characteristics of strong nonlinearity, large noise, extremely unstable and the like. Thus, abnormal data detection is one of the important steps in data preprocessing.
The outlier data is also called outliers. The abnormal data may be erroneous data due to equipment failure, erroneous measurement, or the like, or may be a meaningful event corresponding to reality. The erroneous data has an adverse effect on the system operation. If the error data is not found and removed in time, the equipment can be damaged, and potential hidden danger is brought to the system operation.
The phenomena of interruption, deletion, acquisition deviation and the like of the data acquisition of the existing automatic service system are common. The task of basic data detection is heavy, but the current system has insufficient self-detection capability and mainly depends on manual detection. The manual error detection not only consumes a great deal of manpower and time, but also cannot guarantee the accuracy of the manual error detection.
Disclosure of Invention
The present invention aims to solve at least one of the technical problems in the related art to some extent.
Therefore, an object of the present invention is to provide an improved SOM algorithm-based data intelligent detection method, which uses density-based one-dimensional isolated point detection to remove abnormal data, improve data quality, and introduce a kernel function into weight update of a self-organizing map algorithm to reduce the nonlinear influence of sample data and improve the clustering effect of the SOM algorithm.
Another object of the present invention is to provide a data intelligent detection device based on an improved SOM algorithm.
In order to achieve the above objective, an embodiment of an aspect of the present invention provides a data intelligent detection method based on an improved SOM algorithm, including:
obtaining a sample set, decomposing the sample set according to dimensions, detecting one-dimensional isolated points based on density by dimensions, primarily screening outliers according to dimensions by the multi-dimensional sample set, and removing the outliers;
clustering the sample set based on a self-organizing feature mapping algorithm, and removing abnormal data points;
the self-organizing feature mapping algorithm is improved through a kernel function method, a sample set is clustered through the improved self-organizing feature mapping algorithm, and abnormal data points are removed;
and removing abnormal data points in the sample set according to expert experience to finish intelligent detection of the data.
In order to achieve the above object, another embodiment of the present invention provides an intelligent data detection device based on an improved SOM algorithm, including:
the first rejecting module is used for acquiring a sample set, decomposing the sample set according to dimensions, detecting one-dimensional isolated points based on density by one dimension, primarily screening outliers according to dimensions by the multi-dimensional sample set, and rejecting the outliers;
the second rejecting module is used for rejecting abnormal data points by clustering the sample set based on a self-organizing feature mapping algorithm;
the third rejecting module is used for improving the self-organizing feature mapping algorithm through a kernel function method, clustering a sample set through the improved self-organizing feature mapping algorithm, and rejecting abnormal data points;
and the intelligent detection module is used for eliminating abnormal data points in the sample set according to expert experience to finish intelligent detection of the data.
The technical scheme of the invention has the following technical effects:
(1) The intelligent data detection method based on the density and improved self-organizing feature mapping algorithm is established, abnormal data can be removed, and the data quality is improved.
(2) The kernel function is introduced into the weight updating of the self-organizing map algorithm, so that the influence of sample data nonlinearity can be reduced, and the clustering effect of the SOM algorithm is improved.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The foregoing and/or additional aspects and advantages of the invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings, in which:
FIG. 1 is a flow chart of a method for intelligent detection of data based on an improved SOM algorithm according to one embodiment of the present invention;
FIG. 2 is a flow chart of a method of improved SOM algorithm data detection in accordance with one embodiment of the present invention;
fig. 3 is a schematic structural diagram of a data intelligent detection device based on an improved SOM algorithm according to an embodiment of the present invention.
Detailed Description
Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative and intended to explain the present invention and should not be construed as limiting the invention.
The following describes a data intelligent detection method and device based on an improved SOM algorithm according to an embodiment of the present invention with reference to the accompanying drawings.
First, a data intelligent detection method based on an improved SOM algorithm according to an embodiment of the present invention will be described with reference to the accompanying drawings.
Fig. 1 is a flowchart of a method for intelligently detecting data based on an improved SOM algorithm according to an embodiment of the present invention.
As shown in fig. 1, the data intelligent detection method based on the improved SOM algorithm comprises the following steps:
step S1, a sample set is obtained, the sample set is decomposed according to dimensions, one-dimensional isolated point detection based on density is carried out dimension by dimension, the multi-dimensional sample set is subjected to preliminary screening on outliers according to dimensions, and outliers are removed.
Further, the multi-dimensional sample set is subjected to preliminary screening on outliers according to dimensions, and the outliers are removed in a mode that sample points with Euclidean distance between two points larger than a preset neighborhood are removed.
In particular, the DBSCAN algorithm is one of the most widely used density-based clustering algorithms. The basic idea of the algorithm is: for each object in the cluster, the number of objects contained in a given epsilon neighborhood must not be less than a given value (MinPts), that is, the density of its neighborhood must not be less than a certain threshold. The algorithm utilizes the high-density connectivity of the classes to divide the areas with high enough density into the classes, and can find clusters with arbitrary shapes in the noisy spatial database.
Referring to the DBSCAN algorithm, the embodiment of the invention primarily screens outliers through a density-based one-dimensional isolated point detection algorithm. The algorithm comprises the following steps:
(1) The input sample x, which has dimension M and sample size n. Two parameters of the algorithm are set: epsilon-neighborhood radius epsilon and threshold MinPts;
(2) Let variable i=1 representing the dimension;
(3) Taking the I-th dimension of x, denoted as x I =[x I1 ,x I2 ,...,x In ];
(4) Will x I1 ,x I2 ,...,x In Ascending order to obtain new sequence y I =[y I1 ,y I2 ,...,y In ];
(5) Let k=1, mark all data as "undetected";
(6) Calculating y Ik And y is Ii I=1, 2,.. i =||y Ik -y Ii I, get satisfied D i Epsilon, i.e. falling at y Ik Sample size N in epsilon-neighborhood;
i) If N=1, i.e. y Ik Does not contain other sample points in the epsilon-neighborhood than itself, will y Ik Marking as "detected" and marking the value in its corresponding sequence as "outlier";
ii) if 1<N<MinPts+1, then y Ik Less than a threshold, if the cluster-merging requirement is not met, y Ik Labeled "detected" and its corresponding sequence of values labeled "outliers". It should be noted that in this case a false positive may occur at the boundary point, but may be corrected by the data point following it;
iii) If N is greater than or equal to MinPts+1, y Ik The object in epsilon-neighborhood of (2) satisfies the threshold condition, y Ik And samples in their epsilon-neighborhood are in the same cluster, so y Ik And none of the sample points in their epsilon-neighborhood are outliers, will y Ik And the sample points in the epsilon-neighborhood are marked as 'detected', and the values in the corresponding sequences are marked as 'normal points';
(7) Let k equal the minimum of the values marked as "undetected", repeat step (6) until all values are marked as "detected";
(8) I=i+1, repeating steps (3) to (7) until I > M.
In summary, step S1 adopts a density-based one-dimensional isolated point detection method to perform preliminary screening on outliers according to dimensions on multidimensional data, eliminates obvious outliers in the data, and reduces the clustering effect of data noise on a second-stage self-organizing feature mapping (SOM) algorithm.
And S2, clustering the sample set based on a self-organizing feature mapping algorithm, and removing abnormal data points.
The Self-Organizing Map (SOM) learning algorithm is a competition-free learning algorithm. The SOM network is composed of an input layer and an output layer, wherein the output layer is a two-dimensional grid. The input layer is made up of N neurons for receiving an external N-dimensional input vector. The output layer (competing layer) is typically arranged in a one-dimensional or two-dimensional planar arrangement, consisting of M neurons, for the nodes of the input layerMapped onto the contention layer node. All nodes of input layer and all nodes of competition layer use weight w ij (i=1, 2, …, N; j=1, 2, …, M) and the connection weights are dynamically updated during the network training process.
For each input vector, competition is generated between neurons by comparison between the input vector value and the weight value, and the neuron with the weight vector closest to the input pattern is considered to be most responsive to the input pattern, and is designated as the winning neuron. The winning neuron not only strengthens itself, but also brings surrounding adjacent neurons to be strengthened, while suppressing surrounding farther neurons.
For L N-dimensional input vectors x k =(x 1k ,x 2k ,···,x Nk ) T K=1, 2, ··, the specific steps of the L algorithm are as follows:
(1) And determining the SOM network topology structure, wherein the number of neurons of an input layer is N, and the number of neurons of an output layer is M.
(2) Setting t=0, initializing a weight matrix w j (0) (j=1, 2, …, M) to which a random value is given. The only limitation here is w j (0) (j=1, 2, …, M) are different from each other. It is generally desirable to keep the weights small. Another algorithm initialization method is to randomly select a weight vector from the available set of input vectors.
(3) Providing an input vector x for a network k (t)=(x 1k ,x 2k ,x 3k ,...,x Nk ) T K is more than or equal to 1 and less than or equal to L. To eliminate the influence of dimension, the input data should be normalized first.
(4) Calculating the distance between the current input vector and the competitive layer neuron, and selecting the neuron with the smallest distance as the winning neuron
Figure BDA0002551062770000041
(5) The weight vector of the winning neuron and the neurons in the neighborhood range is adjusted as follows:
Figure BDA0002551062770000042
eta (t) is the learning rate parameter and ranges from 0<η(t)<1, decreasing with time. N (N) q (t) is the neighborhood radius of the winning neuron q, also decreasing with time. The direct result of updating equation (1) is that the weight vector of winning neuron q moves toward the input vector, which also has an effect on the movement of neighbor neuron j within range.
(6) And (3) judging whether the input vectors are all provided for the network, if so, turning to the next step, and otherwise, returning to the step (3).
(7) The learning rate and neighborhood radius are updated.
Figure BDA0002551062770000051
Figure BDA0002551062770000052
Wherein eta (0) is the initial learning rate, N q (0) Is the initial neighborhood radius.
(8) Let t=t+1, judge whether the iteration number reaches the predetermined total iteration number T, if yes, the algorithm ends, otherwise go back to step (3).
In summary, step S2 adopts a self-organizing feature mapping algorithm to cluster the data, and further screens out abnormal data points.
And S3, improving the self-organizing feature mapping algorithm by a kernel function method, clustering the sample set by the improved self-organizing feature mapping algorithm, and removing abnormal data points.
Further, the self-organizing feature mapping algorithm is improved by a kernel function, including changing winning rules and weight adjustment formulas based on neurons in the self-organizing feature mapping algorithm.
Specifically, by changing the calculation method of the distance between the current input vector and the output layer neuron based on the self-organizing feature mapping algorithm, the winning rule of the neuron is changed, and then a kernel function is utilized to obtain a weight adjustment formula.
The self-organizing feature mapping SOM algorithm has a higher convergence speed and can converge to a smaller error. However, as can be seen from the formula (1), in the SOM algorithm, the adjustment of the winning neuron q and its neighborhood depends on X to the weights w of the neurons j Euclidean distance between the two j I. Thus, when the boundary of the input sample is linearly inseparable and the class distribution is a non-gaussian distribution or a non-elliptical distribution, the SOM classifier has poor classification effect. The core approach offers the possibility to solve the above-mentioned problems.
The kernel method can effectively solve the problem of nonlinearity of the input sample. The essence of kernel-based learning is to transform the non-linear problem in the low-dimensional input space into a more easily solved linear problem in the high-dimensional (even infinite-dimensional) feature space by means of kernel-induced implicit mapping, and to characterize it in the form of an inner product.
The kernel method is introduced into the distance metric and weight update formula for determining winning neurons. Because of the flexibility and diversity of the cores, SOM algorithms based on different distance metrics and weight update formulas can be derived.
Defining a nonlinear mapping phi X-phi (X) epsilon F, wherein X epsilon R, R is a sample set, and F is a feature space. The euclidean distance may be replaced by a formalized objective function:
J(w j )=||Φ(X)-Φ(w j )|| 2 (4)
minimum value is found, where the norm in equation (4) can be written as:
||Φ(X)-Φ(w j )|| 2 =Φ(X) T Φ(X)+Φ(w j ) T Φ(w j )-2Φ(X) T Φ(w j ) (5)
each of these can be seen as an inner product in feature space, again in terms of the definition of a kernel that satisfies the Mercer condition:
K(x i ,x j )=Φ(x i ) T Φ(x j ) (6)
substituting formula (6) into (5) includes:
J(w j )=||Φ(X)-Φ(w j )|| 2 =K(X,X)+K(w j ,w j )-2K(X,w j ) (7)
solving for a function J (w j ) As the minimum value, a gradient descent method can be used. Derived to w j Is a new adjustment formula:
Figure BDA0002551062770000061
according to the flexibility of kernel mapping, different kernel functions can induce different distance metrics, and different kernel functions determine different neuron winning rules and weight adjustment formulas. The following are 4 classical kernel functions that meet the Mercer condition:
polynomial K (x, y) = (x) T ·y) d ,d≥2 (9)
Radial basis
Figure BDA0002551062770000062
Cauchy
Figure BDA0002551062770000063
The number of the logarithms K (x, y) =log (1 +: i x-y i 22 ) (12)
By substituting equations (9) to (12) into equation (8), a KSOM weight adjustment equation based on the above four kernel functions can be obtained:
w j (t+1)=w j (t)-η(t)(2d(w j (t) T ) d-1 w j -(x T w j (t)) d-1 x) (13)
Figure BDA0002551062770000064
Figure BDA0002551062770000065
Figure BDA0002551062770000066
under the new distance metric, the winning neuron q is redefined:
Figure BDA0002551062770000067
the modified SOM algorithm is different from the winning rule and the weight adjustment formula of the winning neuron, the rest of the algorithm is unchanged, and the specific flow is shown in figure 2.
In conclusion, the method of introducing the kernel function improves the self-organizing feature mapping algorithm, improves the weight calculation part in the original algorithm, and can improve the data clustering effect. Furthermore, the SOM algorithm is improved by adopting a kernel function, so that rapid convergence can be realized, and abrupt change values in data can be detected and identified.
And S4, eliminating abnormal data points in the sample set according to expert experience, and finishing intelligent detection of the data.
Judging the abnormal class according to expert experience, wherein members of the abnormal class are abnormal data, and providing the abnormal data in the data to finish intelligent detection of the data.
According to the data intelligent detection method based on the improved SOM algorithm, a sample set is decomposed according to dimensions, isolated point detection based on density is carried out dimension by dimension, and sample points with Euclidean distance between two points larger than a set neighborhood are removed; the method is characterized in that the abnormal data detection is realized by adopting a clustering analysis based on a kernel function method to improve an SOM algorithm, and a kernel method is introduced to improve a neural network weight adjustment formula, so that the influence of sample data nonlinearity is reduced; judging the abnormal class according to expert experience, wherein the members of the abnormal class are abnormal data. Therefore, based on density and improved self-organizing feature mapping algorithm, abnormal data can be removed, data quality is improved, a kernel function is introduced into weight updating of the self-organizing feature mapping algorithm, influence of sample data nonlinearity can be reduced, and clustering effect of the SOM algorithm is improved.
Next, a data intelligent detection device based on an improved SOM algorithm according to an embodiment of the present invention will be described with reference to the accompanying drawings.
Fig. 3 is a schematic structural diagram of a data intelligent detection device based on an improved SOM algorithm according to an embodiment of the present invention.
As shown in fig. 3, the data intelligent detection device based on the improved SOM algorithm includes: the system comprises a first rejection module 100, a second rejection module 200, a third rejection module 300 and an intelligent detection module 400.
The first rejecting module 100 is configured to obtain a sample set, decompose the sample set according to dimensions, perform density-based one-dimensional isolated point detection on a dimension-by-dimension basis, perform preliminary screening on outliers according to dimensions by using a multi-dimensional sample set, and reject the outliers.
The second culling module 200 is configured to cull abnormal data points by clustering the sample set based on a self-organizing feature mapping algorithm.
And the third eliminating module 300 is configured to improve the self-organizing feature mapping algorithm by a kernel function method, cluster the sample set by the improved self-organizing feature mapping algorithm, and eliminate abnormal data points.
The intelligent detection module 400 is configured to reject abnormal data points in the sample set according to expert experience, and complete intelligent detection of data.
Further, in one embodiment of the present invention, the outliers are culled, including: and eliminating sample points with Euclidean distance between two points larger than the preset neighborhood.
Further, in one embodiment of the present invention, the improvement of the self-organizing feature-based mapping algorithm by a kernel function method comprises: the winning rules and weight adjustment formulas based on neurons in the self-organizing feature mapping algorithm are changed.
Further, in one embodiment of the invention, different kernel functions determine different neuron winning rules and weight adjustment formulas.
It should be noted that the foregoing explanation of the embodiment of the data intelligent detection method based on the improved SOM algorithm is also applicable to the apparatus of this embodiment, and will not be repeated here.
According to the data intelligent detection device based on the improved SOM algorithm, a sample set is decomposed according to dimensions, isolated point detection based on density is carried out dimension by dimension, and sample points with Euclidean distance between two points larger than a set neighborhood are removed; the method is characterized in that the abnormal data detection is realized by adopting a clustering analysis based on a kernel function method to improve an SOM algorithm, and a kernel method is introduced to improve a neural network weight adjustment formula, so that the influence of sample data nonlinearity is reduced; judging the abnormal class according to expert experience, wherein the members of the abnormal class are abnormal data. Therefore, based on density and improved self-organizing feature mapping algorithm, abnormal data can be removed, data quality is improved, a kernel function is introduced into weight updating of the self-organizing feature mapping algorithm, influence of sample data nonlinearity can be reduced, and clustering effect of the SOM algorithm is improved.
Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In the description of the present invention, the meaning of "plurality" means at least two, for example, two, three, etc., unless specifically defined otherwise.
In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.
While embodiments of the present invention have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the invention, and that variations, modifications, alternatives and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the invention.

Claims (4)

1. The intelligent data detection method based on the improved self-organizing feature mapping algorithm is characterized by comprising the following steps of:
acquiring a nonlinear sample set through a sensor arranged on equipment, decomposing the sample set according to dimensions, detecting one-dimensional isolated points based on density by dimension, primarily screening outliers according to dimensions by the multidimensional sample set, and eliminating the outliers, wherein the sample set comprises equipment operation data, and eliminating the outliers comprises eliminating sample points with Euclidean distance between two points larger than a preset neighborhood;
clustering the sample set after outlier removal by a self-organizing feature mapping algorithm, and removing a first abnormal data point;
changing a winning rule of neurons in the self-organizing feature mapping algorithm by changing a calculation method of distances between a current input vector and neurons of an output layer based on the self-organizing feature mapping algorithm, improving a weight adjustment formula in the self-organizing feature mapping algorithm by a kernel function method, clustering a sample set with first abnormal data points removed by the improved self-organizing feature mapping algorithm, and removing second abnormal data points, wherein the distances between the current input vector and neurons of the output layer based on the self-organizing feature mapping algorithm are calculated in the self-organizing feature mapping algorithm, the neurons with the smallest distances are winning neurons, and the expression of the winning neurons is as follows:
Figure FDA0004095195830000011
wherein x is k (t) is the input vector, k=1, 2, ··, L and L are the number of input vectors, w j (t) is a weight matrix, j=1, 2, …, M is the number of neurons in the output layer;
the weight adjustment formula is:
Figure FDA0004095195830000012
wherein eta (t) is a learning rate parameter, and the range is 0<η(t)<1,N q (t) is the neighborhood radius of the winning neuron q;
and removing the abnormal data points in the sample set from which the second abnormal data points are removed according to expert experience so as to finish intelligent detection of the data.
2. The method for intelligent detection of data based on improved self-organizing feature mapping algorithms of claim 1, wherein different kernel functions determine different neuron winning rules and weight adjustment formulas.
3. An intelligent data detection device based on an improved self-organizing feature mapping algorithm is characterized by comprising:
the first rejecting module is used for acquiring a nonlinear sample set through a sensor arranged on equipment, decomposing the sample set according to dimensions, detecting one-dimensional isolated points based on density by one dimension, primarily screening outliers according to dimensions by the multidimensional sample set, and rejecting the outliers, wherein the sample set comprises equipment operation data, and rejecting the outliers comprises rejecting sample points with Euclidean distance between two points larger than a preset neighborhood;
the second eliminating module is used for eliminating the first abnormal data points by clustering the sample set after the outlier is eliminated based on the self-organizing feature mapping algorithm;
a third rejecting module, configured to change a winning rule of neurons in the self-organizing feature mapping algorithm by changing a calculation method of a distance between a current input vector and neurons of an output layer of the self-organizing feature mapping algorithm, improve a weight adjustment formula in the self-organizing feature mapping algorithm by a kernel function method, cluster a sample set from which a first abnormal data point is rejected by the improved self-organizing feature mapping algorithm, reject a second abnormal data point,in the self-organizing feature mapping algorithm, the distance between the current input vector and the neuron of the output layer based on the self-organizing feature mapping algorithm is calculated, the neuron with the smallest distance is a winning neuron, and the expression of the winning neuron is as follows:
Figure FDA0004095195830000021
wherein x is k (t) is the input vector, k=1, 2, ··, L and L are the number of input vectors, w j (t) is a weight matrix, j=1, 2, …, M is the number of neurons in the output layer;
the weight adjustment formula is:
Figure FDA0004095195830000022
wherein eta (t) is a learning rate parameter, and the range is 0<η(t)<1,N q (t) is the neighborhood radius of the winning neuron q;
and the intelligent detection module is used for removing the abnormal data points in the sample set from which the second abnormal data points are removed according to expert experience so as to finish intelligent detection of the data.
4. The intelligent data detection device based on improved self-organizing feature mapping algorithm as recited in claim 3, wherein different kernel functions determine different neuron winning rules and weight adjustment formulas.
CN202010575124.8A 2020-06-22 2020-06-22 Data intelligent detection method and device based on improved SOM algorithm Active CN111767273B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010575124.8A CN111767273B (en) 2020-06-22 2020-06-22 Data intelligent detection method and device based on improved SOM algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010575124.8A CN111767273B (en) 2020-06-22 2020-06-22 Data intelligent detection method and device based on improved SOM algorithm

Publications (2)

Publication Number Publication Date
CN111767273A CN111767273A (en) 2020-10-13
CN111767273B true CN111767273B (en) 2023-05-23

Family

ID=72721407

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010575124.8A Active CN111767273B (en) 2020-06-22 2020-06-22 Data intelligent detection method and device based on improved SOM algorithm

Country Status (1)

Country Link
CN (1) CN111767273B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114527249B (en) * 2022-01-17 2024-03-19 南方海洋科学与工程广东省实验室(广州) Quality control method and system for water quality monitoring data

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102645580A (en) * 2012-03-16 2012-08-22 清华大学 Intelligent detection method for forward direction active energy incremental data of ammeter
CN107657266A (en) * 2017-08-03 2018-02-02 华北电力大学(保定) A kind of load curve clustering method based on improvement spectrum multiple manifold cluster
CN110674940A (en) * 2019-09-18 2020-01-10 上海擎创信息技术有限公司 Multi-index anomaly detection method based on neural network

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20120070299A (en) * 2010-12-21 2012-06-29 한국전자통신연구원 Apparatus and method for generating adaptive security model

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102645580A (en) * 2012-03-16 2012-08-22 清华大学 Intelligent detection method for forward direction active energy incremental data of ammeter
CN107657266A (en) * 2017-08-03 2018-02-02 华北电力大学(保定) A kind of load curve clustering method based on improvement spectrum multiple manifold cluster
CN110674940A (en) * 2019-09-18 2020-01-10 上海擎创信息技术有限公司 Multi-index anomaly detection method based on neural network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Adaptive Multiple Kernel Self-organizing Maps for Hyperspectral Image Classification;Noha S. Khattab等;《Proceedings of the 8th International Conference on Computer Modeling and Simulation》;20170123;119-124 *
基于核函数的SOM及在齿轮故障聚类识别中的应用;廖广兰等;《湖北工学院学报》;20021231;第17卷(第04期);11-14 *

Also Published As

Publication number Publication date
CN111767273A (en) 2020-10-13

Similar Documents

Publication Publication Date Title
Omran et al. Differential evolution methods for unsupervised image classification
CN109816031B (en) Transformer state evaluation clustering analysis method based on data imbalance measurement
US11113597B2 (en) Artificial neural network and method of training an artificial neural network with epigenetic neurogenesis
CN111310139B (en) Behavior data identification method and device and storage medium
KR20010073042A (en) Image classification using evolved parameters
US7716152B2 (en) Use of sequential nearest neighbor clustering for instance selection in machine condition monitoring
CN108171119B (en) SAR image change detection method based on residual error network
CN111311702B (en) Image generation and identification module and method based on BlockGAN
CN112800115B (en) Data processing method and data processing device
CN114037001A (en) Mechanical pump small sample fault diagnosis method based on WGAN-GP-C and metric learning
CN111767273B (en) Data intelligent detection method and device based on improved SOM algorithm
CN116776245A (en) Three-phase inverter equipment fault diagnosis method based on machine learning
US11972552B2 (en) Abnormal wafer image classification
CN111814883A (en) Label noise correction method based on heterogeneous integration
CN110222778A (en) Online multi-angle of view classification method, system, device based on depth forest
Marian et al. Software defect detection using self-organizing maps
CN115827932A (en) Data outlier detection method, system, computer device and storage medium
CN114528906A (en) Fault diagnosis method, device, equipment and medium for rotary machine
JP2023018316A (en) Image recognition method, image recognition device, and image recognition program
CN111461199A (en) Security attribute selection method based on distributed junk mail classified data
CN115185814B (en) Multi-defect positioning method, system and equipment based on two-dimensional program frequency spectrum
CN115017125B (en) Data processing method and device for improving KNN method
Chong et al. Outliers Removed via spectral clustering for robust model fitting
EP4202780A1 (en) Box-based detection and representation of decision regions of ai based classification algorithms
Albertini et al. Estimating data stream tendencies to adapt clustering parameters

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant