CN111767273A - Data intelligent detection method and device based on improved SOM algorithm - Google Patents

Data intelligent detection method and device based on improved SOM algorithm Download PDF

Info

Publication number
CN111767273A
CN111767273A CN202010575124.8A CN202010575124A CN111767273A CN 111767273 A CN111767273 A CN 111767273A CN 202010575124 A CN202010575124 A CN 202010575124A CN 111767273 A CN111767273 A CN 111767273A
Authority
CN
China
Prior art keywords
algorithm
sample set
self
data
improved
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010575124.8A
Other languages
Chinese (zh)
Other versions
CN111767273B (en
Inventor
胡伟
郭秋婷
黄建平
陈浩
盛银波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
State Grid Corp of China SGCC
Jiaxing Power Supply Co of State Grid Zhejiang Electric Power Co Ltd
Original Assignee
Tsinghua University
State Grid Corp of China SGCC
Jiaxing Power Supply Co of State Grid Zhejiang Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University, State Grid Corp of China SGCC, Jiaxing Power Supply Co of State Grid Zhejiang Electric Power Co Ltd filed Critical Tsinghua University
Priority to CN202010575124.8A priority Critical patent/CN111767273B/en
Publication of CN111767273A publication Critical patent/CN111767273A/en
Application granted granted Critical
Publication of CN111767273B publication Critical patent/CN111767273B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Quality & Reliability (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a data intelligent detection method and a data intelligent detection device based on an improved SOM algorithm, wherein the method comprises the following steps: acquiring a sample set, decomposing the sample set according to dimensions, carrying out density-based one-dimensional isolated point detection on the sample set one by one, preliminarily screening outliers of the multi-dimensional sample set according to the dimensions, and removing the outliers; clustering the sample set based on a self-organizing feature mapping algorithm, and removing abnormal data points; improving a self-organizing feature mapping algorithm by a kernel function method, clustering a sample set by the improved self-organizing feature mapping algorithm, and removing abnormal data points; and (4) rejecting abnormal data points in the sample set according to expert experience to finish intelligent detection of the data. According to the method, abnormal data can be eliminated by using density-based one-dimensional isolated point detection, the data quality is improved, the influence of sample data nonlinearity can be reduced by introducing the kernel function to weight updating of the self-organizing mapping algorithm, and the clustering effect of the SOM algorithm is improved.

Description

Data intelligent detection method and device based on improved SOM algorithm
Technical Field
The invention relates to the technical field of big data intelligent detection, in particular to a data intelligent detection method and device based on an improved SOM algorithm.
Background
The equipment is provided with various data acquisition devices such as sensors and the like, so that the running data of the equipment can be acquired, and the running state of the equipment can be monitored. Due to the factors of complex system, severe environment and the like, the data acquired by the system has the characteristics of strong nonlinearity, large noise, extreme instability and the like. Therefore, anomalous data detection is one of the important steps in data preprocessing.
Outliers are also called outliers. The abnormal data may be erroneous data due to a failure of the apparatus, an error in artificial measurement, or the like, or may be a meaningful event corresponding to reality. Erroneous data can have an adverse effect on system operation. If the error data is not discovered and removed in time, the device may be damaged, and potential hidden danger is brought to the system operation.
The phenomena of interruption, loss, acquisition deviation and the like of data acquisition of the conventional automatic service system generally exist. The task of basic data detection is heavy, and the self-detection capability of the current system is insufficient, and manual detection is mainly relied on. The manual error detection not only consumes a great deal of labor and time, but also cannot ensure the accuracy of the manual error detection.
Disclosure of Invention
The present invention is directed to solving, at least to some extent, one of the technical problems in the related art.
Therefore, one purpose of the invention is to provide a data intelligent detection method based on an improved SOM algorithm, which can eliminate abnormal data by using density-based one-dimensional isolated point detection, improve the data quality, reduce the nonlinear influence of sample data by introducing a kernel function into weight updating of a self-organizing mapping algorithm, and improve the clustering effect of the SOM algorithm.
Another objective of the present invention is to provide an intelligent data detection device based on an improved SOM algorithm.
In order to achieve the above object, an embodiment of the invention provides an intelligent data detection method based on an improved SOM algorithm, including:
acquiring a sample set, decomposing the sample set according to dimensions, carrying out density-based one-dimensional isolated point detection on the sample set one by one, preliminarily screening outliers of the multi-dimensional sample set according to the dimensions, and removing the outliers;
clustering the sample set based on a self-organizing feature mapping algorithm, and removing abnormal data points;
improving the self-organizing feature mapping algorithm by a kernel function method, clustering a sample set by the improved self-organizing feature mapping algorithm, and removing abnormal data points;
and (4) rejecting abnormal data points in the sample set according to expert experience to finish intelligent detection of the data.
In order to achieve the above object, another embodiment of the present invention provides an intelligent data detection device based on an improved SOM algorithm, including:
the first removing module is used for acquiring a sample set, decomposing the sample set according to dimensions, carrying out one-dimensional isolated point detection based on density on a dimension-by-dimension basis, preliminarily screening outliers of the multi-dimensional sample set according to the dimensions, and removing the outliers;
the second eliminating module is used for clustering the sample set based on the self-organizing feature mapping algorithm and eliminating abnormal data points;
the third eliminating module is used for improving the self-organization characteristic mapping algorithm through a kernel function method, clustering the sample set through the improved self-organization characteristic mapping algorithm and eliminating abnormal data points;
and the intelligent detection module is used for eliminating abnormal data points in the sample set according to expert experience to finish intelligent detection of the data.
The technical scheme of the invention has the following technical effects:
(1) the intelligent data detection method based on the density and improved self-organizing feature mapping algorithm is established, abnormal data can be eliminated, and data quality is improved.
(2) The kernel function is introduced into weight updating of the self-organizing mapping algorithm, so that the influence of sample data nonlinearity can be reduced, and the clustering effect of the SOM algorithm is improved.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a flow chart of a method for intelligently detecting data based on an improved SOM algorithm according to an embodiment of the invention;
FIG. 2 is a flow chart of an improved SOM algorithm data detection method according to one embodiment of the present invention;
fig. 3 is a schematic structural diagram of a data intelligent detection device based on an improved SOM algorithm according to an embodiment of the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.
The following describes a data intelligent detection method and device based on an improved SOM algorithm according to an embodiment of the present invention with reference to the accompanying drawings.
First, a data intelligent detection method based on an improved SOM algorithm according to an embodiment of the present invention will be described with reference to the accompanying drawings.
Fig. 1 is a flow chart of a data intelligent detection method based on an improved SOM algorithm according to an embodiment of the present invention.
As shown in FIG. 1, the intelligent data detection method based on the improved SOM algorithm comprises the following steps:
and step S1, obtaining a sample set, decomposing the sample set according to dimensions, carrying out one-dimensional isolated point detection based on density on a dimension-by-dimension basis, carrying out primary screening on outliers according to the dimensions of the multi-dimensional sample set, and removing the outliers.
Further, the multi-dimensional sample set is used for preliminarily screening outliers according to dimensions, and the mode of removing the outliers can be that sample points with Euclidean distance between two points larger than a preset neighborhood are removed.
In particular, the DBSCAN algorithm is one of the most widely used density-based clustering algorithms. The basic idea of the algorithm is: for each object in a cluster, the number of objects contained in a given neighborhood must not be less than a given value (MinPts), i.e., the density of its neighborhood must not be less than a certain threshold. The algorithm uses the high density connectivity of classes, divides regions with sufficiently high density into classes, and can find clusters of arbitrary shape in noisy spatial databases.
By using the DBSCAN algorithm for reference, the embodiments of the present invention preliminarily screen outliers through a density-based one-dimensional outlier detection algorithm. The algorithm steps are as follows:
(1) a sample x is input, with dimension M and sample size n. Two parameters of the algorithm are set: -neighborhood radius and threshold MinPts;
(2) let the variable I representing the dimension be 1;
(3) taking the I dimension of x, and recording as xI=[xI1,xI2,...,xIn];
(4) X is to beI1,xI2,...,xInArranging in ascending order to obtain a new sequence yI=[yI1,yI2,...,yIn];
(5) Let k equal to 1, mark all data as "not detected";
(6) calculating yIkAnd y Ii1,2, an euclidean distance D of ni=||yIk-yIiI, |, to obtain a compound satisfying DiLess than or equal to, i.e. falling on yIk-the number of samples N within the neighborhood;
i) if N is 1, i.e. yIkIs free of sample points other than itself, will yIkMarking the value as detected and marking the value in the corresponding original sequence as an outlier;
ii) if 1<N<MinPts +1, then yIkLess than a threshold, not meeting the requirement of merging into clusters, will yIkThe label is "detected" and the value in its corresponding original sequence is labeled as "outlier". It should be noted that in this case, the boundary point may be misjudged, but may be corrected by the following data point;
iii) if N.gtoreq.MinPts +1, yIkOf the neighborhood satisfies a threshold condition, yIkAnd-samples in its neighborhood are in the same cluster, so yIkAnd-none of the sample points in the neighborhood are outliers, let yIkAnd-sample points within the neighborhood are marked as "detected" and their corresponding values in the original sequence are marked as "normal points";
(7) repeating step (6) with k equal to the minimum of the values marked as "not detected" until all values are marked as "detected";
(8) repeating steps (3) - (7) until I > M.
In summary, in step S1, a density-based one-dimensional outlier detection method is adopted, the outliers are preliminarily screened according to the dimensions from the multidimensional data, the significant outliers in the data are removed, and the clustering effect of the data noise on the second-stage self-organizing feature mapping (SOM) algorithm is reduced.
And step S2, clustering the sample set based on the self-organizing feature mapping algorithm, and removing abnormal data points.
The Self-Organizing mapping learning algorithm (SOM) is an amateur competitive learning algorithm. The SOM network consists of an input layer and an output layer, the output layer being a two-dimensional grid. The input layer is composed of N neurons for receiving an external N-dimensional input vector. The output layer (competition layer) is usually arranged in a one-dimensional or two-dimensional planar arrangement, and is composed of M neurons for mapping the nodes of the input layer onto the nodes of the competition layer. All nodes of input layer and all nodes of competition layer use weight wijAnd (i is 1,2, …, N, j is 1,2, …, M) connecting, and dynamically updating the connection weight value in the network training process.
For each input vector, competition is generated between neurons by comparison between the input vector value and the weight value, and the neuron whose weight vector is closest to the input pattern is considered to react most strongly to the input pattern, designated as the winning neuron. The winning neuron not only strengthens itself, but also drives the neighboring neurons around to be strengthened, and simultaneously inhibits the distant neurons around.
For L N-dimensional input vectors xk=(x1k,x2k,···,xNk)TThe specific steps of the algorithm, k ═ 1,2, ·, L, are as follows:
(1) and determining the topological structure of the SOM network, wherein the number of neurons in an input layer is N, and the number of neurons in an output layer is M.
(2) Setting t as 0, initializing weight matrix wj(0) (j ═ 1,2, …, M), and a random value is assigned. The only restriction here is wj(0) (j-1, 2, …, M) are different from each other. It is generally desirable to keep the weight small. Another method of algorithm initialization is to randomly select weight vectors from the available set of input vectors.
(3) Providing an input vector x for a networkk(t)=(x1k,x2k,x3k,...,xNk)TAnd k is more than or equal to 1 and less than or equal to L. To eliminate the influence of dimension, the input data should be normalized first.
(4) Calculating the distance between the current input vector and the neuron of the competition layer, and selecting the neuron with the minimum distance as the winning neuron
Figure BDA0002551062770000041
(5) The weight vector of the winning neuron and the neuron in the neighborhood range is adjusted as follows:
Figure BDA0002551062770000042
η (t) is a learning rate parameter, ranging from 0<η(t)<1, decreasing with time. N is a radical ofq(t) is the neighborhood radius of the winning neuron q, also decreasing with time. The direct result of updating equation (1) is that the weight vector of the winning neuron q moves towards the input vector, also contributing to the movement of the neighbor neuron j within range.
(6) And (4) judging whether all the input vectors are provided for the network, if so, turning to the next step, and if not, returning to the step (3).
(7) The learning rate and neighborhood radius are updated.
Figure BDA0002551062770000051
Figure BDA0002551062770000052
Wherein η (0) is the initial learning rate, Nq(0) Is the initial neighborhood radius.
(8) And (5) judging whether the iteration number reaches the preset total iteration number T or not by setting T as T +1, if so, finishing the algorithm, and otherwise, returning to the step (3).
In summary, step S2 uses a self-organizing feature mapping algorithm to cluster the data, and further screens out abnormal data points.
And step S3, improving the self-organizing feature mapping algorithm by a kernel function method, clustering the sample set by the improved self-organizing feature mapping algorithm, and removing abnormal data points.
Further, the self-organizing feature mapping algorithm is improved through a kernel function, and the self-organizing feature mapping algorithm comprises the step of changing winning rules and weight adjusting formulas based on neurons in the self-organizing feature mapping algorithm.
Specifically, the winning rule of the neurons is changed by changing the calculation method of the distance between the current input vector and the neurons of the output layer based on the self-organizing feature mapping algorithm, and then the weight value adjustment formula is obtained by using the kernel function.
The SOM algorithm has a fast convergence speed and can converge to a small error. However, as can be seen from equation (1), in the SOM algorithm, the adjustment of the competitive winning neuron q and its neighborhood depends on X to the weight w of each neuronjBetween them is the Euclidean distance | | X-wjL. Thus, the SOM classifier is poor in classification when the boundary of the input sample is linearly inseparable and the class distribution is non-gaussian or non-elliptical. The nuclear approach offers the possibility to solve the above mentioned problems.
The kernel method can effectively solve the problem of nonlinearity of input samples. The essence of kernel-based learning is to transform the non-linear problem in the low-dimensional input space to a more easily solved linear problem in the high-dimensional (even infinite-dimensional) feature space by kernel-induced implicit mapping and to characterize it in the form of an inner product.
And introducing a kernel method into a distance measurement and weight updating formula for judging the winning neuron. Since the kernels have flexibility and diversity, SOM algorithms based on different distance metrics and weight update formulas can be derived.
And defining a nonlinear mapping phi, wherein X → phi (X) belongs to F, wherein X belongs to R, R is a sample set, and F is a feature space. The euclidean distance may be replaced by a formalized objective function:
J(wj)=||Φ(X)-Φ(wj)||2(4)
its minimum is found, where the norm in equation (4) can be written as:
||Φ(X)-Φ(wj)||2=Φ(X)TΦ(X)+Φ(wj)TΦ(wj)-2Φ(X)TΦ(wj) (5)
each term can be regarded as an inner product in the feature space, and then according to the definition of the kernel function satisfying the Mercer condition:
K(xi,xj)=Φ(xi)TΦ(xj) (6)
substituting equation (6) into equation (5) results in:
J(wj)=||Φ(X)-Φ(wj)||2=K(X,X)+K(wj,wj)-2K(X,wj) (7)
computing a function J (w)j) At a minimum, a gradient descent method may be used. Derived to obtain wjNew adjustment formula of (2):
Figure BDA0002551062770000061
according to the flexibility of kernel mapping, different kernel functions can induce different distance measures, and different kernel functions determine different neuron winning rules and weight adjusting formulas. The following are 4 classical kernel functions that satisfy the Mercer condition:
polynomial K (x, y) ═ xT·y)d,d≥2 (9)
Radial basis
Figure BDA0002551062770000062
Coxixi (Kexi)
Figure BDA0002551062770000063
Logarithm K (x, y) ═ log (1+ | | x-y | | luminance22) (12)
By substituting expressions (9) to (12) for expression (8), the KSOM weight adjustment formula based on the above four kernel functions can be obtained:
wj(t+1)=wj(t)-η(t)(2d(wj(t)T)d-1wj-(xTwj(t))d-1x) (13)
Figure BDA0002551062770000064
Figure BDA0002551062770000065
Figure BDA0002551062770000066
under the new distance metric, redefining the winning neuron q:
Figure BDA0002551062770000067
the modified SOM algorithm is different in winning rules and weight adjustment formulas of the winning neurons, and the rest of the algorithm is unchanged, and the specific flow is shown in fig. 2.
In conclusion, the self-organizing feature mapping algorithm is improved by introducing the kernel function method, and the weight calculation part in the original algorithm is improved, so that the data clustering effect can be improved. Furthermore, the SOM algorithm improved by the kernel function can realize rapid convergence, and mutation values in data are detected and identified.
And step S4, removing abnormal data points in the sample set according to expert experience, and completing intelligent detection of data.
And judging the abnormal class according to expert experience, wherein the members of the abnormal class are abnormal data, and providing the abnormal data in the data to finish the intelligent detection of the data.
According to the data intelligent detection method based on the improved SOM algorithm, provided by the embodiment of the invention, a sample set is decomposed according to dimensionality, isolated point detection based on density is carried out dimension by dimension, and sample points with Euclidean distance between two points larger than a set neighborhood are removed; abnormal data detection is realized by improving clustering analysis of an SOM algorithm based on a kernel function method, a kernel method is introduced to improve a neural network weight value adjustment formula, and the influence of sample data nonlinearity is reduced; and judging the abnormal class according to expert experience, wherein the members of the abnormal class are abnormal data. Therefore, abnormal data can be eliminated and the data quality is improved based on the density and the improved self-organizing feature mapping algorithm, the kernel function is introduced into the weight updating of the self-organizing feature mapping algorithm, the influence of sample data nonlinearity can be reduced, and the clustering effect of the SOM algorithm is improved.
Next, a data intelligent detection device based on an improved SOM algorithm according to an embodiment of the present invention will be described with reference to the accompanying drawings.
Fig. 3 is a schematic structural diagram of a data intelligent detection device based on an improved SOM algorithm according to an embodiment of the present invention.
As shown in fig. 3, the intelligent data detection device based on the improved SOM algorithm comprises: a first culling module 100, a second culling module 200, a third culling module 300 and an intelligent detection module 400.
The first eliminating module 100 is configured to obtain a sample set, decompose the sample set according to dimensions, perform density-based one-dimensional outlier detection on a dimension-by-dimension basis, perform preliminary screening on outliers of the multi-dimensional sample set according to dimensions, and eliminate the outliers.
And the second eliminating module 200 is used for clustering the sample set based on the self-organizing feature mapping algorithm to eliminate abnormal data points.
And a third eliminating module 300, configured to improve a self-organizing feature mapping based algorithm by using a kernel function method, and cluster the sample set by using the improved self-organizing feature mapping based algorithm to eliminate abnormal data points.
And the intelligent detection module 400 is used for eliminating abnormal data points in the sample set according to expert experience to finish intelligent detection of the data.
Further, in one embodiment of the present invention, culling outliers comprises: and eliminating sample points of which the Euclidean distance between the two points is greater than a preset neighborhood.
Further, in an embodiment of the present invention, the self-organizing feature mapping based algorithm is improved by a kernel function method, including: and changing winning rules and weight adjusting formulas of the neurons in the self-organizing feature mapping algorithm.
Further, in one embodiment of the invention, different kernel functions determine different neuron winning rules and weight adjustment formulas.
It should be noted that the foregoing explanation of the embodiment of the intelligent data detection method based on the improved SOM algorithm is also applicable to the apparatus of the embodiment, and is not repeated herein.
According to the data intelligent detection device based on the improved SOM algorithm, provided by the embodiment of the invention, a sample set is decomposed according to dimensionality, isolated point detection based on density is carried out dimension by dimension, and sample points with Euclidean distance between two points larger than a set neighborhood are removed; abnormal data detection is realized by improving clustering analysis of an SOM algorithm based on a kernel function method, a kernel method is introduced to improve a neural network weight value adjustment formula, and the influence of sample data nonlinearity is reduced; and judging the abnormal class according to expert experience, wherein the members of the abnormal class are abnormal data. Therefore, abnormal data can be eliminated and the data quality is improved based on the density and the improved self-organizing feature mapping algorithm, the kernel function is introduced into the weight updating of the self-organizing feature mapping algorithm, the influence of sample data nonlinearity can be reduced, and the clustering effect of the SOM algorithm is improved.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims (10)

1. An intelligent data detection method based on an improved SOM algorithm is characterized by comprising the following steps:
acquiring a sample set, decomposing the sample set according to dimensions, carrying out density-based one-dimensional isolated point detection on the sample set one by one, preliminarily screening outliers of the multi-dimensional sample set according to the dimensions, and removing the outliers;
clustering the sample set based on a self-organizing feature mapping algorithm, and removing abnormal data points;
improving the self-organizing feature mapping algorithm by a kernel function method, clustering a sample set by the improved self-organizing feature mapping algorithm, and removing abnormal data points;
and (4) rejecting abnormal data points in the sample set according to expert experience to finish intelligent detection of the data.
2. The intelligent detection method for data based on improved SOM algorithm according to claim 1, wherein the step of rejecting outliers comprises the following steps: and eliminating sample points of which the Euclidean distance between the two points is greater than a preset neighborhood.
3. The improved SOM algorithm-based data intelligent detection method is characterized in that in the self-organizing feature mapping algorithm, the distance between the current input vector and the neuron of the output layer based on the self-organizing feature mapping algorithm is calculated, the neuron with the smallest distance is a winning neuron, and the expression of the winning neuron is as follows:
Figure FDA0002551062760000011
wherein x isk(t) is the input vector, k is 1,2, L is the number of input vectors, wj(t) is a weight matrix, j is 1,2, …, and M is the number of neurons in the output layer;
the weight value adjustment formula is as follows:
Figure FDA0002551062760000012
wherein η (t) is a learning rate parameter with a range of 0<η(t)<1,Nq(t) is the neighborhood radius of the winning neuron q.
4. The improved SOM algorithm-based intelligent data detection method according to claim 1, wherein the improvement of the self-organizing feature mapping algorithm based on the kernel function method comprises the following steps: and changing winning rules and weight adjusting formulas of the neurons in the self-organizing feature mapping algorithm.
5. The method for intelligently detecting data based on the improved SOM algorithm as claimed in claim 3 or 4, wherein the winning rule of the neuron is changed by changing the calculation method of the distance between the current input vector and the neuron in the output layer based on the self-organizing feature mapping algorithm, and then the weight value adjustment formula is obtained by using the kernel function.
6. The improved SOM algorithm-based intelligent data detection method of claim 5, wherein different kernel functions determine different neuron winning rules and weight adjustment formulas.
7. An intelligent data detection device based on an improved SOM algorithm is characterized by comprising:
the first removing module is used for acquiring a sample set, decomposing the sample set according to dimensions, carrying out one-dimensional isolated point detection based on density on a dimension-by-dimension basis, preliminarily screening outliers of the multi-dimensional sample set according to the dimensions, and removing the outliers;
the second eliminating module is used for clustering the sample set based on the self-organizing feature mapping algorithm and eliminating abnormal data points;
the third eliminating module is used for improving the self-organization characteristic mapping algorithm through a kernel function method, clustering the sample set through the improved self-organization characteristic mapping algorithm and eliminating abnormal data points;
and the intelligent detection module is used for eliminating abnormal data points in the sample set according to expert experience to finish intelligent detection of the data.
8. The improved SOM algorithm-based intelligent data detection device according to claim 7, wherein the outlier elimination comprises: and eliminating sample points of which the Euclidean distance between the two points is greater than a preset neighborhood.
9. The improved SOM algorithm-based data intelligent detection device according to claim 7, wherein the improvement of the self-organizing feature mapping algorithm based on the kernel function method comprises the following steps: and changing winning rules and weight adjusting formulas of the neurons in the self-organizing feature mapping algorithm.
10. The improved SOM algorithm-based intelligent data detection device as claimed in claim 7, wherein different kernel functions determine different neuron winning rules and weight adjustment formulas.
CN202010575124.8A 2020-06-22 2020-06-22 Data intelligent detection method and device based on improved SOM algorithm Active CN111767273B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010575124.8A CN111767273B (en) 2020-06-22 2020-06-22 Data intelligent detection method and device based on improved SOM algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010575124.8A CN111767273B (en) 2020-06-22 2020-06-22 Data intelligent detection method and device based on improved SOM algorithm

Publications (2)

Publication Number Publication Date
CN111767273A true CN111767273A (en) 2020-10-13
CN111767273B CN111767273B (en) 2023-05-23

Family

ID=72721407

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010575124.8A Active CN111767273B (en) 2020-06-22 2020-06-22 Data intelligent detection method and device based on improved SOM algorithm

Country Status (1)

Country Link
CN (1) CN111767273B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113537280A (en) * 2021-05-21 2021-10-22 北京中医药大学 Intelligent manufacturing industry big data analysis method based on feature selection
CN114527249A (en) * 2022-01-17 2022-05-24 南方海洋科学与工程广东省实验室(广州) Water quality monitoring data quality control method and system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120159622A1 (en) * 2010-12-21 2012-06-21 Electronics And Telecommunications Research Institute Method and apparatus for generating adaptive security model
CN102645580A (en) * 2012-03-16 2012-08-22 清华大学 Intelligent detection method for forward direction active energy incremental data of ammeter
CN107657266A (en) * 2017-08-03 2018-02-02 华北电力大学(保定) A kind of load curve clustering method based on improvement spectrum multiple manifold cluster
CN110674940A (en) * 2019-09-18 2020-01-10 上海擎创信息技术有限公司 Multi-index anomaly detection method based on neural network

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120159622A1 (en) * 2010-12-21 2012-06-21 Electronics And Telecommunications Research Institute Method and apparatus for generating adaptive security model
CN102645580A (en) * 2012-03-16 2012-08-22 清华大学 Intelligent detection method for forward direction active energy incremental data of ammeter
CN107657266A (en) * 2017-08-03 2018-02-02 华北电力大学(保定) A kind of load curve clustering method based on improvement spectrum multiple manifold cluster
CN110674940A (en) * 2019-09-18 2020-01-10 上海擎创信息技术有限公司 Multi-index anomaly detection method based on neural network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
NOHA S. KHATTAB等: "Adaptive Multiple Kernel Self-organizing Maps for Hyperspectral Image Classification", 《PROCEEDINGS OF THE 8TH INTERNATIONAL CONFERENCE ON COMPUTER MODELING AND SIMULATION》 *
廖广兰等: "基于核函数的SOM及在齿轮故障聚类识别中的应用", 《湖北工学院学报》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113537280A (en) * 2021-05-21 2021-10-22 北京中医药大学 Intelligent manufacturing industry big data analysis method based on feature selection
CN113537280B (en) * 2021-05-21 2024-07-26 北京中医药大学 Intelligent manufacturing industry big data analysis method based on feature selection
CN114527249A (en) * 2022-01-17 2022-05-24 南方海洋科学与工程广东省实验室(广州) Water quality monitoring data quality control method and system
CN114527249B (en) * 2022-01-17 2024-03-19 南方海洋科学与工程广东省实验室(广州) Quality control method and system for water quality monitoring data

Also Published As

Publication number Publication date
CN111767273B (en) 2023-05-23

Similar Documents

Publication Publication Date Title
Zhang et al. Adaptive kernel density-based anomaly detection for nonlinear systems
US10902563B2 (en) Moran&#39;s / for impulse noise detection and removal in color images
Sobolewski et al. Concept Drift Detection and Model Selection with Simulated Recurrence and Ensembles of Statistical Detectors.
KR20010073042A (en) Image classification using evolved parameters
CN111767273A (en) Data intelligent detection method and device based on improved SOM algorithm
Škrjanc et al. Inner matrix norms in evolving cauchy possibilistic clustering for classification and regression from data streams
CN111310139A (en) Behavior data identification method and device and storage medium
Yu et al. Dynamic background subtraction using histograms based on fuzzy c-means clustering and fuzzy nearness degree
Cirrincione et al. The on-line curvilinear component analysis (onCCA) for real-time data reduction
CN115485740A (en) Abnormal wafer image classification
Saitoh An ensemble model of self-organizing maps for imputation of missing values
Misra et al. Image segmentation using clustering with fireworks algorithm
Garcia-Magarinos et al. Lasso logistic regression, GSoft and the cyclic coordinate descent algorithm: application to gene expression data
López-Rubio et al. Selecting the color space for self-organizing map based foreground detection in video
CN112949524B (en) Engine fault detection method based on empirical mode decomposition and multi-core learning
Shenoy et al. Anamoly detection in wireless sensor networks
Kuok et al. Generative broad Bayesian (GBB) imputer for missing data imputation with uncertainty quantification
CN112418142A (en) Feature selection method based on graph signal processing
Randazzo et al. A new unsupervised neural approach to stationary and non-stationary data
de Zarzà et al. UMAP for Geospatial Data Visualization
Louhi et al. Incremental nearest neighborhood graph for data stream clustering
Lulio et al. Jseg algorithm and statistical ann image segmentation techniques for natural scenes
Hermann et al. Incremental one-class learning using regularized null-space training for industrial defect detection
CN113610121B (en) Cross-domain task deep learning identification method
Honda et al. A novel approach to noise clustering in multivariate fuzzy c-means

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant