CN115017125A - Data processing method and device for improving KNN method - Google Patents

Data processing method and device for improving KNN method Download PDF

Info

Publication number
CN115017125A
CN115017125A CN202210946851.XA CN202210946851A CN115017125A CN 115017125 A CN115017125 A CN 115017125A CN 202210946851 A CN202210946851 A CN 202210946851A CN 115017125 A CN115017125 A CN 115017125A
Authority
CN
China
Prior art keywords
data
data information
information
representing
formula
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210946851.XA
Other languages
Chinese (zh)
Other versions
CN115017125B (en
Inventor
李国权
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chenda Guangzhou Network Technology Co ltd
Original Assignee
Chenda Guangzhou Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chenda Guangzhou Network Technology Co ltd filed Critical Chenda Guangzhou Network Technology Co ltd
Priority to CN202210946851.XA priority Critical patent/CN115017125B/en
Publication of CN115017125A publication Critical patent/CN115017125A/en
Application granted granted Critical
Publication of CN115017125B publication Critical patent/CN115017125B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/176Support for shared access to files; File sharing support
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a data processing method and a data processing device for improving a KNN method, which relate to the technical field of data processing and solve the technical problem of data processing, and the technical scheme is that the data processing method and the data processing device for improving the KNN method comprise the following steps: step one, acquiring data information from database information, and performing dimensionality reduction processing on the acquired data information to acquire low-dimensionality data information; step two, carrying out data information processing on the data information after the dimensionality reduction by improving a KNN algorithm model; evaluating the processed data information through an improved error evaluation function; and fourthly, applying and sharing the data information, and performing remote data information processing and data sharing on the acquired data information. The invention greatly improves the data information processing capability through data dimension reduction, data preprocessing, data mining, error analysis and processing.

Description

Data processing method and device for improving KNN method
Technical Field
The present invention relates to the field of data processing, and more particularly, to a data processing method and apparatus for improving a KNN method.
Background
And data processing is a basic link of system engineering and automatic control. Data processing is throughout various fields of social production and social life. The development of data processing technology and the breadth and depth of its application have greatly influenced the progress of human society development. Data (Data) is a representation of facts, concepts or instructions that can be processed by either manual or automated means. After the data is interpreted and given a certain meaning, it becomes information. Data processing (data processing) is the collection, storage, retrieval, processing, transformation, and transmission of data. The basic purpose of data processing is to extract and derive valuable, meaningful data for certain people from large, possibly chaotic, unintelligible amounts of data.
In the prior art, data information is generally processed by adopting a data statistics method, which improves the data processing capability to a certain extent, but when data information is analyzed and calculated, classification and data information processing are difficult to realize, the whole data information processing capability is poor, and the data information processing method is lagged.
Disclosure of Invention
Aiming at the technical defects, the invention discloses a data processing method and a data processing device for improving a KNN method, which greatly improve the data information processing capacity through data dimension reduction, data preprocessing, data mining, error analysis and processing.
In order to realize the technical effects, the invention adopts the following technical scheme:
a data processing method for improving a KNN method, comprising the steps of:
step one, acquiring data information from database information, and performing dimensionality reduction processing on the acquired data information to acquire low-dimensionality data information;
step two, carrying out data information processing on the data information after the dimensionality reduction through an improved KNN algorithm model, wherein the improved KNN algorithm model comprises a data preprocessing step, a data layering step, a data KNN algorithm calculating step and a convolution fault diagnosis step;
evaluating the processed data information through an improved error evaluation function;
and fourthly, applying and sharing the data information, and performing remote data information processing and data sharing on the acquired data information.
As a further technical scheme of the invention, the dimension reduction processing method comprises the following steps:
(S11) dimension reduction is realized by reconstructing matrix data information, and the number of reconstructed matrix data, data dimension and time delay are set;
(S12) solving the distribution probability of different element libraries through an average mutual information method, and analyzing data characteristics through a correlation algorithm model;
(S13) the dimension of the data information is calculated through a false neighbor method, different data classifications are selected by comparing the dimensions of different data information, the sequence between two different dimensions realizes the comparison between different elements in the database information through a feature pair measurement method, and the formula is as follows:
Figure 937176DEST_PATH_IMAGE001
(1)
in formula (1), R represents a data dimension, n represents a vector,
Figure 651054DEST_PATH_IMAGE002
representing the matrix data information before reconstruction,
Figure 102895DEST_PATH_IMAGE003
representing the reconstructed matrix data information;
Figure 422405DEST_PATH_IMAGE004
and
Figure 908882DEST_PATH_IMAGE005
representing the relationship of false adjacent points among the reconstruction matrix data, r representing data information added after reconstruction, u being the optimal dimensionality of the reconstruction matrix data information, and the difference between the element data dimensionality of the reconstruction matrix data and the data dimensionality after dimensionality reduction being larger than 10 after reconstruction;
and (S14) performing dimension reduction judgment, outputting data information when the dimension reduction data information meets the current requirement, and performing dimension reduction calculation again when the dimension reduction data information does not meet the current requirement.
As a further technical scheme of the invention, the data layering is differential layering, and the differential layering method comprises the following steps:
dividing the data attributes into different attributes according to the number and the types, and sequentially arranging and distributing the attribute data quantity from the top layer to the bottom layer from less to most;
calculating the distance between different data attributes, and assuming that certain data information in the data set is
Figure 262503DEST_PATH_IMAGE006
Data attribute class classification
Figure 185328DEST_PATH_IMAGE007
Figure 180966DEST_PATH_IMAGE008
Figure 256369DEST_PATH_IMAGE009
And
Figure 171105DEST_PATH_IMAGE010
then data information
Figure 456592DEST_PATH_IMAGE011
Distance data attribute categories
Figure 865708DEST_PATH_IMAGE012
Figure 310465DEST_PATH_IMAGE013
Figure 271468DEST_PATH_IMAGE014
And
Figure 185197DEST_PATH_IMAGE010
is a distance of
Figure 991479DEST_PATH_IMAGE015
Figure 553392DEST_PATH_IMAGE016
Carrying out differential calculation on the calculated data information with different data attributes; when in use
Figure 560662DEST_PATH_IMAGE017
In which
Figure 210955DEST_PATH_IMAGE018
Is constant, then data information
Figure 820928DEST_PATH_IMAGE019
Is divided into
Figure 991009DEST_PATH_IMAGE020
And (4) class.
As a further technical scheme of the invention, the data KNN algorithm comprises the following steps:
(S21) selecting a big data information test set, and selecting a test big data information vector set according to different data attributes;
(S22) training a big data information test set to construct an n-layer tree form through hierarchical classification; data search of the big data information test set is realized through an optimal search algorithm;
(S23) sequentially calculating the text similarity of each big data information in the big data information test set and the 1 st-nth layer big data information test set training set;
the formula for calculating the Euclidean distance is as follows:
Figure 293815DEST_PATH_IMAGE021
(2)
in the formula (2), the first and second groups,
Figure 165824DEST_PATH_IMAGE022
a feature vector representing test information in a large test set of data information,
Figure 454854DEST_PATH_IMAGE023
representing big data information in test setA sequence of feature vectors of the test information;
Figure 604076DEST_PATH_IMAGE024
the aggregate center vector is tested for large data information of layer 1 class j,
Figure 202417DEST_PATH_IMAGE025
a class representing big data information; m is the dimension of the feature vector of the big data information test set;
Figure 312455DEST_PATH_IMAGE026
testing the kth dimension of the set vector for big data information;
Figure 264230DEST_PATH_IMAGE027
representing a large data information test set vector of a jth class of a first layer in a kth dimension;
(S24) selecting the text most similar to the test text from the training text set according to the text similarity
Figure 661101DEST_PATH_IMAGE028
A text;
(S25) under test text
Figure 305709DEST_PATH_IMAGE029
In each neighbor, the weight of each class is calculated in turn,
Figure 903044DEST_PATH_IMAGE030
representing a formula of weight value, the formula being
Figure 783144DEST_PATH_IMAGE031
Wherein
Figure 641379DEST_PATH_IMAGE032
In order to be able to obtain the data information,
Figure 332254DEST_PATH_IMAGE033
is shown as
Figure 666152DEST_PATH_IMAGE034
The feature vectors of the test information in the big data information test set of a class,
Figure 225310DEST_PATH_IMAGE035
the coefficient of similarity of the Jacard is expressed,
Figure 78996DEST_PATH_IMAGE036
in order to calculate the formula for the degree of similarity,
Figure 799827DEST_PATH_IMAGE037
representing a degree of similarity value, wherein
Figure 621022DEST_PATH_IMAGE038
Is 1 or 0, if
Figure 593657DEST_PATH_IMAGE039
Belong to
Figure 692063DEST_PATH_IMAGE040
Then function
Figure 705500DEST_PATH_IMAGE041
The value is 1, otherwise 0;
(S26) sorting the calculated weights, and comparing the sorted weights differentially, when
Figure 764723DEST_PATH_IMAGE042
In which
Figure 665683DEST_PATH_IMAGE043
If the data set represents the characteristics, the test text belongs to the 1 st class, and only the subclasses of the 1 st class in the second layer need to be compared when the similarity comparison is carried out on the second layer; if it is not
Figure 477650DEST_PATH_IMAGE044
Then continue to judge, exist
Figure 946808DEST_PATH_IMAGE045
When is coming into contact with
Figure 86803DEST_PATH_IMAGE046
When it is, the test text belongs to 1-
Figure 181666DEST_PATH_IMAGE047
One of the classes, when comparing the second layer, only the first class in the 1 st class in the second layer needs to be compared
Figure 723506DEST_PATH_IMAGE048
A subclass of the class; if it is not
Figure 363566DEST_PATH_IMAGE049
If so, continuing to judge; wherein
Figure 115490DEST_PATH_IMAGE050
Representing the difference values of the sorted forward-to-adjacent weights,
Figure 623832DEST_PATH_IMAGE051
the set of large data information test set threshold differential values representing settings,
Figure 895544DEST_PATH_IMAGE052
indicating the presence of
Figure 565560DEST_PATH_IMAGE053
The big data information-like test gathers the differential value of the distance value.
As a further technical scheme of the invention, the convolution fault diagnosis method comprises the following steps:
the fault diagnosis architecture is constructed by expanding the causal convolution with a residual block, as shown in equation (3):
Figure 73289DEST_PATH_IMAGE054
(3)
in the formula (3), O is an output variable of the output layer of the convolution fault diagnosis model,
Figure 995109DEST_PATH_IMAGE055
input variables representing the output layers of the sub-fault diagnosis model,
Figure 511541DEST_PATH_IMAGE056
residual mapping representing deep learning, adding a set exit layer after the weight layer, and expanding the causal convolution function f (t) as defined by:
Figure 742671DEST_PATH_IMAGE057
(4)
in the formula (4), the first and second groups,
Figure 219920DEST_PATH_IMAGE058
is a filter;
Figure 804485DEST_PATH_IMAGE059
is a hierarchy of neural networks;
Figure 565637DEST_PATH_IMAGE060
representing input time series information;
Figure 577455DEST_PATH_IMAGE061
the cavity parameters are cavity interval sizes;
Figure 542000DEST_PATH_IMAGE062
representing a hole convolution operator;
the evaluation formula of the fault diagnosis system structure is as follows:
Figure 54890DEST_PATH_IMAGE063
(5)
in the formula (5), the first and second groups,
Figure 545914DEST_PATH_IMAGE064
the mean value of the fault assessment indexes of the big data information is shown,Twhich represents the duration of the prediction,
Figure 604000DEST_PATH_IMAGE065
an evaluation duration period parameter representing a predictive big data message failure architecture,
Figure 180475DEST_PATH_IMAGE066
represents the hyper-parameters of various items of the deep learning model,θrepresents the evaluation index of the fault diagnosis system structure,
Figure 494126DEST_PATH_IMAGE067
parameters representing the evaluation indexes of the big data information fault diagnosis system structure are subjected to information overlapping by establishing an orthogonalized evaluation matrix, and the iterative process of mutual influence among different information is as follows:
Figure 715023DEST_PATH_IMAGE068
(6)
in the formula (6), α represents a mutual overlapping function of the big data information fault evaluation indexes, β represents a mutual influence iterative process between the big data information, and an algorithm program is established for the matrix of the formula (6) according to an iterative formula between the big data information fault evaluation indexes, that is:
Figure 334223DEST_PATH_IMAGE069
(7)
in the formula (7), the first and second groups,
Figure 522627DEST_PATH_IMAGE070
representing a big data information fault assessment orthogonalization safety matrix, and mu represents an editing parameter of the orthogonalization matrix; then, applying various big data information fault evaluation index data to a data information intelligent prediction platform through a Schmidt formula, and outputting the best evaluation effect obtained by online testing as follows:
Figure 128052DEST_PATH_IMAGE071
(8)
in the formula (8), the first and second groups,
Figure 328089DEST_PATH_IMAGE072
the evaluation index effect of each item of data information of checking calculation is shown,mrepresenting the number of big data information architecture nodes,
Figure 242825DEST_PATH_IMAGE073
the variable value of the number of the nodes of the big data information architecture is judged and evaluated to obtain the effect of the index, and then a weight formula is calculated, wherein the weight formula is as follows:
Figure 669258DEST_PATH_IMAGE074
(9)
in the formula (9), the reaction mixture is,
Figure 937428DEST_PATH_IMAGE075
and representing the weight of the big data information fault evaluation index.
As a further technical solution of the present invention, the improved error evaluation function is
Figure 116606DEST_PATH_IMAGE076
(10)
The formula (10) includes
Figure 343188DEST_PATH_IMAGE077
Group data wherein
Figure 725759DEST_PATH_IMAGE078
Represented as a big data information test sample,
Figure 190763DEST_PATH_IMAGE079
represented as big data information failure prediction samples.
A data processing apparatus for improving a KNN method, comprising:
the data acquisition module is used for acquiring data information from the database information and performing dimensionality reduction processing on the acquired data information to acquire low-dimensional data information;
the data processing module is used for processing the data information after the dimensionality reduction through improving the KNN algorithm model;
the data evaluation module is used for evaluating the processed data information through an improved error evaluation function;
the data sharing module is used for applying and sharing data information, and performing remote data information processing and data sharing on the acquired data information;
the data processing module is respectively connected with the data acquisition module, the data evaluation module and the data sharing module.
The invention has the following positive beneficial effects:
the invention obtains the data information from the database information, and performs the dimension reduction processing on the obtained data information to obtain the low-dimensional data information; carrying out data information processing on the data information subjected to dimensionality reduction by improving a KNN algorithm model, wherein the improved KNN algorithm model comprises a data preprocessing step, a data layering step, a data KNN algorithm calculating step and a convolution fault diagnosis step; evaluating the processed data information through an improved error evaluation function; and data information application and sharing are carried out, and remote data information processing and data sharing are carried out on the acquired data information.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without inventive exercise, wherein:
FIG. 1 is a schematic flow diagram of the present invention;
FIG. 2 is a diagram of a first embodiment of a dimension reduction processing model according to the present invention;
FIG. 3 is a diagram of a second embodiment of the dimension reduction processing model according to the present invention;
FIG. 4 is a schematic structural diagram of a differential layer model according to a first embodiment of the present invention;
FIG. 5 is a schematic structural diagram of a differential layer model according to a second embodiment of the present invention;
FIG. 6 is a schematic structural diagram of a differential layer model according to a third embodiment of the present invention;
FIG. 7 is a schematic diagram of a convolution fault diagnosis model according to the present invention;
FIG. 8 is a comparative illustration of the experimental results of the present invention.
Detailed Description
The preferred embodiments of the present invention will be described below with reference to the accompanying drawings, and it should be understood that the embodiments described herein are merely for the purpose of illustrating and explaining the present invention and are not intended to limit the present invention.
Example (1) method
As shown in fig. 1, a data processing method for improving the KNN method includes the following steps:
step one, acquiring data information from database information, and performing dimensionality reduction processing on the acquired data information to acquire low-dimensionality data information;
step two, carrying out data information processing on the data information after dimensionality reduction through an improved KNN algorithm model, wherein the improved KNN algorithm model comprises a data preprocessing step, a data layering step, a data KNN algorithm calculating step and a convolution fault diagnosis step;
evaluating the processed data information through an improved error evaluation function;
and fourthly, applying and sharing the data information, and performing remote data information processing and data sharing on the acquired data information.
The general name of KNN is K Nearest Neighbors, which means that the value of K is definitely of great importance. The principle of KNN is to determine what class x belongs to based on what class it is from the nearest K points when predicting a new value x.
In the above embodiment, the dimension reduction processing method includes the following steps:
(S11) dimension reduction is realized by reconstructing matrix data information, and the number of reconstructed matrix data, data dimension and time delay are set;
(S12) solving the distribution probability of different element libraries through an average mutual information method, and analyzing data characteristics through a correlation algorithm model;
(S13) the dimension of the data information is calculated through a false neighbor method, different data classifications are selected through comparing the dimensions of different data information, the sequence between two different dimensions realizes the comparison between different elements in the database information through a feature pair measurement method, and the formula is as follows:
Figure 365392DEST_PATH_IMAGE080
(1)
in formula (1), R represents a data dimension, n represents a vector,
Figure 372662DEST_PATH_IMAGE081
representing the matrix data information before the reconstruction is performed,
Figure 632742DEST_PATH_IMAGE082
representing the reconstructed matrix data information;
Figure 632928DEST_PATH_IMAGE083
and
Figure 803009DEST_PATH_IMAGE084
representing the relationship of false adjacent points among the reconstruction matrix data, r representing the data information added after reconstruction, u being the optimal dimensionality of the reconstruction matrix data information, and the difference between the element data dimensionality of the reconstruction matrix data and the data dimensionality after dimensionality reduction being larger than 10 after reconstruction;
and (S14) performing dimension reduction judgment, outputting data information when the dimension reduction data information meets the current requirement, and performing dimension reduction calculation again when the dimension reduction data information does not meet the current requirement.
In a specific embodiment, the dimension reduction process is an operation of converting high-dimensional data into low-dimensional data, and the computing capacity of data information can be improved. In a particular embodiment, one matrix may be reshaped to another new matrix of a different size by borrowing the function reshape through MATLAB, but retaining its original data. The number of rows and columns of the desired reconstructed matrix is represented by giving a matrix represented by a two-dimensional array and two positive integers. The reconstructed matrix needs to fill all elements of the original matrix in the same row traversal order. If reshape operation with the given parameters is feasible and reasonable, outputting a new remolding matrix; otherwise, the original matrix is output.
In a particular embodiment, the average mutual information as a whole represents the amount of information given by one random variable Y about another random variable X in the data processing. Let h (X) represent the uncertainty about the input variable X before the output symbol is received. And H (Y | X) represents the average uncertainty about the input variable X after receiving the output symbol. The difference between the two represents the amount of information obtained by the receiving end, i.e. the average mutual information. It can be seen that some uncertainty is removed by the channel transmission, and certain information is obtained, while the average mutual information represents the amount of information about the input terminal X obtained by averaging each symbol after receiving the output symbols.
In a specific embodiment, the support degree represents the occurrence probability in the population, and the larger the number of the total tickets is, the smaller the minimum support degree is set, so as to ensure that a frequent item set can exist. The less frequent item sets, the minimum support should be adjusted. Firstly, deleting the items which do not meet the minimum support degree to construct a data set, and scanning one side of the data set; then sorting the screened data sets to construct a tree, wherein the root node is NULL; the data set is inserted into the tree.
In the embodiment, on the basis of the false neighborhood concept, a method for simultaneously determining a proper embedding dimension and time delay can be provided, so that the input of the radial basis function neural network can be determined, and then, the radial basis function neural network is used for learning and predicting. The chaotic time sequence is a projection of a track of a high-dimensional phase space chaotic motion on a one-dimensional space, and the track of the chaotic motion is distorted in the projection process. When two points which are not adjacent in the high-dimensional phase space are projected on the one-dimensional space axis, the two points which are not adjacent may be called as two adjacent points, namely false adjacent points, which is the reason why the chaotic time sequence appears irregular. Reconstructing a phase space, namely recovering a track of chaotic motion from a wonton time sequence, wherein the track of the chaotic motion is gradually opened along with the increase of the embedding dimension m, and False adjacent points are gradually kicked out, so that the track of the chaotic motion is recovered.
As shown in FIGS. 2-6, the data attribute categories in FIG. 2
Figure 840236DEST_PATH_IMAGE085
Representing data attributes of
Figure 977825DEST_PATH_IMAGE086
-
Figure 532434DEST_PATH_IMAGE087
Categorizing the data information as a subordinate of a data attribute category, wherein a 11- a 32 And representing various data information in the subordinate classified data information. The data attribute class b in FIG. 3 represents data attributes other than a, of which
Figure 806289DEST_PATH_IMAGE088
-
Figure 279996DEST_PATH_IMAGE089
Representing a data attribute different from the data information a, b 11- b 32 A plurality of data information in subordinate classification data information which is different from the data attribute of the data information a. In other words, a and b are different types of data information.
In the above embodiment, the data hierarchy is a differential hierarchy, and the differential hierarchy method includes:
dividing the data attributes into different attributes according to the number and the types, and sequentially arranging and distributing the attribute data quantity from the top layer to the bottom layer from less to most;
calculating the distance between different data attributes, and assuming that certain data information in the data set is
Figure 390034DEST_PATH_IMAGE090
Data attribute class classification
Figure 341810DEST_PATH_IMAGE091
Figure 756259DEST_PATH_IMAGE092
Figure 276233DEST_PATH_IMAGE093
And
Figure 122835DEST_PATH_IMAGE094
then data information
Figure 878302DEST_PATH_IMAGE095
Distance data attribute categories
Figure 877482DEST_PATH_IMAGE096
Figure 692991DEST_PATH_IMAGE097
Figure 26889DEST_PATH_IMAGE098
And
Figure 195833DEST_PATH_IMAGE099
is a distance of
Figure 174154DEST_PATH_IMAGE100
Figure 550777DEST_PATH_IMAGE101
Carrying out differential calculation on the calculated data information with different data attributes; when in use
Figure 857125DEST_PATH_IMAGE102
In which
Figure 219973DEST_PATH_IMAGE103
Is constant, then data information
Figure 914784DEST_PATH_IMAGE104
Is divided into
Figure 213041DEST_PATH_IMAGE105
And (4) class.
In a specific embodiment, by dividing different data attributes, a user can acquire data information with different attributes from a large amount of data information, and improve data processing capacity of the acquired data information in a distributed computing manner. Through differential calculation, the acquired data information can be correctly classified, so that the division of different module information is realized, and the data processing capacity is improved.
In the above embodiment, the data KNN algorithm includes the steps of:
(S21) selecting a big data information test set, and selecting a test big data information vector set according to different data attributes;
(S22) training a big data information test set to construct an n-layer tree form through hierarchical classification; data search of the big data information test set is realized through an optimal search algorithm;
(S23) sequentially calculating the text similarity of each big data information in the big data information test set and the 1 st-nth layer big data information test set training set;
the formula for calculating the Euclidean distance is as follows:
Figure 131319DEST_PATH_IMAGE106
(2)
in the formula (2), the first and second groups,
Figure 891333DEST_PATH_IMAGE107
a feature vector representing test information in a large test set of data information,
Figure 844246DEST_PATH_IMAGE108
representing a sequence of feature vectors of test information in a big data information test set;
Figure 47825DEST_PATH_IMAGE109
the aggregate center vector is tested for large data information of layer 1 class j,
Figure 578033DEST_PATH_IMAGE110
a class representing big data information; m is the dimension of the feature vector of the big data information test set;
Figure 282683DEST_PATH_IMAGE111
testing the kth dimension of the set vector for big data information;
Figure 965469DEST_PATH_IMAGE112
representing a big data information test set vector of a jth class of a first layer in a kth dimension;
(S24) selecting the text most similar to the test text from the training text set according to the text similarity
Figure 464583DEST_PATH_IMAGE113
A text;
(S25) under test text
Figure 482086DEST_PATH_IMAGE114
In each neighbor, the weight of each class is calculated in turn,
Figure 334636DEST_PATH_IMAGE115
representing a formula of weight value, the formula being
Figure 262141DEST_PATH_IMAGE116
Wherein
Figure 319440DEST_PATH_IMAGE117
In order to be able to obtain the data information,
Figure 309393DEST_PATH_IMAGE118
is shown as
Figure 90267DEST_PATH_IMAGE119
The feature vectors of the test information in the big data information test set of a class,
Figure 996912DEST_PATH_IMAGE035
the coefficient of similarity of the Jacard is expressed,
Figure 837829DEST_PATH_IMAGE120
in order to calculate the formula for the degree of similarity,
Figure 580657DEST_PATH_IMAGE121
representing a degree of similarity value, wherein
Figure 289856DEST_PATH_IMAGE122
Is 1 or 0, if
Figure 926373DEST_PATH_IMAGE123
Belong to
Figure 813558DEST_PATH_IMAGE124
Then function
Figure 27371DEST_PATH_IMAGE041
The value is 1, otherwise 0;
(S26) sorting the calculated weights, and comparing the sorted weights differentially, when
Figure 415627DEST_PATH_IMAGE125
In which
Figure 782017DEST_PATH_IMAGE126
If the data set represents the characteristics, the test text belongs to the 1 st class, and only the subclasses of the 1 st class in the second layer need to be compared when the similarity comparison is carried out on the second layer; if it is not
Figure 964737DEST_PATH_IMAGE127
Then continue to judge, exist
Figure 934354DEST_PATH_IMAGE128
When is coming into contact with
Figure 736088DEST_PATH_IMAGE129
When it is, the test text belongs to1-
Figure 816040DEST_PATH_IMAGE130
One of the classes, when comparing the second layer, only the first class in the 1 st class in the second layer needs to be compared
Figure 825453DEST_PATH_IMAGE131
A subclass of the class; if it is not
Figure 623644DEST_PATH_IMAGE132
If so, continuing to judge; wherein
Figure 229069DEST_PATH_IMAGE133
Representing the difference values of the sorted forward-to-adjacent weights,
Figure 553740DEST_PATH_IMAGE134
the set of large data information test set threshold differential values representing settings,
Figure 609421DEST_PATH_IMAGE135
indicating the presence of
Figure 770275DEST_PATH_IMAGE136
The big data information-like test gathers the differential value of the distance value.
KNN (K-Nearest Neighbor) is one of the simplest machine learning algorithms, can be used for classification and regression, and is a supervised learning algorithm. If a sample belongs to a certain class in the K most similar samples in the feature space (i.e., the nearest neighbors in the feature space), then the sample also belongs to this class. That is, the method only determines the category to which the sample to be classified belongs according to the category of the nearest sample or samples in the classification decision. KNN is classified by measuring the distance between different feature values. If most of the k most similar (i.e. nearest neighbor in feature space) samples in feature space belong to a certain class, then the sample also belongs to this class. K is typically an integer no greater than 20. In the KNN algorithm, the selected neighbors are all objects that have been correctly classified. The method only determines the category of the sample to be classified according to the category of the nearest sample or a plurality of samples in the classification decision. The core idea of the kNN algorithm is that if most of k nearest neighbor samples of a sample in the feature space belong to a certain class, the sample also belongs to the class and has the characteristics of the sample on the class. The method only determines the category of the sample to be classified according to the category of the nearest sample or samples in the determination of classification decision. In a particular embodiment, the result of the KNN algorithm depends largely on the choice of K. The KNN algorithm can be used not only for classification but also for regression. The attributes of a sample are obtained by finding the k nearest neighbors to the sample and assigning the average of the attributes of these neighbors to the sample. A more useful approach is to give different weights (weights) to the impact that neighbors of different distances have on the sample, e.g., the weights are inversely proportional to the distance.
In a further embodiment, the distance between the test data and the respective training data is calculated; sorting according to the increasing relation of the distances; selecting K points with the minimum distance; determining the occurrence frequency of the category where the first K points are located; and then returning the category with the highest frequency of occurrence in the former K points as the prediction classification of the test data.
In a further embodiment, a smaller value is selected by the selection of the value of k, and then an appropriate final value is selected by cross-validation. Where k is smaller, even if the prediction is performed using samples in a smaller domain, the training error will be reduced, but the model will be so complex that it is over-fitted. The larger k is, even if prediction is performed using samples in a large area, training errors increase, a model becomes simple, and under-fitting is easily caused. Therefore, in a specific embodiment, a proper k value needs to be selected to improve the data processing capability.
An exemplary code in a data processing method of an improved KNN method is as follows:
load data.txt;
a = data (1: 30, 1: 4); % of the first thirty groups
aa = data (31: 50, 1: 4); % of the last twenty groups of the first class
b = data (51: 80, 1: 4); % of the first thirty groups of the second class
bb = data (81: 100, 1: 4); % of the last twenty groups of the second class
c = data (101: 130, 1: 4); % of the first thirty groups of the third group
cc = data (131: 150, 1: 4); % of the last twenty groups of the third class
train = cat (1, a, b, c); % of the composition training samples (90X 4)
test = cat (1, aa, bb, cc); % composition test specimen (60 ANG 4)
c = 3; % c mean c =3
z1=train(1,:);
z2=train(45,:);
z3= train (90,: r); % initial clustering centers z1, z2, z3
m = 0; t = 0; % number of iteration steps
while m==0
samp1= [ ]; samp2= [ ]; samp3= [ ]; % defines empty sample: the first type is samp1, the second type is samp2, and the third type is samp3
n1=1;n2=1;n3=1;
t=t+1;
for i=1:90
if(pdist([train(i,:);z1])〈pdist([train(i,:);z2]))&&(pdist([train(i,:);z1])<pdis
([ train (i): z3 ]))% distance
% is assigned to samp1 if the distance of the training sample from cluster z1 is less than the distance from z2, z 3.
samp1(n1,:)=train(i,:);
n1=n1+1;
elseif (pdist([train(i,:);z2])〈pdist([train(i,:);z1]))&&(pdist([train(i,:);z2])<pdist([train(i,:);z3]))
% if the distance between the training sample and the cluster z2 is less than the distance between the training sample and z1, z3, then assign a value to samp2
samp2(n2,:)=train(i,:);
n2=n2+1;
else% is assigned to samp3
samp3(n3,:)=train(i,:);
n3=n3+1;
end。
As shown in fig. 7,
Figure 428658DEST_PATH_IMAGE137
which represents the information of the input data,
Figure 748781DEST_PATH_IMAGE138
it is indicated that the node data information is hidden,
Figure 319571DEST_PATH_IMAGE139
representing function data information nodes in the calculation process of the big data information test set,
Figure 92355DEST_PATH_IMAGE140
the attributes of the hidden layer node are represented,
Figure 551499DEST_PATH_IMAGE141
training data information representing nodes of a data output layer;
in the above embodiment, the convolution fault diagnosis method includes the following steps:
constructing a fault diagnosis architecture by expanding a causal convolution and a residual block, in which Dropout is a regularization technique for removing some random outputs of the convolution sub-fault diagnosis model architecture layer; the number of neurons to discard is given by a DREPOPOUT rate of 0 to 1, which is the probability that the layer output is discarded; the field of view of the convolution fault diagnosis model also depends on the number of layers of the residual block, e.g. kernel size k s =3, spreading factor d =1, 2, 4, number of remaining tile stacksnA receptive field size of =1 would be 3 × 4 × 1= 12. The residual block is shown in equation (3):
Figure 335916DEST_PATH_IMAGE142
(3)
in the formula (3), O is an output variable of the output layer of the convolution fault diagnosis model,
Figure 733399DEST_PATH_IMAGE143
input variables representing the output layers of the sub-fault diagnosis model,
Figure 852534DEST_PATH_IMAGE144
residual mapping representing deep learning, adding a set exit layer after the weight layer, and expanding the causal convolution function f (t) as defined by:
Figure 869031DEST_PATH_IMAGE145
(4)
in the formula (4), the first and second groups,
Figure 898167DEST_PATH_IMAGE058
is a filter;
Figure 325606DEST_PATH_IMAGE059
is a hierarchy of neural networks;
Figure 72982DEST_PATH_IMAGE146
representing input time series information;
Figure 627592DEST_PATH_IMAGE147
the cavity parameters are cavity interval sizes;
Figure 901447DEST_PATH_IMAGE062
representing a hole convolution operator;
the evaluation formula of the fault diagnosis architecture is as follows:
Figure 109574DEST_PATH_IMAGE148
(5)
in the formula (5), the first and second groups,
Figure 219613DEST_PATH_IMAGE149
fault evaluation finger for representing big data informationThe average value is marked, and the average value,Twhich represents the duration of the prediction,
Figure 564531DEST_PATH_IMAGE150
an evaluation duration period parameter representing a predictive big data message failure architecture,
Figure 568259DEST_PATH_IMAGE151
represents the hyper-parameters of various items of the deep learning model,θthe evaluation index of the fault diagnosis architecture is shown,
Figure 88233DEST_PATH_IMAGE152
parameters representing the evaluation indexes of the big data information fault diagnosis system structure are subjected to information overlapping by establishing an orthogonalized evaluation matrix, and the iterative process of mutual influence among different information is as follows:
Figure 544622DEST_PATH_IMAGE153
(6)
in the formula (6), α represents a mutual overlapping function of the big data information fault evaluation indexes, β represents a mutual influence iterative process between the big data information, and an algorithm program is established for the matrix of the formula (6) according to an iterative formula between the big data information fault evaluation indexes, that is:
Figure 690302DEST_PATH_IMAGE154
(7)
in the formula (7), the first and second groups,
Figure 423903DEST_PATH_IMAGE155
representing a big data information fault assessment orthogonalization safety matrix, and mu represents an editing parameter of the orthogonalization matrix; then, applying various big data information fault evaluation index data to a data information intelligent prediction platform through a Schmidt formula, and outputting the best evaluation effect obtained by online testing as follows:
Figure 504991DEST_PATH_IMAGE156
(8)
in the formula (8), the first and second groups,
Figure 307731DEST_PATH_IMAGE157
the evaluation index effect of each item of data information of checking calculation is shown,mrepresenting the number of big data information architecture nodes,
Figure 7834DEST_PATH_IMAGE073
the variable value of the number of the nodes of the big data information architecture is judged and evaluated to obtain the effect of the index, and then a weight formula is calculated, wherein the weight formula is as follows:
Figure 986154DEST_PATH_IMAGE158
(9)
in the formula (9), the reaction mixture,
Figure 97198DEST_PATH_IMAGE159
and representing the weight of the big data information fault evaluation index.
The super-parameters of the convolution fault diagnosis model are subjected to iteration processing by establishing an algorithm model, the fault evaluation index of big data information is calculated according to iteration data, and optimization is performed through an orthogonalization matrix, so that the optimal optimization parameter evaluation result is obtained, and the algorithm performance of the convolution fault diagnosis model system is improved.
The invention applies a novel Time Convolution Network (Time Convolution Network, Convolution component fault diagnosis model) deep learning model for scheduling big data information fault intelligent prediction.
As shown in FIG. 8, in the above embodiment, the improved error evaluation function is
Figure 528180DEST_PATH_IMAGE160
(10)
The formula (10) includes
Figure 766394DEST_PATH_IMAGE161
Group data wherein
Figure 720925DEST_PATH_IMAGE162
Represented as a big data information test sample,
Figure 878237DEST_PATH_IMAGE079
represented as big data information failure prediction samples.
In order to verify the technical effect of the invention, the scheme 1 is assumed to be a decision tree classification method, the scheme 2 is assumed to be a k-means classification method, and the 2 methods are respectively adopted to verify and compare the scheme of the invention.
The corresponding experimental results obtained by continuous training are shown in table 1, and the comparative graph obtained by simulation software is shown in fig. 8.
TABLE 1 error accuracy comparison schematic table of different methods
Figure 671881DEST_PATH_IMAGE163
As can be seen from the above figure, in the data analysis accuracy test, the test result of the method of the present invention is significantly higher than the accuracy of the scheme 1 and the scheme 2, and the data analysis accuracy of the method of the present invention is higher than 80%, and can reach 96% at most, the accuracy fluctuation is small, and the method is relatively stable. The fluctuation range of the scheme 1 and the scheme 2 is large in the data analysis accuracy test, and the accuracy is extremely unstable, so that compared with the method disclosed by the invention, the method has the great defects; therefore, the method has high data analysis accuracy.
Example (2) apparatus
A data processing apparatus for improving a KNN method, comprising:
the data acquisition module is used for acquiring data information from the database information and performing dimensionality reduction processing on the acquired data information to acquire low-dimensional data information;
the data processing module is used for processing the data information after the dimensionality reduction through improving the KNN algorithm model;
the data evaluation module is used for evaluating the processed data information through an improved error evaluation function;
the data sharing module is used for applying and sharing data information, and performing remote data information processing and data sharing on the acquired data information;
the data processing module is respectively connected with the data acquisition module, the data evaluation module and the data sharing module.
Although specific embodiments of the present invention have been described above, it will be understood by those skilled in the art that these specific embodiments are merely illustrative and that various omissions, substitutions and changes in the form of the detail of the methods and systems described above may be made by those skilled in the art without departing from the spirit and scope of the invention. For example, it is within the scope of the present invention to combine the steps of the above-described methods to perform substantially the same function in substantially the same way to achieve substantially the same result. Accordingly, the scope of the invention is to be limited only by the following claims.

Claims (7)

1. A data processing method for improving a KNN method is characterized in that: the method comprises the following steps:
step one, acquiring data information from database information, and performing dimensionality reduction processing on the acquired data information to acquire low-dimensionality data information;
step two, carrying out data information processing on the data information after dimensionality reduction through an improved KNN algorithm model, wherein the improved KNN algorithm model comprises a data preprocessing step, a data layering step, a data KNN algorithm calculating step and a convolution fault diagnosis step;
evaluating the processed data information through an improved error evaluation function;
and fourthly, applying and sharing the data information, and performing remote data information processing and data sharing on the acquired data information.
2. The data processing method for improving a KNN method as claimed in claim 1, wherein: the dimension reduction processing method comprises the following steps:
(S11) dimension reduction is realized by reconstructing matrix data information, and the number of reconstructed matrix data, data dimension and time delay are set;
(S12) solving the distribution probability of different element libraries through an average mutual information method, and analyzing data characteristics through a correlation algorithm model;
(S13) the dimension of the data information is calculated through a false neighbor method, different data classifications are selected by comparing the dimensions of different data information, the sequence between two different dimensions realizes the comparison between different elements in the database information through a feature pair measurement method, and the formula is as follows:
Figure 24729DEST_PATH_IMAGE002
(1)
in formula (1), R represents a data dimension, n represents a vector,
Figure 657835DEST_PATH_IMAGE004
representing the matrix data information before reconstruction,
Figure 56718DEST_PATH_IMAGE006
representing the reconstructed matrix data information;
Figure DEST_PATH_IMAGE007
and with
Figure 800683DEST_PATH_IMAGE008
Representing the relationship of false adjacent points among the reconstruction matrix data, r representing data information added after reconstruction, u being the optimal dimensionality of the reconstruction matrix data information, and the difference between the element data dimensionality of the reconstruction matrix data and the data dimensionality after dimensionality reduction being larger than 10 after reconstruction;
and (S14) performing dimension reduction judgment, outputting data information when the dimension reduction data information meets the current requirement, and performing dimension reduction calculation again when the dimension reduction data information does not meet the current requirement.
3. The data processing method for improving a KNN method as claimed in claim 1, wherein: the data layering is differential layering, and the differential layering method comprises the following steps:
dividing the data attributes into different attributes according to the number and the types, and sequentially arranging and distributing the attribute data quantity from the top layer to the bottom layer from less to most;
calculating the distance between different data attributes, and assuming that certain data information in the data set is
Figure 834367DEST_PATH_IMAGE010
Data attribute class classification
Figure 372796DEST_PATH_IMAGE012
Figure 367297DEST_PATH_IMAGE014
Figure 397176DEST_PATH_IMAGE016
And
Figure 36099DEST_PATH_IMAGE018
then data information
Figure DEST_PATH_IMAGE019
Distance data attribute categories
Figure 791434DEST_PATH_IMAGE020
Figure DEST_PATH_IMAGE021
Figure 837013DEST_PATH_IMAGE022
And
Figure DEST_PATH_IMAGE023
is a distance of
Figure DEST_PATH_IMAGE025
Figure DEST_PATH_IMAGE027
Carrying out differential calculation on the calculated data information with different data attributes; when in use
Figure DEST_PATH_IMAGE029
In which
Figure DEST_PATH_IMAGE031
Is constant, then data information
Figure 201742DEST_PATH_IMAGE032
Is divided into
Figure DEST_PATH_IMAGE033
And (4) class.
4. The data processing method for improving a KNN method as claimed in claim 1, wherein: the data KNN algorithm comprises the following steps:
(S21) selecting a big data information test set, and selecting a test big data information vector set according to different data attributes;
(S22) training a big data information test set to construct an n-layer tree form through hierarchical classification; data search of the big data information test set is realized through an optimal search algorithm;
(S23) sequentially calculating the text similarity of each big data information in the big data information test set and the 1 st-nth layer big data information test set training set;
the formula for calculating the Euclidean distance is as follows:
Figure DEST_PATH_IMAGE035
(2)
in the formula (2), the first and second groups,
Figure DEST_PATH_IMAGE037
a feature vector representing test information in the large data information test set,
Figure 210018DEST_PATH_IMAGE038
representing a sequence of feature vectors of test information in a big data information test set;
Figure 559091DEST_PATH_IMAGE040
the aggregate center vector is tested for large data information of layer 1 class j,
Figure 951020DEST_PATH_IMAGE042
a class representing big data information; m is the dimension of the feature vector of the big data information test set;
Figure 902796DEST_PATH_IMAGE044
testing the kth dimension of the set vector for big data information;
Figure 516311DEST_PATH_IMAGE046
representing a large data information test set vector of a jth class of a first layer in a kth dimension;
(S24) selecting the text most similar to the test text from the training text set according to the text similarity
Figure 82290DEST_PATH_IMAGE048
A text;
(S25) in the test text
Figure 538679DEST_PATH_IMAGE049
In each neighbor, the weight of each class is calculated in turn,
Figure 903933DEST_PATH_IMAGE051
representing a formula of weight value, the formula being
Figure 496588DEST_PATH_IMAGE053
Wherein
Figure 935266DEST_PATH_IMAGE055
In order to be able to obtain the data information,
Figure 551055DEST_PATH_IMAGE057
is shown as
Figure 110213DEST_PATH_IMAGE059
The feature vectors of the test information in the big data information test set of a class,
Figure 682008DEST_PATH_IMAGE060
the coefficient of similarity of the Jacard is expressed,
Figure 74943DEST_PATH_IMAGE062
in order to calculate the formula for the degree of similarity,
Figure 240346DEST_PATH_IMAGE064
representing a degree of similarity value, wherein
Figure 963713DEST_PATH_IMAGE066
Is 1 or 0, if
Figure 468644DEST_PATH_IMAGE068
Belong to
Figure 94797DEST_PATH_IMAGE070
Then function
Figure DEST_PATH_IMAGE071
The value is 1, otherwise 0;
(S26) sorting the calculated weights, and comparing the sorted weights differentially, when
Figure DEST_PATH_IMAGE073
In which
Figure DEST_PATH_IMAGE075
If the data set represents the characteristics, the test text belongs to the 1 st class, and only the subclasses of the 1 st class in the second layer need to be compared when the similarity comparison is carried out on the second layer; if it is not
Figure DEST_PATH_IMAGE077
Then continue to judge, exist
Figure DEST_PATH_IMAGE079
When it comes to
Figure DEST_PATH_IMAGE081
When it is, the test text belongs to 1-
Figure DEST_PATH_IMAGE083
One of the classes, when comparing the second layer, only the first class in the 1 st class in the second layer needs to be compared
Figure 682249DEST_PATH_IMAGE084
A subclass of the class; if it is used
Figure 192996DEST_PATH_IMAGE086
If so, continuing to judge; wherein
Figure 614750DEST_PATH_IMAGE088
Representing the sorted differential values to adjacent weights,
Figure 365799DEST_PATH_IMAGE090
the set of large data information test set threshold differential values representing settings,
Figure 381160DEST_PATH_IMAGE092
indicating the presence of
Figure DEST_PATH_IMAGE093
Big data classThe information test gathers differential values of distance values.
5. The data processing method for improving a KNN method as claimed in claim 1, wherein: the volume integral fault diagnosis method comprises the following steps:
the fault diagnosis architecture is constructed by expanding the causal convolution with a residual block, as shown in equation (3):
Figure DEST_PATH_IMAGE095
(3)
in the formula (3), O is an output variable of the output layer of the convolution fault diagnosis model,
Figure DEST_PATH_IMAGE097
input variables representing the output layers of the sub-fault diagnosis model,
Figure DEST_PATH_IMAGE099
residual mapping representing deep learning, adding a set exit layer after the weight layer, and expanding the causal convolution function f (t) as defined by:
Figure DEST_PATH_IMAGE101
(4)
in the formula (4), the first and second groups,
Figure 653789DEST_PATH_IMAGE102
is a filter;
Figure 70995DEST_PATH_IMAGE104
is a hierarchy of neural networks;
Figure 491480DEST_PATH_IMAGE106
representing input time series information;
Figure 118771DEST_PATH_IMAGE108
the cavity parameters are cavity interval sizes;
Figure 971320DEST_PATH_IMAGE110
representing a hole convolution operator;
the evaluation formula of the fault diagnosis system structure is as follows:
Figure 367667DEST_PATH_IMAGE112
(5)
in the formula (5), the first and second groups,
Figure 460519DEST_PATH_IMAGE114
the mean value of the fault assessment indexes of the big data information is shown,Twhich represents the duration of the prediction,
Figure 450471DEST_PATH_IMAGE116
an evaluation duration period parameter representing a predictive big data message failure architecture,
Figure 887138DEST_PATH_IMAGE118
represents the hyper-parameters of various items of the deep learning model,θrepresents the evaluation index of the fault diagnosis system structure,
Figure 137991DEST_PATH_IMAGE120
parameters representing the evaluation indexes of the big data information fault diagnosis system structure are subjected to information overlapping by establishing an orthogonalized evaluation matrix, and the iterative process of mutual influence among different information is as follows:
Figure 854274DEST_PATH_IMAGE122
(6)
in the formula (6), α represents a mutual overlapping function of the big data information fault evaluation indexes, β represents a mutual influence iterative process between the big data information, and an algorithm program is established for the matrix of the formula (6) according to an iterative formula between the big data information fault evaluation indexes, that is:
Figure 876063DEST_PATH_IMAGE124
(7)
in the formula (7), the first and second groups,
Figure 460628DEST_PATH_IMAGE126
representing a big data information fault assessment orthogonalization safety matrix, and mu represents an editing parameter of the orthogonalization matrix; then, applying various big data information fault evaluation index data to a data information intelligent prediction platform through a Schmidt formula, and outputting the best evaluation effect obtained by online testing as follows:
Figure 441354DEST_PATH_IMAGE128
(8)
in the formula (8), the first and second groups,
Figure 453172DEST_PATH_IMAGE130
the evaluation index effect of each item of data information of checking calculation is shown,mrepresenting the number of big data information architecture nodes,
Figure DEST_PATH_IMAGE131
the variable value of the number of the nodes of the big data information architecture is judged and evaluated to obtain the effect of the index, and then a weight formula is calculated, wherein the weight formula is as follows:
Figure DEST_PATH_IMAGE133
(9)
in the formula (9), the reaction mixture,
Figure DEST_PATH_IMAGE135
and representing the weight of the big data information fault evaluation index.
6. A method as claimed in claim 1The data processing method for improving the KNN method is characterized by comprising the following steps: the improved error evaluation function is
Figure DEST_PATH_IMAGE137
(10)
The formula (10) includes
Figure DEST_PATH_IMAGE139
Group data wherein
Figure DEST_PATH_IMAGE141
Represented as a big data information test sample,
Figure DEST_PATH_IMAGE143
represented as big data information failure prediction samples.
7. A data processing apparatus for improving a KNN method, comprising:
the data acquisition module is used for acquiring data information from the database information and performing dimensionality reduction processing on the acquired data information to acquire low-dimensional data information;
the data processing module is used for processing the data information after the dimensionality reduction through improving the KNN algorithm model;
the data evaluation module is used for evaluating the processed data information through an improved error evaluation function;
the data sharing module is used for applying and sharing data information, and performing remote data information processing and data sharing on the acquired data information;
the data processing module is respectively connected with the data acquisition module, the data evaluation module and the data sharing module.
CN202210946851.XA 2022-08-09 2022-08-09 Data processing method and device for improving KNN method Active CN115017125B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210946851.XA CN115017125B (en) 2022-08-09 2022-08-09 Data processing method and device for improving KNN method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210946851.XA CN115017125B (en) 2022-08-09 2022-08-09 Data processing method and device for improving KNN method

Publications (2)

Publication Number Publication Date
CN115017125A true CN115017125A (en) 2022-09-06
CN115017125B CN115017125B (en) 2022-10-21

Family

ID=83066268

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210946851.XA Active CN115017125B (en) 2022-08-09 2022-08-09 Data processing method and device for improving KNN method

Country Status (1)

Country Link
CN (1) CN115017125B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7720822B1 (en) * 2005-03-18 2010-05-18 Beyondcore, Inc. Quality management in a data-processing environment
CN104408095A (en) * 2014-11-15 2015-03-11 北京广利核系统工程有限公司 Improvement-based KNN (K Nearest Neighbor) text classification method
US20200210826A1 (en) * 2018-12-29 2020-07-02 Northeastern University Intelligent analysis system using magnetic flux leakage data in pipeline inner inspection
CN112308251A (en) * 2020-12-31 2021-02-02 北京蒙帕信创科技有限公司 Work order assignment method and system based on machine learning
CN114781555A (en) * 2022-06-21 2022-07-22 深圳市鼎合丰科技有限公司 Electronic component data classification method by improving KNN method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7720822B1 (en) * 2005-03-18 2010-05-18 Beyondcore, Inc. Quality management in a data-processing environment
CN104408095A (en) * 2014-11-15 2015-03-11 北京广利核系统工程有限公司 Improvement-based KNN (K Nearest Neighbor) text classification method
US20200210826A1 (en) * 2018-12-29 2020-07-02 Northeastern University Intelligent analysis system using magnetic flux leakage data in pipeline inner inspection
CN112308251A (en) * 2020-12-31 2021-02-02 北京蒙帕信创科技有限公司 Work order assignment method and system based on machine learning
CN114781555A (en) * 2022-06-21 2022-07-22 深圳市鼎合丰科技有限公司 Electronic component data classification method by improving KNN method

Also Published As

Publication number Publication date
CN115017125B (en) 2022-10-21

Similar Documents

Publication Publication Date Title
CN112382352B (en) Method for quickly evaluating structural characteristics of metal organic framework material based on machine learning
Isa et al. Using the self organizing map for clustering of text documents
CN107292350A (en) The method for detecting abnormality of large-scale data
CN112784881A (en) Network abnormal flow detection method, model and system
CN110135167B (en) Edge computing terminal security level evaluation method for random forest
CN107103332A (en) A kind of Method Using Relevance Vector Machine sorting technique towards large-scale dataset
CN112288191A (en) Ocean buoy service life prediction method based on multi-class machine learning method
CN110020712B (en) Optimized particle swarm BP network prediction method and system based on clustering
Labroche New incremental fuzzy c medoids clustering algorithms
CN106934410A (en) The sorting technique and system of data
CN112926640A (en) Cancer gene classification method and equipment based on two-stage depth feature selection and storage medium
Farooq Genetic algorithm technique in hybrid intelligent systems for pattern recognition
CN113516019A (en) Hyperspectral image unmixing method and device and electronic equipment
CN112817954A (en) Missing value interpolation method based on multi-method ensemble learning
CN112508363A (en) Deep learning-based power information system state analysis method and device
CN115545111B (en) Network intrusion detection method and system based on clustering self-adaptive mixed sampling
CN115017125B (en) Data processing method and device for improving KNN method
CN111584010A (en) Key protein identification method based on capsule neural network and ensemble learning
CN111488903A (en) Decision tree feature selection method based on feature weight
CN112488188A (en) Feature selection method based on deep reinforcement learning
CN111104950A (en) K value prediction method and device in k-NN algorithm based on neural network
CN117437976B (en) Disease risk screening method and system based on gene detection
CN113240113B (en) Method for enhancing network prediction robustness
CN113609480B (en) Multipath learning intrusion detection method based on large-scale network flow
CN113177604B (en) High-dimensional data feature selection method based on improved L1 regularization and clustering

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant