CN115017125B - Data processing method and device for improving KNN method - Google Patents

Data processing method and device for improving KNN method Download PDF

Info

Publication number
CN115017125B
CN115017125B CN202210946851.XA CN202210946851A CN115017125B CN 115017125 B CN115017125 B CN 115017125B CN 202210946851 A CN202210946851 A CN 202210946851A CN 115017125 B CN115017125 B CN 115017125B
Authority
CN
China
Prior art keywords
data information
data
information
representing
formula
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210946851.XA
Other languages
Chinese (zh)
Other versions
CN115017125A (en
Inventor
李国权
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chenda Guangzhou Network Technology Co ltd
Original Assignee
Chenda Guangzhou Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chenda Guangzhou Network Technology Co ltd filed Critical Chenda Guangzhou Network Technology Co ltd
Priority to CN202210946851.XA priority Critical patent/CN115017125B/en
Publication of CN115017125A publication Critical patent/CN115017125A/en
Application granted granted Critical
Publication of CN115017125B publication Critical patent/CN115017125B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/176Support for shared access to files; File sharing support
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a data processing method and a data processing device for improving a KNN method, which relate to the technical field of data processing and solve the technical problem of data processing, and the technical scheme is that the data processing method and the data processing device for improving the KNN method comprise the following steps: step one, acquiring data information from database information, and performing dimensionality reduction processing on the acquired data information to acquire low-dimensionality data information; step two, carrying out data information processing on the data information after the dimensionality reduction by improving a KNN algorithm model; evaluating the processed data information through an improved error evaluation function; and fourthly, applying and sharing the data information, and performing remote data information processing and data sharing on the acquired data information. The invention greatly improves the data information processing capability through data dimension reduction, data preprocessing, data mining, error analysis and processing.

Description

Data processing method and device for improving KNN method
Technical Field
The present invention relates to the field of data processing, and more particularly, to a data processing method and apparatus for improving a KNN method.
Background
And data processing is a basic link of system engineering and automatic control. Data processing is throughout various fields of social production and social life. The development of data processing technology and the breadth and depth of its application have greatly influenced the progress of human society development. Data (Data) is a representation of facts, concepts or instructions that can be processed by either manual or automated means. After the data is interpreted and given a certain meaning, it becomes information. Data processing (data processing) is the collection, storage, retrieval, processing, transformation, and transmission of data. The basic purpose of data processing is to extract and derive valuable, meaningful data for certain people from a large, possibly chaotic, unintelligible amount of data.
In the prior art, data information is generally processed by adopting a data statistics method, although the data processing capability is improved to a certain extent by the method, the classification and the data information processing of the data information are difficult to realize when the data information is analyzed and calculated, the whole data information processing capability is poor, and the data information processing method is lagged.
Disclosure of Invention
Aiming at the defects of the technology, the invention discloses a data processing method and a data processing device for improving a KNN method.
In order to achieve the technical effects, the invention adopts the following technical scheme:
a data processing method for improving a KNN method, comprising the steps of:
step one, acquiring data information from database information, and performing dimension reduction processing on the acquired data information to acquire low-dimensional data information;
step two, carrying out data information processing on the data information after the dimensionality reduction through an improved KNN algorithm model, wherein the improved KNN algorithm model comprises a data preprocessing step, a data layering step, a data KNN algorithm calculating step and a convolution fault diagnosis step;
evaluating the processed data information through an improved error evaluation function;
and step four, data information application and sharing are carried out, and remote data information processing and data sharing are carried out on the obtained data information.
As a further technical scheme of the invention, the dimension reduction processing method comprises the following steps:
(S11) realizing dimension reduction processing by reconstructing matrix data information, and setting the number of reconstructed matrix data, data dimension and time delay;
(S12) solving the distribution probability of different element libraries by an average mutual information method, and analyzing data characteristics by a correlation algorithm model;
(S13) calculating the dimension of the data information by a false neighbor method, selecting different data classifications by comparing the dimensions of different data information, and comparing different elements in the database information by a characteristic pair measurement method through a sequence between two different dimensions, wherein the formula is as follows:
Figure 937176DEST_PATH_IMAGE001
(1)
in formula (1), R represents a data dimension, n represents a vector,
Figure 651054DEST_PATH_IMAGE002
representing the matrix data information before reconstruction,
Figure 102895DEST_PATH_IMAGE003
representing the reconstructed matrix data information;
Figure 422405DEST_PATH_IMAGE004
and
Figure 908882DEST_PATH_IMAGE005
representing the relationship of false adjacent points among the reconstruction matrix data, r representing data information added after reconstruction, u being the optimal dimensionality of the reconstruction matrix data information, and the difference between the element data dimensionality of the reconstruction matrix data and the data dimensionality after dimensionality reduction being larger than 10 after reconstruction;
and (S14) performing dimension reduction judgment, outputting data information when the dimension reduction data information meets the current requirement, and performing dimension reduction calculation again when the dimension reduction data information does not meet the current requirement.
As a further technical scheme of the invention, the data hierarchy is a differential hierarchy, and the differential hierarchy method comprises the following steps:
dividing the data attributes into different attributes according to the number and the types, and sequentially arranging and distributing the attribute data quantity from the top layer to the bottom layer from less to most;
calculating the distance between different data attributes, and assuming that certain data information in the data set is
Figure 262503DEST_PATH_IMAGE006
Data attribute class classification
Figure 185328DEST_PATH_IMAGE007
Figure 180966DEST_PATH_IMAGE008
Figure 256369DEST_PATH_IMAGE009
And
Figure 171105DEST_PATH_IMAGE010
then data information
Figure 456592DEST_PATH_IMAGE011
Distance data attribute categories
Figure 865708DEST_PATH_IMAGE012
Figure 310465DEST_PATH_IMAGE013
Figure 271468DEST_PATH_IMAGE014
And
Figure 185197DEST_PATH_IMAGE010
is a distance of
Figure 991479DEST_PATH_IMAGE015
Figure 553392DEST_PATH_IMAGE016
Carrying out differential calculation on the calculated data information with different data attributes; when in use
Figure 560662DEST_PATH_IMAGE017
In which
Figure 210955DEST_PATH_IMAGE018
Is constant, then data information
Figure 820928DEST_PATH_IMAGE019
Is divided into
Figure 991009DEST_PATH_IMAGE020
And (4) class.
As a further technical scheme of the invention, the data KNN algorithm comprises the following steps:
(S21) selecting a big data information test set, and selecting a test big data information vector set according to different data attributes;
(S22) training a big data information test set to construct an n-layer tree form through hierarchical classification; data search of the big data information test set is realized through an optimal search algorithm;
(S23) sequentially calculating the text similarity of each big data information in the big data information test set and the 1 st-nth layer big data information test set training set;
the formula for calculating the Euclidean distance is as follows:
Figure 293815DEST_PATH_IMAGE021
(2)
in the formula (2), the first and second groups,
Figure 165824DEST_PATH_IMAGE022
a feature vector representing test information in a large test set of data information,
Figure 454854DEST_PATH_IMAGE023
representing a sequence of feature vectors of test information in a big data information test set;
Figure 604076DEST_PATH_IMAGE024
the aggregate center vector is tested for large data information of layer 1 class j,
Figure 202417DEST_PATH_IMAGE025
a class representing big data information; m is the dimension of the feature vector of the big data information test set;
Figure 312455DEST_PATH_IMAGE026
testing the kth dimension of the set vector for big data information;
Figure 264230DEST_PATH_IMAGE027
representing a large data information test set vector of a jth class of a first layer in a kth dimension;
(S24) according to the text similarity, selecting the text which is most similar to the test text in the training text set
Figure 661101DEST_PATH_IMAGE028
A text;
(S25) in the test text
Figure 305709DEST_PATH_IMAGE029
In each neighbor, the weight of each class is calculated in turn,
Figure 903044DEST_PATH_IMAGE030
representing a formula of weight value, the formula being
Figure 783144DEST_PATH_IMAGE031
Wherein
Figure 641379DEST_PATH_IMAGE032
In order to be able to obtain the data information,
Figure 332254DEST_PATH_IMAGE033
is shown as
Figure 666152DEST_PATH_IMAGE034
The feature vectors of the test information in the big data information test set of a class,
Figure 225310DEST_PATH_IMAGE035
the coefficient of similarity of the Jacard is expressed,
Figure 78996DEST_PATH_IMAGE036
in order to calculate the formula for the degree of similarity,
Figure 799827DEST_PATH_IMAGE037
representing a degree of similarity value, wherein
Figure 621022DEST_PATH_IMAGE038
Is 1 or 0, if
Figure 593657DEST_PATH_IMAGE039
Belong to
Figure 692063DEST_PATH_IMAGE040
Then function
Figure 705500DEST_PATH_IMAGE041
The value is 1, otherwise 0;
(S26) sorting the calculated weights, and differentially comparing the sorted weights, when
Figure 764723DEST_PATH_IMAGE042
In which
Figure 665683DEST_PATH_IMAGE043
If the data set represents the characteristics, the test text belongs to the 1 st class, and when the similarity comparison is carried out on the second layer, only the subclasses of the 1 st class in the second layer need to be compared; if it is not
Figure 477650DEST_PATH_IMAGE044
Then continue to judge that there is
Figure 946808DEST_PATH_IMAGE045
When is coming into contact with
Figure 86803DEST_PATH_IMAGE046
When it is, the test text belongs to 1-
Figure 181666DEST_PATH_IMAGE047
One of the classes, when comparing the second layer, only the first class in the 1 st class in the second layer needs to be compared
Figure 723506DEST_PATH_IMAGE048
A subclass of the class; if it is not
Figure 363566DEST_PATH_IMAGE049
If so, continuing to judge; wherein
Figure 115490DEST_PATH_IMAGE050
Representing the difference values of the sorted forward-to-adjacent weights,
Figure 623832DEST_PATH_IMAGE051
the set of large data information test set threshold differential values representing settings,
Figure 895544DEST_PATH_IMAGE052
indicating the presence of
Figure 565560DEST_PATH_IMAGE053
The big data information-like test gathers the differential value of the distance value.
As a further technical scheme of the invention, the convolution fault diagnosis method comprises the following steps:
the fault diagnosis architecture is constructed by expanding the causal convolution with a residual block, as shown in equation (3):
Figure 73289DEST_PATH_IMAGE054
(3)
in the formula (3), O is an output variable of the output layer of the convolution fault diagnosis model,
Figure 995109DEST_PATH_IMAGE055
input variables representing the output layers of the sub-fault diagnosis model,
Figure 511541DEST_PATH_IMAGE056
residual mapping representing deep learning, adding a set exit layer after the weight layer, and expanding a causal convolution function F (t) defined as:
Figure 742671DEST_PATH_IMAGE057
(4)
in the formula (4), the first and second groups of the chemical reaction are shown in the specification,
Figure 219920DEST_PATH_IMAGE058
is a filter;
Figure 804485DEST_PATH_IMAGE059
is a hierarchy of neural networks;
Figure 565637DEST_PATH_IMAGE060
representing input time series information;
Figure 577455DEST_PATH_IMAGE061
the cavity parameters are cavity interval sizes;
Figure 542000DEST_PATH_IMAGE062
representing a hole convolution operator;
the evaluation formula of the fault diagnosis system structure is as follows:
Figure 54890DEST_PATH_IMAGE063
(5)
in the formula (5), the first and second groups,
Figure 545914DEST_PATH_IMAGE064
the mean value of the fault assessment indexes of the big data information is shown,Twhich represents the duration of the prediction,
Figure 604000DEST_PATH_IMAGE065
an evaluation duration period parameter representing a predictive big data message failure architecture,
Figure 180475DEST_PATH_IMAGE066
represents the hyper-parameters of various items of the deep learning model,θthe evaluation index of the fault diagnosis architecture is shown,
Figure 494126DEST_PATH_IMAGE067
parameters representing the evaluation indexes of the big data information fault diagnosis system structure are subjected to information overlapping by establishing an orthogonalized evaluation matrix, and the interactive influence iterative process among different information is as follows:
Figure 715023DEST_PATH_IMAGE068
(6)
in the formula (6), α represents a mutual overlapping function of the big data information fault evaluation indexes, β represents a mutual influence iterative process between the big data information, and an algorithm program is established for the matrix of the formula (6) according to an iterative formula between the big data information fault evaluation indexes, that is:
Figure 334223DEST_PATH_IMAGE069
(7)
in the formula (7), the first and second groups,
Figure 522627DEST_PATH_IMAGE070
representing a big data information fault assessment orthogonalization safety matrix, and mu represents an editing parameter of the orthogonalization matrix; then, applying various big data information fault evaluation index data to a data information intelligent prediction platform through a Schmidt formula, and outputting the best evaluation effect obtained by online testing as follows:
Figure 128052DEST_PATH_IMAGE071
(8)
in the formula (8), the first and second groups,
Figure 328089DEST_PATH_IMAGE072
the evaluation index effect of each item of data information of checking calculation is shown,mrepresenting the number of big data information architecture nodes,
Figure 242825DEST_PATH_IMAGE073
the variable value of the number of the nodes of the big data information architecture is judged and evaluated to obtain the effect of the index, and then a weight formula is calculated, wherein the weight formula is as follows:
Figure 669258DEST_PATH_IMAGE074
(9)
in the formula (9), the first and second groups,
Figure 937428DEST_PATH_IMAGE075
and representing the weight of the big data information fault evaluation index.
As a further technical solution of the present invention, the improved error evaluation function is
Figure 116606DEST_PATH_IMAGE076
(10)
The formula (10) includes
Figure 343188DEST_PATH_IMAGE077
Group data wherein
Figure 725759DEST_PATH_IMAGE078
Represented as a big data information test sample,
Figure 190763DEST_PATH_IMAGE079
represented as big data information failure prediction samples.
A data processing apparatus for improving a KNN method, comprising:
the data acquisition module is used for acquiring data information from the database information and performing dimension reduction processing on the acquired data information to acquire low-dimensional data information;
the data processing module is used for processing the data information after the dimensionality reduction through improving the KNN algorithm model;
the data evaluation module is used for evaluating the processed data information through an improved error evaluation function;
the data sharing module is used for applying and sharing data information, and performing remote data information processing and data sharing on the acquired data information;
the data processing module is respectively connected with the data acquisition module, the data evaluation module and the data sharing module.
The invention has the following positive beneficial effects:
the invention obtains the data information from the database information, and carries out dimension reduction processing on the obtained data information to obtain the low-dimensional data information; carrying out data information processing on the data information subjected to dimensionality reduction by improving a KNN algorithm model, wherein the improved KNN algorithm model comprises a data preprocessing step, a data layering step, a data KNN algorithm calculating step and a convolution fault diagnosis step; evaluating the processed data information through an improved error evaluation function; and data information application and sharing are carried out, and remote data information processing and data sharing are carried out on the acquired data information.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without inventive exercise, wherein:
FIG. 1 is a schematic flow diagram of the present invention;
FIG. 2 is a diagram of a first embodiment of a dimension reduction processing model according to the present invention;
FIG. 3 is a diagram of a second embodiment of the dimension reduction processing model according to the present invention;
FIG. 4 is a schematic structural diagram of a differential layer model according to a first embodiment of the present invention;
FIG. 5 is a schematic structural diagram of a differential layer model according to a second embodiment of the present invention;
FIG. 6 is a schematic structural diagram of a differential layer model according to a third embodiment of the present invention;
FIG. 7 is a schematic diagram of a convolution fault diagnosis model according to the present invention;
FIG. 8 is a comparative illustration of the experimental results of the present invention.
Detailed Description
The preferred embodiments of the present invention will be described below with reference to the accompanying drawings, and it should be understood that the embodiments described herein are merely for the purpose of illustrating and explaining the present invention and are not intended to limit the present invention.
Example (1) Process
As shown in fig. 1, a data processing method for improving KNN method includes the following steps:
step one, acquiring data information from database information, and performing dimensionality reduction processing on the acquired data information to acquire low-dimensionality data information;
step two, carrying out data information processing on the data information after dimensionality reduction through an improved KNN algorithm model, wherein the improved KNN algorithm model comprises a data preprocessing step, a data layering step, a data KNN algorithm calculating step and a convolution fault diagnosis step;
evaluating the processed data information through an improved error evaluation function;
and fourthly, applying and sharing the data information, and performing remote data information processing and data sharing on the acquired data information.
The KNN is called K Nearest Neighbors, meaning K Nearest Neighbors, and the value of K is definitely important. The principle of KNN is to determine what class x belongs to based on what class it is from the nearest K points when predicting a new value x.
In the above embodiment, the dimension reduction processing method includes the following steps:
(S11) realizing dimension reduction processing by reconstructing matrix data information, and setting the number of reconstructed matrix data, data dimension and time delay;
(S12) solving the distribution probability of different element libraries by an average mutual information method, and analyzing data characteristics by a correlation algorithm model;
(S13) calculating the dimension of the data information by a false neighbor method, selecting different data classifications by comparing the dimensions of different data information, and comparing different elements in the database information by a characteristic pair measurement method through a sequence between two different dimensions, wherein the formula is as follows:
Figure 365392DEST_PATH_IMAGE080
(1)
in formula (1), R represents a data dimension, n represents a vector,
Figure 372662DEST_PATH_IMAGE081
representing the matrix data information before the reconstruction is performed,
Figure 632742DEST_PATH_IMAGE082
representing the reconstructed matrix data information;
Figure 632928DEST_PATH_IMAGE083
and
Figure 803009DEST_PATH_IMAGE084
representing the relationship of false adjacent points among the reconstruction matrix data, r representing data information added after reconstruction, u being the optimal dimensionality of the reconstruction matrix data information, and the difference between the element data dimensionality of the reconstruction matrix data and the data dimensionality after dimensionality reduction being larger than 10 after reconstruction;
and (S14) performing dimension reduction judgment, outputting data information when the dimension reduction data information meets the current requirement, and performing dimension reduction calculation again when the dimension reduction data information does not meet the current requirement.
In a specific embodiment, the dimension reduction process is an operation of converting high-dimensional data into low-dimensional data, and the computing capacity of data information can be improved. In a particular embodiment, one matrix may be reshaped to another new matrix of a different size by borrowing the function reshape through MATLAB, but retaining its original data. The number of rows and columns of the desired reconstructed matrix is represented by giving a matrix represented by a two-dimensional array and two positive integers. The reconstructed matrix needs to fill all elements of the original matrix in the same row traversal order. If reshape operation with the given parameters is feasible and reasonable, outputting a new remolding matrix; otherwise, the original matrix is output.
In a particular embodiment, the average mutual information represents, as a whole, the amount of information given by one random variable Y in respect of another random variable X in the data processing. Let H (X) represent the uncertainty about the input variable X before the output symbol is received. And H (Y | X) represents the average uncertainty about the input variable X after receiving the output symbol. The difference between the two represents the amount of information obtained by the receiving end, i.e. the average mutual information. It can be seen that some uncertainty is removed by the channel transmission, and certain information is obtained, while the average mutual information represents the amount of information about the input terminal X obtained by averaging each symbol after receiving the output symbols.
In a specific embodiment, the support degree represents the occurrence probability in the population, and the larger the number of the total tickets is, the smaller the minimum support degree is set, so as to ensure that a frequent item set can exist. The less frequent item sets, the less support should be adjusted. Firstly, deleting the items which do not meet the minimum support degree to construct a data set, and scanning one side of the data set; then sorting the screened data sets to construct a tree with a root node of NULL; the data set is inserted into the tree.
In the embodiment, on the basis of the false neighborhood concept, a method for simultaneously determining a proper embedding dimension and time delay can be provided, so that the input of the radial basis function neural network can be determined, and then, the radial basis function neural network is used for learning and predicting. The chaotic time sequence is a projection of Gao Weixiang space chaotic motion trajectory on a one-dimensional space, and in the projection process, the chaotic motion trajectory is distorted. Two points which are not adjacent in Gao Weixiang space may be called two adjacent points, i.e. false adjacent points, when projected on a one-dimensional spatial axis, which is why the chaotic time sequence appears irregular. Reconstructing a phase space, namely recovering a track of chaotic motion from a wonton time sequence, wherein the track of the chaotic motion is gradually opened along with the increase of the embedding dimension m, and False adjacent points are gradually kicked out, so that the track of the chaotic motion is recovered.
As shown in FIGS. 2-6, the data attribute categories of FIG. 2
Figure 840236DEST_PATH_IMAGE085
Representing data attributes of
Figure 977825DEST_PATH_IMAGE086
-
Figure 532434DEST_PATH_IMAGE087
Categorizing the data information as a subordinate of a data attribute category, wherein a 11- a 32 And representing various data information in the subordinate classified data information. The data attribute class b in FIG. 3 represents data attributes other than a, of which
Figure 806289DEST_PATH_IMAGE088
-
Figure 279996DEST_PATH_IMAGE089
Representing a data attribute different from that of the data information a, b 11- b 32 A plurality of data information in subordinate classification data information which is different from the data attribute of the data information a. In other words, a and b are different types of data information.
In the above embodiment, the data hierarchy is a differential hierarchy, and the differential hierarchy method includes:
dividing the data attributes into different attributes according to the number and the types, and sequentially arranging and distributing the attribute data quantity from the top layer to the bottom layer from less to most;
calculating the distance between different data attributes, and assuming that certain data information in the data set is
Figure 390034DEST_PATH_IMAGE090
Data attribute class classification
Figure 341810DEST_PATH_IMAGE091
Figure 756259DEST_PATH_IMAGE092
Figure 276233DEST_PATH_IMAGE093
And
Figure 122835DEST_PATH_IMAGE094
then data information
Figure 878302DEST_PATH_IMAGE095
Distance data attribute categories
Figure 877482DEST_PATH_IMAGE096
Figure 692991DEST_PATH_IMAGE097
Figure 26889DEST_PATH_IMAGE098
And
Figure 195833DEST_PATH_IMAGE099
is a distance of
Figure 174154DEST_PATH_IMAGE100
Figure 550777DEST_PATH_IMAGE101
Carrying out differential calculation on the calculated different data attribute data information; when in use
Figure 857125DEST_PATH_IMAGE102
In which
Figure 219973DEST_PATH_IMAGE103
Is constant, then data information
Figure 914784DEST_PATH_IMAGE104
Is divided into
Figure 213041DEST_PATH_IMAGE105
And (4) class.
In a specific embodiment, by dividing different data attributes, a user can acquire data information with different attributes from a large amount of data information, and improve data processing capacity of the acquired data information in a distributed computing manner. Through differential calculation, the acquired data information can be correctly classified, so that the division of different module information is realized, and the data processing capacity is improved.
In the above embodiment, the data KNN algorithm includes the steps of:
(S21) selecting a big data information test set, and selecting a test big data information vector set according to different data attributes;
(S22) training a big data information test set to construct an n-layer tree form through hierarchical classification; data search of the big data information test set is realized through an optimal search algorithm;
(S23) sequentially calculating the text similarity of each big data information in the big data information test set and the 1 st-nth layer big data information test set training set;
the formula for calculating the Euclidean distance is as follows:
Figure 131319DEST_PATH_IMAGE106
(2)
in the formula (2), the first and second groups of the chemical reaction are represented by the following formula,
Figure 891333DEST_PATH_IMAGE107
a feature vector representing test information in a large test set of data information,
Figure 844246DEST_PATH_IMAGE108
representing a sequence of feature vectors of test information in a big data information test set;
Figure 47825DEST_PATH_IMAGE109
the aggregate center vector is tested for large data information of layer 1 class j,
Figure 578033DEST_PATH_IMAGE110
a class representing big data information; m is the dimension of the feature vector of the big data information test set;
Figure 282683DEST_PATH_IMAGE111
testing the kth dimension of the set vector for big data information;
Figure 965469DEST_PATH_IMAGE112
representing the first layer in the k-th dimensionj-type big data information test set vectors;
(S24) according to the text similarity, selecting the text most similar to the test text in the training text set
Figure 464583DEST_PATH_IMAGE113
A text;
(S25) in the test text
Figure 482086DEST_PATH_IMAGE114
In each neighbor, the weight of each class is calculated in turn,
Figure 334636DEST_PATH_IMAGE115
representing a formula of weight value, the formula being
Figure 262141DEST_PATH_IMAGE116
Wherein
Figure 319440DEST_PATH_IMAGE117
In order to be able to obtain the data information,
Figure 309393DEST_PATH_IMAGE118
is shown as
Figure 90267DEST_PATH_IMAGE119
The feature vectors of the test information in the big data information test set of a class,
Figure 996912DEST_PATH_IMAGE035
the coefficient of similarity of the Jacard is expressed,
Figure 837829DEST_PATH_IMAGE120
in order to calculate the formula for the degree of similarity,
Figure 580657DEST_PATH_IMAGE121
representing a degree of similarity value, wherein
Figure 289856DEST_PATH_IMAGE122
Is 1 or 0, if
Figure 926373DEST_PATH_IMAGE123
Belong to
Figure 813558DEST_PATH_IMAGE124
Then function
Figure 27371DEST_PATH_IMAGE041
The value is 1, otherwise 0;
(S26) sorting the calculated weights, and differentially comparing the sorted weights, when
Figure 415627DEST_PATH_IMAGE125
In which
Figure 782017DEST_PATH_IMAGE126
If the data set represents the characteristics, the test text belongs to the 1 st class, and when the similarity comparison is carried out on the second layer, only the subclasses of the 1 st class in the second layer need to be compared; if it is not
Figure 964737DEST_PATH_IMAGE127
Then continue to judge, exist
Figure 934354DEST_PATH_IMAGE128
When is coming into contact with
Figure 736088DEST_PATH_IMAGE129
When it is, the test text belongs to 1-
Figure 816040DEST_PATH_IMAGE130
One of the classes, when comparing the second layer, only the first class in the 1 st class in the second layer needs to be compared
Figure 825453DEST_PATH_IMAGE131
A subclass of the class; if it is used
Figure 623644DEST_PATH_IMAGE132
If so, continuing to judge; wherein
Figure 229069DEST_PATH_IMAGE133
Representing the difference values of the sorted forward-to-adjacent weights,
Figure 553740DEST_PATH_IMAGE134
the set of large data information test set threshold differential values representing settings,
Figure 609421DEST_PATH_IMAGE135
indicating the presence of
Figure 770275DEST_PATH_IMAGE136
The big data information-like test gathers the differential value of the distance value.
KNN (K-Nearest Neighbor) is one of the simplest machine learning algorithms, can be used for classification and regression, and is a supervised learning algorithm. If a sample belongs to a certain class in the K most similar samples in the feature space (i.e., the nearest neighbors in the feature space), then the sample also belongs to this class. That is, the method only determines the category to which the sample to be classified belongs according to the category of the nearest sample or samples in the classification decision. KNN is classified by measuring the distance between different feature values. If a sample belongs to a certain class in the majority of the k most similar samples in feature space (i.e. the nearest neighbors in feature space), then the sample also belongs to this class. K is typically an integer no greater than 20. In the KNN algorithm, the selected neighbors are all objects that have been correctly classified. The method only determines the category of the sample to be classified according to the category of the nearest sample or a plurality of samples in the classification decision. The core idea of the kNN algorithm is that if most of the k nearest neighbors of a sample in the feature space belong to a certain class, then the sample also belongs to this class and has the characteristics of the samples on this class. The method only determines the category of the sample to be classified according to the category of the nearest sample or samples in determining the classification decision. In a particular embodiment, the result of the KNN algorithm depends largely on the choice of K. The KNN algorithm can be used not only for classification but also for regression. The attributes of a sample are obtained by finding the k nearest neighbors of the sample and assigning the average of the attributes of these neighbors to the sample. A more useful approach is to give different weights (weights) to the impact that neighbors of different distances have on the sample, e.g., the weights are inversely proportional to the distance.
In a further embodiment, the distance between the test data and the respective training data is calculated; sorting according to the increasing relation of the distances; selecting K points with the minimum distance; then determining the occurrence frequency of the category of the first K points; and then returning the category with the highest frequency of occurrence in the former K points as the prediction classification of the test data.
In a further embodiment, a smaller value is selected by the selection of the value of k, and then an appropriate final value is selected by cross-validation. Where k is smaller, even if the prediction is performed using samples in a smaller domain, the training error will be reduced, but the model will be so complex that it is over-fitted. The larger k is, even if prediction is performed using samples in a large area, training errors increase, a model becomes simple, and under-fitting is easily caused. Therefore, in a specific embodiment, a proper k value needs to be selected to improve the data processing capability.
An exemplary code in a data processing method of an improved KNN method is as follows:
load data.txt;
a = data (1; % of the first thirty groups
aa = data (31, 50, 1; % of the last twenty groups of the first class
b = data (51; % of the first thirty groups of the second class
bb = data (81; % of the last twenty groups of the second class
c = data (101, 1; % of the first thirty groups of the third group
cc = data (131, 150, 1; % of the last twenty groups of the third class
train = cat (1, a, b, c); % of the composition training samples (90X 4)
test = cat (1,aa, bb, cc); % composition test specimen (60 ANG 4)
c =3; % c mean c =3
z1=train(1,:);
z2=train(45,:);
z3= train (90,: r); % initial clustering center z1, z2, z3
m =0; t =0; % number of iteration steps
while m==0
samp1= [ ]; samp2= [ ]; samp3= [ ]; % defines empty sample: the first type is samp1, the second type is samp2, and the third type is samp3
n1=1;n2=1;n3=1;
t=t+1;
for i=1:90
if(pdist([train(i,:);z1])〈pdist([train(i,:);z2]))&&(pdist([train(i,:);z1])<pdis
([ train (i); z3 ]))% distance
% assigns samp1 if the distance from the training sample to cluster z1 is less than the distance from z2, z 3.
samp1(n1,:)=train(i,:);
n1=n1+1;
elseif (pdist([train(i,:);z2])〈pdist([train(i,:);z1]))&&(pdist([train(i,:);z2])<pdist([train(i,:);z3]))
% if the distance between the training sample and the cluster z2 is less than the distance between the training sample and z1, z3, then assign a value to samp2
samp2(n2,:)=train(i,:);
n2=n2+1;
else is assigned to samp3
samp3(n3,:)=train(i,:);
n3=n3+1;
end。
As shown in fig. 7,
Figure 428658DEST_PATH_IMAGE137
which represents the information of the input data,
Figure 748781DEST_PATH_IMAGE138
indicating that the data information of the hidden node,
Figure 319571DEST_PATH_IMAGE139
representing function data information nodes in the calculation process of the big data information test set,
Figure 92355DEST_PATH_IMAGE140
the attributes of the hidden layer node are represented,
Figure 551499DEST_PATH_IMAGE141
training data information representing nodes of a data output layer;
in the above embodiment, the convolution fault diagnosis method includes the following steps:
constructing a fault diagnosis architecture by expanding a causal convolution and a residual block, in which Dropout is a regularization technique for removing some random outputs of the convolution sub-fault diagnosis model architecture layer; the number of neurons to discard is given by a DREPOPOUT rate of 0 to 1, which is the probability that the layer output is discarded; the convolution fault diagnosis model receptive field also depends on the number of layers of the residual block, e.g. kernel size k s =3, dilation factor d =1, 2, 4, remaining number of tile stacksnThe receptive field size of =1 would be 3 × 4 × 1=12. The residual block is shown in equation (3):
Figure 335916DEST_PATH_IMAGE142
(3)
in the formula (3), O is an output variable of the output layer of the convolution fault diagnosis model,
Figure 733399DEST_PATH_IMAGE143
input variables representing the output layers of the sub-fault diagnosis model,
Figure 852534DEST_PATH_IMAGE144
residual mapping representing deep learning, adding a set back after the weight layerAnd (3) layering, wherein the expansion causal convolution function F (t) is defined as:
Figure 869031DEST_PATH_IMAGE145
(4)
in the formula (4), the first and second groups of the chemical reaction are shown in the specification,
Figure 898167DEST_PATH_IMAGE058
is a filter;
Figure 325606DEST_PATH_IMAGE059
is a hierarchy of neural networks;
Figure 72982DEST_PATH_IMAGE146
time series information representing the input;
Figure 627592DEST_PATH_IMAGE147
the cavity parameters are cavity interval sizes;
Figure 901447DEST_PATH_IMAGE062
representing a hole convolution operator;
the evaluation formula of the fault diagnosis system structure is as follows:
Figure 109574DEST_PATH_IMAGE148
(5)
in the formula (5), the first and second groups,
Figure 219613DEST_PATH_IMAGE149
the mean value of the fault evaluation indexes of the big data information is represented,Twhich represents the duration of the prediction,
Figure 564531DEST_PATH_IMAGE150
an evaluation duration period parameter representing a predictive big data message failure architecture,
Figure 568259DEST_PATH_IMAGE151
represents the hyper-parameters of various items of the deep learning model,θindicating fault diagnosisThe evaluation index of the system structure is obtained,
Figure 88233DEST_PATH_IMAGE152
parameters representing the evaluation indexes of the big data information fault diagnosis system structure are subjected to information overlapping by establishing an orthogonalized evaluation matrix, and the interactive influence iterative process among different information is as follows:
Figure 544622DEST_PATH_IMAGE153
(6)
in the formula (6), α represents a mutual overlapping function of the big data information fault evaluation indexes, β represents a mutual influence iterative process between the big data information, and an algorithm program is established for the matrix of the formula (6) according to an iterative formula between the big data information fault evaluation indexes, that is:
Figure 690302DEST_PATH_IMAGE154
(7)
in the formula (7), the first and second groups,
Figure 423903DEST_PATH_IMAGE155
representing a big data information fault assessment orthogonalization safety matrix, and mu represents an editing parameter of the orthogonalization matrix; then, applying various big data information fault evaluation index data to a data information intelligent prediction platform through a Schmidt formula, and outputting the best evaluation effect obtained by online testing as follows:
Figure 504991DEST_PATH_IMAGE156
(8)
in the formula (8), the first and second groups,
Figure 307731DEST_PATH_IMAGE157
the evaluation index effect of each item of data information of checking calculation is shown,mrepresenting the number of big data information architecture nodes,
Figure 7834DEST_PATH_IMAGE073
the variable value of the number of nodes of the big data information architecture is judged to evaluate the index effect, and then a weight formula is calculated, wherein the weight formula is as follows:
Figure 986154DEST_PATH_IMAGE158
(9)
in the formula (9), the first and second groups,
Figure 97198DEST_PATH_IMAGE159
and representing the weight of the big data information fault evaluation index.
The super-parameters of the convolution fault diagnosis model are subjected to iteration processing by establishing an algorithm model, the fault evaluation index of big data information is calculated according to iteration data, and optimization is performed through an orthogonalization matrix, so that the optimal optimization parameter evaluation result is obtained, and the algorithm performance of the convolution fault diagnosis model system is improved.
The invention applies a novel Time Convolution Network (Time Convolution Network, convolution component fault diagnosis model) deep learning model for scheduling big data information fault intelligent prediction.
In the above embodiment, the improved error evaluation function is as shown in FIG. 8
Figure 528180DEST_PATH_IMAGE160
(10)
The formula (10) includes
Figure 766394DEST_PATH_IMAGE161
Group data of wherein
Figure 720925DEST_PATH_IMAGE162
Represented as a big data information test sample,
Figure 878237DEST_PATH_IMAGE079
represented as big data information failure prediction samples.
In order to verify the technical effect of the invention, the scheme 1 is assumed to be a decision tree classification method, the scheme 2 is assumed to be a k-means classification method, and the 2 methods are respectively adopted to verify and compare the scheme of the invention.
The corresponding experimental results obtained by continuous training are shown in table 1, and the comparative graph obtained by simulation software is shown in fig. 8.
TABLE 1 error accuracy comparison of different methods
Figure 671881DEST_PATH_IMAGE163
As can be seen from the above figure, in the data analysis accuracy test, the test result of the method of the present invention is significantly higher than the accuracy of the scheme 1 and the scheme 2, and the data analysis accuracy of the method of the present invention is higher than 80%, and can reach 96% at most, the accuracy fluctuation is small, and the method is relatively stable. The fluctuation range of the scheme 1 and the scheme 2 is large in the data analysis accuracy test, and the accuracy is extremely unstable, so that compared with the method disclosed by the invention, the method has the great defects; therefore, the method has high data analysis accuracy.
Example (2) apparatus
A data processing apparatus for improving a KNN method, comprising:
the data acquisition module is used for acquiring data information from the database information and performing dimensionality reduction processing on the acquired data information to acquire low-dimensional data information;
the data processing module is used for processing the data information after the dimensionality reduction through improving the KNN algorithm model;
the data evaluation module is used for evaluating the processed data information through an improved error evaluation function;
the data sharing module is used for applying and sharing data information, and performing remote data information processing and data sharing on the acquired data information;
the data processing module is respectively connected with the data acquisition module, the data evaluation module and the data sharing module.
Although specific embodiments of the present invention have been described above, it will be understood by those skilled in the art that these specific embodiments are merely illustrative and that various omissions, substitutions and changes in the form of the detail of the methods and systems described above may be made by those skilled in the art without departing from the spirit and scope of the invention. For example, it is within the scope of the present invention to combine the steps of the above-described methods to perform substantially the same function in substantially the same way to achieve substantially the same result. Accordingly, the scope of the invention is to be limited only by the following claims.

Claims (6)

1. A data processing method for improving a KNN method is characterized by comprising the following steps: the method comprises the following steps:
step one, acquiring data information from database information, and performing dimensionality reduction processing on the acquired data information to acquire low-dimensionality data information;
step two, carrying out data information processing on the data information after dimensionality reduction through an improved KNN algorithm model, wherein the improved KNN algorithm model comprises a data preprocessing step, a data layering step, a data KNN algorithm calculating step and a convolution fault diagnosis step;
evaluating the processed data information through an improved error evaluation function;
fourth, data information is applied and shared, and remote data information processing and data sharing are carried out on the acquired data information;
the volume integral fault diagnosis method comprises the following steps:
the fault diagnosis architecture is constructed by expanding the causal convolution with a residual block, as shown in equation (1):
Figure DEST_PATH_IMAGE001
(1)
in the formula (1), O is an output variable of the output layer of the convolution fault diagnosis model,
Figure 697078DEST_PATH_IMAGE002
representing inputs to an output layer of a sub-fault diagnostic modelThe variables are the variables of the process,
Figure DEST_PATH_IMAGE003
residual mapping representing deep learning, adding a set exit layer after the weight layer, and expanding a causal convolution function F (t) defined as:
Figure 16063DEST_PATH_IMAGE004
(2)
in the formula (2), the reaction mixture is,
Figure DEST_PATH_IMAGE005
is a filter;
Figure 514565DEST_PATH_IMAGE006
is a hierarchy of neural networks;
Figure DEST_PATH_IMAGE007
time series information representing the input;
Figure 526384DEST_PATH_IMAGE008
the cavity parameters are cavity interval sizes;
Figure 677879DEST_PATH_IMAGE009
representing a hole convolution operator;
the evaluation formula of the fault diagnosis system structure is as follows:
Figure 3819DEST_PATH_IMAGE010
(3)
in the formula (3), the reaction mixture is,
Figure DEST_PATH_IMAGE011
the mean value of the fault assessment indexes of the big data information is shown,Twhich represents the duration of the prediction,
Figure 353897DEST_PATH_IMAGE012
an evaluation duration period parameter representing a predictive big data message failure architecture,
Figure 474300DEST_PATH_IMAGE013
represents the hyper-parameters of various items of the deep learning model,θthe evaluation index of the fault diagnosis architecture is shown,
Figure 316354DEST_PATH_IMAGE014
parameters representing the evaluation indexes of the big data information fault diagnosis system structure are subjected to information overlapping by establishing an orthogonalized evaluation matrix, and the iterative process of mutual influence among different information is as follows:
Figure DEST_PATH_IMAGE015
(4)
in the formula (4), alpha represents a mutual overlapping function of the fault evaluation indexes of the big data information, beta represents an interactive influence iterative process between the big data information, and an algorithm program is established for the matrix of the formula (5) according to an iterative formula between the fault evaluation indexes of the big data information, namely:
Figure 364426DEST_PATH_IMAGE016
(5)
in the formula (5), the reaction mixture is,
Figure DEST_PATH_IMAGE017
representing a big data information fault assessment orthogonalization safety matrix, and mu represents an editing parameter of the orthogonalization matrix;
then, applying various big data information fault evaluation index data to a data information intelligent prediction platform through a Schmidt formula, and outputting the best evaluation effect obtained by online testing as follows:
Figure 241115DEST_PATH_IMAGE018
(6)
in the formula (6), the reaction mixture is,
Figure DEST_PATH_IMAGE019
the evaluation index effect of each item of data information of checking calculation is shown,mrepresenting the number of big data information architecture nodes,
Figure 594736DEST_PATH_IMAGE020
the variable value of the number of the nodes of the big data information architecture is judged and evaluated to obtain the effect of the index, and then a weight formula is calculated, wherein the weight formula is as follows:
Figure DEST_PATH_IMAGE021
(7)
in the formula (7), the reaction mixture is,
Figure 189665DEST_PATH_IMAGE022
and representing the weight of the big data information fault evaluation index.
2. The data processing method for improving a KNN method as claimed in claim 1, wherein: the dimension reduction processing method comprises the following steps:
(S11) realizing dimension reduction processing by reconstructing matrix data information, and setting the number of reconstructed matrix data, data dimension and time delay;
(S12) solving the distribution probability of different element libraries by an average mutual information method, and analyzing data characteristics by a correlation algorithm model;
(S13) calculating the dimension of the data information by a false neighbor method, selecting different data classifications by comparing the dimensions of different data information, and comparing different elements in the database information by a characteristic pair measurement method through a sequence between two different dimensions, wherein the formula is as follows:
Figure DEST_PATH_IMAGE023
(8)
in equation (8), R represents a data dimension, n represents a vector,
Figure 44358DEST_PATH_IMAGE024
representing the matrix data information before reconstruction,
Figure DEST_PATH_IMAGE025
representing the reconstructed matrix data information;
Figure 713237DEST_PATH_IMAGE026
and with
Figure DEST_PATH_IMAGE027
Representing the relationship of false adjacent points among the reconstruction matrix data, r representing the data information added after reconstruction, u being the optimal dimensionality of the reconstruction matrix data information, and the difference between the element data dimensionality of the reconstruction matrix data and the data dimensionality after dimensionality reduction being larger than 10 after reconstruction;
and (S14) performing dimension reduction judgment, outputting data information when the dimension reduction data information meets the current requirement, and performing dimension reduction calculation again when the dimension reduction data information does not meet the current requirement.
3. The data processing method for improving a KNN method as claimed in claim 1, wherein: the data layering is differential layering, and the differential layering method comprises the following steps:
dividing the data attributes into different attributes according to the number and the types, and sequentially arranging and distributing the attribute data quantity from the top layer to the bottom layer from less to most;
calculating the distance between different data attributes, and classifying the data attributes into different categories on the assumption that certain data information in the data set exists
Figure 300076DEST_PATH_IMAGE028
Figure DEST_PATH_IMAGE029
Figure 447548DEST_PATH_IMAGE030
And
Figure 653401DEST_PATH_IMAGE031
then data information distance data attribute category
Figure 35841DEST_PATH_IMAGE032
Figure 262423DEST_PATH_IMAGE033
Figure 707311DEST_PATH_IMAGE034
And
Figure 107068DEST_PATH_IMAGE035
is a distance of
Figure DEST_PATH_IMAGE036
Figure 16118DEST_PATH_IMAGE037
Carrying out differential calculation on the calculated data information with different data attributes; when in use
Figure DEST_PATH_IMAGE038
In which
Figure 679181DEST_PATH_IMAGE039
Being constant, the data information is divided into
Figure DEST_PATH_IMAGE040
And (4) class.
4. The data processing method for improving a KNN method as claimed in claim 1, wherein: the data KNN algorithm comprises the following steps:
(S21) selecting a big data information test set, and selecting a test big data information vector set according to different data attributes;
(S22) training a big data information test set to construct an n-layer tree form through hierarchical classification; data search of the big data information test set is realized through an optimal search algorithm;
(S23) sequentially calculating the text similarity of each big data information in the big data information test set and the 1 st-nth layer big data information test set training set;
the formula for calculating the Euclidean distance is as follows:
Figure 795386DEST_PATH_IMAGE041
(9)
in the case of the formula (9),
Figure DEST_PATH_IMAGE042
a feature vector representing test information in a large test set of data information,
Figure 202096DEST_PATH_IMAGE043
representing a sequence of feature vectors of test information in a big data information test set;
Figure DEST_PATH_IMAGE044
the aggregate center vector is tested for large data information of layer 1 class j,
Figure 965653DEST_PATH_IMAGE045
a class representing big data information; m is the dimension of the feature vector of the big data information test set;
Figure 330775DEST_PATH_IMAGE046
testing the kth dimension of the set vector for big data information;
Figure 15834DEST_PATH_IMAGE047
representing a large data information test set vector of a jth class of a first layer in a kth dimension;
(S24) according to the text similarity, selecting the text which is most similar to the test text in the training text set
Figure 695077DEST_PATH_IMAGE048
A text;
(S25) in the test text
Figure 906616DEST_PATH_IMAGE048
In each neighbor, the weight of each class is calculated in turn,
Figure 318006DEST_PATH_IMAGE049
representing a formula of weight value, the formula being
Figure DEST_PATH_IMAGE050
In which
Figure 146153DEST_PATH_IMAGE051
In order to be able to obtain the data information,
Figure DEST_PATH_IMAGE052
is shown as
Figure 897596DEST_PATH_IMAGE053
The feature vectors of the test information in the big data information test set of a class,
Figure 166903DEST_PATH_IMAGE054
the Jacard's similarity coefficient is expressed,
Figure 483615DEST_PATH_IMAGE055
in order to calculate the formula for the degree of similarity,
Figure 267901DEST_PATH_IMAGE056
representing a degree of similarity value, wherein
Figure 288946DEST_PATH_IMAGE057
Is 1 or 0, if
Figure 84864DEST_PATH_IMAGE058
Belong to
Figure 228269DEST_PATH_IMAGE059
Then function
Figure DEST_PATH_IMAGE060
The value is 1, otherwise 0;
(S26) sorting the calculated weights, and differentially comparing the sorted weights, when
Figure 171954DEST_PATH_IMAGE061
In which
Figure DEST_PATH_IMAGE062
If the data set represents the characteristics, the test text belongs to the 1 st class, and only the subclasses of the 1 st class in the second layer need to be compared when the similarity comparison is carried out on the second layer; if it is not
Figure 324587DEST_PATH_IMAGE063
Then continue to judge that there is
Figure 975011DEST_PATH_IMAGE064
When is coming into contact with
Figure 309826DEST_PATH_IMAGE065
When it is, the test text belongs to 1-
Figure 678490DEST_PATH_IMAGE066
One of the classes, when comparing the second layer, only the first class in the 1 st class in the second layer needs to be compared
Figure 838076DEST_PATH_IMAGE066
A subclass of the class; if it is used
Figure 670903DEST_PATH_IMAGE067
If so, continuing to judge; wherein
Figure 765898DEST_PATH_IMAGE068
Representing the difference values of the sorted forward-to-adjacent weights,
Figure 12071DEST_PATH_IMAGE069
the set of large data information test set threshold differential values representing settings,
Figure DEST_PATH_IMAGE070
indicating the presence of
Figure 647452DEST_PATH_IMAGE066
The big data information-like test gathers the differential value of the distance value.
5. The data processing method for improving a KNN method as claimed in claim 1, wherein: the improved error evaluation function is
Figure 600365DEST_PATH_IMAGE071
(10)
The formula (10) includes
Figure DEST_PATH_IMAGE072
Group data wherein
Figure 725315DEST_PATH_IMAGE073
Represented as a big data information test sample,
Figure DEST_PATH_IMAGE074
represented as big data information failure prediction samples.
6. An apparatus for applying the data processing method of the KNN improving method as claimed in any one of claims 1 to 5, comprising:
the data acquisition module is used for acquiring data information from the database information and performing dimension reduction processing on the acquired data information to acquire low-dimensional data information;
the data processing module is used for processing the data information after the dimensionality reduction through improving the KNN algorithm model;
the data evaluation module is used for evaluating the processed data information through an improved error evaluation function;
the data sharing module is used for applying and sharing data information, and performing remote data information processing and data sharing on the acquired data information;
the data processing module is respectively connected with the data acquisition module, the data evaluation module and the data sharing module.
CN202210946851.XA 2022-08-09 2022-08-09 Data processing method and device for improving KNN method Active CN115017125B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210946851.XA CN115017125B (en) 2022-08-09 2022-08-09 Data processing method and device for improving KNN method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210946851.XA CN115017125B (en) 2022-08-09 2022-08-09 Data processing method and device for improving KNN method

Publications (2)

Publication Number Publication Date
CN115017125A CN115017125A (en) 2022-09-06
CN115017125B true CN115017125B (en) 2022-10-21

Family

ID=83066268

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210946851.XA Active CN115017125B (en) 2022-08-09 2022-08-09 Data processing method and device for improving KNN method

Country Status (1)

Country Link
CN (1) CN115017125B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7720822B1 (en) * 2005-03-18 2010-05-18 Beyondcore, Inc. Quality management in a data-processing environment
CN104408095A (en) * 2014-11-15 2015-03-11 北京广利核系统工程有限公司 Improvement-based KNN (K Nearest Neighbor) text classification method
CN114781555A (en) * 2022-06-21 2022-07-22 深圳市鼎合丰科技有限公司 Electronic component data classification method by improving KNN method

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11488010B2 (en) * 2018-12-29 2022-11-01 Northeastern University Intelligent analysis system using magnetic flux leakage data in pipeline inner inspection
CN112308251A (en) * 2020-12-31 2021-02-02 北京蒙帕信创科技有限公司 Work order assignment method and system based on machine learning

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7720822B1 (en) * 2005-03-18 2010-05-18 Beyondcore, Inc. Quality management in a data-processing environment
CN104408095A (en) * 2014-11-15 2015-03-11 北京广利核系统工程有限公司 Improvement-based KNN (K Nearest Neighbor) text classification method
CN114781555A (en) * 2022-06-21 2022-07-22 深圳市鼎合丰科技有限公司 Electronic component data classification method by improving KNN method

Also Published As

Publication number Publication date
CN115017125A (en) 2022-09-06

Similar Documents

Publication Publication Date Title
CN112784881B (en) Network abnormal flow detection method, model and system
CN107622182B (en) Method and system for predicting local structural features of protein
US11544570B2 (en) Method and apparatus for large scale machine learning
CN107292350A (en) The method for detecting abnormality of large-scale data
CN108805193B (en) Electric power missing data filling method based on hybrid strategy
CN112382352A (en) Method for quickly evaluating structural characteristics of metal organic framework material based on machine learning
CN110020712B (en) Optimized particle swarm BP network prediction method and system based on clustering
Labroche New incremental fuzzy c medoids clustering algorithms
CN112926640A (en) Cancer gene classification method and equipment based on two-stage depth feature selection and storage medium
CN114139639B (en) Fault classification method based on self-step neighborhood preserving embedding
Farooq Genetic algorithm technique in hybrid intelligent systems for pattern recognition
Jha et al. Criminal behaviour analysis and segmentation using k-means clustering
CN113516019A (en) Hyperspectral image unmixing method and device and electronic equipment
CN115017125B (en) Data processing method and device for improving KNN method
Cai et al. Fuzzy criteria in multi-objective feature selection for unsupervised learning
CN111488903A (en) Decision tree feature selection method based on feature weight
CN113704570A (en) Large-scale complex network community detection method based on self-supervision learning type evolution
CN112488188A (en) Feature selection method based on deep reinforcement learning
CN111104950A (en) K value prediction method and device in k-NN algorithm based on neural network
CN117437976B (en) Disease risk screening method and system based on gene detection
CN113240113B (en) Method for enhancing network prediction robustness
Triguero et al. Prototype generation for nearest neighbor classification: Survey of methods
Ahamad et al. Clustering and classification algorithms in data mining
Bektaş et al. Optimisations of four imputation frameworks for performance exploring based on decision tree algorithms in big data analysis problems
Zhan et al. Analyzing community structure in networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant