CN112085053A - Data drift discrimination method and device based on nearest neighbor method - Google Patents

Data drift discrimination method and device based on nearest neighbor method Download PDF

Info

Publication number
CN112085053A
CN112085053A CN202010749770.1A CN202010749770A CN112085053A CN 112085053 A CN112085053 A CN 112085053A CN 202010749770 A CN202010749770 A CN 202010749770A CN 112085053 A CN112085053 A CN 112085053A
Authority
CN
China
Prior art keywords
data
tested
test
test data
standard reference
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010749770.1A
Other languages
Chinese (zh)
Other versions
CN112085053B (en
Inventor
李锐
金长新
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Inspur Scientific Research Institute Co Ltd
Original Assignee
Jinan Inspur Hi Tech Investment and Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jinan Inspur Hi Tech Investment and Development Co Ltd filed Critical Jinan Inspur Hi Tech Investment and Development Co Ltd
Priority to CN202010749770.1A priority Critical patent/CN112085053B/en
Publication of CN112085053A publication Critical patent/CN112085053A/en
Application granted granted Critical
Publication of CN112085053B publication Critical patent/CN112085053B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24147Distances to closest patterns, e.g. nearest neighbour classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2474Sequence data queries, e.g. querying versioned data

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Fuzzy Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Complex Calculations (AREA)

Abstract

The application discloses a data drift judging method and device based on a nearest neighbor method, which are used for solving the problems that a large amount of computing power is required to be consumed, the scheme is complex and the operation is difficult to realize in the existing data drift judging algorithm. The method comprises the following steps: the server acquires a standard reference data set; the server acquires a test data set; the server judges the similarity between the data to be tested and the standard reference data set and the similarity between the data to be tested and the test data set based on a nearest neighbor algorithm aiming at each data to be tested in the test data set; and the server judges whether the test data group has data drift or not according to the similarity judgment result of each to-be-tested data in the test data group.

Description

Data drift discrimination method and device based on nearest neighbor method
Technical Field
The present invention relates to the field of concept drift, and in particular, to a data drift determination method and apparatus based on a nearest neighbor method.
Background
With the popularization and development of network application, data of various industries are continuously generated in a data stream mode, and the data have the characteristics of mass and rapid change. For example, in the industrial field, sensors need to constantly collect new data; in the e-commerce field, merchants need to continuously acquire behavior data of users.
For the same subject, data acquired at different times are referred to as time series data, which can be used to describe the time-varying condition of the subject. However, in many areas, the data distribution may change unpredictably over time, resulting in data drift that may render existing data models inapplicable to new data. Therefore, in order to select an appropriate data model, a data analyst needs to determine whether there is data drift in the data.
At present, an algorithm for judging whether data drift occurs exists, and the algorithm is a three-branch decision tree concept algorithm. In the detection process, the training data is classified by using a decision tree, and then the training data are classified into an L domain, an R domain and an M domain of three decisions according to the classification error rate of each subtree. The L domain, the R domain and the M domain respectively represent that data do not drift, data drift and data drift possibly.
However, the existing algorithms for judging data drift, including the three-branch decision tree concept algorithm, often have the problems of large consumption of computing power, complex scheme and difficult operation.
Disclosure of Invention
The embodiment of the application provides a data drift judging method and device based on a nearest neighbor method, and aims to solve the problems of large calculation amount, complexity and impracticality of the existing data drift judging method.
In one aspect, an embodiment of the present application provides a data drift discrimination method based on a nearest neighbor method, where the method includes:
the server acquires a standard reference data set;
the server acquires a test data set;
the server judges the similarity between the data to be tested and the standard reference data set and the similarity between the data to be tested and the test data set based on a nearest neighbor algorithm aiming at each data to be tested in the test data set;
and the server judges whether the test data group has data drift or not according to the similarity judgment result of each to-be-tested data in the test data group.
In one example, the standard reference data set is generated at an earlier time than the test data set.
In one example, before the server obtains the test data set, the method further comprises: the server determines a test data window for storing the test data set.
In one example, the server determines, for each data to be tested in the test data set, similarity between the data to be tested and the standard reference data set and similarity between the data to be tested and the test data set based on a nearest neighbor algorithm, including: the server calculates the distance between the data to be tested and each data in the standard reference data set and the distance between the data to be tested and each remaining data in the test data set; selecting front K pieces of data closest to the data to be tested based on the distance between the data to be tested and each piece of data in the standard reference data group and the distance between the data to be tested and each piece of remaining data in the test data group, wherein K is a preset parameter; and judging the similarity of the data to be tested with the standard reference data set and the data to be tested based on the K pieces of data.
In one example, the preset parameter K is an odd number.
In one example, the server determines similarity of the data to be tested with the standard reference data set and the data to be tested based on the K pieces of data, including: determining the number of data belonging to the standard data group in the K pieces of data as a first number; determining the number of data belonging to the test data group in the K pieces of data as a second number; if the first number is greater than the second number, the data to be tested is similar to the standard reference data set; and if the first number is smaller than the second number, the data to be tested is similar to the data group to be tested.
In one example, the determining, by the server, whether data drifting occurs in the test data group according to a result of determining similarity of each piece of data to be tested in the test data group includes: determining the number of data to be tested in the test data group similar to the standard reference data group as a third number; determining the number of data to be tested in the test data group, which is similar to the test data group, as a fourth number; if the third number is greater than the fourth number, the test data group has no data drift; and if the third quantity is smaller than the fourth quantity, the test data has data drift.
In one example, the server calculating the distance of the data to be tested from each data in the standard reference data set and the distance of the data to be tested from each remaining data in the test data set comprises: calculating the distance between the data to be tested and each data in the standard reference data set and the distance between the data to be tested and each remaining data in the test data set based on an Euclidean distance formula; the Euclidean distance formula is as follows:
Figure BDA0002609631380000031
wherein D (x, y) represents the distance between the data to be tested and the corresponding data, (x)1,y1) Coordinates representing the data to be tested, (x)2,y2) Coordinates representing the respective data.
In one example, the method further comprises: and if the test data group drifts, sending a data drifting result to corresponding edge equipment so that the edge equipment performs corresponding data processing on the test data group.
On the other hand, an embodiment of the present application further provides a data drift determination device based on a nearest neighbor method, where the device includes:
the first acquisition module is used for acquiring a standard reference data set;
the second acquisition module is used for acquiring the test data set;
the first judgment module is used for judging the similarity between the data to be tested and the standard reference data set and the similarity between the data to be tested and the test data set on the basis of a nearest neighbor algorithm aiming at each data to be tested in the test data set;
and the second judging module is used for judging whether the test data group has data drift or not according to the similarity judging result of each to-be-tested data in the test data group.
The data drift distinguishing method and device based on the nearest neighbor method provided by the embodiment of the application at least have the following beneficial effects: whether the test data set drifts or not is judged through the KNN algorithm, the implementation method is simple and efficient, the comprehension is easy, parameters do not need to be estimated, and the calculation power consumption is low. The design of the standard reference data set can increase the stability and robustness for judging whether the test data set has data drift. Meanwhile, the method can be used in edge equipment and combined with a sensor, can find the change of data at the first time and carry out corresponding data processing on the data in time.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
fig. 1 is a flowchart of a data drift determination method based on a nearest neighbor method according to an embodiment of the present application;
fig. 2 is a schematic diagram of the KNN algorithm provided in the embodiment of the present application;
fig. 3 is a schematic structural diagram of a data drift determination device based on a nearest neighbor method according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be described in detail and completely with reference to the following specific embodiments of the present application and the accompanying drawings. It should be apparent that the described embodiments are only some of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The technical solutions proposed in the embodiments of the present application are described in detail below with reference to the accompanying drawings.
Fig. 1 is a flowchart of a data drift determination method based on a nearest neighbor method according to an embodiment of the present application, where the method includes the following steps:
s101: the server obtains a standard reference data set.
In the embodiment of the application, the server randomly acquires a piece of data from the time sequence data acquired by the acquisition device or the time sequence data pre-stored in the database as a standard reference data set. Wherein, the acquisition device can be other devices such as a sensor.
The standard reference data set is a collection of several standard reference data. The standard reference data group conforms to any statistical distribution, and can be used for judging whether the statistical distribution of the test data group is the same as that of the standard reference data group according to the statistical distribution, so as to judge whether the data drift of the test data group occurs.
The length of the standard reference data set may be set as required, which is not limited in the present application.
S102: the server obtains a test data set.
In the embodiment of the application, the server acquires the test data group from the time sequence data acquired by the acquisition device or the time sequence data stored in the database.
The test data set is a data set which needs to be judged whether data drifting occurs in the application. The test data group comprises a plurality of pieces of data to be tested. The dimensions of the data to be tested in the test data set can be set as required, which is not limited in the present application.
In one embodiment, the server may obtain the test data set and the standard reference data set having time difference from the time-series data collected by the collecting device based on a characteristic that the time-series data may change with time. And the generation time of the standard reference data group should be earlier than the generation time of the test data group in order to judge whether the test data groups belong to the same statistical distribution according to the standard reference data group of which the statistical distribution is predicted.
In one embodiment, the server may determine a window of test data prior to obtaining the set of test data. The test data window is a storage unit convenient for storing the test data set and is used for storing the test data set. Thus, the length of the test data set (i.e., the number of data to be tested included in the test data set) is the same as the length of the test data window. The length of the test data window may be set according to the length requirement of the test data group, which is not limited in the present application.
S103: and the server judges the similarity between the data to be tested and the standard reference data set and the similarity between the data to be tested and the test data set based on a nearest neighbor algorithm aiming at each data to be tested in the test data set.
In the embodiment of the application, the server judges the similarity between the selected to-be-tested data in the test data group and the standard reference data group and the test data group based on a nearest neighbor (KNN) method aiming at each to-be-tested data in the test data group in the test data window.
And comparing the piece of data with the rest data in the test data group and the data in the standard reference data group to judge the similarity of the piece of data with the standard reference data group and the test data group.
In one embodiment, the step of determining the similarity of the data to be tested to the test data set and the standard reference data set comprises:
first, the distance between the data to be tested and the remaining data in the test data set and the distance between the data to be tested and all data in the standard reference data set are calculated.
Wherein, the distance between the data to be tested and other data can be represented as the similarity between the data to be tested and the corresponding data. The closer the distance is, the higher the similarity degree of the data to be tested and the corresponding data is, and the farther the distance is, the lower the similarity degree of the data to be tested and the corresponding data is.
And secondly, sequencing the distance between the data to be tested obtained in the first step and the rest data in the test data group and the distance between the data to be tested and all data in the standard reference data group.
And thirdly, determining a preset parameter K, and selecting K pieces of data closest to the data to be tested according to the K value.
Fourthly, the similarity of the data to be tested with the standard reference data set and the test data set is judged based on the K pieces of data.
In one embodiment, the server calculates the distance of the data to be tested from each data in the standard reference data set and the distance of the data to be tested from each remaining data in the test data set based on the Euclidean distance formula.
Taking two-dimensional data as an example, the Euclidean distance formula is as follows:
Figure BDA0002609631380000061
wherein D (x, y) represents the distance between the data to be tested and the corresponding data, (x)1,y1) Coordinates representing data to be tested, (x)2,y2) Representing the coordinates of the corresponding data.
In one embodiment, when the server determines similarity between the data to be tested and the standard reference data group and the test data group based on the K pieces of data, the server may determine the number of data belonging to the standard reference data group in the K pieces of data as the first number, and determine the number of data belonging to the test data group in the K pieces of data as the second number.
If the first number is larger than the second number, it indicates that the number of data similar to the data to be tested in the standard reference data group is larger in the K pieces of data, and it can be considered that the similarity degree of the data to be tested and the standard reference data group is higher, and the data to be tested is similar to the standard reference data group.
If the first number is smaller than the second number, it indicates that the number of data similar to the data to be tested in the test data group is greater in the K pieces of data, and it can be considered that the similarity between the data to be tested and the test data group is higher, and the data to be tested is similar to the test data group.
If the first number is equal to the second number, it indicates that the number of data similar to the data to be tested in the test data group is the same as the number of data similar to the data to be tested in the standard reference data group in the K pieces of data, and it can be considered that the similarity between the data to be tested and the standard reference data group is the same as the similarity between the data to be tested and the test data group, and the similarity between the data to be tested and the standard reference data group and the test data group cannot be judged.
In one embodiment, the value of K is preferably odd. Therefore, the situation that the data quantity of the standard reference data group and the data quantity of the test data group in the first K data from the data to be tested are the same because the K value is even can be avoided, the similarity between the data to be tested and the standard reference data group as well as the similarity between the data to be tested and the test data group can not be judged under the situation, and the occurrence of uncertain factors is avoided.
For convenience of explanation, the present application will be described taking two-dimensional data as an example.
Fig. 2 is a schematic diagram of the KNN algorithm principle provided in the embodiment of the present application. As shown in fig. 2, the x-axis and the y-axis represent different dimensions of the data,
Figure BDA0002609631380000071
respectively representing a standard reference data set and a test data set,
Figure BDA0002609631380000072
the inner circles represent data in the standard reference data set,
Figure BDA0002609631380000073
the squares within represent data in the test data set and Xu represents data to be tested.
The step of judging the similarity of the data to be tested, the standard reference data set and the test data set by the server comprises the following steps:
the first step is as follows: server separately calculates XuAnd
Figure BDA0002609631380000081
the distance of all points within.
The second step is that: the server obtains X in the first stepuAnd
Figure BDA0002609631380000082
the distances of all points within the sequence are sorted based on the Euclidean distance formula.
The third step: the server selects a preset parameter K equal to 5 and selects a distance XuThe nearest 5 points, as indicated by the arrows in the figure.
The fourth step: judgment of XuAnd
Figure BDA0002609631380000083
the similarity of (c). As can be seen from FIG. 2, with XuOf the nearest 5 points, 4 data points belong to the standard reference data set
Figure BDA0002609631380000084
1 data point belonging to the test data set
Figure BDA0002609631380000085
The data to be tested is much similar to the data in the standard reference data set, and it can be determined that the data to be tested is similar to the standard reference data set.
S104: and the server judges whether the data drifting occurs in the test data group or not according to the similarity judgment result of each to-be-tested data in the test data group.
In the embodiment of the application, the server judges whether the test data group has data drift or not according to the similarity between each data to be tested in the test data group and the standard reference data group and the test data group.
In one embodiment, the server determines, as the third quantity, a quantity of data to be tested in the test data set that is similar to the standard reference data set. The server determines the number of data to be tested in the test data set similar to the test data set as a fourth number.
If the third number is larger than the fourth number, the number of the data to be tested which are similar to the standard reference data group in the test data group is larger than the number of the data to be tested which are similar to the test data group, and the statistical distribution of most of the data in the test data group is consistent with the standard reference data group, the data drift of the test data group does not occur.
If the third number is smaller than the fourth number, the number of the data to be tested which are similar to the standard reference data group in the test data group is smaller than the number of the data to be tested which are similar to the test data group, and the statistical distribution of most of the data in the test data group is inconsistent with the standard reference data group, the data drift of the test data group occurs.
If the third number is equal to the fourth number, it indicates that the number of the data to be tested in the test data group similar to the standard reference data group is equal to the number of the data to be tested in the test data group similar to the test data group, and it cannot be determined whether data drift occurs in the test data group.
In one embodiment, the number of test data sets collected by the server is preferably an odd number. Therefore, the situation that whether the test data groups have data drift or not due to the fact that the third number is equal to the fourth number when the number of the test data groups is an even number can be avoided, and the uncertain factors are avoided.
In one embodiment, if the test data group drifts, the server sends the data drift result to the corresponding edge device, so that the edge device can timely monitor the time series data with the data drift and timely perform corresponding data processing on the time series data. For example, the data model adapted to the time-series data is re-determined according to the change of the statistical distribution of the time-series data.
In the embodiment of the application, the server judges whether the test data set drifts through the KNN algorithm, and the implementation method is simple and efficient, easy to implement, easy to understand, free of parameter estimation and training and low in calculation power consumption.
The test data set is effectively supervised by designing the standard reference data set, the accuracy of judging whether the test data set has data drift is improved, and the stability and the robustness of judging whether the test data set has data drift can be improved.
And the method can be used in the edge device and is combined with the sensor, and the change of the data can be found at the first time.
Based on the same inventive idea, the data drift determination method based on the nearest neighbor method provided in the embodiment of the present application further provides a corresponding data drift determination device based on the nearest neighbor method, as shown in fig. 3.
Fig. 3 is a schematic structural diagram of a data drift determination device based on a nearest neighbor method according to an embodiment of the present application, which specifically includes:
a first obtaining module 301, configured to obtain a standard reference data set;
a second obtaining module 302, configured to obtain a test data set;
a first judging module 303, configured to judge, based on a nearest neighbor algorithm, a similarity between the data to be tested and the standard reference data set and a similarity between the data to be tested and the test data set for each data to be tested in the test data set;
the second determining module 304 is configured to determine whether data drifting occurs in the test data set according to a similarity determination result of each to-be-tested data in the test data set.
The embodiments in the present application are described in a progressive manner, and the same and similar parts among the embodiments can be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, as for the apparatus embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims (10)

1. A data drift discrimination method based on a nearest neighbor method is characterized by comprising the following steps:
the server acquires a standard reference data set;
acquiring a test data set;
for each data to be tested in the test data group, judging the similarity between the data to be tested and the standard reference data group and the similarity between the data to be tested and the test data group based on a nearest neighbor algorithm;
and judging whether the test data group has data drift or not according to the similarity judgment result of each to-be-tested data in the test data group.
2. The method according to claim 1, wherein the data drift discrimination method based on the nearest neighbor method,
the standard reference data set is generated at a time earlier than the test data set.
3. The method of claim 1, wherein before the obtaining the test data set, the method further comprises:
the server determines a test data window for storing the test data set.
4. The method for discriminating data drift based on nearest neighbor method according to claim 1, wherein for each data to be tested in the test data set, the similarity between the data to be tested and the standard reference data set and the similarity between the data to be tested and the test data set are determined based on nearest neighbor algorithm, comprising:
calculating the distance between the data to be tested and each data in the standard reference data set and the distance between the data to be tested and each remaining data in the test data set;
selecting front K pieces of data closest to the data to be tested based on the distance between the data to be tested and each piece of data in the standard reference data group and the distance between the data to be tested and each piece of remaining data in the test data group, wherein K is a preset parameter;
and judging the similarity of the data to be tested with the standard reference data set and the data to be tested based on the K pieces of data.
5. The nearest neighbor method-based data drift discrimination method as claimed in claim 4, wherein the preset parameter K is an odd number.
6. The method for discriminating data drift based on the nearest neighbor method as claimed in claim 4, wherein the determining the similarity between the data to be tested and the standard reference data set and the data to be tested based on the K pieces of data comprises:
determining the number of data belonging to the standard data group in the K pieces of data as a first number;
determining the number of data belonging to the test data group in the K pieces of data as a second number;
if the first number is greater than the second number, the data to be tested is similar to the standard reference data set;
and if the first number is smaller than the second number, the data to be tested is similar to the data group to be tested.
7. The method for judging data drift based on the nearest neighbor method as claimed in claim 1, wherein judging whether the test data set has data drift according to the result of judging the similarity of each data to be tested in the test data set comprises:
determining the number of data to be tested in the test data group similar to the standard reference data group as a third number;
determining the number of data to be tested in the test data group, which is similar to the test data group, as a fourth number;
if the third number is greater than the fourth number, the test data group has no data drift;
if the third number is less than the fourth number, data drift occurs in the test data set.
8. The method for discriminating data shift based on nearest neighbor method as claimed in claim 4, wherein calculating the distance between the data to be tested and each data in the standard reference data set and the distance between the data to be tested and each remaining data in the test data set comprises:
calculating the distance between the data to be tested and each data in the standard reference data set and the distance between the data to be tested and each remaining data in the test data set based on an Euclidean distance formula;
the Euclidean distance formula is as follows:
Figure FDA0002609631370000021
wherein D (x, y) represents the distance between the data to be tested and the corresponding data, (x)1,y1) Coordinates representing the data to be tested, (x)2,y2) Coordinates representing the respective data.
9. The method according to claim 1, wherein the method further comprises:
and if the test data group drifts, sending a data drifting result to corresponding edge equipment so that the edge equipment performs corresponding data processing on the test data group.
10. A data drift discrimination device based on a nearest neighbor method is characterized by comprising the following steps:
the first acquisition module is used for acquiring a standard reference data set;
the second acquisition module is used for acquiring the test data set;
the first judgment module is used for judging the similarity between the data to be tested and the standard reference data set and the similarity between the data to be tested and the test data set on the basis of a nearest neighbor algorithm aiming at each data to be tested in the test data set;
and the second judging module is used for judging whether the test data group has data drift or not according to the similarity judging result of each to-be-tested data in the test data group.
CN202010749770.1A 2020-07-30 2020-07-30 Data drift discrimination method and device based on nearest neighbor method Active CN112085053B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010749770.1A CN112085053B (en) 2020-07-30 2020-07-30 Data drift discrimination method and device based on nearest neighbor method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010749770.1A CN112085053B (en) 2020-07-30 2020-07-30 Data drift discrimination method and device based on nearest neighbor method

Publications (2)

Publication Number Publication Date
CN112085053A true CN112085053A (en) 2020-12-15
CN112085053B CN112085053B (en) 2022-08-26

Family

ID=73735200

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010749770.1A Active CN112085053B (en) 2020-07-30 2020-07-30 Data drift discrimination method and device based on nearest neighbor method

Country Status (1)

Country Link
CN (1) CN112085053B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170330109A1 (en) * 2016-05-16 2017-11-16 Purepredictive, Inc. Predictive drift detection and correction
CN109508733A (en) * 2018-10-23 2019-03-22 北京邮电大学 A kind of method for detecting abnormality based on distribution probability measuring similarity
CN109686400A (en) * 2018-12-14 2019-04-26 济南浪潮高新科技投资发展有限公司 A kind of enrichment degree method of inspection, device and readable medium, storage control
CN110149143A (en) * 2019-05-16 2019-08-20 广东信通通信有限公司 Test optical fiber data processing method, device, computer equipment and storage medium
CN110909813A (en) * 2019-11-29 2020-03-24 四川万益能源科技有限公司 Business abnormal electricity utilization detection method based on edge algorithm
US20200116522A1 (en) * 2018-10-15 2020-04-16 Kabushiki Kaisha Toshiba Anomaly detection apparatus and anomaly detection method
CN111143413A (en) * 2019-12-26 2020-05-12 太原科技大学 Anomaly detection method based on data flow concept drift

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170330109A1 (en) * 2016-05-16 2017-11-16 Purepredictive, Inc. Predictive drift detection and correction
US20200116522A1 (en) * 2018-10-15 2020-04-16 Kabushiki Kaisha Toshiba Anomaly detection apparatus and anomaly detection method
CN109508733A (en) * 2018-10-23 2019-03-22 北京邮电大学 A kind of method for detecting abnormality based on distribution probability measuring similarity
CN109686400A (en) * 2018-12-14 2019-04-26 济南浪潮高新科技投资发展有限公司 A kind of enrichment degree method of inspection, device and readable medium, storage control
CN110149143A (en) * 2019-05-16 2019-08-20 广东信通通信有限公司 Test optical fiber data processing method, device, computer equipment and storage medium
CN110909813A (en) * 2019-11-29 2020-03-24 四川万益能源科技有限公司 Business abnormal electricity utilization detection method based on edge algorithm
CN111143413A (en) * 2019-12-26 2020-05-12 太原科技大学 Anomaly detection method based on data flow concept drift

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
刘茂 等: "基于交叠数据窗距离测度概念漂移检测新方法", 《计算机应用》 *

Also Published As

Publication number Publication date
CN112085053B (en) 2022-08-26

Similar Documents

Publication Publication Date Title
CN106951925B (en) Data processing method, device, server and system
CN110210508B (en) Model generation method, abnormal flow detection device, electronic device and computer-readable storage medium
EP3462267A1 (en) Anomaly diagnosis method and anomaly diagnosis apparatus
Labatut et al. Evaluation of performance measures for classifiers comparison
CN103582884A (en) Robust feature matching for visual search
JP2010204966A (en) Sampling device, sampling method, sampling program, class distinction device and class distinction system
CN109949176A (en) It is a kind of based on figure insertion social networks in abnormal user detection method
CN107016416B (en) Data classification prediction method based on neighborhood rough set and PCA fusion
JP5027859B2 (en) Signal identification method and signal identification apparatus
KR101733708B1 (en) Method and system for rating measured values taken from a system
CN110995153A (en) Abnormal data detection method and device for photovoltaic power station and electronic equipment
CN110348215B (en) Abnormal object identification method, abnormal object identification device, electronic equipment and medium
CN108470194A (en) A kind of Feature Selection method and device
CN111161097A (en) Method and device for detecting switch event based on event detection algorithm of hypothesis test
Colby et al. Counterfactual Exploration for Improving Multiagent Learning.
CN111898637A (en) Feature selection algorithm based on Relieff-DDC
CN109766958B (en) A kind of data preprocessing method and device for data fusion
CN106919650A (en) A kind of textural anomaly detection method of increment parallel type Dynamic Graph
CN112085053B (en) Data drift discrimination method and device based on nearest neighbor method
CN105224954A (en) A kind of topic discover method removing the impact of little topic based on Single-pass
CN116579842B (en) Credit data analysis method and system based on user behavior data
CN115713270B (en) Method and device for detecting and correcting peer mutual evaluation abnormal scores
CN117014193A (en) Unknown Web attack detection method based on behavior baseline
CN112597699B (en) Social network rumor source identification method integrated with objective weighting method
CN110990383A (en) Similarity calculation method based on industrial big data set

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20220729

Address after: 250101 building S02, 1036 Chaochao Road, high tech Zone, Jinan City, Shandong Province

Applicant after: Shandong Inspur Scientific Research Institute Co.,Ltd.

Address before: Floor 6, Chaochao Road, Shandong Province

Applicant before: JINAN INSPUR HIGH-TECH TECHNOLOGY DEVELOPMENT Co.,Ltd.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant