CN112085053A - Data drift discrimination method and device based on nearest neighbor method - Google Patents
Data drift discrimination method and device based on nearest neighbor method Download PDFInfo
- Publication number
- CN112085053A CN112085053A CN202010749770.1A CN202010749770A CN112085053A CN 112085053 A CN112085053 A CN 112085053A CN 202010749770 A CN202010749770 A CN 202010749770A CN 112085053 A CN112085053 A CN 112085053A
- Authority
- CN
- China
- Prior art keywords
- data
- tested
- test
- test data
- standard reference
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2413—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
- G06F18/24147—Distances to closest patterns, e.g. nearest neighbour classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2474—Sequence data queries, e.g. querying versioned data
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Computational Biology (AREA)
- Fuzzy Systems (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Biology (AREA)
- Mathematical Physics (AREA)
- Probability & Statistics with Applications (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Databases & Information Systems (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
- Complex Calculations (AREA)
Abstract
The application discloses a data drift judging method and device based on a nearest neighbor method, which are used for solving the problems that a large amount of computing power is required to be consumed, the scheme is complex and the operation is difficult to realize in the existing data drift judging algorithm. The method comprises the following steps: the server acquires a standard reference data set; the server acquires a test data set; the server judges the similarity between the data to be tested and the standard reference data set and the similarity between the data to be tested and the test data set based on a nearest neighbor algorithm aiming at each data to be tested in the test data set; and the server judges whether the test data group has data drift or not according to the similarity judgment result of each to-be-tested data in the test data group.
Description
Technical Field
The present invention relates to the field of concept drift, and in particular, to a data drift determination method and apparatus based on a nearest neighbor method.
Background
With the popularization and development of network application, data of various industries are continuously generated in a data stream mode, and the data have the characteristics of mass and rapid change. For example, in the industrial field, sensors need to constantly collect new data; in the e-commerce field, merchants need to continuously acquire behavior data of users.
For the same subject, data acquired at different times are referred to as time series data, which can be used to describe the time-varying condition of the subject. However, in many areas, the data distribution may change unpredictably over time, resulting in data drift that may render existing data models inapplicable to new data. Therefore, in order to select an appropriate data model, a data analyst needs to determine whether there is data drift in the data.
At present, an algorithm for judging whether data drift occurs exists, and the algorithm is a three-branch decision tree concept algorithm. In the detection process, the training data is classified by using a decision tree, and then the training data are classified into an L domain, an R domain and an M domain of three decisions according to the classification error rate of each subtree. The L domain, the R domain and the M domain respectively represent that data do not drift, data drift and data drift possibly.
However, the existing algorithms for judging data drift, including the three-branch decision tree concept algorithm, often have the problems of large consumption of computing power, complex scheme and difficult operation.
Disclosure of Invention
The embodiment of the application provides a data drift judging method and device based on a nearest neighbor method, and aims to solve the problems of large calculation amount, complexity and impracticality of the existing data drift judging method.
In one aspect, an embodiment of the present application provides a data drift discrimination method based on a nearest neighbor method, where the method includes:
the server acquires a standard reference data set;
the server acquires a test data set;
the server judges the similarity between the data to be tested and the standard reference data set and the similarity between the data to be tested and the test data set based on a nearest neighbor algorithm aiming at each data to be tested in the test data set;
and the server judges whether the test data group has data drift or not according to the similarity judgment result of each to-be-tested data in the test data group.
In one example, the standard reference data set is generated at an earlier time than the test data set.
In one example, before the server obtains the test data set, the method further comprises: the server determines a test data window for storing the test data set.
In one example, the server determines, for each data to be tested in the test data set, similarity between the data to be tested and the standard reference data set and similarity between the data to be tested and the test data set based on a nearest neighbor algorithm, including: the server calculates the distance between the data to be tested and each data in the standard reference data set and the distance between the data to be tested and each remaining data in the test data set; selecting front K pieces of data closest to the data to be tested based on the distance between the data to be tested and each piece of data in the standard reference data group and the distance between the data to be tested and each piece of remaining data in the test data group, wherein K is a preset parameter; and judging the similarity of the data to be tested with the standard reference data set and the data to be tested based on the K pieces of data.
In one example, the preset parameter K is an odd number.
In one example, the server determines similarity of the data to be tested with the standard reference data set and the data to be tested based on the K pieces of data, including: determining the number of data belonging to the standard data group in the K pieces of data as a first number; determining the number of data belonging to the test data group in the K pieces of data as a second number; if the first number is greater than the second number, the data to be tested is similar to the standard reference data set; and if the first number is smaller than the second number, the data to be tested is similar to the data group to be tested.
In one example, the determining, by the server, whether data drifting occurs in the test data group according to a result of determining similarity of each piece of data to be tested in the test data group includes: determining the number of data to be tested in the test data group similar to the standard reference data group as a third number; determining the number of data to be tested in the test data group, which is similar to the test data group, as a fourth number; if the third number is greater than the fourth number, the test data group has no data drift; and if the third quantity is smaller than the fourth quantity, the test data has data drift.
In one example, the server calculating the distance of the data to be tested from each data in the standard reference data set and the distance of the data to be tested from each remaining data in the test data set comprises: calculating the distance between the data to be tested and each data in the standard reference data set and the distance between the data to be tested and each remaining data in the test data set based on an Euclidean distance formula; the Euclidean distance formula is as follows:wherein D (x, y) represents the distance between the data to be tested and the corresponding data, (x)1,y1) Coordinates representing the data to be tested, (x)2,y2) Coordinates representing the respective data.
In one example, the method further comprises: and if the test data group drifts, sending a data drifting result to corresponding edge equipment so that the edge equipment performs corresponding data processing on the test data group.
On the other hand, an embodiment of the present application further provides a data drift determination device based on a nearest neighbor method, where the device includes:
the first acquisition module is used for acquiring a standard reference data set;
the second acquisition module is used for acquiring the test data set;
the first judgment module is used for judging the similarity between the data to be tested and the standard reference data set and the similarity between the data to be tested and the test data set on the basis of a nearest neighbor algorithm aiming at each data to be tested in the test data set;
and the second judging module is used for judging whether the test data group has data drift or not according to the similarity judging result of each to-be-tested data in the test data group.
The data drift distinguishing method and device based on the nearest neighbor method provided by the embodiment of the application at least have the following beneficial effects: whether the test data set drifts or not is judged through the KNN algorithm, the implementation method is simple and efficient, the comprehension is easy, parameters do not need to be estimated, and the calculation power consumption is low. The design of the standard reference data set can increase the stability and robustness for judging whether the test data set has data drift. Meanwhile, the method can be used in edge equipment and combined with a sensor, can find the change of data at the first time and carry out corresponding data processing on the data in time.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
fig. 1 is a flowchart of a data drift determination method based on a nearest neighbor method according to an embodiment of the present application;
fig. 2 is a schematic diagram of the KNN algorithm provided in the embodiment of the present application;
fig. 3 is a schematic structural diagram of a data drift determination device based on a nearest neighbor method according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be described in detail and completely with reference to the following specific embodiments of the present application and the accompanying drawings. It should be apparent that the described embodiments are only some of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The technical solutions proposed in the embodiments of the present application are described in detail below with reference to the accompanying drawings.
Fig. 1 is a flowchart of a data drift determination method based on a nearest neighbor method according to an embodiment of the present application, where the method includes the following steps:
s101: the server obtains a standard reference data set.
In the embodiment of the application, the server randomly acquires a piece of data from the time sequence data acquired by the acquisition device or the time sequence data pre-stored in the database as a standard reference data set. Wherein, the acquisition device can be other devices such as a sensor.
The standard reference data set is a collection of several standard reference data. The standard reference data group conforms to any statistical distribution, and can be used for judging whether the statistical distribution of the test data group is the same as that of the standard reference data group according to the statistical distribution, so as to judge whether the data drift of the test data group occurs.
The length of the standard reference data set may be set as required, which is not limited in the present application.
S102: the server obtains a test data set.
In the embodiment of the application, the server acquires the test data group from the time sequence data acquired by the acquisition device or the time sequence data stored in the database.
The test data set is a data set which needs to be judged whether data drifting occurs in the application. The test data group comprises a plurality of pieces of data to be tested. The dimensions of the data to be tested in the test data set can be set as required, which is not limited in the present application.
In one embodiment, the server may obtain the test data set and the standard reference data set having time difference from the time-series data collected by the collecting device based on a characteristic that the time-series data may change with time. And the generation time of the standard reference data group should be earlier than the generation time of the test data group in order to judge whether the test data groups belong to the same statistical distribution according to the standard reference data group of which the statistical distribution is predicted.
In one embodiment, the server may determine a window of test data prior to obtaining the set of test data. The test data window is a storage unit convenient for storing the test data set and is used for storing the test data set. Thus, the length of the test data set (i.e., the number of data to be tested included in the test data set) is the same as the length of the test data window. The length of the test data window may be set according to the length requirement of the test data group, which is not limited in the present application.
S103: and the server judges the similarity between the data to be tested and the standard reference data set and the similarity between the data to be tested and the test data set based on a nearest neighbor algorithm aiming at each data to be tested in the test data set.
In the embodiment of the application, the server judges the similarity between the selected to-be-tested data in the test data group and the standard reference data group and the test data group based on a nearest neighbor (KNN) method aiming at each to-be-tested data in the test data group in the test data window.
And comparing the piece of data with the rest data in the test data group and the data in the standard reference data group to judge the similarity of the piece of data with the standard reference data group and the test data group.
In one embodiment, the step of determining the similarity of the data to be tested to the test data set and the standard reference data set comprises:
first, the distance between the data to be tested and the remaining data in the test data set and the distance between the data to be tested and all data in the standard reference data set are calculated.
Wherein, the distance between the data to be tested and other data can be represented as the similarity between the data to be tested and the corresponding data. The closer the distance is, the higher the similarity degree of the data to be tested and the corresponding data is, and the farther the distance is, the lower the similarity degree of the data to be tested and the corresponding data is.
And secondly, sequencing the distance between the data to be tested obtained in the first step and the rest data in the test data group and the distance between the data to be tested and all data in the standard reference data group.
And thirdly, determining a preset parameter K, and selecting K pieces of data closest to the data to be tested according to the K value.
Fourthly, the similarity of the data to be tested with the standard reference data set and the test data set is judged based on the K pieces of data.
In one embodiment, the server calculates the distance of the data to be tested from each data in the standard reference data set and the distance of the data to be tested from each remaining data in the test data set based on the Euclidean distance formula.
Taking two-dimensional data as an example, the Euclidean distance formula is as follows:
wherein D (x, y) represents the distance between the data to be tested and the corresponding data, (x)1,y1) Coordinates representing data to be tested, (x)2,y2) Representing the coordinates of the corresponding data.
In one embodiment, when the server determines similarity between the data to be tested and the standard reference data group and the test data group based on the K pieces of data, the server may determine the number of data belonging to the standard reference data group in the K pieces of data as the first number, and determine the number of data belonging to the test data group in the K pieces of data as the second number.
If the first number is larger than the second number, it indicates that the number of data similar to the data to be tested in the standard reference data group is larger in the K pieces of data, and it can be considered that the similarity degree of the data to be tested and the standard reference data group is higher, and the data to be tested is similar to the standard reference data group.
If the first number is smaller than the second number, it indicates that the number of data similar to the data to be tested in the test data group is greater in the K pieces of data, and it can be considered that the similarity between the data to be tested and the test data group is higher, and the data to be tested is similar to the test data group.
If the first number is equal to the second number, it indicates that the number of data similar to the data to be tested in the test data group is the same as the number of data similar to the data to be tested in the standard reference data group in the K pieces of data, and it can be considered that the similarity between the data to be tested and the standard reference data group is the same as the similarity between the data to be tested and the test data group, and the similarity between the data to be tested and the standard reference data group and the test data group cannot be judged.
In one embodiment, the value of K is preferably odd. Therefore, the situation that the data quantity of the standard reference data group and the data quantity of the test data group in the first K data from the data to be tested are the same because the K value is even can be avoided, the similarity between the data to be tested and the standard reference data group as well as the similarity between the data to be tested and the test data group can not be judged under the situation, and the occurrence of uncertain factors is avoided.
For convenience of explanation, the present application will be described taking two-dimensional data as an example.
Fig. 2 is a schematic diagram of the KNN algorithm principle provided in the embodiment of the present application. As shown in fig. 2, the x-axis and the y-axis represent different dimensions of the data,respectively representing a standard reference data set and a test data set,the inner circles represent data in the standard reference data set,the squares within represent data in the test data set and Xu represents data to be tested.
The step of judging the similarity of the data to be tested, the standard reference data set and the test data set by the server comprises the following steps:
The second step is that: the server obtains X in the first stepuAndthe distances of all points within the sequence are sorted based on the Euclidean distance formula.
The third step: the server selects a preset parameter K equal to 5 and selects a distance XuThe nearest 5 points, as indicated by the arrows in the figure.
The fourth step: judgment of XuAndthe similarity of (c). As can be seen from FIG. 2, with XuOf the nearest 5 points, 4 data points belong to the standard reference data set1 data point belonging to the test data setThe data to be tested is much similar to the data in the standard reference data set, and it can be determined that the data to be tested is similar to the standard reference data set.
S104: and the server judges whether the data drifting occurs in the test data group or not according to the similarity judgment result of each to-be-tested data in the test data group.
In the embodiment of the application, the server judges whether the test data group has data drift or not according to the similarity between each data to be tested in the test data group and the standard reference data group and the test data group.
In one embodiment, the server determines, as the third quantity, a quantity of data to be tested in the test data set that is similar to the standard reference data set. The server determines the number of data to be tested in the test data set similar to the test data set as a fourth number.
If the third number is larger than the fourth number, the number of the data to be tested which are similar to the standard reference data group in the test data group is larger than the number of the data to be tested which are similar to the test data group, and the statistical distribution of most of the data in the test data group is consistent with the standard reference data group, the data drift of the test data group does not occur.
If the third number is smaller than the fourth number, the number of the data to be tested which are similar to the standard reference data group in the test data group is smaller than the number of the data to be tested which are similar to the test data group, and the statistical distribution of most of the data in the test data group is inconsistent with the standard reference data group, the data drift of the test data group occurs.
If the third number is equal to the fourth number, it indicates that the number of the data to be tested in the test data group similar to the standard reference data group is equal to the number of the data to be tested in the test data group similar to the test data group, and it cannot be determined whether data drift occurs in the test data group.
In one embodiment, the number of test data sets collected by the server is preferably an odd number. Therefore, the situation that whether the test data groups have data drift or not due to the fact that the third number is equal to the fourth number when the number of the test data groups is an even number can be avoided, and the uncertain factors are avoided.
In one embodiment, if the test data group drifts, the server sends the data drift result to the corresponding edge device, so that the edge device can timely monitor the time series data with the data drift and timely perform corresponding data processing on the time series data. For example, the data model adapted to the time-series data is re-determined according to the change of the statistical distribution of the time-series data.
In the embodiment of the application, the server judges whether the test data set drifts through the KNN algorithm, and the implementation method is simple and efficient, easy to implement, easy to understand, free of parameter estimation and training and low in calculation power consumption.
The test data set is effectively supervised by designing the standard reference data set, the accuracy of judging whether the test data set has data drift is improved, and the stability and the robustness of judging whether the test data set has data drift can be improved.
And the method can be used in the edge device and is combined with the sensor, and the change of the data can be found at the first time.
Based on the same inventive idea, the data drift determination method based on the nearest neighbor method provided in the embodiment of the present application further provides a corresponding data drift determination device based on the nearest neighbor method, as shown in fig. 3.
Fig. 3 is a schematic structural diagram of a data drift determination device based on a nearest neighbor method according to an embodiment of the present application, which specifically includes:
a first obtaining module 301, configured to obtain a standard reference data set;
a second obtaining module 302, configured to obtain a test data set;
a first judging module 303, configured to judge, based on a nearest neighbor algorithm, a similarity between the data to be tested and the standard reference data set and a similarity between the data to be tested and the test data set for each data to be tested in the test data set;
the second determining module 304 is configured to determine whether data drifting occurs in the test data set according to a similarity determination result of each to-be-tested data in the test data set.
The embodiments in the present application are described in a progressive manner, and the same and similar parts among the embodiments can be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, as for the apparatus embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.
Claims (10)
1. A data drift discrimination method based on a nearest neighbor method is characterized by comprising the following steps:
the server acquires a standard reference data set;
acquiring a test data set;
for each data to be tested in the test data group, judging the similarity between the data to be tested and the standard reference data group and the similarity between the data to be tested and the test data group based on a nearest neighbor algorithm;
and judging whether the test data group has data drift or not according to the similarity judgment result of each to-be-tested data in the test data group.
2. The method according to claim 1, wherein the data drift discrimination method based on the nearest neighbor method,
the standard reference data set is generated at a time earlier than the test data set.
3. The method of claim 1, wherein before the obtaining the test data set, the method further comprises:
the server determines a test data window for storing the test data set.
4. The method for discriminating data drift based on nearest neighbor method according to claim 1, wherein for each data to be tested in the test data set, the similarity between the data to be tested and the standard reference data set and the similarity between the data to be tested and the test data set are determined based on nearest neighbor algorithm, comprising:
calculating the distance between the data to be tested and each data in the standard reference data set and the distance between the data to be tested and each remaining data in the test data set;
selecting front K pieces of data closest to the data to be tested based on the distance between the data to be tested and each piece of data in the standard reference data group and the distance between the data to be tested and each piece of remaining data in the test data group, wherein K is a preset parameter;
and judging the similarity of the data to be tested with the standard reference data set and the data to be tested based on the K pieces of data.
5. The nearest neighbor method-based data drift discrimination method as claimed in claim 4, wherein the preset parameter K is an odd number.
6. The method for discriminating data drift based on the nearest neighbor method as claimed in claim 4, wherein the determining the similarity between the data to be tested and the standard reference data set and the data to be tested based on the K pieces of data comprises:
determining the number of data belonging to the standard data group in the K pieces of data as a first number;
determining the number of data belonging to the test data group in the K pieces of data as a second number;
if the first number is greater than the second number, the data to be tested is similar to the standard reference data set;
and if the first number is smaller than the second number, the data to be tested is similar to the data group to be tested.
7. The method for judging data drift based on the nearest neighbor method as claimed in claim 1, wherein judging whether the test data set has data drift according to the result of judging the similarity of each data to be tested in the test data set comprises:
determining the number of data to be tested in the test data group similar to the standard reference data group as a third number;
determining the number of data to be tested in the test data group, which is similar to the test data group, as a fourth number;
if the third number is greater than the fourth number, the test data group has no data drift;
if the third number is less than the fourth number, data drift occurs in the test data set.
8. The method for discriminating data shift based on nearest neighbor method as claimed in claim 4, wherein calculating the distance between the data to be tested and each data in the standard reference data set and the distance between the data to be tested and each remaining data in the test data set comprises:
calculating the distance between the data to be tested and each data in the standard reference data set and the distance between the data to be tested and each remaining data in the test data set based on an Euclidean distance formula;
wherein D (x, y) represents the distance between the data to be tested and the corresponding data, (x)1,y1) Coordinates representing the data to be tested, (x)2,y2) Coordinates representing the respective data.
9. The method according to claim 1, wherein the method further comprises:
and if the test data group drifts, sending a data drifting result to corresponding edge equipment so that the edge equipment performs corresponding data processing on the test data group.
10. A data drift discrimination device based on a nearest neighbor method is characterized by comprising the following steps:
the first acquisition module is used for acquiring a standard reference data set;
the second acquisition module is used for acquiring the test data set;
the first judgment module is used for judging the similarity between the data to be tested and the standard reference data set and the similarity between the data to be tested and the test data set on the basis of a nearest neighbor algorithm aiming at each data to be tested in the test data set;
and the second judging module is used for judging whether the test data group has data drift or not according to the similarity judging result of each to-be-tested data in the test data group.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010749770.1A CN112085053B (en) | 2020-07-30 | 2020-07-30 | Data drift discrimination method and device based on nearest neighbor method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010749770.1A CN112085053B (en) | 2020-07-30 | 2020-07-30 | Data drift discrimination method and device based on nearest neighbor method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112085053A true CN112085053A (en) | 2020-12-15 |
CN112085053B CN112085053B (en) | 2022-08-26 |
Family
ID=73735200
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010749770.1A Active CN112085053B (en) | 2020-07-30 | 2020-07-30 | Data drift discrimination method and device based on nearest neighbor method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112085053B (en) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170330109A1 (en) * | 2016-05-16 | 2017-11-16 | Purepredictive, Inc. | Predictive drift detection and correction |
CN109508733A (en) * | 2018-10-23 | 2019-03-22 | 北京邮电大学 | A kind of method for detecting abnormality based on distribution probability measuring similarity |
CN109686400A (en) * | 2018-12-14 | 2019-04-26 | 济南浪潮高新科技投资发展有限公司 | A kind of enrichment degree method of inspection, device and readable medium, storage control |
CN110149143A (en) * | 2019-05-16 | 2019-08-20 | 广东信通通信有限公司 | Test optical fiber data processing method, device, computer equipment and storage medium |
CN110909813A (en) * | 2019-11-29 | 2020-03-24 | 四川万益能源科技有限公司 | Business abnormal electricity utilization detection method based on edge algorithm |
US20200116522A1 (en) * | 2018-10-15 | 2020-04-16 | Kabushiki Kaisha Toshiba | Anomaly detection apparatus and anomaly detection method |
CN111143413A (en) * | 2019-12-26 | 2020-05-12 | 太原科技大学 | Anomaly detection method based on data flow concept drift |
-
2020
- 2020-07-30 CN CN202010749770.1A patent/CN112085053B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170330109A1 (en) * | 2016-05-16 | 2017-11-16 | Purepredictive, Inc. | Predictive drift detection and correction |
US20200116522A1 (en) * | 2018-10-15 | 2020-04-16 | Kabushiki Kaisha Toshiba | Anomaly detection apparatus and anomaly detection method |
CN109508733A (en) * | 2018-10-23 | 2019-03-22 | 北京邮电大学 | A kind of method for detecting abnormality based on distribution probability measuring similarity |
CN109686400A (en) * | 2018-12-14 | 2019-04-26 | 济南浪潮高新科技投资发展有限公司 | A kind of enrichment degree method of inspection, device and readable medium, storage control |
CN110149143A (en) * | 2019-05-16 | 2019-08-20 | 广东信通通信有限公司 | Test optical fiber data processing method, device, computer equipment and storage medium |
CN110909813A (en) * | 2019-11-29 | 2020-03-24 | 四川万益能源科技有限公司 | Business abnormal electricity utilization detection method based on edge algorithm |
CN111143413A (en) * | 2019-12-26 | 2020-05-12 | 太原科技大学 | Anomaly detection method based on data flow concept drift |
Non-Patent Citations (1)
Title |
---|
刘茂 等: "基于交叠数据窗距离测度概念漂移检测新方法", 《计算机应用》 * |
Also Published As
Publication number | Publication date |
---|---|
CN112085053B (en) | 2022-08-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106951925B (en) | Data processing method, device, server and system | |
CN110210508B (en) | Model generation method, abnormal flow detection device, electronic device and computer-readable storage medium | |
EP3462267A1 (en) | Anomaly diagnosis method and anomaly diagnosis apparatus | |
Labatut et al. | Evaluation of performance measures for classifiers comparison | |
CN103582884A (en) | Robust feature matching for visual search | |
JP2010204966A (en) | Sampling device, sampling method, sampling program, class distinction device and class distinction system | |
CN109949176A (en) | It is a kind of based on figure insertion social networks in abnormal user detection method | |
CN107016416B (en) | Data classification prediction method based on neighborhood rough set and PCA fusion | |
JP5027859B2 (en) | Signal identification method and signal identification apparatus | |
KR101733708B1 (en) | Method and system for rating measured values taken from a system | |
CN110995153A (en) | Abnormal data detection method and device for photovoltaic power station and electronic equipment | |
CN110348215B (en) | Abnormal object identification method, abnormal object identification device, electronic equipment and medium | |
CN108470194A (en) | A kind of Feature Selection method and device | |
CN111161097A (en) | Method and device for detecting switch event based on event detection algorithm of hypothesis test | |
Colby et al. | Counterfactual Exploration for Improving Multiagent Learning. | |
CN111898637A (en) | Feature selection algorithm based on Relieff-DDC | |
CN109766958B (en) | A kind of data preprocessing method and device for data fusion | |
CN106919650A (en) | A kind of textural anomaly detection method of increment parallel type Dynamic Graph | |
CN112085053B (en) | Data drift discrimination method and device based on nearest neighbor method | |
CN105224954A (en) | A kind of topic discover method removing the impact of little topic based on Single-pass | |
CN116579842B (en) | Credit data analysis method and system based on user behavior data | |
CN115713270B (en) | Method and device for detecting and correcting peer mutual evaluation abnormal scores | |
CN117014193A (en) | Unknown Web attack detection method based on behavior baseline | |
CN112597699B (en) | Social network rumor source identification method integrated with objective weighting method | |
CN110990383A (en) | Similarity calculation method based on industrial big data set |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20220729 Address after: 250101 building S02, 1036 Chaochao Road, high tech Zone, Jinan City, Shandong Province Applicant after: Shandong Inspur Scientific Research Institute Co.,Ltd. Address before: Floor 6, Chaochao Road, Shandong Province Applicant before: JINAN INSPUR HIGH-TECH TECHNOLOGY DEVELOPMENT Co.,Ltd. |
|
TA01 | Transfer of patent application right | ||
GR01 | Patent grant | ||
GR01 | Patent grant |