CN112286951A - Data detection method and device - Google Patents
Data detection method and device Download PDFInfo
- Publication number
- CN112286951A CN112286951A CN202011350055.7A CN202011350055A CN112286951A CN 112286951 A CN112286951 A CN 112286951A CN 202011350055 A CN202011350055 A CN 202011350055A CN 112286951 A CN112286951 A CN 112286951A
- Authority
- CN
- China
- Prior art keywords
- data
- detection
- target
- detection window
- current
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 493
- 230000002159 abnormal effect Effects 0.000 claims abstract description 65
- 238000000034 method Methods 0.000 claims description 23
- 230000008569 process Effects 0.000 claims description 8
- 238000010586 diagram Methods 0.000 description 6
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 4
- 238000012790 confirmation Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 230000005856 abnormality Effects 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/23—Updating
- G06F16/2358—Change logging, detection, and notification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/252—Integrating or interfacing systems involving database management systems between a Database Management System and a front-end application
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/254—Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
Abstract
The embodiment of the invention provides a data detection method and device. The embodiment of the invention adopts the detection windows to traverse the current data stream by the preset step length to obtain the detection results of the target data in the current data stream in a plurality of detection windows; the preset step length is smaller than the width of the detection window, whether the target data is abnormal or not is determined according to the detection results of the target data in the detection windows, misjudgment caused by the fact that the data are located at the boundary of the detection windows can be effectively avoided, the stability of data abnormal detection is improved, and the false alarm rate is reduced.
Description
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a data detection method and apparatus.
Background
In industry, it is often necessary to periodically collect data from an inspection object by a collection device. However, due to the influence of the collection equipment factor, the environmental factor or the human factor, data with quality problems may exist in the collected data, and such data is referred to as abnormal data herein.
In order to improve the data quality of the collected data, abnormal data needs to be found out from the collected data and then sent to manual further confirmation. If the abnormal data is confirmed to be abnormal data manually, processing the abnormal data according to a preset abnormal data processing strategy; if the data is not abnormal data after manual confirmation, the data is replaced into the data sequence of the collected data.
Disclosure of Invention
In order to overcome the problems in the related art, the invention provides a data detection method and a data detection device.
According to a first aspect of the embodiments of the present invention, there is provided a data detection method, including:
traversing the current data stream by adopting the detection windows according to a preset step length, and acquiring detection results of target data in the current data stream in a plurality of detection windows; the preset step length is smaller than the width of the detection window;
and determining whether the target data is abnormal or not according to the detection results of the target data in a plurality of detection windows.
According to a second aspect of embodiments of the present invention, there is provided a data detection apparatus, including:
the acquisition module is used for traversing the current data stream by adopting the detection windows in a preset step length to acquire the detection results of the target data in the current data stream in the plurality of detection windows; the preset step length is smaller than the width of the detection window;
and the determining module is used for determining whether the target data is abnormal or not according to the detection results of the target data in the plurality of detection windows.
The technical scheme provided by the embodiment of the invention has the following beneficial effects:
in the embodiment of the invention, the detection windows are adopted to traverse the current data stream by the preset step length, and the detection results of the target data in the current data stream in a plurality of detection windows are obtained; the preset step length is smaller than the width of the detection window, whether the target data is abnormal or not is determined according to the detection results of the target data in the detection windows, misjudgment caused by the fact that the data are located at the boundary of the detection windows can be effectively avoided, the stability of data abnormal detection is improved, and the false alarm rate is reduced.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the specification.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present specification and together with the description, serve to explain the principles of the specification.
Fig. 1 is a flowchart illustrating a data detection method according to an embodiment of the present invention.
Fig. 2 is a diagram illustrating an example of a sliding process of a sliding window.
Fig. 3 is a functional block diagram of a data detection apparatus according to an embodiment of the present invention.
Fig. 4 is a hardware structure diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of embodiments of the invention, as detailed in the following claims.
The terminology used in the embodiments of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of embodiments of the invention. As used in the examples of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It should be understood that although the terms first, second, third, etc. may be used to describe various information in embodiments of the present invention, the information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, the first information may also be referred to as second information, and similarly, the second information may also be referred to as first information, without departing from the scope of embodiments of the present invention. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.
The data detection method provided by the present invention is described in detail below with reference to examples.
Fig. 1 is a flowchart illustrating a data detection method according to an embodiment of the present invention. As shown in fig. 1, the data detection method may include:
s101, traversing the current data stream by adopting a detection window in a preset step length to obtain detection results of target data in the current data stream in a plurality of detection windows; the preset step length is smaller than the width of the detection window.
S102, determining whether the target data are abnormal or not according to the detection results of the target data in a plurality of detection windows.
In this embodiment, the current data stream is a current data stream of the acquired data of the detection target.
In application, the acquisition device acquires data of a detection target at fixed time intervals, so that the time intervals between two adjacent data in the current data stream are equal. The width of the detection window may be equal to an integer multiple of the time interval of the data in the current data stream.
In this embodiment, the preset step size may be equal to an integer multiple of the time interval of the data in the current data stream, for example, the minimum step size may be equal to one time interval.
In application, the value of the preset step length can be determined according to application requirements. The smaller the preset step length is, the more the detection times are, the higher the accuracy of the detection result is, but the more the calculation resources are needed; the larger the preset step length is, the fewer the detection times are, the smaller the required computing resource is, but the accuracy of the detection result is correspondingly reduced.
In this embodiment, each time the detection window slides by one step, the data in the detection window is detected once. As the detection window slides, the position of the data in the detection window changes. Fig. 2 is a diagram illustrating an example of a sliding process of a sliding window. As shown in fig. 2, assuming that the step size of the sliding of the detection window is equal to a time interval, in the graph (a) of fig. 2, the data at the time T is at the boundary of the detection window, and the sliding of the detection window is further slid by 2 steps on the basis of the graph (a) to reach the position shown in the graph (b) of fig. 2, and the data at the time T is at the middle of the detection window.
When the data in the detection window is detected, the data in the whole detection window usually reflects the rule of the synchronous data, and whether each piece of data in the detection window is normal can be detected according to the rule. When the data is located at the boundary of the detection window, the rule of the data segment where the data is located cannot be completely embodied in the detection window due to the fact that the front data or the rear data of the data are not located in the detection window, and therefore misjudgment is prone to occur during detection, and abnormal data are caused. When the data is positioned at other positions except the boundary in the detection window, the front data and the rear data of the data are both in the detection window, so that the rule of the data segment where the data is positioned can be completely embodied in the detection window, and the detection accuracy is high.
If the detection is performed according to a fixed detection window (that means that each piece of data only exists in one detection window), and each piece of data is detected only once, the data at the boundary of each detection window may be detected incorrectly due to misjudgment, so that the actual normal data is judged to be abnormal data.
In this embodiment, since the step length of the sliding of the detection window is smaller than the width of the detection window, each piece of data can be detected at least twice, so that the detection results of the data located at other positions in the detection window except for the boundary can be obtained at a high probability.
For example, in fig. 2, when the step size of the sliding of the detection windows is equal to one time interval, there are 5 detection windows covering the T-time data, and in each of the 5 detection windows, the T-time data is detected once, so that the detection result of the T-time data in the 5 detection windows can be obtained.
In this embodiment, whether the target data is abnormal is determined not only according to the primary detection result of the target data, but also according to the multiple detection results of the target data in the multiple detection windows, so that misjudgment caused by the data being at the boundary of the detection windows can be effectively avoided, the stability of data abnormality detection is improved, and the false alarm rate is reduced.
In one example, obtaining detection results of target data in a current data stream in a plurality of detection windows may include:
determining a target detection window, wherein the target detection window covers the target data;
and determining the detection result of the target data in the target detection window according to the first data in the current data stream in the target detection window.
In this embodiment, the detection result of the target data in the detection window is determined according to the data in the same detection window as the target data, and the target data can be detected according to the rule of the data segment in which the target data is located, so as to obtain an accurate data detection result.
In this embodiment, the policy for detecting the target data based on the first data may be determined according to a specific application scenario, which is not limited in this embodiment.
In some cases, a piece of data may be determined as abnormal data due to a large difference from the data in the same detection window, but not abnormal data in the view of the history.
For example, sales data for a day of a year for a business may be shown in a surge state compared to past and future sales data, but the business may be promoted each year for that day, and thus the surge data may appear abnormal when viewed in the current data stream of the current day, but normal when viewed in the current historical data of the same day.
Therefore, in this embodiment, it may also be detected whether the target data is abnormal based on the current data stream where the target data is located and the historical data stream that is in the same period as the target data.
Based on this, in one example, obtaining detection results of the target data in the current data stream in a plurality of detection windows may include:
determining a target detection window, wherein the target detection window covers the target data;
and determining the detection result of the target data in the target detection window according to the first data in the current data stream in the target detection window and the second data in the historical data stream in the target detection window.
In this embodiment, after the current data stream and the historical data stream are time-aligned, the current data stream and the historical data stream may be traversed simultaneously by using a detection window with a preset step length. The first data is within the same detection window as the second data, which is a contemporaneous history of the first data.
For example, the current data stream is the actual water supply of a water supply enterprise per day in 2019 for the year, and the historical data stream is the actual water supply of the water supply enterprise per day in 2018 for the year. Assuming that the width of the detection window is 30 days, and the target data is 2019, 10 and 15 days, when the date covered by the detection window is 10, 1 and 30 days, the first data is 2019, 10, 1 and 30 days, 2019, 10 and 30 days, and the second data is 2018, 10, 1 and 30 days, 2018, 10 and 30 months.
The detection result of the target data is determined based on the current data stream and the historical data stream, the correlation between the current data rule and the historical synchronous data rule is considered, the stability of data detection can be further improved, and the misjudgment rate is reduced.
In one example, determining a detection result of the target data in the target detection window according to first data in a current data stream within the target detection window and second data in a historical data stream within the target detection window may include:
if the detection result of the target data in the target detection window determined according to the first data and the detection result of the target data in the target detection window determined according to the second data both indicate that the target data is abnormal, determining that the detection result of the target data in the target detection window indicates that the target data is abnormal;
and if at least one of the detection result of the target data in the target detection window determined according to the first data and the detection result of the target data in the target detection window determined according to the second data indicates that the target data is normal, determining that the detection result of the target data in the target detection window indicates that the target data is normal.
In this embodiment, when at least one of the detection result based on the current data and the detection result based on the historical data indicates that the target data is normal, it is determined that the target data is normal, so that the stability of data detection can be improved, and the false alarm rate can be reduced.
In order to ensure the reliability of detection, data in the detection window needs to reach a certain magnitude, and under the condition that the data is sparse, the detection is possibly inaccurate because the data amount in the detection window does not reach the magnitude requirement.
In one example, traversing the current data stream with a detection window in a preset step size includes:
in the traversing process, determining whether the data volume in the current detection window is larger than or equal to a preset data volume threshold value;
if not, increasing the width of the current detection window; and if so, keeping the width of the current detection window unchanged.
In the embodiment, when the data volume in the current detection window is less, the data volume in the current detection window can be increased by increasing the width of the current detection window, so that the detection accuracy is improved.
In this embodiment, the data amount threshold may be set according to a specific application scenario.
Wherein, the increasing amplitude of the width of the detection window can be determined according to application requirements.
In one example, increasing the width of the current detection window may include:
increasing the width of the current detection window to a first width so that the data amount in the current detection window is equal to the data amount threshold value; the first width is less than or equal to a preset width threshold.
In this embodiment, the principle of increasing the width of the detection window is as follows: and under the condition that the data volume in the detection window meets the magnitude requirement required by the detection precision, the detection window is made as small as possible. Therefore, the detection calculation amount of each detection window is reduced as much as possible, and the requirement on calculation resources is reduced.
In one example, increasing the width of the current detection window may include:
and if the width of the current detection window is increased to a preset width threshold, the data volume in the current detection window is still smaller than the data volume threshold, and the width of the current detection window is kept equal to the width threshold.
The width of the detection window needs to be adjusted within a certain range, and the width of the detection window cannot be adjusted infinitely. In this embodiment, the adjustment range of the detection window width is limited by the width threshold, and if the detection window width has been adjusted to the upper limit of the adjustment range (i.e., the width threshold), the data amount in the detection window still does not reach the minimum level requirement, the detection window width is not increased any more.
And, for data within the detection window whose detection window width has been equal to the maximum width threshold but whose amount of data within the detection window is still less than the aforementioned data amount threshold, no data detection is performed, and the data can be marked as state unknown. So as to detect the data with unknown state by other detection modes or abandon the detection according to the requirement.
In one example, determining whether the target data is abnormal according to the detection result of the target data in a plurality of detection windows may include:
if the detection results of the target data in the multiple detection windows indicate that the target data are abnormal, determining that the target data are abnormal data;
and if at least one detection result of the target data in the detection results of the plurality of detection windows indicates that the target data is normal, determining that the target data is normal data.
In the embodiment, as long as one detection result in the detection results of the plurality of detection windows indicates that the target data is normal, the target data is determined to be normal data, so that the stability of data detection is effectively improved, and the false alarm rate is reduced.
In one example, obtaining the target data in the current data stream before the detection results of the plurality of detection windows further includes:
detecting the unmarked data and the data marked as abnormal in the detection window once every time the detection window slides by one step;
marking corresponding data according to the detection results of the unmarked data and the data marked as abnormal in the current detection window, and keeping the mark of the data marked as normal in the detection window unchanged.
For example, the data in the first detection window correspond to the times T1, T2, T3, T4 and T5, and if none of the 5 data are marked, all the 5 data are detected in the first detection window, the data at the times T1 and T5 are marked as abnormal after detection, and the data at the times T2, T3 and T4 are marked as normal; the times corresponding to the data in the second detection window are T2, T3, T4, T5, and T6, respectively, and in this detection window, the data at times T2, T3, and T4 are not detected, but only the data at times T5 (marked as abnormal) and T6 (marked as no) are detected. If the data detection result at the time T5 is normal in the second detection window, the data at the time T5 is marked as normal, and the data at the time T5 does not need to be detected in the subsequent detection window.
In this embodiment, only unmarked data and data marked as abnormal in the detection window are detected, and data marked as normal in the detection window is not detected, so that the calculation amount can be reduced, and the calculation resources can be saved.
In one example, obtaining the detection results of the target data in the current data stream before the detection results of the plurality of detection windows may further include:
and detecting all data in the detection window once every time the detection window slides by one step, and marking all data in the detection window according to a detection result.
In this embodiment, all data in the detection window are detected, and a sufficient number of detection results can be obtained, so that the accuracy of the final detection result is higher.
In the data detection method provided by the embodiment of the invention, the detection windows are adopted to traverse the current data stream by the preset step length, and the detection results of the target data in the current data stream in a plurality of detection windows are obtained; the preset step length is smaller than the width of the detection window, whether the target data is abnormal or not is determined according to the detection results of the target data in the detection windows, misjudgment caused by the fact that the data are located at the boundary of the detection windows can be effectively avoided, the stability of data abnormal detection is improved, and the false alarm rate is reduced.
Based on the above method embodiment, the embodiment of the present invention further provides corresponding apparatus, device, and storage medium embodiments. For detailed implementation of the embodiments of the apparatus, device and storage medium of the embodiments of the present invention, please refer to the corresponding descriptions in the foregoing method embodiments.
Fig. 3 is a functional block diagram of a data detection apparatus according to an embodiment of the present invention. As shown in fig. 3, in this embodiment, the apparatus may include:
an obtaining module 310, configured to traverse a current data stream by using a detection window with a preset step length, and obtain detection results of target data in the current data stream in multiple detection windows; the preset step length is smaller than the width of the detection window;
the determining module 320 is configured to determine whether the target data is abnormal according to detection results of the target data in multiple detection windows.
In one example, the obtaining module 310 may be specifically configured to:
determining a target detection window, wherein the target detection window covers the target data;
and determining the detection result of the target data in the target detection window according to the first data in the current data stream in the target detection window.
In one example, the obtaining module 310 may be specifically configured to:
determining a target detection window, wherein the target detection window covers the target data;
and determining the detection result of the target data in the target detection window according to the first data in the current data stream in the target detection window and the second data in the historical data stream in the target detection window.
In one example, the obtaining module 310 may be specifically configured to:
in the traversing process, determining whether the data volume in the current detection window is larger than or equal to a preset data volume threshold value;
if not, increasing the width of the current detection window; and if so, keeping the width of the current detection window unchanged.
In one example, increasing the width of the current detection window includes:
increasing the width of the current detection window to a first width so that the data amount in the current detection window is equal to the data amount threshold value; the first width is less than or equal to a preset width threshold.
In one example, increasing the width of the current detection window includes:
and if the width of the current detection window is increased to a preset width threshold, the data volume in the current detection window is still smaller than the data volume threshold, and the width of the current detection window is kept equal to the width threshold.
In one example, the determining module 320 may be specifically configured to:
if the detection results of the target data in the multiple detection windows indicate that the target data are abnormal, determining that the target data are abnormal data;
and if at least one detection result of the target data in the detection results of the plurality of detection windows indicates that the target data is normal, determining that the target data is normal data.
In one example, determining a detection result of the target data in the target detection window according to first data in a current data stream within the target detection window and second data in a historical data stream within the target detection window includes:
if the detection result of the target data in the target detection window determined according to the first data and the detection result of the target data in the target detection window determined according to the second data both indicate that the target data is abnormal, determining that the detection result of the target data in the target detection window indicates that the target data is abnormal;
and if at least one of the detection result of the target data in the target detection window determined according to the first data and the detection result of the target data in the target detection window determined according to the second data indicates that the target data is normal, determining that the detection result of the target data in the target detection window indicates that the target data is normal.
In one example, further comprising:
the first detection module is used for detecting the unmarked data and the data marked as abnormal in the detection window once when the detection window slides by one step;
and the first marking module is used for marking corresponding data according to the detection results of the unmarked data and the data marked as abnormal in the current detection window and keeping the mark of the data marked as normal in the detection window unchanged.
In one example, further comprising:
the second detection module is used for detecting all data in the detection window once when the detection window slides by one step;
and the second marking module is used for marking each piece of data in the detection window according to the detection result.
The embodiment of the invention also provides the electronic equipment. Fig. 4 is a hardware structure diagram of an electronic device according to an embodiment of the present invention. As shown in fig. 4, the electronic apparatus includes: an internal bus 401, and a memory 402, a processor 403, and an external interface 404 connected through the internal bus.
The processor 403 is configured to read the machine-readable instructions in the memory 402 and execute the instructions to implement the following operations:
traversing the current data stream by adopting the detection windows according to a preset step length, and acquiring detection results of target data in the current data stream in a plurality of detection windows; the preset step length is smaller than the width of the detection window;
and determining whether the target data is abnormal or not according to the detection results of the target data in a plurality of detection windows.
In one example, obtaining detection results of target data in a current data stream in a plurality of detection windows includes:
determining a target detection window, wherein the target detection window covers the target data;
and determining the detection result of the target data in the target detection window according to the first data in the current data stream in the target detection window.
In one example, obtaining detection results of target data in a current data stream in a plurality of detection windows includes:
determining a target detection window, wherein the target detection window covers the target data;
and determining the detection result of the target data in the target detection window according to the first data in the current data stream in the target detection window and the second data in the historical data stream in the target detection window.
In one example, traversing the current data stream with a detection window in a preset step size includes:
in the traversing process, determining whether the data volume in the current detection window is larger than or equal to a preset data volume threshold value;
if not, increasing the width of the current detection window; and if so, keeping the width of the current detection window unchanged.
In one example, increasing the width of the current detection window includes:
increasing the width of the current detection window to a first width so that the data amount in the current detection window is equal to the data amount threshold value; the first width is less than or equal to a preset width threshold.
In one example, increasing the width of the current detection window includes:
and if the width of the current detection window is increased to a preset width threshold, the data volume in the current detection window is still smaller than the data volume threshold, and the width of the current detection window is kept equal to the width threshold.
In one example, determining whether the target data is abnormal according to the detection results of the target data in a plurality of detection windows includes:
if the detection results of the target data in the multiple detection windows indicate that the target data are abnormal, determining that the target data are abnormal data;
and if at least one detection result of the target data in the detection results of the plurality of detection windows indicates that the target data is normal, determining that the target data is normal data.
In one example, determining a detection result of the target data in the target detection window according to first data in a current data stream within the target detection window and second data in a historical data stream within the target detection window includes:
if the detection result of the target data in the target detection window determined according to the first data and the detection result of the target data in the target detection window determined according to the second data both indicate that the target data is abnormal, determining that the detection result of the target data in the target detection window indicates that the target data is abnormal;
and if at least one of the detection result of the target data in the target detection window determined according to the first data and the detection result of the target data in the target detection window determined according to the second data indicates that the target data is normal, determining that the detection result of the target data in the target detection window indicates that the target data is normal.
In one example, obtaining the target data in the current data stream before the detection results of the plurality of detection windows further includes:
detecting the unmarked data and the data marked as abnormal in the detection window once every time the detection window slides by one step;
marking corresponding data according to the detection results of the unmarked data and the data marked as abnormal in the current detection window, and keeping the mark of the data marked as normal in the detection window unchanged.
In one example, obtaining the target data in the current data stream before the detection results of the plurality of detection windows further includes:
and detecting all data in the detection window once every time the detection window slides by one step, and marking all data in the detection window according to a detection result.
An embodiment of the present invention further provides a computer-readable storage medium, where a plurality of computer instructions are stored on the computer-readable storage medium, and when executed, the computer instructions perform the following processing:
traversing the current data stream by adopting the detection windows according to a preset step length, and acquiring detection results of target data in the current data stream in a plurality of detection windows; the preset step length is smaller than the width of the detection window;
and determining whether the target data is abnormal or not according to the detection results of the target data in a plurality of detection windows.
In one example, obtaining detection results of target data in a current data stream in a plurality of detection windows includes:
determining a target detection window, wherein the target detection window covers the target data;
and determining the detection result of the target data in the target detection window according to the first data in the current data stream in the target detection window.
In one example, obtaining detection results of target data in a current data stream in a plurality of detection windows includes:
determining a target detection window, wherein the target detection window covers the target data;
and determining the detection result of the target data in the target detection window according to the first data in the current data stream in the target detection window and the second data in the historical data stream in the target detection window.
In one example, traversing the current data stream with a detection window in a preset step size includes:
in the traversing process, determining whether the data volume in the current detection window is larger than or equal to a preset data volume threshold value;
if not, increasing the width of the current detection window; and if so, keeping the width of the current detection window unchanged.
In one example, increasing the width of the current detection window includes:
increasing the width of the current detection window to a first width so that the data amount in the current detection window is equal to the data amount threshold value; the first width is less than or equal to a preset width threshold.
In one example, increasing the width of the current detection window includes:
and if the width of the current detection window is increased to a preset width threshold, the data volume in the current detection window is still smaller than the data volume threshold, and the width of the current detection window is kept equal to the width threshold.
In one example, determining whether the target data is abnormal according to the detection results of the target data in a plurality of detection windows includes:
if the detection results of the target data in the multiple detection windows indicate that the target data are abnormal, determining that the target data are abnormal data;
and if at least one detection result of the target data in the detection results of the plurality of detection windows indicates that the target data is normal, determining that the target data is normal data.
In one example, determining a detection result of the target data in the target detection window according to first data in a current data stream within the target detection window and second data in a historical data stream within the target detection window includes:
if the detection result of the target data in the target detection window determined according to the first data and the detection result of the target data in the target detection window determined according to the second data both indicate that the target data is abnormal, determining that the detection result of the target data in the target detection window indicates that the target data is abnormal;
and if at least one of the detection result of the target data in the target detection window determined according to the first data and the detection result of the target data in the target detection window determined according to the second data indicates that the target data is normal, determining that the detection result of the target data in the target detection window indicates that the target data is normal.
In one example, obtaining the target data in the current data stream before the detection results of the plurality of detection windows further includes:
detecting the unmarked data and the data marked as abnormal in the detection window once every time the detection window slides by one step;
marking corresponding data according to the detection results of the unmarked data and the data marked as abnormal in the current detection window, and keeping the mark of the data marked as normal in the detection window unchanged.
In one example, obtaining the target data in the current data stream before the detection results of the plurality of detection windows further includes:
and detecting all data in the detection window once every time the detection window slides by one step, and marking all data in the detection window according to a detection result.
For the device and apparatus embodiments, as they correspond substantially to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, wherein the modules described as separate parts may or may not be physically separate, and the parts displayed as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network modules. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution in the specification. One of ordinary skill in the art can understand and implement it without inventive effort.
The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
Other embodiments of the present description will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This specification is intended to cover any variations, uses, or adaptations of the specification following, in general, the principles of the specification and including such departures from the present disclosure as come within known or customary practice within the art to which the specification pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the specification being indicated by the following claims.
It will be understood that the present description is not limited to the precise arrangements described above and shown in the drawings, and that various modifications and changes may be made without departing from the scope thereof. The scope of the present description is limited only by the appended claims.
The above description is only a preferred embodiment of the present disclosure, and should not be taken as limiting the present disclosure, and any modifications, equivalents, improvements, etc. made within the spirit and principle of the present disclosure should be included in the scope of the present disclosure.
Claims (11)
1. A method for data detection, comprising:
traversing the current data stream by adopting the detection windows according to a preset step length, and acquiring detection results of target data in the current data stream in a plurality of detection windows; the preset step length is smaller than the width of the detection window;
and determining whether the target data is abnormal or not according to the detection results of the target data in a plurality of detection windows.
2. The method of claim 1, wherein obtaining detection results of target data in a current data stream in a plurality of detection windows comprises:
determining a target detection window, wherein the target detection window covers the target data;
and determining the detection result of the target data in the target detection window according to the first data in the current data stream in the target detection window.
3. The method of claim 1, wherein obtaining detection results of target data in a current data stream in a plurality of detection windows comprises:
determining a target detection window, wherein the target detection window covers the target data;
and determining the detection result of the target data in the target detection window according to the first data in the current data stream in the target detection window and the second data in the historical data stream in the target detection window.
4. The method of claim 1, wherein traversing the current data stream with a detection window in a preset step size comprises:
in the traversing process, determining whether the data volume in the current detection window is larger than or equal to a preset data volume threshold value;
if not, increasing the width of the current detection window; and if so, keeping the width of the current detection window unchanged.
5. The method of claim 4, wherein increasing the width of the current detection window comprises:
increasing the width of the current detection window to a first width so that the data amount in the current detection window is equal to the data amount threshold value; the first width is less than or equal to a preset width threshold.
6. The method of claim 4, wherein increasing the width of the current detection window comprises:
and if the width of the current detection window is increased to a preset width threshold, the data volume in the current detection window is still smaller than the data volume threshold, and the width of the current detection window is kept equal to the width threshold.
7. The method of claim 1, wherein determining whether the target data is abnormal according to the detection results of the target data in a plurality of detection windows comprises:
if the detection results of the target data in the multiple detection windows indicate that the target data are abnormal, determining that the target data are abnormal data;
and if at least one detection result of the target data in the detection results of the plurality of detection windows indicates that the target data is normal, determining that the target data is normal data.
8. The method of claim 3, wherein determining the detection result of the target data in the target detection window according to the first data in the current data stream in the target detection window and the second data in the historical data stream in the target detection window comprises:
if the detection result of the target data in the target detection window determined according to the first data and the detection result of the target data in the target detection window determined according to the second data both indicate that the target data is abnormal, determining that the detection result of the target data in the target detection window indicates that the target data is abnormal;
and if at least one of the detection result of the target data in the target detection window determined according to the first data and the detection result of the target data in the target detection window determined according to the second data indicates that the target data is normal, determining that the detection result of the target data in the target detection window indicates that the target data is normal.
9. The method of claim 1, wherein obtaining the target data in the current data stream precedes the detection results of the plurality of detection windows, further comprising:
detecting the unmarked data and the data marked as abnormal in the detection window once every time the detection window slides by one step;
marking corresponding data according to the detection results of the unmarked data and the data marked as abnormal in the current detection window, and keeping the mark of the data marked as normal in the detection window unchanged.
10. The method of claim 1, wherein obtaining the target data in the current data stream precedes the detection results of the plurality of detection windows, further comprising:
and detecting all data in the detection window once every time the detection window slides by one step, and marking all data in the detection window according to a detection result.
11. A data detection apparatus, comprising:
the acquisition module is used for traversing the current data stream by adopting the detection windows in a preset step length to acquire the detection results of the target data in the current data stream in the plurality of detection windows; the preset step length is smaller than the width of the detection window;
and the determining module is used for determining whether the target data is abnormal or not according to the detection results of the target data in the plurality of detection windows.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011350055.7A CN112286951A (en) | 2020-11-26 | 2020-11-26 | Data detection method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011350055.7A CN112286951A (en) | 2020-11-26 | 2020-11-26 | Data detection method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112286951A true CN112286951A (en) | 2021-01-29 |
Family
ID=74426412
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011350055.7A Pending CN112286951A (en) | 2020-11-26 | 2020-11-26 | Data detection method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112286951A (en) |
Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130085715A1 (en) * | 2011-09-29 | 2013-04-04 | Choudur Lakshminarayan | Anomaly detection in streaming data |
WO2015119607A1 (en) * | 2014-02-06 | 2015-08-13 | Hewlett-Packard Development Company, L.P. | Resource management |
US9298788B1 (en) * | 2013-03-11 | 2016-03-29 | DataTorrent, Inc. | Checkpointing in distributed streaming platform for real-time applications |
CN107682319A (en) * | 2017-09-13 | 2018-02-09 | 桂林电子科技大学 | A kind of method of data flow anomaly detection and multiple-authentication based on enhanced angle Outlier factor |
CN109587001A (en) * | 2018-11-15 | 2019-04-05 | 新华三信息安全技术有限公司 | A kind of performance indicator method for detecting abnormality and device |
EP3623964A1 (en) * | 2018-09-14 | 2020-03-18 | Verint Americas Inc. | Framework for the automated determination of classes and anomaly detection methods for time series |
US20200097852A1 (en) * | 2018-09-20 | 2020-03-26 | Cable Television Laboratories, Inc. | Systems and methods for detecting and grouping anomalies in data |
CN110928255A (en) * | 2019-11-20 | 2020-03-27 | 珠海格力电器股份有限公司 | Data anomaly statistical alarm method and device and electronic equipment |
CN111178456A (en) * | 2020-01-15 | 2020-05-19 | 腾讯科技(深圳)有限公司 | Abnormal index detection method and device, computer equipment and storage medium |
WO2020134032A1 (en) * | 2018-12-28 | 2020-07-02 | 中国银联股份有限公司 | Method for detecting abnormality of service system, and apparatus therefor |
CN111400721A (en) * | 2020-03-24 | 2020-07-10 | 杭州数梦工场科技有限公司 | API interface detection method and device |
CN111538897A (en) * | 2020-03-16 | 2020-08-14 | 北京三快在线科技有限公司 | Recommended abnormality detection method and device, electronic equipment and readable storage medium |
US20200314159A1 (en) * | 2019-03-29 | 2020-10-01 | Paypal, Inc. | Anomaly detection for streaming data |
CN111858680A (en) * | 2020-08-01 | 2020-10-30 | 西安交通大学 | System and method for rapidly detecting satellite telemetry time sequence data abnormity in real time |
-
2020
- 2020-11-26 CN CN202011350055.7A patent/CN112286951A/en active Pending
Patent Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130085715A1 (en) * | 2011-09-29 | 2013-04-04 | Choudur Lakshminarayan | Anomaly detection in streaming data |
US9298788B1 (en) * | 2013-03-11 | 2016-03-29 | DataTorrent, Inc. | Checkpointing in distributed streaming platform for real-time applications |
WO2015119607A1 (en) * | 2014-02-06 | 2015-08-13 | Hewlett-Packard Development Company, L.P. | Resource management |
CN107682319A (en) * | 2017-09-13 | 2018-02-09 | 桂林电子科技大学 | A kind of method of data flow anomaly detection and multiple-authentication based on enhanced angle Outlier factor |
EP3623964A1 (en) * | 2018-09-14 | 2020-03-18 | Verint Americas Inc. | Framework for the automated determination of classes and anomaly detection methods for time series |
US20200097852A1 (en) * | 2018-09-20 | 2020-03-26 | Cable Television Laboratories, Inc. | Systems and methods for detecting and grouping anomalies in data |
CN109587001A (en) * | 2018-11-15 | 2019-04-05 | 新华三信息安全技术有限公司 | A kind of performance indicator method for detecting abnormality and device |
WO2020134032A1 (en) * | 2018-12-28 | 2020-07-02 | 中国银联股份有限公司 | Method for detecting abnormality of service system, and apparatus therefor |
US20200314159A1 (en) * | 2019-03-29 | 2020-10-01 | Paypal, Inc. | Anomaly detection for streaming data |
CN110928255A (en) * | 2019-11-20 | 2020-03-27 | 珠海格力电器股份有限公司 | Data anomaly statistical alarm method and device and electronic equipment |
CN111178456A (en) * | 2020-01-15 | 2020-05-19 | 腾讯科技(深圳)有限公司 | Abnormal index detection method and device, computer equipment and storage medium |
CN111538897A (en) * | 2020-03-16 | 2020-08-14 | 北京三快在线科技有限公司 | Recommended abnormality detection method and device, electronic equipment and readable storage medium |
CN111400721A (en) * | 2020-03-24 | 2020-07-10 | 杭州数梦工场科技有限公司 | API interface detection method and device |
CN111858680A (en) * | 2020-08-01 | 2020-10-30 | 西安交通大学 | System and method for rapidly detecting satellite telemetry time sequence data abnormity in real time |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11599825B2 (en) | Method and apparatus for training trajectory classification model, and electronic device | |
CN109086734B (en) | Method and device for positioning pupil image in human eye image | |
CN104978578A (en) | Mobile phone photo taking text image quality evaluation method | |
CN109410172B (en) | Paper thickness detection method and device, storage medium and processor | |
CN111383246B (en) | Scroll detection method, device and equipment | |
CN110008247B (en) | Method, device and equipment for determining abnormal source and computer readable storage medium | |
CN102680481A (en) | Detection method for cotton fiber impurities | |
US11925498B2 (en) | Reconstructing image | |
WO2021017000A1 (en) | Method and apparatus for acquiring meter reading, and memory, processor and terminal | |
CN110827245A (en) | Method and equipment for detecting screen display disconnection | |
AU2019200861B2 (en) | Unobtrusive and automated detection of frequencies of spatially located distinct parts of a machine | |
CN109523557B (en) | Image semantic segmentation labeling method, device and storage medium | |
Lauridsen et al. | Reading circular analogue gauges using digital image processing | |
CN113723467A (en) | Sample collection method, device and equipment for defect detection | |
CN113225667B (en) | Method and device for eliminating non-direct path of arrival time measurement value and terminal | |
CN112286951A (en) | Data detection method and device | |
CN111428858A (en) | Method and device for determining number of samples, electronic equipment and storage medium | |
CN116774986A (en) | Automatic evaluation method and device for software development workload, storage medium and processor | |
US10977482B2 (en) | Object attribution analyzing method and related object attribution analyzing device | |
CN110991370B (en) | Multichannel information fusion ATM panel carryover detection method | |
CN110098983B (en) | Abnormal flow detection method and device | |
CN111546793A (en) | Processing method and system for paper detection threshold of printer | |
CN111132052A (en) | Intelligent safety campus positioning method, system, equipment and readable storage medium | |
CN114881908B (en) | Abnormal pixel identification method, device and equipment and computer storage medium | |
CN116452924B (en) | Model threshold adjustment method and device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |