CN112286951A - Data detection method and device - Google Patents

Data detection method and device Download PDF

Info

Publication number
CN112286951A
CN112286951A CN202011350055.7A CN202011350055A CN112286951A CN 112286951 A CN112286951 A CN 112286951A CN 202011350055 A CN202011350055 A CN 202011350055A CN 112286951 A CN112286951 A CN 112286951A
Authority
CN
China
Prior art keywords
data
detection
target
detection window
current
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011350055.7A
Other languages
Chinese (zh)
Inventor
薄红涛
闫军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dt Dream Technology Co Ltd
Original Assignee
Hangzhou Dt Dream Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dt Dream Technology Co Ltd filed Critical Hangzhou Dt Dream Technology Co Ltd
Priority to CN202011350055.7A priority Critical patent/CN112286951A/en
Publication of CN112286951A publication Critical patent/CN112286951A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2358Change logging, detection, and notification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/252Integrating or interfacing systems involving database management systems between a Database Management System and a front-end application
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses

Abstract

The embodiment of the invention provides a data detection method and device. The embodiment of the invention adopts the detection windows to traverse the current data stream by the preset step length to obtain the detection results of the target data in the current data stream in a plurality of detection windows; the preset step length is smaller than the width of the detection window, whether the target data is abnormal or not is determined according to the detection results of the target data in the detection windows, misjudgment caused by the fact that the data are located at the boundary of the detection windows can be effectively avoided, the stability of data abnormal detection is improved, and the false alarm rate is reduced.

Description

Data detection method and device
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a data detection method and apparatus.
Background
In industry, it is often necessary to periodically collect data from an inspection object by a collection device. However, due to the influence of the collection equipment factor, the environmental factor or the human factor, data with quality problems may exist in the collected data, and such data is referred to as abnormal data herein.
In order to improve the data quality of the collected data, abnormal data needs to be found out from the collected data and then sent to manual further confirmation. If the abnormal data is confirmed to be abnormal data manually, processing the abnormal data according to a preset abnormal data processing strategy; if the data is not abnormal data after manual confirmation, the data is replaced into the data sequence of the collected data.
Disclosure of Invention
In order to overcome the problems in the related art, the invention provides a data detection method and a data detection device.
According to a first aspect of the embodiments of the present invention, there is provided a data detection method, including:
traversing the current data stream by adopting the detection windows according to a preset step length, and acquiring detection results of target data in the current data stream in a plurality of detection windows; the preset step length is smaller than the width of the detection window;
and determining whether the target data is abnormal or not according to the detection results of the target data in a plurality of detection windows.
According to a second aspect of embodiments of the present invention, there is provided a data detection apparatus, including:
the acquisition module is used for traversing the current data stream by adopting the detection windows in a preset step length to acquire the detection results of the target data in the current data stream in the plurality of detection windows; the preset step length is smaller than the width of the detection window;
and the determining module is used for determining whether the target data is abnormal or not according to the detection results of the target data in the plurality of detection windows.
The technical scheme provided by the embodiment of the invention has the following beneficial effects:
in the embodiment of the invention, the detection windows are adopted to traverse the current data stream by the preset step length, and the detection results of the target data in the current data stream in a plurality of detection windows are obtained; the preset step length is smaller than the width of the detection window, whether the target data is abnormal or not is determined according to the detection results of the target data in the detection windows, misjudgment caused by the fact that the data are located at the boundary of the detection windows can be effectively avoided, the stability of data abnormal detection is improved, and the false alarm rate is reduced.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the specification.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present specification and together with the description, serve to explain the principles of the specification.
Fig. 1 is a flowchart illustrating a data detection method according to an embodiment of the present invention.
Fig. 2 is a diagram illustrating an example of a sliding process of a sliding window.
Fig. 3 is a functional block diagram of a data detection apparatus according to an embodiment of the present invention.
Fig. 4 is a hardware structure diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of embodiments of the invention, as detailed in the following claims.
The terminology used in the embodiments of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of embodiments of the invention. As used in the examples of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It should be understood that although the terms first, second, third, etc. may be used to describe various information in embodiments of the present invention, the information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, the first information may also be referred to as second information, and similarly, the second information may also be referred to as first information, without departing from the scope of embodiments of the present invention. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.
The data detection method provided by the present invention is described in detail below with reference to examples.
Fig. 1 is a flowchart illustrating a data detection method according to an embodiment of the present invention. As shown in fig. 1, the data detection method may include:
s101, traversing the current data stream by adopting a detection window in a preset step length to obtain detection results of target data in the current data stream in a plurality of detection windows; the preset step length is smaller than the width of the detection window.
S102, determining whether the target data are abnormal or not according to the detection results of the target data in a plurality of detection windows.
In this embodiment, the current data stream is a current data stream of the acquired data of the detection target.
In application, the acquisition device acquires data of a detection target at fixed time intervals, so that the time intervals between two adjacent data in the current data stream are equal. The width of the detection window may be equal to an integer multiple of the time interval of the data in the current data stream.
In this embodiment, the preset step size may be equal to an integer multiple of the time interval of the data in the current data stream, for example, the minimum step size may be equal to one time interval.
In application, the value of the preset step length can be determined according to application requirements. The smaller the preset step length is, the more the detection times are, the higher the accuracy of the detection result is, but the more the calculation resources are needed; the larger the preset step length is, the fewer the detection times are, the smaller the required computing resource is, but the accuracy of the detection result is correspondingly reduced.
In this embodiment, each time the detection window slides by one step, the data in the detection window is detected once. As the detection window slides, the position of the data in the detection window changes. Fig. 2 is a diagram illustrating an example of a sliding process of a sliding window. As shown in fig. 2, assuming that the step size of the sliding of the detection window is equal to a time interval, in the graph (a) of fig. 2, the data at the time T is at the boundary of the detection window, and the sliding of the detection window is further slid by 2 steps on the basis of the graph (a) to reach the position shown in the graph (b) of fig. 2, and the data at the time T is at the middle of the detection window.
When the data in the detection window is detected, the data in the whole detection window usually reflects the rule of the synchronous data, and whether each piece of data in the detection window is normal can be detected according to the rule. When the data is located at the boundary of the detection window, the rule of the data segment where the data is located cannot be completely embodied in the detection window due to the fact that the front data or the rear data of the data are not located in the detection window, and therefore misjudgment is prone to occur during detection, and abnormal data are caused. When the data is positioned at other positions except the boundary in the detection window, the front data and the rear data of the data are both in the detection window, so that the rule of the data segment where the data is positioned can be completely embodied in the detection window, and the detection accuracy is high.
If the detection is performed according to a fixed detection window (that means that each piece of data only exists in one detection window), and each piece of data is detected only once, the data at the boundary of each detection window may be detected incorrectly due to misjudgment, so that the actual normal data is judged to be abnormal data.
In this embodiment, since the step length of the sliding of the detection window is smaller than the width of the detection window, each piece of data can be detected at least twice, so that the detection results of the data located at other positions in the detection window except for the boundary can be obtained at a high probability.
For example, in fig. 2, when the step size of the sliding of the detection windows is equal to one time interval, there are 5 detection windows covering the T-time data, and in each of the 5 detection windows, the T-time data is detected once, so that the detection result of the T-time data in the 5 detection windows can be obtained.
In this embodiment, whether the target data is abnormal is determined not only according to the primary detection result of the target data, but also according to the multiple detection results of the target data in the multiple detection windows, so that misjudgment caused by the data being at the boundary of the detection windows can be effectively avoided, the stability of data abnormality detection is improved, and the false alarm rate is reduced.
In one example, obtaining detection results of target data in a current data stream in a plurality of detection windows may include:
determining a target detection window, wherein the target detection window covers the target data;
and determining the detection result of the target data in the target detection window according to the first data in the current data stream in the target detection window.
In this embodiment, the detection result of the target data in the detection window is determined according to the data in the same detection window as the target data, and the target data can be detected according to the rule of the data segment in which the target data is located, so as to obtain an accurate data detection result.
In this embodiment, the policy for detecting the target data based on the first data may be determined according to a specific application scenario, which is not limited in this embodiment.
In some cases, a piece of data may be determined as abnormal data due to a large difference from the data in the same detection window, but not abnormal data in the view of the history.
For example, sales data for a day of a year for a business may be shown in a surge state compared to past and future sales data, but the business may be promoted each year for that day, and thus the surge data may appear abnormal when viewed in the current data stream of the current day, but normal when viewed in the current historical data of the same day.
Therefore, in this embodiment, it may also be detected whether the target data is abnormal based on the current data stream where the target data is located and the historical data stream that is in the same period as the target data.
Based on this, in one example, obtaining detection results of the target data in the current data stream in a plurality of detection windows may include:
determining a target detection window, wherein the target detection window covers the target data;
and determining the detection result of the target data in the target detection window according to the first data in the current data stream in the target detection window and the second data in the historical data stream in the target detection window.
In this embodiment, after the current data stream and the historical data stream are time-aligned, the current data stream and the historical data stream may be traversed simultaneously by using a detection window with a preset step length. The first data is within the same detection window as the second data, which is a contemporaneous history of the first data.
For example, the current data stream is the actual water supply of a water supply enterprise per day in 2019 for the year, and the historical data stream is the actual water supply of the water supply enterprise per day in 2018 for the year. Assuming that the width of the detection window is 30 days, and the target data is 2019, 10 and 15 days, when the date covered by the detection window is 10, 1 and 30 days, the first data is 2019, 10, 1 and 30 days, 2019, 10 and 30 days, and the second data is 2018, 10, 1 and 30 days, 2018, 10 and 30 months.
The detection result of the target data is determined based on the current data stream and the historical data stream, the correlation between the current data rule and the historical synchronous data rule is considered, the stability of data detection can be further improved, and the misjudgment rate is reduced.
In one example, determining a detection result of the target data in the target detection window according to first data in a current data stream within the target detection window and second data in a historical data stream within the target detection window may include:
if the detection result of the target data in the target detection window determined according to the first data and the detection result of the target data in the target detection window determined according to the second data both indicate that the target data is abnormal, determining that the detection result of the target data in the target detection window indicates that the target data is abnormal;
and if at least one of the detection result of the target data in the target detection window determined according to the first data and the detection result of the target data in the target detection window determined according to the second data indicates that the target data is normal, determining that the detection result of the target data in the target detection window indicates that the target data is normal.
In this embodiment, when at least one of the detection result based on the current data and the detection result based on the historical data indicates that the target data is normal, it is determined that the target data is normal, so that the stability of data detection can be improved, and the false alarm rate can be reduced.
In order to ensure the reliability of detection, data in the detection window needs to reach a certain magnitude, and under the condition that the data is sparse, the detection is possibly inaccurate because the data amount in the detection window does not reach the magnitude requirement.
In one example, traversing the current data stream with a detection window in a preset step size includes:
in the traversing process, determining whether the data volume in the current detection window is larger than or equal to a preset data volume threshold value;
if not, increasing the width of the current detection window; and if so, keeping the width of the current detection window unchanged.
In the embodiment, when the data volume in the current detection window is less, the data volume in the current detection window can be increased by increasing the width of the current detection window, so that the detection accuracy is improved.
In this embodiment, the data amount threshold may be set according to a specific application scenario.
Wherein, the increasing amplitude of the width of the detection window can be determined according to application requirements.
In one example, increasing the width of the current detection window may include:
increasing the width of the current detection window to a first width so that the data amount in the current detection window is equal to the data amount threshold value; the first width is less than or equal to a preset width threshold.
In this embodiment, the principle of increasing the width of the detection window is as follows: and under the condition that the data volume in the detection window meets the magnitude requirement required by the detection precision, the detection window is made as small as possible. Therefore, the detection calculation amount of each detection window is reduced as much as possible, and the requirement on calculation resources is reduced.
In one example, increasing the width of the current detection window may include:
and if the width of the current detection window is increased to a preset width threshold, the data volume in the current detection window is still smaller than the data volume threshold, and the width of the current detection window is kept equal to the width threshold.
The width of the detection window needs to be adjusted within a certain range, and the width of the detection window cannot be adjusted infinitely. In this embodiment, the adjustment range of the detection window width is limited by the width threshold, and if the detection window width has been adjusted to the upper limit of the adjustment range (i.e., the width threshold), the data amount in the detection window still does not reach the minimum level requirement, the detection window width is not increased any more.
And, for data within the detection window whose detection window width has been equal to the maximum width threshold but whose amount of data within the detection window is still less than the aforementioned data amount threshold, no data detection is performed, and the data can be marked as state unknown. So as to detect the data with unknown state by other detection modes or abandon the detection according to the requirement.
In one example, determining whether the target data is abnormal according to the detection result of the target data in a plurality of detection windows may include:
if the detection results of the target data in the multiple detection windows indicate that the target data are abnormal, determining that the target data are abnormal data;
and if at least one detection result of the target data in the detection results of the plurality of detection windows indicates that the target data is normal, determining that the target data is normal data.
In the embodiment, as long as one detection result in the detection results of the plurality of detection windows indicates that the target data is normal, the target data is determined to be normal data, so that the stability of data detection is effectively improved, and the false alarm rate is reduced.
In one example, obtaining the target data in the current data stream before the detection results of the plurality of detection windows further includes:
detecting the unmarked data and the data marked as abnormal in the detection window once every time the detection window slides by one step;
marking corresponding data according to the detection results of the unmarked data and the data marked as abnormal in the current detection window, and keeping the mark of the data marked as normal in the detection window unchanged.
For example, the data in the first detection window correspond to the times T1, T2, T3, T4 and T5, and if none of the 5 data are marked, all the 5 data are detected in the first detection window, the data at the times T1 and T5 are marked as abnormal after detection, and the data at the times T2, T3 and T4 are marked as normal; the times corresponding to the data in the second detection window are T2, T3, T4, T5, and T6, respectively, and in this detection window, the data at times T2, T3, and T4 are not detected, but only the data at times T5 (marked as abnormal) and T6 (marked as no) are detected. If the data detection result at the time T5 is normal in the second detection window, the data at the time T5 is marked as normal, and the data at the time T5 does not need to be detected in the subsequent detection window.
In this embodiment, only unmarked data and data marked as abnormal in the detection window are detected, and data marked as normal in the detection window is not detected, so that the calculation amount can be reduced, and the calculation resources can be saved.
In one example, obtaining the detection results of the target data in the current data stream before the detection results of the plurality of detection windows may further include:
and detecting all data in the detection window once every time the detection window slides by one step, and marking all data in the detection window according to a detection result.
In this embodiment, all data in the detection window are detected, and a sufficient number of detection results can be obtained, so that the accuracy of the final detection result is higher.
In the data detection method provided by the embodiment of the invention, the detection windows are adopted to traverse the current data stream by the preset step length, and the detection results of the target data in the current data stream in a plurality of detection windows are obtained; the preset step length is smaller than the width of the detection window, whether the target data is abnormal or not is determined according to the detection results of the target data in the detection windows, misjudgment caused by the fact that the data are located at the boundary of the detection windows can be effectively avoided, the stability of data abnormal detection is improved, and the false alarm rate is reduced.
Based on the above method embodiment, the embodiment of the present invention further provides corresponding apparatus, device, and storage medium embodiments. For detailed implementation of the embodiments of the apparatus, device and storage medium of the embodiments of the present invention, please refer to the corresponding descriptions in the foregoing method embodiments.
Fig. 3 is a functional block diagram of a data detection apparatus according to an embodiment of the present invention. As shown in fig. 3, in this embodiment, the apparatus may include:
an obtaining module 310, configured to traverse a current data stream by using a detection window with a preset step length, and obtain detection results of target data in the current data stream in multiple detection windows; the preset step length is smaller than the width of the detection window;
the determining module 320 is configured to determine whether the target data is abnormal according to detection results of the target data in multiple detection windows.
In one example, the obtaining module 310 may be specifically configured to:
determining a target detection window, wherein the target detection window covers the target data;
and determining the detection result of the target data in the target detection window according to the first data in the current data stream in the target detection window.
In one example, the obtaining module 310 may be specifically configured to:
determining a target detection window, wherein the target detection window covers the target data;
and determining the detection result of the target data in the target detection window according to the first data in the current data stream in the target detection window and the second data in the historical data stream in the target detection window.
In one example, the obtaining module 310 may be specifically configured to:
in the traversing process, determining whether the data volume in the current detection window is larger than or equal to a preset data volume threshold value;
if not, increasing the width of the current detection window; and if so, keeping the width of the current detection window unchanged.
In one example, increasing the width of the current detection window includes:
increasing the width of the current detection window to a first width so that the data amount in the current detection window is equal to the data amount threshold value; the first width is less than or equal to a preset width threshold.
In one example, increasing the width of the current detection window includes:
and if the width of the current detection window is increased to a preset width threshold, the data volume in the current detection window is still smaller than the data volume threshold, and the width of the current detection window is kept equal to the width threshold.
In one example, the determining module 320 may be specifically configured to:
if the detection results of the target data in the multiple detection windows indicate that the target data are abnormal, determining that the target data are abnormal data;
and if at least one detection result of the target data in the detection results of the plurality of detection windows indicates that the target data is normal, determining that the target data is normal data.
In one example, determining a detection result of the target data in the target detection window according to first data in a current data stream within the target detection window and second data in a historical data stream within the target detection window includes:
if the detection result of the target data in the target detection window determined according to the first data and the detection result of the target data in the target detection window determined according to the second data both indicate that the target data is abnormal, determining that the detection result of the target data in the target detection window indicates that the target data is abnormal;
and if at least one of the detection result of the target data in the target detection window determined according to the first data and the detection result of the target data in the target detection window determined according to the second data indicates that the target data is normal, determining that the detection result of the target data in the target detection window indicates that the target data is normal.
In one example, further comprising:
the first detection module is used for detecting the unmarked data and the data marked as abnormal in the detection window once when the detection window slides by one step;
and the first marking module is used for marking corresponding data according to the detection results of the unmarked data and the data marked as abnormal in the current detection window and keeping the mark of the data marked as normal in the detection window unchanged.
In one example, further comprising:
the second detection module is used for detecting all data in the detection window once when the detection window slides by one step;
and the second marking module is used for marking each piece of data in the detection window according to the detection result.
The embodiment of the invention also provides the electronic equipment. Fig. 4 is a hardware structure diagram of an electronic device according to an embodiment of the present invention. As shown in fig. 4, the electronic apparatus includes: an internal bus 401, and a memory 402, a processor 403, and an external interface 404 connected through the internal bus.
The processor 403 is configured to read the machine-readable instructions in the memory 402 and execute the instructions to implement the following operations:
traversing the current data stream by adopting the detection windows according to a preset step length, and acquiring detection results of target data in the current data stream in a plurality of detection windows; the preset step length is smaller than the width of the detection window;
and determining whether the target data is abnormal or not according to the detection results of the target data in a plurality of detection windows.
In one example, obtaining detection results of target data in a current data stream in a plurality of detection windows includes:
determining a target detection window, wherein the target detection window covers the target data;
and determining the detection result of the target data in the target detection window according to the first data in the current data stream in the target detection window.
In one example, obtaining detection results of target data in a current data stream in a plurality of detection windows includes:
determining a target detection window, wherein the target detection window covers the target data;
and determining the detection result of the target data in the target detection window according to the first data in the current data stream in the target detection window and the second data in the historical data stream in the target detection window.
In one example, traversing the current data stream with a detection window in a preset step size includes:
in the traversing process, determining whether the data volume in the current detection window is larger than or equal to a preset data volume threshold value;
if not, increasing the width of the current detection window; and if so, keeping the width of the current detection window unchanged.
In one example, increasing the width of the current detection window includes:
increasing the width of the current detection window to a first width so that the data amount in the current detection window is equal to the data amount threshold value; the first width is less than or equal to a preset width threshold.
In one example, increasing the width of the current detection window includes:
and if the width of the current detection window is increased to a preset width threshold, the data volume in the current detection window is still smaller than the data volume threshold, and the width of the current detection window is kept equal to the width threshold.
In one example, determining whether the target data is abnormal according to the detection results of the target data in a plurality of detection windows includes:
if the detection results of the target data in the multiple detection windows indicate that the target data are abnormal, determining that the target data are abnormal data;
and if at least one detection result of the target data in the detection results of the plurality of detection windows indicates that the target data is normal, determining that the target data is normal data.
In one example, determining a detection result of the target data in the target detection window according to first data in a current data stream within the target detection window and second data in a historical data stream within the target detection window includes:
if the detection result of the target data in the target detection window determined according to the first data and the detection result of the target data in the target detection window determined according to the second data both indicate that the target data is abnormal, determining that the detection result of the target data in the target detection window indicates that the target data is abnormal;
and if at least one of the detection result of the target data in the target detection window determined according to the first data and the detection result of the target data in the target detection window determined according to the second data indicates that the target data is normal, determining that the detection result of the target data in the target detection window indicates that the target data is normal.
In one example, obtaining the target data in the current data stream before the detection results of the plurality of detection windows further includes:
detecting the unmarked data and the data marked as abnormal in the detection window once every time the detection window slides by one step;
marking corresponding data according to the detection results of the unmarked data and the data marked as abnormal in the current detection window, and keeping the mark of the data marked as normal in the detection window unchanged.
In one example, obtaining the target data in the current data stream before the detection results of the plurality of detection windows further includes:
and detecting all data in the detection window once every time the detection window slides by one step, and marking all data in the detection window according to a detection result.
An embodiment of the present invention further provides a computer-readable storage medium, where a plurality of computer instructions are stored on the computer-readable storage medium, and when executed, the computer instructions perform the following processing:
traversing the current data stream by adopting the detection windows according to a preset step length, and acquiring detection results of target data in the current data stream in a plurality of detection windows; the preset step length is smaller than the width of the detection window;
and determining whether the target data is abnormal or not according to the detection results of the target data in a plurality of detection windows.
In one example, obtaining detection results of target data in a current data stream in a plurality of detection windows includes:
determining a target detection window, wherein the target detection window covers the target data;
and determining the detection result of the target data in the target detection window according to the first data in the current data stream in the target detection window.
In one example, obtaining detection results of target data in a current data stream in a plurality of detection windows includes:
determining a target detection window, wherein the target detection window covers the target data;
and determining the detection result of the target data in the target detection window according to the first data in the current data stream in the target detection window and the second data in the historical data stream in the target detection window.
In one example, traversing the current data stream with a detection window in a preset step size includes:
in the traversing process, determining whether the data volume in the current detection window is larger than or equal to a preset data volume threshold value;
if not, increasing the width of the current detection window; and if so, keeping the width of the current detection window unchanged.
In one example, increasing the width of the current detection window includes:
increasing the width of the current detection window to a first width so that the data amount in the current detection window is equal to the data amount threshold value; the first width is less than or equal to a preset width threshold.
In one example, increasing the width of the current detection window includes:
and if the width of the current detection window is increased to a preset width threshold, the data volume in the current detection window is still smaller than the data volume threshold, and the width of the current detection window is kept equal to the width threshold.
In one example, determining whether the target data is abnormal according to the detection results of the target data in a plurality of detection windows includes:
if the detection results of the target data in the multiple detection windows indicate that the target data are abnormal, determining that the target data are abnormal data;
and if at least one detection result of the target data in the detection results of the plurality of detection windows indicates that the target data is normal, determining that the target data is normal data.
In one example, determining a detection result of the target data in the target detection window according to first data in a current data stream within the target detection window and second data in a historical data stream within the target detection window includes:
if the detection result of the target data in the target detection window determined according to the first data and the detection result of the target data in the target detection window determined according to the second data both indicate that the target data is abnormal, determining that the detection result of the target data in the target detection window indicates that the target data is abnormal;
and if at least one of the detection result of the target data in the target detection window determined according to the first data and the detection result of the target data in the target detection window determined according to the second data indicates that the target data is normal, determining that the detection result of the target data in the target detection window indicates that the target data is normal.
In one example, obtaining the target data in the current data stream before the detection results of the plurality of detection windows further includes:
detecting the unmarked data and the data marked as abnormal in the detection window once every time the detection window slides by one step;
marking corresponding data according to the detection results of the unmarked data and the data marked as abnormal in the current detection window, and keeping the mark of the data marked as normal in the detection window unchanged.
In one example, obtaining the target data in the current data stream before the detection results of the plurality of detection windows further includes:
and detecting all data in the detection window once every time the detection window slides by one step, and marking all data in the detection window according to a detection result.
For the device and apparatus embodiments, as they correspond substantially to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, wherein the modules described as separate parts may or may not be physically separate, and the parts displayed as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network modules. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution in the specification. One of ordinary skill in the art can understand and implement it without inventive effort.
The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
Other embodiments of the present description will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This specification is intended to cover any variations, uses, or adaptations of the specification following, in general, the principles of the specification and including such departures from the present disclosure as come within known or customary practice within the art to which the specification pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the specification being indicated by the following claims.
It will be understood that the present description is not limited to the precise arrangements described above and shown in the drawings, and that various modifications and changes may be made without departing from the scope thereof. The scope of the present description is limited only by the appended claims.
The above description is only a preferred embodiment of the present disclosure, and should not be taken as limiting the present disclosure, and any modifications, equivalents, improvements, etc. made within the spirit and principle of the present disclosure should be included in the scope of the present disclosure.

Claims (11)

1. A method for data detection, comprising:
traversing the current data stream by adopting the detection windows according to a preset step length, and acquiring detection results of target data in the current data stream in a plurality of detection windows; the preset step length is smaller than the width of the detection window;
and determining whether the target data is abnormal or not according to the detection results of the target data in a plurality of detection windows.
2. The method of claim 1, wherein obtaining detection results of target data in a current data stream in a plurality of detection windows comprises:
determining a target detection window, wherein the target detection window covers the target data;
and determining the detection result of the target data in the target detection window according to the first data in the current data stream in the target detection window.
3. The method of claim 1, wherein obtaining detection results of target data in a current data stream in a plurality of detection windows comprises:
determining a target detection window, wherein the target detection window covers the target data;
and determining the detection result of the target data in the target detection window according to the first data in the current data stream in the target detection window and the second data in the historical data stream in the target detection window.
4. The method of claim 1, wherein traversing the current data stream with a detection window in a preset step size comprises:
in the traversing process, determining whether the data volume in the current detection window is larger than or equal to a preset data volume threshold value;
if not, increasing the width of the current detection window; and if so, keeping the width of the current detection window unchanged.
5. The method of claim 4, wherein increasing the width of the current detection window comprises:
increasing the width of the current detection window to a first width so that the data amount in the current detection window is equal to the data amount threshold value; the first width is less than or equal to a preset width threshold.
6. The method of claim 4, wherein increasing the width of the current detection window comprises:
and if the width of the current detection window is increased to a preset width threshold, the data volume in the current detection window is still smaller than the data volume threshold, and the width of the current detection window is kept equal to the width threshold.
7. The method of claim 1, wherein determining whether the target data is abnormal according to the detection results of the target data in a plurality of detection windows comprises:
if the detection results of the target data in the multiple detection windows indicate that the target data are abnormal, determining that the target data are abnormal data;
and if at least one detection result of the target data in the detection results of the plurality of detection windows indicates that the target data is normal, determining that the target data is normal data.
8. The method of claim 3, wherein determining the detection result of the target data in the target detection window according to the first data in the current data stream in the target detection window and the second data in the historical data stream in the target detection window comprises:
if the detection result of the target data in the target detection window determined according to the first data and the detection result of the target data in the target detection window determined according to the second data both indicate that the target data is abnormal, determining that the detection result of the target data in the target detection window indicates that the target data is abnormal;
and if at least one of the detection result of the target data in the target detection window determined according to the first data and the detection result of the target data in the target detection window determined according to the second data indicates that the target data is normal, determining that the detection result of the target data in the target detection window indicates that the target data is normal.
9. The method of claim 1, wherein obtaining the target data in the current data stream precedes the detection results of the plurality of detection windows, further comprising:
detecting the unmarked data and the data marked as abnormal in the detection window once every time the detection window slides by one step;
marking corresponding data according to the detection results of the unmarked data and the data marked as abnormal in the current detection window, and keeping the mark of the data marked as normal in the detection window unchanged.
10. The method of claim 1, wherein obtaining the target data in the current data stream precedes the detection results of the plurality of detection windows, further comprising:
and detecting all data in the detection window once every time the detection window slides by one step, and marking all data in the detection window according to a detection result.
11. A data detection apparatus, comprising:
the acquisition module is used for traversing the current data stream by adopting the detection windows in a preset step length to acquire the detection results of the target data in the current data stream in the plurality of detection windows; the preset step length is smaller than the width of the detection window;
and the determining module is used for determining whether the target data is abnormal or not according to the detection results of the target data in the plurality of detection windows.
CN202011350055.7A 2020-11-26 2020-11-26 Data detection method and device Pending CN112286951A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011350055.7A CN112286951A (en) 2020-11-26 2020-11-26 Data detection method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011350055.7A CN112286951A (en) 2020-11-26 2020-11-26 Data detection method and device

Publications (1)

Publication Number Publication Date
CN112286951A true CN112286951A (en) 2021-01-29

Family

ID=74426412

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011350055.7A Pending CN112286951A (en) 2020-11-26 2020-11-26 Data detection method and device

Country Status (1)

Country Link
CN (1) CN112286951A (en)

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130085715A1 (en) * 2011-09-29 2013-04-04 Choudur Lakshminarayan Anomaly detection in streaming data
WO2015119607A1 (en) * 2014-02-06 2015-08-13 Hewlett-Packard Development Company, L.P. Resource management
US9298788B1 (en) * 2013-03-11 2016-03-29 DataTorrent, Inc. Checkpointing in distributed streaming platform for real-time applications
CN107682319A (en) * 2017-09-13 2018-02-09 桂林电子科技大学 A kind of method of data flow anomaly detection and multiple-authentication based on enhanced angle Outlier factor
CN109587001A (en) * 2018-11-15 2019-04-05 新华三信息安全技术有限公司 A kind of performance indicator method for detecting abnormality and device
EP3623964A1 (en) * 2018-09-14 2020-03-18 Verint Americas Inc. Framework for the automated determination of classes and anomaly detection methods for time series
US20200097852A1 (en) * 2018-09-20 2020-03-26 Cable Television Laboratories, Inc. Systems and methods for detecting and grouping anomalies in data
CN110928255A (en) * 2019-11-20 2020-03-27 珠海格力电器股份有限公司 Data anomaly statistical alarm method and device and electronic equipment
CN111178456A (en) * 2020-01-15 2020-05-19 腾讯科技(深圳)有限公司 Abnormal index detection method and device, computer equipment and storage medium
WO2020134032A1 (en) * 2018-12-28 2020-07-02 中国银联股份有限公司 Method for detecting abnormality of service system, and apparatus therefor
CN111400721A (en) * 2020-03-24 2020-07-10 杭州数梦工场科技有限公司 API interface detection method and device
CN111538897A (en) * 2020-03-16 2020-08-14 北京三快在线科技有限公司 Recommended abnormality detection method and device, electronic equipment and readable storage medium
US20200314159A1 (en) * 2019-03-29 2020-10-01 Paypal, Inc. Anomaly detection for streaming data
CN111858680A (en) * 2020-08-01 2020-10-30 西安交通大学 System and method for rapidly detecting satellite telemetry time sequence data abnormity in real time

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130085715A1 (en) * 2011-09-29 2013-04-04 Choudur Lakshminarayan Anomaly detection in streaming data
US9298788B1 (en) * 2013-03-11 2016-03-29 DataTorrent, Inc. Checkpointing in distributed streaming platform for real-time applications
WO2015119607A1 (en) * 2014-02-06 2015-08-13 Hewlett-Packard Development Company, L.P. Resource management
CN107682319A (en) * 2017-09-13 2018-02-09 桂林电子科技大学 A kind of method of data flow anomaly detection and multiple-authentication based on enhanced angle Outlier factor
EP3623964A1 (en) * 2018-09-14 2020-03-18 Verint Americas Inc. Framework for the automated determination of classes and anomaly detection methods for time series
US20200097852A1 (en) * 2018-09-20 2020-03-26 Cable Television Laboratories, Inc. Systems and methods for detecting and grouping anomalies in data
CN109587001A (en) * 2018-11-15 2019-04-05 新华三信息安全技术有限公司 A kind of performance indicator method for detecting abnormality and device
WO2020134032A1 (en) * 2018-12-28 2020-07-02 中国银联股份有限公司 Method for detecting abnormality of service system, and apparatus therefor
US20200314159A1 (en) * 2019-03-29 2020-10-01 Paypal, Inc. Anomaly detection for streaming data
CN110928255A (en) * 2019-11-20 2020-03-27 珠海格力电器股份有限公司 Data anomaly statistical alarm method and device and electronic equipment
CN111178456A (en) * 2020-01-15 2020-05-19 腾讯科技(深圳)有限公司 Abnormal index detection method and device, computer equipment and storage medium
CN111538897A (en) * 2020-03-16 2020-08-14 北京三快在线科技有限公司 Recommended abnormality detection method and device, electronic equipment and readable storage medium
CN111400721A (en) * 2020-03-24 2020-07-10 杭州数梦工场科技有限公司 API interface detection method and device
CN111858680A (en) * 2020-08-01 2020-10-30 西安交通大学 System and method for rapidly detecting satellite telemetry time sequence data abnormity in real time

Similar Documents

Publication Publication Date Title
US11599825B2 (en) Method and apparatus for training trajectory classification model, and electronic device
CN109086734B (en) Method and device for positioning pupil image in human eye image
CN104978578A (en) Mobile phone photo taking text image quality evaluation method
CN109410172B (en) Paper thickness detection method and device, storage medium and processor
CN111383246B (en) Scroll detection method, device and equipment
CN110008247B (en) Method, device and equipment for determining abnormal source and computer readable storage medium
CN102680481A (en) Detection method for cotton fiber impurities
US11925498B2 (en) Reconstructing image
WO2021017000A1 (en) Method and apparatus for acquiring meter reading, and memory, processor and terminal
CN110827245A (en) Method and equipment for detecting screen display disconnection
AU2019200861B2 (en) Unobtrusive and automated detection of frequencies of spatially located distinct parts of a machine
CN109523557B (en) Image semantic segmentation labeling method, device and storage medium
Lauridsen et al. Reading circular analogue gauges using digital image processing
CN113723467A (en) Sample collection method, device and equipment for defect detection
CN113225667B (en) Method and device for eliminating non-direct path of arrival time measurement value and terminal
CN112286951A (en) Data detection method and device
CN111428858A (en) Method and device for determining number of samples, electronic equipment and storage medium
CN116774986A (en) Automatic evaluation method and device for software development workload, storage medium and processor
US10977482B2 (en) Object attribution analyzing method and related object attribution analyzing device
CN110991370B (en) Multichannel information fusion ATM panel carryover detection method
CN110098983B (en) Abnormal flow detection method and device
CN111546793A (en) Processing method and system for paper detection threshold of printer
CN111132052A (en) Intelligent safety campus positioning method, system, equipment and readable storage medium
CN114881908B (en) Abnormal pixel identification method, device and equipment and computer storage medium
CN116452924B (en) Model threshold adjustment method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination