WO2022141746A1 - Method for detecting anomaly in water quality and electronic device - Google Patents

Method for detecting anomaly in water quality and electronic device Download PDF

Info

Publication number
WO2022141746A1
WO2022141746A1 PCT/CN2021/075420 CN2021075420W WO2022141746A1 WO 2022141746 A1 WO2022141746 A1 WO 2022141746A1 CN 2021075420 W CN2021075420 W CN 2021075420W WO 2022141746 A1 WO2022141746 A1 WO 2022141746A1
Authority
WO
WIPO (PCT)
Prior art keywords
water quality
outlier
data
data set
threshold
Prior art date
Application number
PCT/CN2021/075420
Other languages
French (fr)
Chinese (zh)
Inventor
许红龙
郭沛清
Original Assignee
佛山科学技术学院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 佛山科学技术学院 filed Critical 佛山科学技术学院
Publication of WO2022141746A1 publication Critical patent/WO2022141746A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01KMEASURING TEMPERATURE; MEASURING QUANTITY OF HEAT; THERMALLY-SENSITIVE ELEMENTS NOT OTHERWISE PROVIDED FOR
    • G01K13/00Thermometers specially adapted for specific purposes
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/18Water
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/18Water
    • G01N33/1806Biological oxygen demand [BOD] or chemical oxygen demand [COD]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24147Distances to closest patterns, e.g. nearest neighbour classification

Definitions

  • the invention relates to the technical field of water quality detection, in particular to a water quality abnormality detection method and electronic equipment.
  • Water is essential for aquatic life, human life and industrial production.
  • the cleanliness of the water body and the content of various chemical components are an important basis for determining the use of water sources and environmental protection work.
  • the limited environmental protection resources determine that sewage treatment must be targeted, focusing on some areas, rather than casting a wide net.
  • Water quality monitoring and analysis involves chemical oxygen demand COD, ammonia nitrogen, total phosphorus, dissolved oxygen, etc. and various heavy metal content indicators, and the indicators included in different water quality monitoring and analysis instruments are not exactly the same. After the values of these indicators are measured by instruments, they are then analyzed and ranked, and the key water areas to be treated are determined according to the water quality samples at the top of the ranking and combined with the regional conditions.
  • the existing water quality anomaly detection methods it is widely used to set abnormal thresholds for various water quality indicators involved in water quality monitoring and analysis instruments. If the thresholds are exceeded, the indicators are considered abnormal, and the method is hereinafter referred to as the indicator threshold method.
  • the distance-based outlier detection algorithm is applied to realize the abnormal detection of water quality for the data of different monitoring and analysis instruments. , and in order to speed up the outlier detection process, it utilizes the priori water quality anomalies to increase the outlier threshold to improve the detection speed, hereinafter referred to as the a priori threshold method.
  • the disadvantage of the indicator threshold method is that it requires domain expert knowledge to set abnormal thresholds for each indicator, and because multiple water quality indicators are used at the same time, it is more difficult to judge which water quality samples are abnormal and the order of abnormality.
  • the a priori threshold method is a distance-based outlier detection algorithm. It can automatically give the most abnormal N water quality samples without domain expert knowledge, but its acceleration effect depends on the abnormal water quality samples (prior samples) known in advance. If the number of abnormal samples known in advance is too small, or there are no such abnormal samples, the acceleration effect of detection will be greatly reduced or even not accelerated. Furthermore, even if the number of prior samples is sufficient, in order to obtain the prior threshold, it is still necessary to detect their outliers based on the global dataset.
  • the present invention provides a method and electronic device for detecting abnormal water quality, so as to solve one or more technical problems existing in the prior art, and at least provide a beneficial choice or create conditions.
  • an embodiment of the present invention provides a method for detecting abnormal water quality, including:
  • S107 Divide the ordered water quality data set into multiple data blocks, use the pre-threshold as the outlier threshold, perform outlier detection on each data block in turn, and determine the maximum N of the detected data blocks according to the outlier threshold outliers, update the outlier threshold to the Nth largest outlier of the detected data block, and use the updated outlier threshold as the judgment criterion for outlier detection in the next data block until all data blocks are detected.
  • the water quality arrays corresponding to the maximum N outlier degrees of all data blocks are regarded as abnormal N water quality arrays.
  • the k nearest neighbors of each object in the ordered one-dimensional data set determined in step S105 include:
  • any object in the ordered one-dimensional data set is denoted as O
  • search k objects forward, when k1 ⁇ k, search forward k1 objects;
  • step S107 is specifically:
  • step S203 determine whether t is 1, if yes, go to step S205, if not, go to step S204;
  • step S204 determine whether d0+the outlier degree of the reference point ⁇ outlier degree threshold, wherein d0 is the distance between the first water quality array in the t-th data block and the reference point, if so, execute step S215, if not, execute S205;
  • step S207 determine whether Xm has been removed, if yes, then go to step S211, if not, go to step S208;
  • Step S209 determine whether j ⁇ k, if so, execute step S211, if not, update the temporary k nearest neighbor of Xm, update the temporary outlier degree of Xm to be the distance between Xm and the kth nearest neighbor of the temporary k nearest neighbors, execute Step S210;
  • the N water quality arrays corresponding to the maximum N outlier degrees of all the currently detected data blocks are taken as the abnormal N water quality data groups.
  • the distance between all the arrays and the reference point is calculated by the method of calculating the distance between the two water quality arrays, and the method for calculating the distance between the two water quality arrays includes:
  • x 11 , x 12 ,...,x 1n represent the normalized data of the different physical quantities of the water quality array x1
  • x 21 , x 22 ,...,x 2n represent the normalized processing of the different physical quantities of the water quality array x2
  • dist(x1, x2) represents the distance between the water quality array x1 and x2.
  • each water quality array includes at least one of chemical oxygen demand data, ammonia nitrogen data, total phosphorus data, and dissolved oxygen data.
  • an embodiment of the present invention also provides an electronic device, including:
  • a memory for storing a computer-readable program
  • the computer-readable program when executed by the processor, causes the processor to implement the method of any one of claims 1-5.
  • One of the embodiments of the present invention has at least the following beneficial effects: calculating the distance values between all water quality arrays in the water quality data set and the reference point, and forming all the distance values into a one-dimensional data set; finding the k nearest to each object in the one-dimensional data set Neighbor, determine the pre-threshold, in addition, divide the ordered water quality data set into multiple data blocks, take the pre-threshold as the outlier threshold, perform outlier detection on each data block in turn, and update the outlier threshold to The Nth largest outlier degree of the detected data block is used, and the updated outlier degree threshold is used as the judgment criterion for outlier detection in the next data block. Without the need to know some abnormal water quality points in advance, and do not need to calculate the global outlier degree, the outlier detection speed is improved, and the detection results of water quality abnormality are guaranteed to be consistent with the traditional distance-based outlier detection algorithm.
  • FIG. 1 is a flowchart of a method for detecting abnormal water quality according to an embodiment of the present invention.
  • FIG. 2 is a flowchart of a method for detecting abnormal water quality in a water quality data set provided by an embodiment of the present invention.
  • Data block a unit of outlier detection, consisting of several objects in the data set, for example, 1000 objects are commonly used as a data block.
  • k nearest neighbors refers to the distance between object A and all objects in the dataset, and the k corresponding objects with the smallest distance value are the nearest neighbors of A.
  • Temporary k nearest neighbors refers to the calculation of the distance between object A and some objects in the dataset, and the k corresponding objects with the smallest distance value are the temporary k nearest neighbors of A.
  • the kth nearest neighbor refers to the k distance values between object A and its k nearest neighbors. The distance values are sorted from small to large, and the object corresponding to the kth distance value is the kth nearest neighbor of object A.
  • the temporary kth nearest neighbor refers to the k distance values between object A and its temporary k nearest neighbors.
  • the distance values are sorted from small to large, and the object corresponding to the kth distance value is the temporary kth nearest neighbor of object A.
  • Outlier degree of object A refers to the distance value of object A and its kth nearest neighbor.
  • Temporary outlier degree of object A refers to the distance value of object A and its temporary k-th nearest neighbor.
  • Fig. 1 is a kind of water quality abnormal detection method provided by the embodiment of the present invention, including:
  • Each water quality array has the same dimension and includes at least one water quality data
  • Each water quality array is multi-dimensional data, including at least one water quality data, and each water quality array includes at least one of chemical oxygen demand data, ammonia nitrogen data, total phosphorus data, dissolved oxygen data, temperature data, turbidity data, pH value, etc. .
  • chemical oxygen demand data ammonia nitrogen data
  • total phosphorus data total phosphorus data
  • dissolved oxygen data temperature data
  • turbidity data pH value, etc.
  • the method for calculating the distance between two water quality arrays is:
  • x 11 , x 12 ,...,x 1n represent the normalized data of the different physical quantities of the water quality array x1
  • x 21 , x 22 ,...,x 2n represent the normalized processing of the different physical quantities of the water quality array x2
  • dist(x1, x2) represents the distance between the water quality array x1 and x2.
  • D is the number of water quality arrays, and the value of D is generally relatively large, which can be more than tens of thousands.
  • Determining the k-nearest neighbors of each object of the dataset includes:
  • any object of the ordered one-dimensional data set is denoted as O
  • search k objects forward, when k1 ⁇ k, search forward k1 objects;
  • the ordered one-dimensional data set is sorted in order, when determining the k nearest neighbors of the object O, only the distances of the k objects before and after the object O need to be calculated, and the distance between the object O and all objects does not need to be calculated, reducing the calculation time.
  • the accuracy of the detection results is ensured by setting the pre-threshold according to the data set.
  • the principle is as follows: Due to the triangular inequality of distances, the distance between each object in the data set and the reference point is calculated, so that after mapping to a one-dimensional space, the objects are two by two. The distance between them (called one-dimensional space distance) is less than or equal to their actual distance (multi-dimensional space distance); further, search k nearest neighbors for object s a in one-dimensional space, then the k nearest neighbors and s a The one-dimensional space distances are all less than or equal to the multi-dimensional space distances. It can be further deduced that the one-dimensional outlier degree of s a is less than or equal to the multi-dimensional outlier degree.
  • the one-dimensional outlier degrees of all objects are less than their The multi-dimensional outlier degree of ; take the N objects with the largest one-dimensional outlier degree, and the smallest one-dimensional outlier degree (ie, the Nth largest) is used as the pre-threshold Tb, and similarly it can be proved that Tb is less than or equal to the multi-dimensional outlier degree threshold ; Multi-dimensional outlier threshold, which is the outlier degree of the Nth largest water quality abnormal point to be detected. It is less than or equal to the pre-threshold Tb of this value to exclude non-outlier points, which obviously will not cause false exclusion, thus ensuring the detection result. correctness.
  • S107 Divide the ordered water quality data set into multiple data blocks, use the pre-threshold as the outlier threshold, perform outlier detection on each data block in turn, and determine the maximum N of the detected data blocks according to the outlier threshold outliers, update the outlier threshold to the Nth largest outlier of the detected data block, and use the updated outlier threshold as the judgment criterion for outlier detection in the next data block until all data blocks are detected.
  • the water quality arrays corresponding to the maximum N outlier degrees of all data blocks are regarded as abnormal N water quality arrays.
  • step S107 is specifically:
  • step S203 determine whether t is 1, if yes, go to step S205, if not, go to step S204;
  • step S204 determine whether d0+the outlier degree of the reference point ⁇ outlier degree threshold, wherein d0 is the distance between the first water quality array in the t-th data block and the reference point, if so, execute step S215, if not, execute S205;
  • the distance between the first water quality array in the t-th data block and the reference point, and the outlier degree of the reference point are calculated and stored in step S103.
  • the water quality arrays in the data block are arranged in order.
  • t is greater than or equal to 2
  • the first water quality array in the data block satisfies the termination rule, that is, the outlier degree of d0 + reference point ⁇ outlier degree threshold, it means that The first water quality array is not an outlier, and the other water quality arrays and other data blocks in this data block are not outliers, and the entire data set does not need to be tested.
  • the first water quality array is used to judge, and when it is determined that the termination rule is met, the detection is stopped and the detection result is output, which greatly shortens the detection time.
  • the ordered water quality data sets are arranged in order, and the objects that are close to each other are also arranged close to each other. Therefore, taking the object in the middle of the data block, for example, the data block is 1000 objects, the 500th or 501st object can be selected.
  • the object is the median object, and the k nearest neighbors are searched spirally (alternately searching for its front and back).
  • the data block is [X1, . , ..., XM], but the position number is still the number in the original data block.
  • step S207 determine whether Xm has been removed, if yes, then go to step S211, if not, go to step S208;
  • the non-outlier points will be deleted. Since m adopts the position number of the initial data block, it is necessary to judge whether the water quality at the position number m has been removed. to be processed.
  • step S209 determine whether j ⁇ k, if so, execute step S211, if not, update the temporary k nearest neighbor of Xm, update the temporary outlier degree of Xm to be the distance between Xm and the temporary kth nearest neighbor, and execute step S210;
  • the distance value between Xm and its temporary k-th nearest neighbor is gradually smaller or unchanged during the period (because the k-nearest neighbor update is also the k smallest, which is impossible. Take a larger value), that is, the temporary outlier cannot become larger, but may only remain unchanged or become smaller, and if the temporary outlier is less than the outlier threshold, it is determined that it is not an outlier. Therefore, once it is found that the temporary outlier degree is less than the outlier degree threshold, it can be directly excluded as a non-outlier point, and there is no need to continue searching for its k nearest neighbors.
  • the Nth largest outlier degree in the first data block is directly used as a new outlier degree threshold.
  • t>1 take the largest N outliers from the largest N outliers in the 1st to (t-1) data blocks and the largest N outliers in the tth data block
  • the Nth largest outlier in the 1st to tth data blocks is taken as a new outlier threshold.
  • the N water quality arrays corresponding to the maximum N outlier degrees of all the currently detected data blocks are taken as the abnormal N water quality data groups.
  • the present invention also provides an electronic device, comprising:
  • a memory for storing a computer-readable program
  • the processor When the computer-readable program is executed by the processor, the processor is caused to implement the control method as in the above-described embodiment.
  • Computer storage media include, but are not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disk (DVD) or other optical disk storage, magnetic cartridges, magnetic tape, magnetic disk storage or other magnetic storage devices, or may Any other medium used to store desired information and which can be accessed by a computer.
  • communication media typically embodies computer readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transport mechanism, and can include any information delivery media, as is well known to those of ordinary skill in the art .

Landscapes

  • Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Chemical & Material Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Analytical Chemistry (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Pathology (AREA)
  • Immunology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Food Science & Technology (AREA)
  • Medicinal Chemistry (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Emergency Medicine (AREA)
  • Biomedical Technology (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Examining Or Testing Airtightness (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A method for detecting an anomaly in water quality and an electronic device. The method for detecting the anomaly in water quality comprises: calculating distance values between all water quality arrays in a water quality data set and a reference point, and forming all the distance values into a one-dimensional data set; calculating k nearest neighbors of each object of the one-dimensional data set; determining a pre-threshold; and dividing the ordered water quality data set into a plurality of data blocks, taking the pre-threshold as an outlier threshold, sequentially performing outlier detection on each data block, updating the outlier threshold to an N-th largest outlier of the detected data blocks, and taking the updated outlier threshold as a determination standard for performing outlier detection on the next data block. In the case that some of water quality abnormal points do not need to be known in advance and the global outlier does not need to be calculated, the outlier detection speed is accelerated, and it is guaranteed that the water quality abnormal detection result is consistent with the traditional distance-based outlier detection algorithm.

Description

一种水质异常检测方法及电子设备A kind of water quality abnormal detection method and electronic equipment 技术领域technical field
本发明涉及水质检测技术领域,尤其涉及一种水质异常检测方法及电子设备。The invention relates to the technical field of water quality detection, in particular to a water quality abnormality detection method and electronic equipment.
背景技术Background technique
水对于水生生物、人类生活和工业生产至关重要。水体的洁净程度和各种化学成分含量,是确定水源用途和环保工作的重要依据。尤其是水体环保工作,环保资源的有限性决定了治理污水必须有的放矢,针对部分区域重点治理,而非广撒网。水质监测分析涉及化学需氧量COD、氨氮、总磷、溶解氧等及多种重金属含量指标,而不同的水质监测分析仪器囊括的指标并不完全相同。在通过仪器测定这些指标的数值之后,再分析和排序,根据排序靠前的水质样本,结合所在区域情况,确定重点治理水域。Water is essential for aquatic life, human life and industrial production. The cleanliness of the water body and the content of various chemical components are an important basis for determining the use of water sources and environmental protection work. Especially in water body environmental protection work, the limited environmental protection resources determine that sewage treatment must be targeted, focusing on some areas, rather than casting a wide net. Water quality monitoring and analysis involves chemical oxygen demand COD, ammonia nitrogen, total phosphorus, dissolved oxygen, etc. and various heavy metal content indicators, and the indicators included in different water quality monitoring and analysis instruments are not exactly the same. After the values of these indicators are measured by instruments, they are then analyzed and ranked, and the key water areas to be treated are determined according to the water quality samples at the top of the ranking and combined with the regional conditions.
在现有的水质异常检测方法中,广泛应用的是为水质监测分析仪器涉及的各项水质指标设定异常阈值,超过该阈值则视为该指标异常,下文称该方法为指标阈值法。在申请号为201910560024.5的发明专利“一种基于先验知识的异常水质检测方法及系统”中,应用了基于距离的离群检测算法,实现了对于不同监测分析仪器数据均能作水质的异常检测,且为加速离群检测过程,其利用了先验的水质异常点,提高离群度阈值,以提高检测速度,下文称该法为先验阈值法。In the existing water quality anomaly detection methods, it is widely used to set abnormal thresholds for various water quality indicators involved in water quality monitoring and analysis instruments. If the thresholds are exceeded, the indicators are considered abnormal, and the method is hereinafter referred to as the indicator threshold method. In the invention patent with the application number of 201910560024.5, "A Method and System for Abnormal Water Quality Detection Based on Prior Knowledge", the distance-based outlier detection algorithm is applied to realize the abnormal detection of water quality for the data of different monitoring and analysis instruments. , and in order to speed up the outlier detection process, it utilizes the priori water quality anomalies to increase the outlier threshold to improve the detection speed, hereinafter referred to as the a priori threshold method.
指标阈值法的缺陷在于需要领域专家知识,为之设定各个指标的异常阈值,且因同时使用多个水质指标,更难以判断哪些水质样本为异常,以及异常程度的排序。先验阈值法属于基于距离的离群检测算法,不需要领域专家知识,即可自动给出最异常的N个水质样本,但其加速效果依赖于预先知道的异常水质样本(先验样本),如果预先知道的异常样本数量过少,或者没有这些异常样本,检测的加速效果将大打折扣,甚至没能加速。此外,即使先验样本数量足够,为了获得先验阈值,仍然需要基于全局数据集检测其离群度。The disadvantage of the indicator threshold method is that it requires domain expert knowledge to set abnormal thresholds for each indicator, and because multiple water quality indicators are used at the same time, it is more difficult to judge which water quality samples are abnormal and the order of abnormality. The a priori threshold method is a distance-based outlier detection algorithm. It can automatically give the most abnormal N water quality samples without domain expert knowledge, but its acceleration effect depends on the abnormal water quality samples (prior samples) known in advance. If the number of abnormal samples known in advance is too small, or there are no such abnormal samples, the acceleration effect of detection will be greatly reduced or even not accelerated. Furthermore, even if the number of prior samples is sufficient, in order to obtain the prior threshold, it is still necessary to detect their outliers based on the global dataset.
发明内容SUMMARY OF THE INVENTION
本发明提供一种水质异常检测方法及电子设备,以解决现有技术中所存在的一个或多个技术问题,至少提供一种有益的选择或创造条件。The present invention provides a method and electronic device for detecting abnormal water quality, so as to solve one or more technical problems existing in the prior art, and at least provide a beneficial choice or create conditions.
第一方面,本发明实施例提供了一种水质异常检测方法,包括:In a first aspect, an embodiment of the present invention provides a method for detecting abnormal water quality, including:
S101、获取多个水质数组,组成水质数据集,每个水质数组的维度相同,包括至少一个 水质数据;S101. Acquire multiple water quality arrays to form a water quality data set, and each water quality array has the same dimension and includes at least one water quality data;
S102、在水质数据集中随机选择一个水质数组作为基准点;S102, randomly select a water quality array as a reference point in the water quality data set;
S103、计算水质数据集中所有水质数组与基准点的距离值,将所有距离值构成一维数据集;S103. Calculate the distance values between all water quality arrays in the water quality data set and the reference point, and form all the distance values into a one-dimensional data set;
S104、对一维数据集的所有距离值进行降序排序得到有序的一维数据集,根据所述降序排序的顺序对水质数据集的所有水质数组进行排序得到有序的水质数据集,S104, performing descending sorting on all distance values of the one-dimensional data set to obtain an ordered one-dimensional data set, and sorting all the water quality arrays in the water quality data set according to the descending order to obtain an ordered water quality data set,
S105、确定有序的一维数据集的每个对象的k最近邻,1≤k≤D*1%,其中D为水质数据集中水质数组的数量;S105. Determine the k nearest neighbors of each object in the ordered one-dimensional data set, 1≤k≤D*1%, where D is the number of water quality arrays in the water quality data set;
S106、计算有序的一维数据集的每个对象与其第k最近邻的距离值得到每个对象的离群度,一维数据集的所有对象的离群度构成一维离群度,根据一维离群度中每个离群度的大小,按从大到小的顺序选取最大N个离群度,并将第N大的离群度作为预阈值,其中第k最近邻为k最近邻中的第k个;S106. Calculate the distance value between each object of the ordered one-dimensional data set and its k-th nearest neighbor to obtain the outlier degree of each object, and the outlier degrees of all objects in the one-dimensional data set constitute the one-dimensional outlier degree, according to The size of each outlier in one-dimensional outliers, select the largest N outliers in descending order, and use the Nth largest outlier as the pre-threshold, where the kth nearest neighbor is the k nearest neighbor the kth in the neighborhood;
S107、将有序的水质数据集划分为多个数据块,将预阈值作为离群度阈值,依次对每个数据块进行离群检测,根据离群度阈值确定已检测的数据块的最大N个离群度,将离群度阈值更新为已检测的数据块的第N大的离群度,将更新的离群度阈值作为下一个数据块进行离群检测的判断标准,直至所有数据块检测完毕,所有数据块的最大N个离群度对应的水质数组作为异常的N个水质数组。S107: Divide the ordered water quality data set into multiple data blocks, use the pre-threshold as the outlier threshold, perform outlier detection on each data block in turn, and determine the maximum N of the detected data blocks according to the outlier threshold outliers, update the outlier threshold to the Nth largest outlier of the detected data block, and use the updated outlier threshold as the judgment criterion for outlier detection in the next data block until all data blocks are detected. After the detection is completed, the water quality arrays corresponding to the maximum N outlier degrees of all data blocks are regarded as abnormal N water quality arrays.
进一步,步骤S105中确定有序的一维数据集的每个对象的k最近邻包括:Further, the k nearest neighbors of each object in the ordered one-dimensional data set determined in step S105 include:
假设有序的一维数据集的任一对象记为O,对象O的前面存在k1个对象,对象O的后面存在k2个对象,其中k1≥0,k2≥0;Assuming that any object in the ordered one-dimensional data set is denoted as O, there are k1 objects in front of object O, and k2 objects in the back of object O, where k1≥0, k2≥0;
当k1≥k,往前搜索k个对象,当k1<k时,往前搜索k1个对象;When k1≥k, search k objects forward, when k1<k, search forward k1 objects;
当k2≥k,往后搜索k个对象,当k2<k时,往后搜索k2个对象;When k2≥k, search k objects backward, when k2<k, search backward k2 objects;
计算对象O与所有搜索的对象的距离,根据距离的大小将搜索到的对象按从小到大排序,距离排名前k的对象为对象O的k最近邻。Calculate the distance between object O and all searched objects, sort the searched objects from small to large according to the size of the distance, and the objects with the top k distances are the k nearest neighbors of object O.
进一步,步骤S107具体为:Further, step S107 is specifically:
S201、将有序的水质数据集划分为B个数据块,每个数据块包括M个水质数组,离群度阈值=预阈值;S201. Divide the ordered water quality data set into B data blocks, each data block includes M water quality arrays, outlier threshold=pre-threshold;
S202、设t=1,t表示第t个数据块;S202, set t=1, and t represents the t-th data block;
S203、判断t是否为1,若是,执行步骤S205,若否,执行步骤S204;S203, determine whether t is 1, if yes, go to step S205, if not, go to step S204;
S204、判断是否d0+基准点的离群度<离群度阈值,其中d0为第t个数据块中的第1个水质数组与基准点的距离,若是,执行步骤S215,若否执行S205;S204, determine whether d0+the outlier degree of the reference point<outlier degree threshold, wherein d0 is the distance between the first water quality array in the t-th data block and the reference point, if so, execute step S215, if not, execute S205;
S205、从有序的水质数据集的第t个数据块的中位对象起,按螺旋顺序确定按螺旋顺序排序的水质数据集,xj表示按螺旋顺序排序的水质数据集的水质数据组,j=1;S205. From the median object of the t-th data block of the ordered water quality data set, determine the water quality data set sorted in the spiral order according to the spiral order, xj represents the water quality data group of the water quality data set sorted in the spiral order, j =1;
S206、设m=1,m表示水质数组在初始第t个数据块中的位置编号,Xm表示编号为m的水质数组;S206, set m=1, m represents the position number of the water quality array in the initial t-th data block, and Xm represents the water quality array numbered m;
S207、判断Xm是否已被移除,若是,则执行步骤S211,若否,则执行步骤S208;S207, determine whether Xm has been removed, if yes, then go to step S211, if not, go to step S208;
S208、计算Xm与xj的距离;S208, calculate the distance between Xm and xj;
S209、判断是否j<k,若是,则执行步骤S211,若否,更新Xm的临时k最近邻,更新Xm的临时离群度为Xm与临时k最近邻中的第k最近邻的距离,执行步骤S210;S209, determine whether j<k, if so, execute step S211, if not, update the temporary k nearest neighbor of Xm, update the temporary outlier degree of Xm to be the distance between Xm and the kth nearest neighbor of the temporary k nearest neighbors, execute Step S210;
S210、判断Xm的临时离群度是否低于离群度阈值;若判断结果为是,则将Xm从第t个数据块中移除,执行步骤S211;若判断结果为否,执行步骤S211;S210, determine whether the temporary outlier degree of Xm is lower than the outlier degree threshold; if the determination result is yes, remove Xm from the t-th data block, and execute step S211; if the determination result is no, execute step S211;
S211、判断m是否小于M,若是,m=m+1,执行步骤S207,若否,执行步骤S212;S211, determine whether m is less than M, if yes, m=m+1, go to step S207, if not, go to step S212;
S212、判断j是否小于D,若是,则j=j+1,执行步骤S206;若否,执行步骤S213;S212, determine whether j is less than D, if yes, then j=j+1, go to step S206; if not, go to step S213;
S213、当t=1时,确定第t个数据块中的最大的N个离群度,取第N大的离群度作为离群度阈值,执行步骤S214;当t>1时,确定第1至(t-1)个数据块中的最大N个离群度,从第1至(t-1)个数据块中的最大N个离群度和第t个数据块中的最大N个离群度中确定第1至t个数据块中的最大N个离群度,离群度阈值=第1至t个数据块中的第N大离群度,执行步骤S214;S213. When t=1, determine the largest N outlier degrees in the t-th data block, take the N-th largest outlier degree as the outlier degree threshold, and execute step S214; when t>1, determine the N-th largest outlier degree Max N outliers in 1 to (t-1) blocks, from max N outliers in 1 to (t-1) blocks and max N in t block Determine the maximum N outlier degrees in the 1st to tth data blocks in the outlier degree, and the outlier degree threshold=the Nth largest outlier degree in the 1st to tth data blocks, and execute step S214;
S214、判断t是否小于B,若是,t=t+1,执行步骤S204,若否,执行步骤S215;S214, determine whether t is less than B, if yes, t=t+1, go to step S204, if not, go to step S215;
S215、当前已检测的所有数据块的最大N个离群度对应的N个水质数组作为异常的N个水质数据组。S215. The N water quality arrays corresponding to the maximum N outlier degrees of all the currently detected data blocks are taken as the abnormal N water quality data groups.
进一步,采用计算两个水质数组的距离的方法计算所有数组与基准点的距离,所述计算两个水质数组的距离的方法包括:Further, the distance between all the arrays and the reference point is calculated by the method of calculating the distance between the two water quality arrays, and the method for calculating the distance between the two water quality arrays includes:
假设2个水质数组分别为x1与x2,分别用n维变量表示,x1=(x 11,x 12,…,x 1n),x2=(x 21,x 22,…,x 2n),则两个x1与x2的距离为: Assuming that the two water quality arrays are x1 and x2, respectively represented by n-dimensional variables, x1=(x 11 , x 12 ,...,x 1n ), x2=(x 21 ,x 22 ,...,x 2n ), then the two The distance between x1 and x2 is:
Figure PCTCN2021075420-appb-000001
Figure PCTCN2021075420-appb-000001
其中,x 11,x 12,…,x 1n表示水质数组x1的不同物理量作归一化处理后的数据,x 21,x 22,…,x 2n表示水质数组x2的不同物理量作归一化处理后的数据,dist(x1,x2)表示水质数组x1与x2的距离。 Among them, x 11 , x 12 ,…,x 1n represent the normalized data of the different physical quantities of the water quality array x1, and x 21 , x 22 ,…,x 2n represent the normalized processing of the different physical quantities of the water quality array x2 After the data, dist(x1, x2) represents the distance between the water quality array x1 and x2.
进一步,n大于等于1,每个水质数组包括化学需氧量数据、氨氮数据、总磷数据、溶解氧数据中的至少一个。Further, n is greater than or equal to 1, and each water quality array includes at least one of chemical oxygen demand data, ammonia nitrogen data, total phosphorus data, and dissolved oxygen data.
第二方面,本发明实施例还提供了一种电子设备,包括:In a second aspect, an embodiment of the present invention also provides an electronic device, including:
处理器;processor;
存储器,用于存储计算机可读程序;a memory for storing a computer-readable program;
当所述计算机可读程序被所述处理器执行时,使得所述处理器实现如权利要求1-5任一项所述的方法。The computer-readable program, when executed by the processor, causes the processor to implement the method of any one of claims 1-5.
本发明实施例的一种,至少具有以下有益效果:计算水质数据集中所有水质数组与基准点的距离值,将所有距离值构成一维数据集;求一维数据集的每个对象的k最近邻,确定预阈值,另外,将有序的水质数据集划分为多个数据块,将预阈值作为离群度阈值,依次对每个数据块进行离群检测,将离群度阈值更新为已检测的数据块的第N大的离群度,将更新的离群度阈值作为下一个数据块进行离群检测的判断标准。在不需要预先知道部分水质异常点,也无须计算全局离群度的情况下,提高了离群检测速度,并且保证水质异常检测结果与传统基于距离的离群检测算法一致。One of the embodiments of the present invention has at least the following beneficial effects: calculating the distance values between all water quality arrays in the water quality data set and the reference point, and forming all the distance values into a one-dimensional data set; finding the k nearest to each object in the one-dimensional data set Neighbor, determine the pre-threshold, in addition, divide the ordered water quality data set into multiple data blocks, take the pre-threshold as the outlier threshold, perform outlier detection on each data block in turn, and update the outlier threshold to The Nth largest outlier degree of the detected data block is used, and the updated outlier degree threshold is used as the judgment criterion for outlier detection in the next data block. Without the need to know some abnormal water quality points in advance, and do not need to calculate the global outlier degree, the outlier detection speed is improved, and the detection results of water quality abnormality are guaranteed to be consistent with the traditional distance-based outlier detection algorithm.
附图说明Description of drawings
附图用来提供对本发明技术方案的进一步理解,并且构成说明书的一部分,与本发明的实施例一起用于解释本发明的技术方案,并不构成对本发明技术方案的限制。The accompanying drawings are used to provide a further understanding of the technical solutions of the present invention, and constitute a part of the description. They are used to explain the technical solutions of the present invention together with the embodiments of the present invention, and do not constitute a limitation on the technical solutions of the present invention.
图1是本发明实施例提供的一种水质异常检测方法的流程图。FIG. 1 is a flowchart of a method for detecting abnormal water quality according to an embodiment of the present invention.
图2是本发明实施例提供的一种对水质数据集进行水质异常检测的方法的流程图。FIG. 2 is a flowchart of a method for detecting abnormal water quality in a water quality data set provided by an embodiment of the present invention.
具体实施方式Detailed ways
为了使本发明的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本发明进行进一步详细说明。应当理解,此处所描述的具体实施例仅用以解释本发明,并不用于限定本发明。In order to make the objectives, technical solutions and advantages of the present invention clearer, the present invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present invention, but not to limit the present invention.
需要说明的是,虽然在系统示意图中进行了功能模块划分,在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于系统中的模块划分,或流程图中的顺序执行所示出或描述的步骤。说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。It should be noted that although the functional modules are divided in the schematic diagram of the system and the logical sequence is shown in the flow chart, in some cases, the modules may be divided differently from the system or executed in the order in the flow chart. steps shown or described. The terms "first", "second" and the like in the description and claims and the above drawings are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence.
本实施例中的术语介绍:Terms used in this example are introduced:
数据块:离群检测的一个单位,由数据集中的若干对象组成,例如常用1000个对象作为一个数据块。Data block: a unit of outlier detection, consisting of several objects in the data set, for example, 1000 objects are commonly used as a data block.
k最近邻:指对象A与数据集所有对象计算距离,距离值最小的k个对应的对象为A的最近邻。k nearest neighbors: refers to the distance between object A and all objects in the dataset, and the k corresponding objects with the smallest distance value are the nearest neighbors of A.
临时k最近邻:指对象A与数据集的部分对象计算距离,距离值最小的k个对应的对象 为A的临时k最近邻。Temporary k nearest neighbors: refers to the calculation of the distance between object A and some objects in the dataset, and the k corresponding objects with the smallest distance value are the temporary k nearest neighbors of A.
第k最近邻:指对象A与它的k最近邻的k个距离值中,距离值按从小到大排序,排名第k的距离值对应的对象为对象A的第k最近邻。The kth nearest neighbor: refers to the k distance values between object A and its k nearest neighbors. The distance values are sorted from small to large, and the object corresponding to the kth distance value is the kth nearest neighbor of object A.
临时第k最近邻,指对象A与它的临时k最近邻的k个距离值中,距离值按从小到大排序,排名第k的距离值对应的对象为对象A的临时第k最近邻。The temporary kth nearest neighbor refers to the k distance values between object A and its temporary k nearest neighbors. The distance values are sorted from small to large, and the object corresponding to the kth distance value is the temporary kth nearest neighbor of object A.
对象A的离群度:指对象A与它的第k最近邻的距离值。Outlier degree of object A: refers to the distance value of object A and its kth nearest neighbor.
对象A的临时离群度:指对象A与它的临时第k最近邻的距离值。Temporary outlier degree of object A: refers to the distance value of object A and its temporary k-th nearest neighbor.
螺旋顺序:假如有一个索引序列1、2、3、4、5、6、7、8、9、10,如果以5为起点,它的螺旋顺序就是5、4、6、3、7、2、8……,或者5、6、4、7、3、8、2……,就是一前一后、依次类推的意思。Spiral order: If there is an index sequence of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, if it starts at 5, its spiral order is 5, 4, 6, 3, 7, 2 , 8..., or 5, 6, 4, 7, 3, 8, 2..., which means one after the other, and so on.
图1是本发明实施例提供的一种水质异常检测方法,包括:Fig. 1 is a kind of water quality abnormal detection method provided by the embodiment of the present invention, including:
S101、获取多个水质数组,组成水质数据集,每个水质数组的维度相同,包括至少一个水质数据;S101. Acquire multiple water quality arrays to form a water quality data set. Each water quality array has the same dimension and includes at least one water quality data;
每个水质数组为多维数据,包括至少一个水质数据,每个水质数组包括化学需氧量数据、氨氮数据、总磷数据、溶解氧数据、温度数据、浊度数据、PH值等中的至少一个。本领域技术人员可以根据实际需要选取不同物理量的水质数据。Each water quality array is multi-dimensional data, including at least one water quality data, and each water quality array includes at least one of chemical oxygen demand data, ammonia nitrogen data, total phosphorus data, dissolved oxygen data, temperature data, turbidity data, pH value, etc. . Those skilled in the art can select water quality data of different physical quantities according to actual needs.
S102、在水质数据集中随机选择一个水质数组作为基准点;S102, randomly select a water quality array as a reference point in the water quality data set;
S103、计算水质数据集中所有水质数组与基准点的距离值,将所有距离值构成一维数据集;S103. Calculate the distance values between all water quality arrays in the water quality data set and the reference point, and form all the distance values into a one-dimensional data set;
具体地,采用欧式距离计算距离。计算两个水质数组的距离的方法为:Specifically, the Euclidean distance is used to calculate the distance. The method for calculating the distance between two water quality arrays is:
假设2个水质数组分别为x1与x2,分别用n维变量表示,x1=(x 11,x 12,…,x 1n),x2=(x 21,x 22,…,x 2n),则两个x1与x2的距离为: Assuming that the two water quality arrays are x1 and x2, respectively represented by n-dimensional variables, x1=(x 11 , x 12 ,...,x 1n ), x2=(x 21 ,x 22 ,...,x 2n ), then the two The distance between x1 and x2 is:
Figure PCTCN2021075420-appb-000002
Figure PCTCN2021075420-appb-000002
其中,x 11,x 12,…,x 1n表示水质数组x1的不同物理量作归一化处理后的数据,x 21,x 22,…,x 2n表示水质数组x2的不同物理量作归一化处理后的数据,dist(x1,x2)表示水质数组x1与x2的距离。 Among them, x 11 , x 12 ,…,x 1n represent the normalized data of the different physical quantities of the water quality array x1, and x 21 , x 22 ,…,x 2n represent the normalized processing of the different physical quantities of the water quality array x2 After the data, dist(x1, x2) represents the distance between the water quality array x1 and x2.
n根据实际情况确定,例如,n=4,水质数组包括化学需氧量数据、氨氮数据、总磷数据、溶解氧数据。n is determined according to the actual situation, for example, n=4, and the water quality array includes chemical oxygen demand data, ammonia nitrogen data, total phosphorus data, and dissolved oxygen data.
S104、对一维数据集的所有距离值进行降序排序得到有序的一维数据集,根据所述降序排序的顺序对水质数据集的所有水质数组进行排序得到有序的水质数据集;S104, performing descending sorting on all distance values of the one-dimensional data set to obtain an ordered one-dimensional data set, and sorting all the water quality arrays of the water quality data set according to the descending order to obtain an ordered water quality data set;
S105、确定有序的一维数据集的每个对象的k最近邻,1≤k≤D*1%,其中D为水质数 据集中水质数组的数量;S105. Determine the k nearest neighbors of each object in the ordered one-dimensional data set, 1≤k≤D*1%, where D is the number of water quality arrays in the water quality data set;
D为水质数组的数量,D的值一般比较大,可以是几万以上。D is the number of water quality arrays, and the value of D is generally relatively large, which can be more than tens of thousands.
确定数据集的每个对象的k最近邻包括:Determining the k-nearest neighbors of each object of the dataset includes:
假设有序的一维数据集的每个对象的有序的一维数据集的任一对象记为O,对象O的前面存在k1个对象,对象O的后面存在k2个对象,其中k1≥0,k2≥0;Assuming that each object of the ordered one-dimensional data set, any object of the ordered one-dimensional data set is denoted as O, there are k1 objects in front of the object O, and k2 objects exist behind the object O, where k1≥0 , k2≥0;
当k1≥k,往前搜索k个对象,当k1<k时,往前搜索k1个对象;When k1≥k, search k objects forward, when k1<k, search forward k1 objects;
当k2≥k,往后搜索k个对象,当k2<k时,往后搜索k2个对象;When k2≥k, search k objects backward, when k2<k, search backward k2 objects;
计算对象O与所有搜索的对象的距离,根据距离的大小将搜索到的对象按从小到大排序,距离排名前k的对象为对象O的k最近邻。Calculate the distance between object O and all searched objects, sort the searched objects from small to large according to the size of the distance, and the objects with the top k distances are the k nearest neighbors of object O.
由于有序的一维数据集是按照顺序排序的,因此在确定对象O的k最近邻时只需要计算对象O前后的k个对象的距离,不需要计算对象O与所有对象的距离,减少计算时间。Since the ordered one-dimensional data set is sorted in order, when determining the k nearest neighbors of the object O, only the distances of the k objects before and after the object O need to be calculated, and the distance between the object O and all objects does not need to be calculated, reducing the calculation time.
S106、计算有序的一维数据集的每个对象与其第k最近邻的距离值得到每个对象的离群度,一维数据集的所有对象的离群度构成一维离群度,根据一维离群度中每个离群度的大小,按从大到小的顺序选取最大N个离群度,并将第N大的离群度作为预阈值,其中第k最近邻为k最近邻中的第k个;S106. Calculate the distance value between each object of the ordered one-dimensional data set and its k-th nearest neighbor to obtain the outlier degree of each object, and the outlier degrees of all objects in the one-dimensional data set constitute the one-dimensional outlier degree, according to The size of each outlier in one-dimensional outliers, select the largest N outliers in descending order, and use the Nth largest outlier as the pre-threshold, where the kth nearest neighbor is the k nearest neighbor the kth in the neighborhood;
通过根据数据集来自行设置预阈值保证检测结果的准确性,原理如下:由于距离的三角不等性,数据集里每个对象与基准点计算距离,从而映射至一维空间之后,对象两两之间的距离(称为一维空间距离)小于或等于它们的实际距离(多维空间距离);进而,在一维空间为对象s a搜索k最近邻,那么这k个最近邻与s a的一维空间距离全都小于或等于多维空间距离,进一步可推导出s a的一维离群度小于或等于多维离群度,由s a的一般性可知所有对象的一维离群度都小于它们的多维离群度;取一维离群度最大的N个对象,其中最小的一维离群度(即第N大)作为预阈值Tb,同理可证Tb小于或等于多维离群度阈值;多维离群度阈值,即为拟检测的第N大水质异常点的离群度,小于或等于该值的预阈值Tb来排除非离群点,显然不会造成误排除,从而保证检测结果的正确性。 The accuracy of the detection results is ensured by setting the pre-threshold according to the data set. The principle is as follows: Due to the triangular inequality of distances, the distance between each object in the data set and the reference point is calculated, so that after mapping to a one-dimensional space, the objects are two by two. The distance between them (called one-dimensional space distance) is less than or equal to their actual distance (multi-dimensional space distance); further, search k nearest neighbors for object s a in one-dimensional space, then the k nearest neighbors and s a The one-dimensional space distances are all less than or equal to the multi-dimensional space distances. It can be further deduced that the one-dimensional outlier degree of s a is less than or equal to the multi-dimensional outlier degree. From the generality of s a , it can be known that the one-dimensional outlier degrees of all objects are less than their The multi-dimensional outlier degree of ; take the N objects with the largest one-dimensional outlier degree, and the smallest one-dimensional outlier degree (ie, the Nth largest) is used as the pre-threshold Tb, and similarly it can be proved that Tb is less than or equal to the multi-dimensional outlier degree threshold ; Multi-dimensional outlier threshold, which is the outlier degree of the Nth largest water quality abnormal point to be detected. It is less than or equal to the pre-threshold Tb of this value to exclude non-outlier points, which obviously will not cause false exclusion, thus ensuring the detection result. correctness.
S107、将有序的水质数据集划分为多个数据块,将预阈值作为离群度阈值,依次对每个数据块进行离群检测,根据离群度阈值确定已检测的数据块的最大N个离群度,将离群度阈值更新为已检测的数据块的第N大的离群度,将更新的离群度阈值作为下一个数据块进行离群检测的判断标准,直至所有数据块检测完毕,所有数据块的最大N个离群度对应的水质数组作为异常的N个水质数组。S107: Divide the ordered water quality data set into multiple data blocks, use the pre-threshold as the outlier threshold, perform outlier detection on each data block in turn, and determine the maximum N of the detected data blocks according to the outlier threshold outliers, update the outlier threshold to the Nth largest outlier of the detected data block, and use the updated outlier threshold as the judgment criterion for outlier detection in the next data block until all data blocks are detected. After the detection is completed, the water quality arrays corresponding to the maximum N outlier degrees of all data blocks are regarded as abnormal N water quality arrays.
如图2所示,步骤S107具体为:As shown in Figure 2, step S107 is specifically:
S201、将有序的水质数据集划分为B个数据块,每个数据块包括M个水质数组,离群 度阈值=预阈值;S201. Divide the ordered water quality data set into B data blocks, each data block includes M water quality arrays, and outlier threshold=pre-threshold;
S202、设t=1,t表示第t个数据块;S202, set t=1, and t represents the t-th data block;
S203、判断t是否为1,若是,执行步骤S205,若否,执行步骤S204;S203, determine whether t is 1, if yes, go to step S205, if not, go to step S204;
S204、判断是否d0+基准点的离群度<离群度阈值,其中d0为第t个数据块中的第1个水质数组与基准点的距离,若是,执行步骤S215,若否执行S205;S204, determine whether d0+the outlier degree of the reference point<outlier degree threshold, wherein d0 is the distance between the first water quality array in the t-th data block and the reference point, if so, execute step S215, if not, execute S205;
具体第,第t个数据块中的第1个水质数组与基准点的距离、基准点的离群度由步骤S103计算并保存。Specifically, the distance between the first water quality array in the t-th data block and the reference point, and the outlier degree of the reference point are calculated and stored in step S103.
数据块中的水质数组是按顺序排列的,当t大于等于2时,只要数据块中的第一个水质数组满足终止规则,即满足d0+基准点的离群度<离群度阈值,则说明第一个水质数组不是离群点,这个数据块中的其他水质数组和其他数据块也不是离群点,整个数据集都不用检测。先通过第一个水质数组进行判断,当确定满足终止规则时,停止检测输出检测结果,大大缩短检测时间。The water quality arrays in the data block are arranged in order. When t is greater than or equal to 2, as long as the first water quality array in the data block satisfies the termination rule, that is, the outlier degree of d0 + reference point < outlier degree threshold, it means that The first water quality array is not an outlier, and the other water quality arrays and other data blocks in this data block are not outliers, and the entire data set does not need to be tested. The first water quality array is used to judge, and when it is determined that the termination rule is met, the detection is stopped and the detection result is output, which greatly shortens the detection time.
S205、从有序的水质数据集的第t个数据块的中位对象起,按螺旋顺序确定按螺旋顺序排序的水质数据集,xj表示按螺旋顺序排序的水质数据集的水质数据组,j=1;S205. From the median object of the t-th data block of the ordered water quality data set, determine the water quality data set sorted in the spiral order according to the spiral order, xj represents the water quality data group of the water quality data set sorted in the spiral order, j =1;
有序的水质数据集是按顺序排列的,距离近的对象,排序的时候也排得近,所以以数据块中位的对象,例如数据块是1000个对象,则可取第500个或501个对象为中位对象开始,螺旋式(交替搜索其前面和后面)搜索k最近邻,则可以更快搜索到k最近邻,减少搜索时间。The ordered water quality data sets are arranged in order, and the objects that are close to each other are also arranged close to each other. Therefore, taking the object in the middle of the data block, for example, the data block is 1000 objects, the 500th or 501st object can be selected. The object is the median object, and the k nearest neighbors are searched spirally (alternately searching for its front and back).
S206、设m=1,m表示水质数组在初始第t个数据块中的位置编号,Xm表示编号为m的水质数组;S206, set m=1, m represents the position number of the water quality array in the initial t-th data block, and Xm represents the water quality array numbered m;
在此步骤中,当第一次确定了数据块中的水质数组的编号m后,保持其编号不变,即使后续删除了水质数组,其编号也保持不变。例如数据块为[X1,…,Xm-1,Xm,Xm+1,…,XM],在执行步骤210后删除了Xm,则数据块更新为[X1,…,Xm-1,Xm+1,…,XM],但位置编号依然为初始数据块中的编号。In this step, when the number m of the water quality array in the data block is determined for the first time, its number remains unchanged, even if the water quality array is subsequently deleted, its number remains unchanged. For example, the data block is [X1, . , …, XM], but the position number is still the number in the original data block.
S207、判断Xm是否已被移除,若是,则执行步骤S211,若否,则执行步骤S208;S207, determine whether Xm has been removed, if yes, then go to step S211, if not, go to step S208;
在执行过程会对非离群点进行删除,由于m采用初始数据块的位置编号,因此,需要判断在位置编号m的水质是否被移除,当被移除时,对下一个位置的水质数组进行处理。During the execution process, the non-outlier points will be deleted. Since m adopts the position number of the initial data block, it is necessary to judge whether the water quality at the position number m has been removed. to be processed.
S208、计算Xm与xj的距离;S208, calculate the distance between Xm and xj;
采用欧式距离计算两个水质数组的距离。Use Euclidean distance to calculate the distance between two water quality arrays.
S209、判断是否j<k,若是,则执行步骤S211,若否,更新Xm的临时k最近邻,更新Xm的临时离群度为Xm与临时第k最近邻的距离,执行步骤S210;S209, determine whether j<k, if so, execute step S211, if not, update the temporary k nearest neighbor of Xm, update the temporary outlier degree of Xm to be the distance between Xm and the temporary kth nearest neighbor, and execute step S210;
当j<k时,说明距离值的数量还未达到k,则不进行临时离群度的计算。When j<k, it means that the number of distance values has not yet reached k, and the calculation of the temporary outlier is not performed.
S210、判断Xm的临时离群度是否低于离群度阈值;若判断结果为是,则将Xm从第t个数据块中移除,执行步骤S211;若判断结果为否,执行步骤S211;S210, determine whether the temporary outlier degree of Xm is lower than the outlier degree threshold; if the determination result is yes, remove Xm from the t-th data block, and execute step S211; if the determination result is no, execute step S211;
因为Xm与数据集所有对象计算距离是逐个计算的,期间Xm与其临时的第k最近邻的距离值是逐渐变小或不变的(因为往后更新k最近邻也是取k个最小,不可能取更大的值),即临时离群度不可能变大,只可能不变或变小,而如果临时离群度小于离群度阈值,则确定不是离群点。因此一旦发现临时离群度小于离群度阈值,则可以直接作为非离群点排除,不需要继续搜索其k最近邻。不需要计算所有对象的距离再进行判断,加快了检测速度,减少检测时间,并且,由于数据块的水质数组被移除,因此,数据块的水质数组越来越少,减少计算量,加快检测速度。Because the calculated distance between Xm and all objects in the dataset is calculated one by one, the distance value between Xm and its temporary k-th nearest neighbor is gradually smaller or unchanged during the period (because the k-nearest neighbor update is also the k smallest, which is impossible. Take a larger value), that is, the temporary outlier cannot become larger, but may only remain unchanged or become smaller, and if the temporary outlier is less than the outlier threshold, it is determined that it is not an outlier. Therefore, once it is found that the temporary outlier degree is less than the outlier degree threshold, it can be directly excluded as a non-outlier point, and there is no need to continue searching for its k nearest neighbors. There is no need to calculate the distance of all objects before making judgments, which speeds up the detection speed and reduces the detection time. Moreover, since the water quality array of the data block is removed, the water quality array of the data block is getting smaller and smaller, reducing the amount of calculation and speeding up the detection. speed.
S211、判断m是否小于M,若是,m=m+1,执行步骤S207,若否,执行步骤S212;S211, determine whether m is less than M, if yes, m=m+1, go to step S207, if not, go to step S212;
当数据块中的水质数组未检测完,则继续下一个水质数组进行处理。When the water quality array in the data block has not been detected, proceed to the next water quality array for processing.
S212、判断j是否小于D,若是,则j=j+1,执行步骤S206;若否,执行步骤S213;S212, determine whether j is less than D, if yes, then j=j+1, go to step S206; if not, go to step S213;
S213、当t=1时,确定第t个数据块中的最大的N个离群度,取第N大的离群度作为离群度阈值,执行步骤S214;当t>1时,确定第1至(t-1)个数据块中的最大N个离群度,从第1至(t-1)个数据块中的最大N个离群度和第t个数据块中的最大N个离群度中确定第1至t个数据块中的最大N个离群度,离群度阈值=第1至t个数据块中的第N大离群度,执行步骤S214;S213. When t=1, determine the largest N outlier degrees in the t-th data block, take the N-th largest outlier degree as the outlier degree threshold, and execute step S214; when t>1, determine the N-th largest outlier degree Max N outliers in 1 to (t-1) blocks, from max N outliers in 1 to (t-1) blocks and max N in t block Determine the maximum N outlier degrees in the 1st to tth data blocks in the outlier degree, and the outlier degree threshold=the Nth largest outlier degree in the 1st to tth data blocks, and execute step S214;
具体地,当t=1时,直接将第1个数据块中的第N大的离群度作为新的离群度阈值。当t>1时,则从第1至(t-1)个数据块中的最大N个离群度和第t个数据块中的最大N个离群度中取最大的N个离群度作为第1至t个数据块中的最大N个离群度,将第1至t个数据块中的第N大离群度作为新的离群度阈值。Specifically, when t=1, the Nth largest outlier degree in the first data block is directly used as a new outlier degree threshold. When t>1, take the largest N outliers from the largest N outliers in the 1st to (t-1) data blocks and the largest N outliers in the tth data block As the largest N outliers in the 1st to tth data blocks, the Nth largest outlier in the 1st to tth data blocks is taken as a new outlier threshold.
S214、判断t是否小于B,若是,t=t+1,执行步骤S204,若否,执行步骤S215;S214, determine whether t is less than B, if yes, t=t+1, go to step S204, if not, go to step S215;
当还有数据块未检测时,继续检测下一个数据块。When there are still data blocks undetected, continue to detect the next data block.
S215、当前已检测的所有数据块的最大N个离群度对应的N个水质数组作为异常的N个水质数据组。S215. The N water quality arrays corresponding to the maximum N outlier degrees of all the currently detected data blocks are taken as the abnormal N water quality data groups.
本发明还提供了一种电子设备,包括:The present invention also provides an electronic device, comprising:
处理器;processor;
存储器,用于存储计算机可读程序;a memory for storing a computer-readable program;
当所述计算机可读程序被所述处理器执行时,使得所述处理器实现如上述实施例的控制方法。When the computer-readable program is executed by the processor, the processor is caused to implement the control method as in the above-described embodiment.
本领域普通技术人员可以理解,上文中所公开方法中的全部或某些步骤、系统可以被实施为软件、固件、硬件及其适当的组合。某些物理组件或所有物理组件可以被实施为由处理器,如中央处理器、数字信号处理器或微处理器执行的软件,或者被实施为硬件,或者被实施为集成电路,如专用集成电路。这样的软件可以分布在计算机可读介质上,计算机可读介质可以包括计算机存储介质(或非暂时性介质)和通信介质(或暂时性介质)。如本领域普通技术人员公知的,术语计算机存储介质包括在用于存储信息(诸如计算机可读指令、数据结构、程序模块或其他数据)的任何方法或技术中实施的易失性和非易失性、可移除和不可移除介质。计算机存储介质包括但不限于RAM、ROM、EEPROM、闪存或其他存储器技术、CD-ROM、数字多功能盘(DVD)或其他光盘存储、磁盒、磁带、磁盘存储或其他磁存储装置、或者可以用于存储期望的信息并且可以被计算机访问的任何其他的介质。此外,本领域普通技术人员公知的是,通信介质通常包含计算机可读指令、数据结构、程序模块或者诸如载波或其他传输机制之类的调制数据信号中的其他数据,并且可包括任何信息递送介质。Those of ordinary skill in the art can understand that all or some of the steps and systems in the methods disclosed above can be implemented as software, firmware, hardware, and appropriate combinations thereof. Some or all physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit . Such software may be distributed on computer-readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). As is known to those of ordinary skill in the art, the term computer storage media includes both volatile and nonvolatile implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data flexible, removable and non-removable media. Computer storage media include, but are not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disk (DVD) or other optical disk storage, magnetic cartridges, magnetic tape, magnetic disk storage or other magnetic storage devices, or may Any other medium used to store desired information and which can be accessed by a computer. In addition, communication media typically embodies computer readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transport mechanism, and can include any information delivery media, as is well known to those of ordinary skill in the art .
以上是对本发明的较佳实施进行了具体说明,但本发明并不局限于上述实施方式,熟悉本领域的技术人员在不违背本发明精神的前提下还可作出种种的等同变形或替换,这些等同的变形或替换均包含在本发明权利要求所限定的范围内。The preferred implementation of the present invention has been specifically described above, but the present invention is not limited to the above-mentioned embodiments. Those skilled in the art can also make various equivalent deformations or replacements under the premise of not violating the spirit of the present invention. These Equivalent modifications or substitutions are included within the scope defined by the claims of the present invention.

Claims (6)

  1. 一种水质异常检测方法,其特征在于,包括:A method for detecting abnormal water quality, comprising:
    S101、获取多个水质数组,组成水质数据集,每个水质数组的维度相同,包括至少一个水质数据;S101. Acquire multiple water quality arrays to form a water quality data set. Each water quality array has the same dimension and includes at least one water quality data;
    S102、在水质数据集中随机选择一个水质数组作为基准点;S102, randomly select a water quality array as a reference point in the water quality data set;
    S103、计算水质数据集中所有水质数组与基准点的距离值,将所有距离值构成一维数据集;S103. Calculate the distance values between all water quality arrays in the water quality data set and the reference point, and form all the distance values into a one-dimensional data set;
    S104、对一维数据集的所有距离值进行降序排序得到有序的一维数据集,根据所述降序排序的顺序对水质数据集的所有水质数组进行排序得到有序的水质数据集,S104, performing descending sorting on all distance values of the one-dimensional data set to obtain an ordered one-dimensional data set, and sorting all the water quality arrays of the water quality data set according to the descending order to obtain an ordered water quality data set,
    S105、确定有序的一维数据集的每个对象的k最近邻,1≤k≤D*1%,其中D为水质数据集中水质数组的数量;S105. Determine the k nearest neighbors of each object in the ordered one-dimensional data set, 1≤k≤D*1%, where D is the number of water quality arrays in the water quality data set;
    S106、计算有序的一维数据集的每个对象与其第k最近邻的距离值得到每个对象的离群度,一维数据集的所有对象的离群度构成一维离群度,根据一维离群度中每个离群度的大小,按从大到小的顺序选取最大N个离群度,并将第N大的离群度作为预阈值,其中第k最近邻为k最近邻中的第k个;S106. Calculate the distance value between each object of the ordered one-dimensional data set and its k-th nearest neighbor to obtain the outlier degree of each object, and the outlier degrees of all objects in the one-dimensional data set constitute the one-dimensional outlier degree, according to The size of each outlier in one-dimensional outliers, select the largest N outliers in descending order, and use the Nth largest outlier as the pre-threshold, where the kth nearest neighbor is the k nearest neighbor the kth in the neighborhood;
    S107、将有序的水质数据集划分为多个数据块,将预阈值作为离群度阈值,依次对每个数据块进行离群检测,根据离群度阈值确定已检测的数据块的最大N个离群度,将离群度阈值更新为已检测的数据块的第N大的离群度,将更新的离群度阈值作为下一个数据块进行离群检测的判断标准,直至所有数据块检测完毕,所有数据块的最大N个离群度对应的水质数组作为异常的N个水质数组。S107: Divide the ordered water quality data set into multiple data blocks, use the pre-threshold as the outlier threshold, perform outlier detection on each data block in turn, and determine the maximum N of the detected data blocks according to the outlier threshold outliers, update the outlier threshold to the Nth largest outlier of the detected data block, and use the updated outlier threshold as the judgment criterion for outlier detection in the next data block until all data blocks are detected. After the detection is completed, the water quality arrays corresponding to the maximum N outlier degrees of all data blocks are regarded as abnormal N water quality arrays.
  2. 根据权利要求1所述的水质异常检测方法,其特征在于,步骤S105中确定有序的一维数据集的每个对象的k最近邻包括:The method for detecting abnormality in water quality according to claim 1, wherein determining the k nearest neighbors of each object in the ordered one-dimensional data set in step S105 comprises:
    假设有序的一维数据集的任一对象记为O,对象O的前面存在k1个对象,对象O的后面存在k2个对象,其中k1≥0,k2≥0;Assuming that any object in the ordered one-dimensional data set is denoted as O, there are k1 objects in front of object O, and k2 objects in the back of object O, where k1≥0, k2≥0;
    当k1≥k,往前搜索k个对象,当k1<k时,往前搜索k1个对象;When k1≥k, search k objects forward, when k1<k, search forward k1 objects;
    当k2≥k,往后搜索k个对象,当k2<k时,往后搜索k2个对象;When k2≥k, search k objects backward, when k2<k, search backward k2 objects;
    计算对象O与所有搜索的对象的距离,根据距离的大小将搜索到的对象按从小到大排序,距离排名前k的对象为对象O的k最近邻。Calculate the distance between object O and all searched objects, sort the searched objects from small to large according to the size of the distance, and the objects with the top k distances are the k nearest neighbors of object O.
  3. 根据权利要求1所述的水质异常检测方法,其特征在于,步骤S107具体为:The method for detecting abnormal water quality according to claim 1, wherein step S107 is specifically:
    S201、将有序的水质数据集划分为B个数据块,每个数据块包括M个水质数组,离群 度阈值=预阈值;S201. Divide the ordered water quality data set into B data blocks, each data block includes M water quality arrays, and outlier threshold=pre-threshold;
    S202、设t=1,t表示第t个数据块;S202, set t=1, and t represents the t-th data block;
    S203、判断t是否为1,若是,执行步骤S205,若否,执行步骤S204;S203, determine whether t is 1, if yes, go to step S205, if not, go to step S204;
    S204、判断是否d0+基准点的离群度<离群度阈值,其中d0为第t个数据块中的第1个水质数组与基准点的距离,若是,执行步骤S215,若否执行S205;S204, determine whether d0+the outlier degree of the reference point<outlier degree threshold, wherein d0 is the distance between the first water quality array in the t-th data block and the reference point, if so, execute step S215, if not, execute S205;
    S205、从有序的水质数据集的第t个数据块的中位对象起,按螺旋顺序确定按螺旋顺序排序的水质数据集,xj表示按螺旋顺序排序的水质数据集的水质数据组,j=1;S205. From the median object of the t-th data block of the ordered water quality data set, determine the water quality data set sorted in the spiral order according to the spiral order, xj represents the water quality data group of the water quality data set sorted in the spiral order, j =1;
    S206、设m=1,m表示水质数组在初始第t个数据块中的位置编号,Xm表示编号为m的水质数组;S206, set m=1, m represents the position number of the water quality array in the initial t-th data block, and Xm represents the water quality array numbered m;
    S207、判断Xm是否已被移除,若是,则执行步骤S211,若否,则执行步骤S208;S207, determine whether Xm has been removed, if so, execute step S211, if not, execute step S208;
    S208、计算Xm与xj的距离;S208, calculate the distance between Xm and xj;
    S209、判断是否j<k,若是,则执行步骤S211,若否,更新Xm的临时k最近邻,更新Xm的临时离群度为Xm与临时k最近邻中的第k最近邻的距离,执行步骤S210;S209, judge whether j<k, if so, execute step S211, if not, update the temporary k nearest neighbor of Xm, update the temporary outlier degree of Xm to be the distance between Xm and the kth nearest neighbor of the temporary k nearest neighbors, execute Step S210;
    S210、判断Xm的临时离群度是否低于离群度阈值;若判断结果为是,则将Xm从第t个数据块中移除,执行步骤S211;若判断结果为否,执行步骤S211;S210, determine whether the temporary outlier degree of Xm is lower than the outlier degree threshold; if the determination result is yes, remove Xm from the t-th data block, and execute step S211; if the determination result is no, execute step S211;
    S211、判断m是否小于M,若是,m=m+1,执行步骤S207,若否,执行步骤S212;S211, determine whether m is less than M, if yes, m=m+1, go to step S207, if not, go to step S212;
    S212、判断j是否小于D,若是,则j=j+1,执行步骤S206;若否,执行步骤S213;S212, determine whether j is less than D, if yes, then j=j+1, go to step S206; if not, go to step S213;
    S213、当t=1时,确定第t个数据块中的最大的N个离群度,取第N大的离群度作为离群度阈值,执行步骤S214;当t>1时,确定第1至(t-1)个数据块中的最大N个离群度,从第1至(t-1)个数据块中的最大N个离群度和第t个数据块中的最大N个离群度中确定第1至t个数据块中的最大N个离群度,离群度阈值=第1至t个数据块中的第N大离群度,执行步骤S214;S213. When t=1, determine the largest N outlier degrees in the t-th data block, take the N-th largest outlier degree as the outlier degree threshold, and execute step S214; when t>1, determine the N-th largest outlier degree Max N outliers in 1 to (t-1) blocks, from max N outliers in 1 to (t-1) blocks and max N in t block Determine the maximum N outlier degrees in the 1st to tth data blocks in the outlier degree, and the outlier degree threshold=the Nth largest outlier degree in the 1st to tth data blocks, and execute step S214;
    S214、判断t是否小于B,若是,t=t+1,执行步骤S204,若否,执行步骤S215;S214, determine whether t is less than B, if yes, t=t+1, go to step S204, if not, go to step S215;
    S215、当前已检测的所有数据块的最大N个离群度对应的N个水质数组作为异常的N个水质数据组。S215. The N water quality arrays corresponding to the maximum N outlier degrees of all the currently detected data blocks are taken as the abnormal N water quality data groups.
  4. 根据权利要求1所述的水质异常检测方法,其特征在于,采用计算两个水质数组的距离的方法计算所有数组与基准点的距离,所述计算两个水质数组的距离的方法包括:The method for detecting abnormality in water quality according to claim 1, wherein the method for calculating the distance between two water quality arrays is used to calculate the distance between all the arrays and the reference point, and the method for calculating the distance between the two water quality arrays comprises:
    假设2个水质数组分别为x1与x2,分别用n维变量表示,x1=(x 11,x 12,…,x 1n),x2=(x 21,x 22,…,x 2n),则两个x1与x2的距离为: Assuming that the two water quality arrays are x1 and x2, respectively represented by n-dimensional variables, x1=(x 11 , x 12 ,...,x 1n ), x2=(x 21 ,x 22 ,...,x 2n ), then the two The distance between x1 and x2 is:
    Figure PCTCN2021075420-appb-100001
    Figure PCTCN2021075420-appb-100001
    其中,x 11,x 12,…,x 1n表示水质数组x1的不同物理量作归一化处理后的数据,x 21,x 22,…,x 2n 表示水质数组x2的不同物理量作归一化处理后的数据,dist(x1,x2)表示水质数组x1与x2的距离。 Among them, x 11 , x 12 ,…,x 1n represent the normalized data of the different physical quantities of the water quality array x1, and x 21 , x 22 ,…,x 2n represent the normalized processing of the different physical quantities of the water quality array x2 After the data, dist(x1, x2) represents the distance between the water quality array x1 and x2.
  5. 根据权利要求1所述的水质异常检测方法,其特征在于,n大于等于1,每个水质数组包括化学需氧量数据、氨氮数据、总磷数据、溶解氧数据中的至少一个。The abnormal water quality detection method according to claim 1, wherein n is greater than or equal to 1, and each water quality array includes at least one of chemical oxygen demand data, ammonia nitrogen data, total phosphorus data, and dissolved oxygen data.
  6. 一种电子设备,其特征在于,包括:An electronic device, comprising:
    处理器;processor;
    存储器,用于存储计算机可读程序;a memory for storing a computer-readable program;
    当所述计算机可读程序被所述处理器执行时,使得所述处理器实现如权利要求1-5任一项所述的方法。The computer-readable program, when executed by the processor, causes the processor to implement the method of any one of claims 1-5.
PCT/CN2021/075420 2020-12-30 2021-02-05 Method for detecting anomaly in water quality and electronic device WO2022141746A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011626167.0 2020-12-30
CN202011626167.0A CN112733904B (en) 2020-12-30 2020-12-30 Water quality abnormity detection method and electronic equipment

Publications (1)

Publication Number Publication Date
WO2022141746A1 true WO2022141746A1 (en) 2022-07-07

Family

ID=75609827

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/075420 WO2022141746A1 (en) 2020-12-30 2021-02-05 Method for detecting anomaly in water quality and electronic device

Country Status (2)

Country Link
CN (1) CN112733904B (en)
WO (1) WO2022141746A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116308952A (en) * 2023-03-08 2023-06-23 浪潮智慧科技有限公司 Water quality monitoring method and device based on unmanned ship
CN117171685A (en) * 2023-09-01 2023-12-05 武汉中核仪表有限公司 Operation monitoring method of turbidity measurement system
CN117807550A (en) * 2024-02-29 2024-04-02 山东宙雨消防科技股份有限公司 Intelligent quantitative detection method and system for building fire-fighting facilities

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114935697B (en) * 2022-07-25 2022-12-30 广东电网有限责任公司佛山供电局 Three-phase load unbalance identification method, system, equipment and medium
CN117650995B (en) * 2023-11-28 2024-06-14 佛山科学技术学院 Data transmission anomaly identification method based on outlier detection
CN117651256B (en) * 2023-11-28 2024-06-07 佛山科学技术学院 Node energy consumption monitoring method and system based on outlier detection

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030004902A1 (en) * 2001-06-27 2003-01-02 Nec Corporation Outlier determination rule generation device and outlier detection device, and outlier determination rule generation method and outlier detection method thereof
CN105138641A (en) * 2015-08-24 2015-12-09 河海大学 Angle-based high dimensional data outlier detection method
CN105426907A (en) * 2015-11-06 2016-03-23 河海大学 Fuzzy distance-based uncertain outlier detection method
CN105975519A (en) * 2016-04-28 2016-09-28 深圳大学 Multi-supporting point index-based outlier detection method and system
CN107480258A (en) * 2017-08-15 2017-12-15 佛山科学技术学院 A kind of metric space Outliers Detection method based on a variety of strong points
CN110070100A (en) * 2019-03-01 2019-07-30 广东奥博信息产业股份有限公司 A kind of agricultural weather Outliers Detection method and device that multiple-factor is integrated
CN110287238A (en) * 2019-06-26 2019-09-27 广东奥博信息产业股份有限公司 A kind of exception water quality detection method and system based on priori knowledge

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017185296A1 (en) * 2016-04-28 2017-11-02 深圳大学 Method and system for detecting outlier based on multiple support points index
CN105893213B (en) * 2016-06-22 2018-04-20 北京蓝海讯通科技股份有限公司 A kind of method for detecting abnormality, application and monitoring device
CN110737874B (en) * 2019-09-02 2021-04-20 中国科学院地理科学与资源研究所 Watershed water quality monitoring abnormal value detection method based on spatial relationship

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030004902A1 (en) * 2001-06-27 2003-01-02 Nec Corporation Outlier determination rule generation device and outlier detection device, and outlier determination rule generation method and outlier detection method thereof
CN105138641A (en) * 2015-08-24 2015-12-09 河海大学 Angle-based high dimensional data outlier detection method
CN105426907A (en) * 2015-11-06 2016-03-23 河海大学 Fuzzy distance-based uncertain outlier detection method
CN105975519A (en) * 2016-04-28 2016-09-28 深圳大学 Multi-supporting point index-based outlier detection method and system
CN107480258A (en) * 2017-08-15 2017-12-15 佛山科学技术学院 A kind of metric space Outliers Detection method based on a variety of strong points
CN110070100A (en) * 2019-03-01 2019-07-30 广东奥博信息产业股份有限公司 A kind of agricultural weather Outliers Detection method and device that multiple-factor is integrated
CN110287238A (en) * 2019-06-26 2019-09-27 广东奥博信息产业股份有限公司 A kind of exception water quality detection method and system based on priori knowledge

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116308952A (en) * 2023-03-08 2023-06-23 浪潮智慧科技有限公司 Water quality monitoring method and device based on unmanned ship
CN116308952B (en) * 2023-03-08 2023-09-22 浪潮智慧科技有限公司 Water quality monitoring method and device based on unmanned ship
CN117171685A (en) * 2023-09-01 2023-12-05 武汉中核仪表有限公司 Operation monitoring method of turbidity measurement system
CN117171685B (en) * 2023-09-01 2024-02-09 武汉中核仪表有限公司 Operation monitoring method of turbidity measurement system
CN117807550A (en) * 2024-02-29 2024-04-02 山东宙雨消防科技股份有限公司 Intelligent quantitative detection method and system for building fire-fighting facilities
CN117807550B (en) * 2024-02-29 2024-05-17 山东宙雨消防科技股份有限公司 Intelligent quantitative detection method and system for building fire-fighting facilities

Also Published As

Publication number Publication date
CN112733904A (en) 2021-04-30
CN112733904B (en) 2022-03-25

Similar Documents

Publication Publication Date Title
WO2022141746A1 (en) Method for detecting anomaly in water quality and electronic device
CN108154198B (en) Knowledge base entity normalization method, system, terminal and computer readable storage medium
CN108880915B (en) Electric power information network safety alarm information false alarm determination method and system
CN110008247B (en) Method, device and equipment for determining abnormal source and computer readable storage medium
CN105512206A (en) Outlier detection method based on clustering
CN111289998A (en) Obstacle detection method, obstacle detection device, storage medium, and vehicle
US11887303B2 (en) Image processing model generation method, image processing method and device, and electronic device
KR20200107774A (en) How to align targeting nucleic acid sequencing data
CN107622185A (en) A kind of digital pcr density calculating method
CN105117485B (en) A kind of high-accuracy overall situation outlier detection algorithm based on k very neighbours
CN114063056A (en) Ship track fusion method, system, medium and equipment
CN113129335A (en) Visual tracking algorithm and multi-template updating strategy based on twin network
CN112906738A (en) Water quality detection and treatment method
CN114116829A (en) Abnormal data analysis method, abnormal data analysis system, and storage medium
CN110889118A (en) Abnormal SQL statement detection method and device, computer equipment and storage medium
CN115372995A (en) Laser radar target detection method and system based on European clustering
CN111666359B (en) POI candidate arrival point mining method, device and equipment
CN112529112B (en) Mineral identification method and device
CN113269238A (en) Data stream clustering method and device based on density peak value
WO2016112618A1 (en) Distance-based algorithm for solving representative node set in two dimensional space
CN110084157B (en) Data processing method and device for image re-recognition
CN112463564A (en) Method and device for determining correlation index influencing host state
CN110967674A (en) Vehicle-mounted radar array antenna failure detection method and device and vehicle-mounted radar
CN112015911B (en) Method for searching massive knowledge maps
CN113109761B (en) Trajectory-oriented calculation time reduction method based on multi-hypothesis tracking algorithm

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21912524

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21912524

Country of ref document: EP

Kind code of ref document: A1