CN114996318A - Automatic judgment method and system for processing mode of abnormal value of detection data - Google Patents

Automatic judgment method and system for processing mode of abnormal value of detection data Download PDF

Info

Publication number
CN114996318A
CN114996318A CN202210815910.XA CN202210815910A CN114996318A CN 114996318 A CN114996318 A CN 114996318A CN 202210815910 A CN202210815910 A CN 202210815910A CN 114996318 A CN114996318 A CN 114996318A
Authority
CN
China
Prior art keywords
field
data
value
missing
type
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210815910.XA
Other languages
Chinese (zh)
Other versions
CN114996318B (en
Inventor
高仕斌
占栋
李想
张金鑫
佘夏威
熊昊睿
黄瀚韬
冯中伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southwest Jiaotong University
Chengdu Tangyuan Electric Co Ltd
Original Assignee
Southwest Jiaotong University
Chengdu Tangyuan Electric Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southwest Jiaotong University, Chengdu Tangyuan Electric Co Ltd filed Critical Southwest Jiaotong University
Priority to CN202210815910.XA priority Critical patent/CN114996318B/en
Publication of CN114996318A publication Critical patent/CN114996318A/en
Application granted granted Critical
Publication of CN114996318B publication Critical patent/CN114996318B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24564Applying rules; Deductive queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses an automatic discrimination method and system for detecting abnormal value processing mode of data, which is characterized in that each field type is determined; counting the proportion of the missing value data quantity in each data field to the total data quantity of the field, and judging whether the field is available; if the field is available, entering the next judging stage, otherwise, not entering the next judging stage; when a type field is available and a missing value exists, comparing the ratio of the missing value data amount in the type field with the availability threshold value R 0 Comparing, and judging the processing mode of the missing value of the type field according to the comparison result; when the numerical field is available, the processing modes of the missing value and the abnormal value are judged by calculating the ratio of the coefficient of variation value to the missing value data amount. By matching statistical and business rulesThe combined mode is based on a data analysis technology, so that the data analysis efficiency is effectively improved, and the burden of big data analysis personnel and business experts is reduced.

Description

一种检测数据异常值处理方式的自动判别方法及系统An automatic discrimination method and system for detecting data outlier processing methods

技术领域technical field

本发明涉及统计学和数据挖掘技术的技术领域,具体涉及一种检测数据异常值处理方式的自动判别方法及系统。The invention relates to the technical field of statistics and data mining technology, in particular to an automatic discrimination method and system for detecting abnormal value processing methods of data.

背景技术Background technique

现有对轨道交通检测数据异常值的处理方法判别必须首先通过数据分析人员通过对检测数据每个字段一一进行分析,获取各个字段的数据类型、分布。同时,分析人员必须在业务专家的辅助下,结合数据字段的业务背景最终决定数据各字段的异常值和缺失值处理。上述方式弊端在于如果检测数据的维度或字段较多时,会加大数据分析人员和业务专家的负担,降低数据分析的效率。为此,本发明专利通过将统计学和业务规则相结合的方式,基于数据分析技术构建了轨道交通检测数据异常值和缺失值处理的自动判别系统和方法。The existing methods for processing abnormal values of rail transit detection data must first obtain the data type and distribution of each field by analyzing each field of the detection data one by one by a data analyst. At the same time, with the assistance of business experts, the analyst must finally decide the processing of outliers and missing values in each field of the data based on the business background of the data field. The disadvantage of the above method is that if the detection data has many dimensions or fields, it will increase the burden on data analysts and business experts, and reduce the efficiency of data analysis. To this end, the patent of the present invention constructs an automatic discrimination system and method for processing abnormal values and missing values of rail transit detection data by combining statistics and business rules based on data analysis technology.

发明内容SUMMARY OF THE INVENTION

为了克服上述现有技术中存在的缺陷,本发明的目的是提供适用于轨道交通领域的一种检测数据异常值处理方式的自动判别方法,其通过将统计学和业务规则相结合的方式,基于数据分析技术构建了轨道交通检测数据异常值和缺失值处理的自动判别系统,有效提高数据分析的效率,降低大数据分析人员和业务专家的负担,具有重大的安全意义和实际应用价值。In order to overcome the above-mentioned defects in the prior art, the purpose of the present invention is to provide an automatic discrimination method for detecting data abnormal value processing methods suitable for the field of rail transit, which combines statistics and business rules based on The data analysis technology builds an automatic discrimination system for the processing of outliers and missing values in rail transit detection data, which effectively improves the efficiency of data analysis, reduces the burden on big data analysts and business experts, and has great safety significance and practical application value.

本发明的技术方案如下:The technical scheme of the present invention is as follows:

S1、根据每个字段数据的相关业务规则,确定所述每个字段类型,所述字段类型包括确定型字段和不确定型字段,其中确定型字段包括数值型字段、类别型字段和时间戳型字段。S1. Determine each field type according to the relevant business rules of each field data, where the field types include deterministic fields and indeterminate fields, wherein deterministic fields include numeric fields, category fields, and timestamp fields field.

进一步地,所述步骤S1,包括:Further, the step S1 includes:

从业务规则库中,检索每个字段数据的相关业务规则;From the business rule base, retrieve the relevant business rules of each field data;

如果业务规则库中明确了该数据字段的字段类型,则该数据字段类型为业务规则中指定类型;If the field type of the data field is specified in the business rule base, the data field type is the type specified in the business rule;

若没有该字段数据的业务规则,则获取该数据字段每个非缺失值的数据类型,所述每个非缺失值的数据类型包括数值型、类别型和时间戳型;If there is no business rule for the field data, obtain the data type of each non-missing value of the data field, and the data type of each non-missing value includes numeric type, category type and timestamp type;

根据获取的该字段每个非缺失值的三种数据类型对应的数量,分别计算三种数据类型的数量占该字段非缺失值数据总量的比例,以占比最高的数据类型为该字段的字段类型;若三种数据类型的占比相等,则该字段的字段类型为不确定型。According to the obtained number of the three data types of each non-missing value in the field, calculate the proportion of the three data types to the total non-missing value data of the field, and the data type with the highest proportion is the data type of the field. Field type; if the proportion of the three data types is equal, the field type of the field is indeterminate.

S2、统计每个字段中缺失值数量占所述字段数据总量的比例,判断所述字段是否可用;若所述字段可用则进入下一个判别阶段,否则不进入下一个判别阶段。S2. Count the ratio of the number of missing values in each field to the total amount of data in the field, and determine whether the field is available; if the field is available, enter the next judgment stage, otherwise, do not enter the next judgment stage.

进一步地,当缺失值数量比例R大于设定可用性阈值R0时,则判断该字段不可用。Further, when the ratio R of the number of missing values is greater than the set availability threshold R 0 , it is determined that the field is unavailable.

进一步地,对上述确定的字段类型的数据进行分析,若该字段中另外两种数据类型的数据量之和占该字段数据总量大于可用性阈值R0,则该字段不可用;如果该字段可用,则将该字段中另外两种数据类型的数据转化为缺失值处理。Further, analyze the data of the above determined field type, if the sum of the data volume of the other two data types in the field accounts for the total amount of the field data and is greater than the availability threshold R 0 , then the field is unavailable; if the field is available , the data of the other two data types in the field are converted to missing values.

进一步地,根据可用数值型字段的数据,构建数值型字段的标准态数据库。Further, according to the data of the available numeric fields, a standard database of numeric fields is constructed.

进一步地,从历史检测数据中提取质量良好的N次检验数据,根据检测位置将检测数据对齐,得到标准态数据库。Further, N times of inspection data with good quality are extracted from the historical inspection data, and the inspection data is aligned according to the inspection position to obtain a standard state database.

S3、当类别型字段为可用,且存在缺失值时,将所述类别型字段中缺失值数据量占比R与可用性阈值R0比较,根据比较结果判别所述类别型字段缺失值的处理方式。S3. When the category field is available and there is a missing value, compare the proportion R of the missing value data in the category field with the availability threshold R 0 , and determine the processing method of the missing value of the category field according to the comparison result .

进一步地,当所述类别型字段中缺失值数据量占比R小于可用性阈值

Figure 600002DEST_PATH_IMAGE001
时,利用所述类别型字段的众数填充缺失值;Further, when the proportion of missing value data in the category field R is less than the availability threshold
Figure 600002DEST_PATH_IMAGE001
When , use the mode of the categorical field to fill in the missing value;

当所述类别型字段中缺失值数据量占比R大于等于可用性阈值

Figure 156885DEST_PATH_IMAGE002
时,利用其他字段的数据构建该类别型字段的Softmax分类模型,利用分类模型对所述类别型字段的分类结果填充所述类别型字段的缺失值。When the proportion of missing data in the category field R is greater than or equal to the availability threshold
Figure 156885DEST_PATH_IMAGE002
When the data of other fields is used, a Softmax classification model of the category field is constructed, and the classification result of the category field by the classification model is used to fill in the missing value of the category field.

S4、当数值型字段为可用,分别通过计算变异系数值和缺失值数据量占比,对缺失值和异常值的处理方式进行判别。S4. When the numerical field is available, the processing methods of missing values and abnormal values are discriminated by calculating the coefficient of variation value and the proportion of missing value data respectively.

进一步地,所述步骤S4,具体包括:Further, the step S4 specifically includes:

S41、计算所述数值型字段的标准差和算术平均值的比例,得到变异系数CV;S41, calculating the ratio of the standard deviation and the arithmetic mean of the numerical field to obtain the coefficient of variation CV;

具体计算公式为:The specific calculation formula is:

Figure 762267DEST_PATH_IMAGE003
Figure 762267DEST_PATH_IMAGE003
,

其中,

Figure 430008DEST_PATH_IMAGE004
为字段数据标准差,
Figure 161204DEST_PATH_IMAGE005
为字段数据算术平均值;in,
Figure 430008DEST_PATH_IMAGE004
is the standard deviation of the field data,
Figure 161204DEST_PATH_IMAGE005
is the arithmetic mean of the field data;

根据变异系数的值所在阈值范围,利用对应阈值范围设置的判定方法,判定所述数值型字段的数据异常值;According to the threshold range where the value of the coefficient of variation is located, use the judgment method set corresponding to the threshold range to judge the data abnormal value of the numerical field;

S42、将所述数值型字段中缺失值数据量占比R与可用性阈值

Figure 888989DEST_PATH_IMAGE006
比较,根据比较结果判断所述数值型字段的缺失值的填充方式。S42. Calculate the proportion R of the data volume of missing values in the numerical field and the availability threshold
Figure 888989DEST_PATH_IMAGE006
Comparing, and determining the filling method of the missing value of the numeric field according to the comparison result.

进一步地,所述步骤S41,包括:Further, the step S41 includes:

当变异系数CV值小于15%时,利用标准态判定数据异常值;When the CV value of the coefficient of variation is less than 15%, the standard state is used to determine the abnormal value of the data;

当变异系数CV值小于35%,大于等于15%时,利用孤立森林算法判定数据异常值;When the CV value of the coefficient of variation is less than 35% and greater than or equal to 15%, the isolated forest algorithm is used to determine the abnormal value of the data;

当变异系数CV值小于50%,大于等于35%时,利用聚类算法判定数据异常值;When the CV value of the coefficient of variation is less than 50% and greater than or equal to 35%, the clustering algorithm is used to determine the abnormal value of the data;

当变异系数CV值大于等于50%时,利用3σ方法判定数据异常值。When the CV value of the coefficient of variation is greater than or equal to 50%, the 3σ method is used to determine the data outliers.

根据变异系数的值所在阈值范围对应的判断方法,可提高自动判别的效率。According to the judgment method corresponding to the threshold range of the value of the coefficient of variation, the efficiency of automatic judgment can be improved.

进一步地,所述将所述数值型字段中缺失值数据量占比R与可用性阈值

Figure 987526DEST_PATH_IMAGE007
比较,根据比较结果判断所述数值型字段的缺失值填充方式,包括:Further, the ratio R of the data volume of missing values in the numerical field and the availability threshold
Figure 987526DEST_PATH_IMAGE007
Comparing, and judging the missing value filling method of the numeric field according to the comparison result, including:

Figure 193379DEST_PATH_IMAGE008
时,则利用该字段非缺失数据的均值填充缺失值;when
Figure 193379DEST_PATH_IMAGE008
When , use the mean of the non-missing data in this field to fill in the missing values;

Figure 310240DEST_PATH_IMAGE009
时,则利用所述数值型字段与检测位置建立插值模型,通过插值法填充缺失值;when
Figure 310240DEST_PATH_IMAGE009
When , use the numerical field and the detection position to establish an interpolation model, and fill in the missing values by interpolation;

Figure 208926DEST_PATH_IMAGE010
时,则利用其他字段的数据构建所述数值型字段的回归模型,利用回归模型填充所述数值型字段的缺失值。when
Figure 208926DEST_PATH_IMAGE010
When the data of other fields is used, the regression model of the numerical field is constructed, and the missing value of the numerical field is filled by the regression model.

与现有技术相比,本发明的有益效果:Compared with the prior art, the beneficial effects of the present invention:

1. 将专家经验和业务规则结合,使检测数据的异常值和缺失值处理方式的判别实现自动化;1. Combining expert experience and business rules to automate the discrimination of abnormal values and missing values in the detection data;

2. 从数据质量出发,结合数据的可用性,判别结果更加可靠;2. Starting from the quality of the data, combined with the availability of the data, the judgment results are more reliable;

3. 在构建数值型变量的过程中,充分利用历史检测数据;3. In the process of constructing numerical variables, make full use of historical detection data;

4. 自动判别系统模块化构建,有利于计算机实现。4. The modular construction of the automatic identification system is conducive to computer realization.

基于上述一种检测数据异常值处理方式的自动判别方法,本发明还提供了一种检测数据异常值处理方式的自动判别系统,包括:Based on the above-mentioned automatic discrimination method for the processing method of abnormal value of detection data, the present invention also provides an automatic discrimination system for processing method of abnormal value of detection data, including:

业务规则判别模块,用于设置并存储各个字段的业务规则,其中业务规则包括字段的数据类型、字段取值范围或集合;The business rule discrimination module is used to set and store the business rules of each field, wherein the business rules include the data type of the field, the value range or set of the field;

字段类型自动判别模块,用于分析业务规则中未明确数据字段的数据类型,以判别所述字段的字段类型,所述字段类型包括确定型字段和不确定型字段,其中确定型字段包括数值型字段、类别型字段和时间戳型字段;The field type automatic identification module is used to analyze the data type of the unspecified data field in the business rules to identify the field type of the field, and the field type includes a deterministic field and an indeterminate field, wherein the deterministic field includes a numerical type fields, categorical fields and timestamp fields;

数据字段可用性自动判别模块,用于判别各个数据字段的质量情况,以判断各个数据字段是否具有分析意义;The data field availability automatic judgment module is used to judge the quality of each data field to judge whether each data field has analytical significance;

标准态数据库模块,用于判别数值型字段的异常值和缺失值处理方式;Standard database module, which is used to discriminate outliers and missing values of numeric fields;

数据字段处理方式自动判别模块,用于判别各个数据字段类型中异常值和/或缺失值的具体处理方式。The data field processing mode automatic discriminating module is used to discriminate the specific processing mode of outliers and/or missing values in each data field type.

进一步地,分析业务规则中未明确数据字段的数据类型,包括通过分析各个数据字段中非缺失值中数值型取值、类别型取值和时间戳型取值的占比,以得出各个数据字段的字段类型。Further, analyze the data types of the data fields that are not specified in the business rules, including analyzing the proportion of numeric values, categorical values, and timestamp values in the non-missing values in each data field to obtain each data. The field type of the field.

进一步地,判别各个数据字段的质量情况包括数据混乱程度判别、数据缺失值占比判别、数据重复值判别。Further, judging the quality of each data field includes judging the degree of data confusion, judging the proportion of missing data values, and judging data duplicate values.

进一步地,如果所述字段数据混乱且类型不确定,则判定所述字段为不可用。Further, if the field data is chaotic and the type is uncertain, it is determined that the field is unavailable.

进一步地,当所述字段中数值型和类别型数据的数量相同,则判定数据混乱,并且业务规则中没有指定类型,则所述数据类型为不确定;Further, when the number of numerical and categorical data in the field is the same, it is determined that the data is chaotic, and there is no specified type in the business rule, then the data type is uncertain;

当所述字段数据中某个值的数量占非缺失值总数的比例超过预设阈值,则判定数据重复值过多;When the proportion of the number of a certain value in the field data to the total number of non-missing values exceeds a preset threshold, it is determined that there are too many duplicate values in the data;

当所述字段数据中缺失值的数量占数据总数的比例超过预设阈值,则判定数据缺失值过多,所述数据不可用。When the ratio of the number of missing values in the field data to the total data exceeds a preset threshold, it is determined that there are too many missing values in the data, and the data is unavailable.

进一步地,所述标准态数据库通过可用数值型字段的数据构建得到。Further, the standard state database is constructed by using data of available numeric fields.

进一步地,从历史检测数据中提取质量良好的N次检验数据,根据检测位置将检测数据对齐,得到标准态数据库。Further, N times of inspection data with good quality are extracted from the historical inspection data, and the inspection data is aligned according to the inspection position to obtain a standard state database.

附图说明Description of drawings

图1为本发明的方法流程图。FIG. 1 is a flow chart of the method of the present invention.

具体实施方式Detailed ways

以下结合实施例和附图对本发明的构思、具体实施方式及产生的技术效果进行清楚、完整的描述,以充分地理解本发明的目的、特征和效果。The concept, specific embodiments and technical effects of the present invention will be clearly and completely described below in conjunction with the embodiments and accompanying drawings, so as to fully understand the purpose, features and effects of the present invention.

实施例1Example 1

如图1所示,本实施例提出了适用于轨道交通领域的一种检测数据异常值处理方式的自动判别方法,包括以下步骤:As shown in FIG. 1 , this embodiment proposes an automatic discrimination method for detecting abnormal value processing methods in the field of rail transit, including the following steps:

S1、根据每个数据字段的相关业务规则,确定所述每个数据字段类型,所述字段类型包括确定型字段和不确定型字段,其中确定型字段包括数值型字段、类别型字段和时间戳型字段;S1. Determine the type of each data field according to the relevant business rules of each data field, where the field type includes a deterministic field and an indeterminate field, wherein the deterministic field includes a numeric field, a category field, and a timestamp type field;

S2、统计每个数据字段中缺失值数据量占所述字段数据总量的比例,判断所述字段是否可用;若所述字段可用则进入下一个判别阶段,否则不进入下一个判别阶段;S2. Count the proportion of the missing value data in each data field to the total amount of the field data, and determine whether the field is available; if the field is available, enter the next judgment stage, otherwise do not enter the next judgment stage;

S3、当类别型字段为可用,且存在缺失值时,将所述类别型字段中缺失值数据量占比R与N倍可用性阈值

Figure 529180DEST_PATH_IMAGE012
比较,根据比较结果判别所述类别型字段缺失值的处理方式;S3. When the category field is available and there is a missing value, the ratio of the data volume of the missing value in the category field is R and N times the availability threshold
Figure 529180DEST_PATH_IMAGE012
Compare, according to the comparison result, determine the processing method of the missing value of the category field;

S4、当数值型字段为可用,分别通过计算变异系数值和缺失值数据量占比,对缺失值和异常值的处理方式进行判别。S4. When the numerical field is available, the processing methods of missing values and abnormal values are discriminated by calculating the coefficient of variation value and the proportion of missing value data respectively.

实施例2Example 2

在实施例1的基础上,本发明提出了一种数据类型确定方法,包括:On the basis of Embodiment 1, the present invention proposes a data type determination method, including:

从业务规则库中,检索每个数据字段的相关业务规则;From the business rule base, retrieve the relevant business rules for each data field;

如果业务规则库中明确了所述数据字段的字段类型,则所述数据字段类型为业务规则中指定类型;If the field type of the data field is specified in the business rule base, the data field type is the type specified in the business rule;

若没有所述数据字段的业务规则,则获取所述数据字段每个非缺失值的类型,所述每个非缺失值的数据类型包括数值型、类别型、时间戳型;If there is no business rule for the data field, obtain the type of each non-missing value of the data field, and the data type of each non-missing value includes numeric type, category type, and timestamp type;

根据获取的所述字段每个非缺失值的三种数据类型对应的数据量,计算三种数据类型的数据量占所述字段非缺失值数据总量的比例,以占比最高的数据类型为所述数据字段的字段类型;若三种数据类型的占比相等,则所述数据字段的字段类型为不确定型。According to the obtained data volume corresponding to the three data types of each non-missing value of the field, calculate the proportion of the data volume of the three data types to the total non-missing value data of the field, and the data type with the highest proportion is The field type of the data field; if the proportions of the three data types are equal, the field type of the data field is indeterminate.

实施例3Example 3

在实施例2的基础上,提出了判别所述字段是否可用的方法,具体包括:On the basis of Embodiment 2, a method for judging whether the field is available is proposed, which specifically includes:

当缺失值数据量占比R大于设定可用性阈值

Figure 335462DEST_PATH_IMAGE014
时,则判断该数据字段不可用。When the proportion of missing value data R is greater than the set availability threshold
Figure 335462DEST_PATH_IMAGE014
, it is judged that the data field is unavailable.

进一步地,对上述确定的字段类型的数据进行分析,若该数据字段中另外两种数据类型的数据量之和占该字段数据总量大于可用性阈值

Figure 447774DEST_PATH_IMAGE016
,则该数据字段不可用;如果可用,则将该数据字段中所述另外两种数据类型的数据转化为缺失值处理。Further, analyze the data of the above determined field type, if the sum of the data volume of the other two data types in the data field accounts for the total amount of the field data and is greater than the availability threshold
Figure 447774DEST_PATH_IMAGE016
, the data field is unavailable; if it is available, the data of the other two data types described in the data field is converted into missing value processing.

实施例4Example 4

根据可用数值型字段的数据,构建数值型字段的标准态数据库。According to the data of the available numeric fields, construct the standard database of numeric fields.

进一步地,从历史检测数据中提取质量良好的N次检验数据,根据检测位置将检测数据对齐,得到标准态数据库。Further, N times of inspection data with good quality are extracted from the historical inspection data, and the inspection data is aligned according to the inspection position to obtain a standard state database.

实施例5Example 5

在实施例3的基础上,本方案提出的类别型字段缺失值处理方式,具体判别包括:On the basis of Embodiment 3, the method for handling missing values of categorical fields proposed in this solution includes:

当所述类别型字段中缺失值数据量占比R小于N倍可用性阈值

Figure 110837DEST_PATH_IMAGE014
时,利用所述类别型字段的众数填充缺失值;When the proportion of missing value data in the category field is less than N times the availability threshold
Figure 110837DEST_PATH_IMAGE014
When , use the mode of the categorical field to fill in the missing value;

当所述类别型字段中缺失值数据量占比R大于等于N倍可用性阈值

Figure DEST_PATH_IMAGE018
时,利用其他字段的非缺失值数据构建该类别型字段的Softmax分类模型,对所述类别型字段进行分类处理,根据分类模型对所述类别型字段的分类结果填充所述类别型字段的缺失值,其中,本方案优选N为0.1。When the proportion of missing data in the category field R is greater than or equal to N times the availability threshold
Figure DEST_PATH_IMAGE018
When , use the non-missing value data of other fields to construct the Softmax classification model of the categorical field, classify the categorical field, and fill in the missing of the categorical field according to the classification result of the categorical field by the classification model. value, wherein, in this scheme, N is preferably 0.1.

实施例6Example 6

在实施例3的基础上,本方案提出的数值型字段缺失值和异常值处理方式的具体判别包括:On the basis of Embodiment 3, the specific discrimination of the processing methods for missing values and outliers in numerical fields proposed by this solution includes:

S41、计算所述数值型字段的标准差和算术平均值的比例,得到变异系数CV;S41, calculating the ratio of the standard deviation and the arithmetic mean of the numerical field to obtain the coefficient of variation CV;

具体计算公式为:The specific calculation formula is:

Figure 652808DEST_PATH_IMAGE003
Figure 652808DEST_PATH_IMAGE003

其中,

Figure 200464DEST_PATH_IMAGE004
为字段数据标准差,
Figure 291916DEST_PATH_IMAGE005
为字段数据算术平均值;in,
Figure 200464DEST_PATH_IMAGE004
is the standard deviation of the field data,
Figure 291916DEST_PATH_IMAGE005
is the arithmetic mean of the field data;

根据变异系数的值所在阈值范围,利用对应阈值范围设置的判定方法,判定所述数值型字段的数据异常值;According to the threshold range where the value of the coefficient of variation is located, use the judgment method set corresponding to the threshold range to judge the data abnormal value of the numerical field;

S42、将所述数值型字段中缺失值数据量占比R与可用性阈值

Figure DEST_PATH_IMAGE020
比较,根据比较结果判断所述数值型字段的缺失值填充方式。S42. Calculate the proportion R of the data volume of missing values in the numerical field and the availability threshold
Figure DEST_PATH_IMAGE020
Comparing, and judging the filling method of the missing value of the numeric field according to the comparison result.

进一步地,所述步骤S41,包括:Further, the step S41 includes:

当变异系数CV值<15%时,判别结结果为利用标准态判定数据异常值;When the CV value of the coefficient of variation is less than 15%, the result of the discriminant result is to use the standard state to determine the abnormal value of the data;

当15%≤变异系数CV值<35%时,判别结结果为利用孤立森林算法判定数据异常值;When 15%≤variation coefficient CV value <35%, the result of the discrimination result is to use the isolated forest algorithm to determine the abnormal value of the data;

当35%≤变异系数CV值<50%时,判别结结果为利用聚类算法判定数据异常值;When 35%≤variation coefficient CV value <50%, the result of the discriminant result is to use the clustering algorithm to determine the abnormal value of the data;

当变异系数CV值≥50%时,判别结结果为利用3σ方法判定数据异常值。When the coefficient of variation CV value is greater than or equal to 50%, the result of the discriminant result is to use the 3σ method to determine the abnormal value of the data.

根据变异系数的值所在阈值范围对应的判断方法,可提出自动判别的效率。According to the judgment method corresponding to the threshold range of the value of the coefficient of variation, the efficiency of automatic judgment can be proposed.

进一步地,所述将所述数值型字段中缺失值数据量占比R与可用性阈值

Figure DEST_PATH_IMAGE022
比较,根据比较结果判断所述数值型字段的缺失值填充方式,包括:Further, the ratio R of the data volume of missing values in the numerical field and the availability threshold
Figure DEST_PATH_IMAGE022
Comparing, and judging the missing value filling method of the numeric field according to the comparison result, including:

Figure DEST_PATH_IMAGE023
时,则利用所述数值型字段中非缺失数据的均值填充缺失值;when
Figure DEST_PATH_IMAGE023
, then use the mean of the non-missing data in the numeric field to fill in the missing values;

Figure DEST_PATH_IMAGE024
时,则利用所述数值型字段与检测位置建立插值模型,通过插值法填充缺失值;when
Figure DEST_PATH_IMAGE024
When , use the numerical field and the detection position to establish an interpolation model, and fill in the missing values by interpolation;

Figure DEST_PATH_IMAGE025
时,则利用其他字段的数据构建所述数值型字段的回归模型,利用回归模型填充所述数值型字段的缺失值。when
Figure DEST_PATH_IMAGE025
When the data of other fields is used, the regression model of the numerical field is constructed, and the missing value of the numerical field is filled by the regression model.

与现有技术相比,本发明的有益效果:Compared with the prior art, the beneficial effects of the present invention:

1. 将专家经验和业务规则结合,使检测数据的异常值和缺失值处理方式的判别实现自动化了;1. Combining expert experience and business rules to automate the discrimination of abnormal values and missing values in the detection data;

2. 从数据质量出发,结合数据的可用性,判别结果更加可靠;2. Starting from the quality of the data, combined with the availability of the data, the judgment results are more reliable;

3. 在构建数值型变量的过程中,充分利用历史检测数据;3. In the process of constructing numerical variables, make full use of historical detection data;

4. 自动判别系统模块化构建,有利于计算机实现。4. The modular construction of the automatic identification system is conducive to computer realization.

实施例7Example 7

基于上述一种检测数据异常值处理方式的自动判别方法,本发明还提供了一种检测数据异常值处理方式的自动判别系统,包括:Based on the above-mentioned automatic discrimination method for the processing method of abnormal value of detection data, the present invention also provides an automatic discrimination system for processing method of abnormal value of detection data, including:

业务规则判别模块,用于设置并存储各个字段的业务规则,其中业务规则包括字段的数据类型、字段取值范围或集合;The business rule discrimination module is used to set and store the business rules of each field, wherein the business rules include the data type of the field, the value range or set of the field;

数据字段类型自动判别模块,用于分析业务规则中未明确数据字段的数据类型,以判别所述字段的字段类型,所述字段类型包括确定型字段和不确定型,其中确定型字段包括数值型字段、类别型字段和时间戳型字段;The data field type automatic identification module is used to analyze the data type of the unspecified data field in the business rules, so as to identify the field type of the field, the field type includes a definite type field and an indeterminate type, wherein the definite type field includes a numerical type fields, categorical fields and timestamp fields;

数据字段可用性自动判别模块,用于判别各个数据字段的质量情况,以判断各个数据字段是否具有分析意义;The data field availability automatic judgment module is used to judge the quality of each data field to judge whether each data field has analytical significance;

标准态数据库模块,用于判别数值型字段的异常值和缺失值处理方式;Standard database module, which is used to discriminate outliers and missing values of numeric fields;

数据字段处理方法自动判别模块,用于判别各个数据字段类型中异常值和/或缺失值的具体处理方式。The data field processing method automatic discrimination module is used to discriminate the specific processing method of outliers and/or missing values in each data field type.

进一步地,所述分析业务规则中未明确数据字段的数据类型,包括通过分析各个数据字段中非缺失值中数值型取值、类别型取值和时间戳型取值的占比,以得出各个数据字段的字段类型。Further, the data types of the data fields are not specified in the analysis business rules, including by analyzing the proportion of numerical values, categorical values and timestamp values in the non-missing values in each data field to obtain The field type of each data field.

进一步地,所述判别数据字段的质量情况包括数据混乱程度判别、数据缺失值占比判别、数据重复值判别。Further, the judging of the quality of the data fields includes judging the degree of confusion in the data, judging the proportion of missing values in the data, and judging the repeated values of the data.

进一步地,如果所述字段数据混乱且类型不确定,则判定所述字段为不可用。Further, if the field data is chaotic and the type is uncertain, it is determined that the field is unavailable.

进一步地,当所述字段中数值型和类别型数据的数量相同,则判定数据混乱,并且业务规则中没有指定类型,则所述数据类型为不确定;Further, when the number of numerical and categorical data in the field is the same, it is determined that the data is chaotic, and there is no specified type in the business rule, then the data type is uncertain;

当所述字段数据中某个值的数量占非缺失值总数的比例超过预设阈值,则判定数据重复值过多;When the proportion of the number of a certain value in the field data to the total number of non-missing values exceeds a preset threshold, it is determined that there are too many duplicate values in the data;

当所述字段数据中缺失值的数量占数据总数的比例超过预设阈值,则判定数据缺失值过多,所述数据不可用。When the ratio of the number of missing values in the field data to the total data exceeds a preset threshold, it is determined that there are too many missing values in the data, and the data is unavailable.

进一步地,所述标准态数据库通过可用数值型字段的数据构建得到。Further, the standard state database is constructed by using data of available numeric fields.

进一步地,从历史检测数据中提取质量良好的N次检验数据,根据检测位置将检测数据对齐,得到标准态数据库。Further, N times of inspection data with good quality are extracted from the historical inspection data, and the inspection data is aligned according to the inspection position to obtain a standard state database.

以上对本发明的实施方式进行了具体说明,但本发明并不限于所述实施例,熟悉本领域的技术人员在不违背本发明精神的前提下,还可作出种种等同变型或替换,这些等同或替换均包含在本发明权利要求所限定的范围内。The embodiments of the present invention have been specifically described above, but the present invention is not limited to the examples. Those skilled in the art can make various equivalent modifications or substitutions without departing from the spirit of the present invention. These equivalents or Alternatives are included within the scope of the invention as defined by the claims.

Claims (12)

1.一种检测数据异常值处理方式的自动判别方法,其特征在于,包括:1. an automatic discrimination method of detection data abnormal value processing mode, is characterized in that, comprises: 根据每个字段数据的相关业务规则,确定所述每个字段类型,所述字段类型包括确定型字段和不确定型字段,其中,确定型字段包括数值型字段、类别型字段、时间戳型字段;Each field type is determined according to the relevant business rules of each field data, and the field types include deterministic fields and indeterminate fields, wherein the deterministic fields include numeric fields, category fields, and timestamp fields ; 统计所述字段中缺失值数量占所述字段数据总量的比例R,判断所述字段是否可用;若所述字段可用则进入下一个判别阶段,否则不进入下一个判别阶段;Count the ratio R of the number of missing values in the field to the total amount of data in the field, and determine whether the field is available; if the field is available, enter the next judgment stage, otherwise do not enter the next judgment stage; 当类别型字段为可用,且存在缺失值时,将所述类别型字段中缺失值数据量占比R与可用性阈值R0比较,根据比较结果判别所述类别型字段缺失值的处理方式;When the categorical field is available and there is a missing value, compare the proportion R of the missing value data in the categorical field with the availability threshold R 0 , and determine the processing method for the missing value of the categorical field according to the comparison result; 当数值型字段为可用,分别通过计算变异系数值和缺失值数据量占比,对缺失值和异常值的处理方式进行判别。When the numeric field is available, the processing methods of missing values and outliers are discriminated by calculating the coefficient of variation value and the proportion of missing value data respectively. 2.根据权利要求1所述的检测数据异常值处理方式的自动判别方法,其特征在于:根据可用数值型字段的数据,构建数值型字段的标准态数据库。2 . The automatic discrimination method for the processing mode of abnormal value of detection data according to claim 1 , wherein the standard state database of the numerical field is constructed according to the data of the available numerical field. 3 . 3.根据权利要求1所述的检测数据异常值处理方式的自动判别方法,其特征在于:3. the automatic discrimination method of detection data abnormal value processing mode according to claim 1, is characterized in that: 如果业务规则库中没有确定所述字段类型,则获取所述字段中每个非缺失值对应的数据类型,其中,所述字段的数据类型包括数值型、类别型和时间戳型;If the field type is not determined in the business rule base, obtain the data type corresponding to each non-missing value in the field, wherein the data type of the field includes a numeric type, a category type, and a timestamp type; 根据非缺失值的三种数据类型对应的数据量,分别计算三种数据类型的数据量占所述字段数据中非缺失值数据总量的比例;According to the data volume corresponding to the three data types of non-missing values, calculate the proportion of the data volume of the three data types to the total non-missing value data in the field data; 根据所述字段中数据类型数据量的占比,判别所述字段类型。According to the proportion of the data type data in the field, the field type is determined. 4.根据权利要求3所述的检测数据异常值处理方式的自动判别方法,其特征在于:4. the automatic discrimination method of detection data abnormal value processing mode according to claim 3, is characterized in that: 所述根据所述字段中数据类型数据量的占比,判别所述字段类型,具体包括:The said field type is judged according to the proportion of the data type data volume in the said field, which specifically includes: 以占比最高的数据类型为所述确定型字段的类型;Taking the data type with the highest proportion as the type of the deterministic field; 若三种数据类型的占比相等,则所述字段类型为不确定型字段。If the proportions of the three data types are equal, the field type is an indeterminate field. 5.根据权利要求1所述的检测数据异常值处理方式的自动判别方法,其特征在于:5. the automatic discrimination method of detection data abnormal value processing mode according to claim 1, is characterized in that: 所述判断所述字段是否可用,包括:The judging whether the field is available includes: 当缺失值数据量占比R大于设定可用性阈值R0时,则判断该字段不可用。When the proportion of missing value data R is greater than the set availability threshold R 0 , it is determined that the field is unavailable. 6.根据权利要求5所述的检测数据异常值处理方式的自动判别方法,其特征在于:所述判断所述字段是否可用,还包括:6. The automatic discrimination method of detection data abnormal value processing mode according to claim 5, characterized in that: said judging whether said field is available, further comprising: 统计所述确定型字段中另外两种数据类型数量之和占所述字段数据总量的比例;Count the proportion of the sum of the other two data types in the deterministic field to the total amount of data in the field; 若大于设定可用性阈值R0,则所述确定型字段不可用,否则所述确定型字段可用。If it is greater than the set availability threshold R 0 , the deterministic field is unavailable; otherwise, the deterministic field is available. 7.根据权利要求6所述的检测数据异常值处理方式的自动判别方法,其特征在于,7. The automatic discrimination method of detection data abnormal value processing mode according to claim 6, is characterized in that, 当所述确定型字段可用时;when the deterministic field is available; 将所述确定型字段中另外两种数据类型的数据转化为缺失值进行处理。The data of the other two data types in the deterministic field are converted into missing values for processing. 8.根据权利要求1所述的检测数据异常值处理方式的自动判别方法,其特征在于,8. The automatic discrimination method of detection data abnormal value processing mode according to claim 1, is characterized in that, 所述根据比较结果判别所述类别型字段缺失值的处理方式,包括:The processing method for judging the missing value of the categorical field according to the comparison result includes: 当所述类别型字段中缺失值数据量占比R小于N倍可用性阈值R0时,利用所述类别型字段的众数填充缺失值;When the proportion R of the data volume of missing values in the categorical field is less than N times the availability threshold R 0 , use the mode of the categorical field to fill in the missing value; 当所述类别型字段中缺失值数据量占比R大于等于N倍可用性阈值R0时,利用其他字段的数据构建该类别型字段的Softmax分类模型,利用分类模型对所述类别型字段的分类结果填充所述类别型字段的缺失值。When the proportion R of missing data in the category field is greater than or equal to N times the availability threshold R 0 , use the data of other fields to construct a Softmax classification model for the category field, and use the classification model to classify the category field. The result fills in the missing values of the categorical field. 9.根据权利要求1所述的检测数据异常值处理方式的自动判别方法,其特征在于,所述分别通过计算变异系数值和缺失值数据量占比,对缺失值和异常值的处理方式进行判别,具体包括:9. The automatic discrimination method of detection data abnormal value processing method according to claim 1, characterized in that, the processing methods of missing values and abnormal values are respectively calculated by calculating the coefficient of variation value and the proportion of missing value data. Judgment, including: 计算所述数值型字段的标准差和算术平均值的比例,得到变异系数CV,根据变异系数的值所在阈值范围,利用对应阈值范围设置的判定方法,判定所述数值型字段的数据异常值;Calculate the ratio of the standard deviation and the arithmetic mean of the numerical field to obtain the coefficient of variation CV, and determine the data abnormal value of the numerical field according to the threshold range where the value of the coefficient of variation is located, using the judgment method set corresponding to the threshold range; 将所述数值型字段中缺失值数据量占比R,并与可用性阈值R0比较,根据比较结果填充所述数值型字段的缺失值。The proportion R of the missing value data in the numerical field is compared with the availability threshold R 0 , and the missing value of the numerical field is filled according to the comparison result. 10.根据权利要求9所述的检测数据异常值处理方式的自动判别方法,其特征在于,10. The automatic discrimination method of detection data abnormal value processing mode according to claim 9, is characterized in that, 所述根据变异系数的值所在阈值范围,利用对应阈值范围设置的判定方法,判定所述数值型字段的数据异常值,具体包括:According to the threshold range where the value of the coefficient of variation is located, using the judgment method set corresponding to the threshold range to determine the data abnormal value of the numerical field, specifically including: 当变异系数CV值,在CV值<15%范围时,利用标准态判定数据异常值;When the CV value of the coefficient of variation is in the range of CV value <15%, the standard state is used to determine the abnormal value of the data; 当变异系数CV值,在15%≤CV值<35%范围时,利用孤立森林算法判定数据异常值;When the CV value of the coefficient of variation is in the range of 15%≤CV value<35%, the isolated forest algorithm is used to determine the abnormal value of the data; 当变异系数CV值,在35%≤CV值<50%范围时,利用聚类算法判定数据异常值;When the CV value of the coefficient of variation is in the range of 35%≤CV value<50%, the clustering algorithm is used to determine the abnormal value of the data; 当变异系数CV值,在CV值≥50%范围时,利用3σ方法判定数据异常值。When the CV value of the coefficient of variation is in the range of CV value ≥ 50%, the 3σ method is used to determine the abnormal value of the data. 11.根据权利要求9所述的检测数据异常值处理方式的自动判别方法,其特征在于,11. The automatic discrimination method of detection data abnormal value processing mode according to claim 9, is characterized in that, 当R<0.1R0时,则利用该字段非缺失数据的均值填充缺失值;When R < 0.1R 0 , the mean of the non-missing data in this field is used to fill in the missing values; 当0.1R0≤R<0.5R0时,则利用所述数值型字段与检测位置建立插值模型,通过插值法填充缺失值;When 0.1R 0 ≤R<0.5R 0 , an interpolation model is established by using the numerical field and the detection position, and missing values are filled by interpolation; 当R≥0.5R0时,则利用其他字段的数据构建所述数值型字段的回归模型,利用回归模型填充所述数值型字段的缺失值。When R≥0.5R 0 , use the data of other fields to construct a regression model of the numerical field, and use the regression model to fill in the missing values of the numerical field. 12.一种检测异常值处理方式的自动判别系统,其特征在于,包括业务规则判别模块、数据字段类型自动判别模块、数据字段可用性自动判别模块、标准态数据库模块和数据字段处理方式自动判别模块;12. An automatic discrimination system for detecting an abnormal value processing method, characterized in that it comprises a business rule discrimination module, a data field type automatic discrimination module, a data field availability automatic discrimination module, a standard state database module and a data field processing mode automatic discrimination module. ; 所述业务规则判别模块,用于设置并存储各个字段的业务规则,其中业务规则包括字段的数据类型、字段取值范围或集合;The business rule discrimination module is used to set and store the business rules of each field, wherein the business rules include the data type of the field, the field value range or set; 所述数据字段类型自动判别模块,用于分析业务规则中未明确数据字段的数据类型,以判别所述字段的字段类型,所述字段类型包括确定型字段和不确定型字段,其中所述确定型字段包括数值型字段、类别型字段和时间戳型字段;The data field type automatic discriminating module is used to analyze the data type of the unspecified data field in the business rules to discriminate the field type of the field. Type fields include numeric fields, categorical fields and timestamp fields; 所述数据字段可用性自动判别模块,用于判别各个数据字段的质量情况,以判断各个数据字段是否具有分析意义;The data field availability automatic judging module is used for judging the quality of each data field to judge whether each data field has analytical significance; 所述标准态数据库模块,用于判别数值型字段的异常值和缺失值处理方式;The standard state database module is used to discriminate the abnormal value and missing value processing mode of the numerical field; 所述数据字段处理方式自动判别模块,用于判别各个数据字段类型中异常值和/或缺失值的具体处理方式。The data field processing mode automatic discriminating module is used to discriminate the specific processing mode of outliers and/or missing values in each data field type.
CN202210815910.XA 2022-07-12 2022-07-12 Automatic judgment method and system for processing mode of abnormal value of detection data Active CN114996318B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210815910.XA CN114996318B (en) 2022-07-12 2022-07-12 Automatic judgment method and system for processing mode of abnormal value of detection data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210815910.XA CN114996318B (en) 2022-07-12 2022-07-12 Automatic judgment method and system for processing mode of abnormal value of detection data

Publications (2)

Publication Number Publication Date
CN114996318A true CN114996318A (en) 2022-09-02
CN114996318B CN114996318B (en) 2022-11-04

Family

ID=83020719

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210815910.XA Active CN114996318B (en) 2022-07-12 2022-07-12 Automatic judgment method and system for processing mode of abnormal value of detection data

Country Status (1)

Country Link
CN (1) CN114996318B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118861019A (en) * 2024-07-05 2024-10-29 山西晋云互联科技有限公司 A method, system, device and medium for automatically verifying the quality of structured data
CN119126779A (en) * 2024-08-06 2024-12-13 台州爱鑫智能科技有限公司 A control method for an unmanned intelligent robot

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020169735A1 (en) * 2001-03-07 2002-11-14 David Kil Automatic mapping from data to preprocessing algorithms
US20040194061A1 (en) * 2003-03-31 2004-09-30 Hitachi, Ltd. Method for allocating programs
CN103440283A (en) * 2013-08-13 2013-12-11 江苏华大天益电力科技有限公司 Vacancy filling system for measured point data and vacancy filling method
CN105426425A (en) * 2015-11-04 2016-03-23 华中科技大学 Big data marketing method based on mobile signaling
CN106649579A (en) * 2016-11-17 2017-05-10 苏州航天系统工程有限公司 Time-series data cleaning method for pipe net modeling
CN107729293A (en) * 2017-09-27 2018-02-23 中南大学 A kind of geographical space method for detecting abnormal based on Multivariate adaptive regression splines
CN110086860A (en) * 2019-04-19 2019-08-02 武汉大学 A kind of data exception detection method and device under Internet of Things big data environment
CN110808084A (en) * 2019-09-19 2020-02-18 西安电子科技大学 A copy number variation detection method based on single-sample next-generation sequencing data
CN111177217A (en) * 2019-12-24 2020-05-19 平安信托有限责任公司 Data preprocessing method, device, computer equipment and storage medium
CN111680267A (en) * 2020-06-01 2020-09-18 四川大学 A three-stage advanced online identification method for abnormal dam safety monitoring data
CN111737249A (en) * 2020-08-24 2020-10-02 国网浙江省电力有限公司 Abnormal data detection method and device based on Lasso algorithm
CN112883340A (en) * 2021-04-30 2021-06-01 西南交通大学 Track quality index threshold value rationality analysis method based on quantile regression
CN113934716A (en) * 2021-09-27 2022-01-14 杭州电子科技大学 A smart campus-oriented time series data restoration method based on coefficient of variation constraints
CN114492552A (en) * 2020-11-12 2022-05-13 中移动信息技术有限公司 Method, device and equipment for training broadband user authenticity judgment model
CN114660378A (en) * 2022-02-28 2022-06-24 成都唐源电气股份有限公司 A Catenary Comprehensive Diagnosis Method Based on Multi-source Detection Parameters

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020169735A1 (en) * 2001-03-07 2002-11-14 David Kil Automatic mapping from data to preprocessing algorithms
US20040194061A1 (en) * 2003-03-31 2004-09-30 Hitachi, Ltd. Method for allocating programs
CN103440283A (en) * 2013-08-13 2013-12-11 江苏华大天益电力科技有限公司 Vacancy filling system for measured point data and vacancy filling method
CN105426425A (en) * 2015-11-04 2016-03-23 华中科技大学 Big data marketing method based on mobile signaling
CN106649579A (en) * 2016-11-17 2017-05-10 苏州航天系统工程有限公司 Time-series data cleaning method for pipe net modeling
CN107729293A (en) * 2017-09-27 2018-02-23 中南大学 A kind of geographical space method for detecting abnormal based on Multivariate adaptive regression splines
CN110086860A (en) * 2019-04-19 2019-08-02 武汉大学 A kind of data exception detection method and device under Internet of Things big data environment
CN110808084A (en) * 2019-09-19 2020-02-18 西安电子科技大学 A copy number variation detection method based on single-sample next-generation sequencing data
CN111177217A (en) * 2019-12-24 2020-05-19 平安信托有限责任公司 Data preprocessing method, device, computer equipment and storage medium
CN111680267A (en) * 2020-06-01 2020-09-18 四川大学 A three-stage advanced online identification method for abnormal dam safety monitoring data
CN111737249A (en) * 2020-08-24 2020-10-02 国网浙江省电力有限公司 Abnormal data detection method and device based on Lasso algorithm
CN114492552A (en) * 2020-11-12 2022-05-13 中移动信息技术有限公司 Method, device and equipment for training broadband user authenticity judgment model
CN112883340A (en) * 2021-04-30 2021-06-01 西南交通大学 Track quality index threshold value rationality analysis method based on quantile regression
CN113934716A (en) * 2021-09-27 2022-01-14 杭州电子科技大学 A smart campus-oriented time series data restoration method based on coefficient of variation constraints
CN114660378A (en) * 2022-02-28 2022-06-24 成都唐源电气股份有限公司 A Catenary Comprehensive Diagnosis Method Based on Multi-source Detection Parameters

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118861019A (en) * 2024-07-05 2024-10-29 山西晋云互联科技有限公司 A method, system, device and medium for automatically verifying the quality of structured data
CN119126779A (en) * 2024-08-06 2024-12-13 台州爱鑫智能科技有限公司 A control method for an unmanned intelligent robot

Also Published As

Publication number Publication date
CN114996318B (en) 2022-11-04

Similar Documents

Publication Publication Date Title
CN111882446B (en) Abnormal account detection method based on graph convolution network
CN114996318A (en) Automatic judgment method and system for processing mode of abnormal value of detection data
CN107742127A (en) An improved anti-stealing intelligent early warning system and method
CN109949152A (en) A Personal Credit Default Prediction Method
CN111695823B (en) Industrial control network flow-based anomaly evaluation method and system
CN107679734A (en) It is a kind of to be used for the method and system without label data classification prediction
CN110866331A (en) An Evaluation Method for Quality Defects of Power Transformer Family
CN115510302B (en) Smart factory data classification method based on big data statistics
CN105574642A (en) Smart grid big data-based electricity price execution checking method
CN117349786B (en) Evidence fusion transformer fault diagnosis method based on data equalization
CN109215799B (en) Screening method for false association signals in reported adverse drug reaction data of concomitant medications
CN117829994A (en) Money laundering risk analysis method based on graph calculation
CN111709668A (en) Method and device for risk identification of power grid equipment parameters based on data mining technology
CN117171157A (en) Clearing data acquisition and cleaning method based on data analysis
CN112330095A (en) Quality management method based on decision tree algorithm
CN116739645A (en) Order abnormity supervision system based on enterprise management
CN112949735A (en) Liquid hazardous chemical substance volatile concentration abnormity discovery method based on outlier data mining
CN114385403A (en) Distributed collaborative fault diagnosis method based on two-layer knowledge graph architecture
CN118279034A (en) Internet financial wind control report analysis method and system based on artificial intelligence
CN110703183A (en) Intelligent electric energy meter fault data analysis method and system
CN119722282A (en) A method and system for building a corporate credit scoring model
CN113657747A (en) Enterprise safety production standardization level intelligent evaluation system
CN115081716A (en) Enterprise default risk prediction method, computer equipment and storage medium
CN118779587B (en) Gas use abnormality judging method based on gas user classification model
CN118587019B (en) A method for inferring and identifying the time of an accident based on big data of Internet of Vehicles

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant