CN114996318A - Automatic judgment method and system for processing mode of abnormal value of detection data - Google Patents

Automatic judgment method and system for processing mode of abnormal value of detection data Download PDF

Info

Publication number
CN114996318A
CN114996318A CN202210815910.XA CN202210815910A CN114996318A CN 114996318 A CN114996318 A CN 114996318A CN 202210815910 A CN202210815910 A CN 202210815910A CN 114996318 A CN114996318 A CN 114996318A
Authority
CN
China
Prior art keywords
field
data
value
type
missing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210815910.XA
Other languages
Chinese (zh)
Other versions
CN114996318B (en
Inventor
高仕斌
占栋
李想
张金鑫
佘夏威
熊昊睿
黄瀚韬
冯中伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southwest Jiaotong University
Chengdu Tangyuan Electric Co Ltd
Original Assignee
Southwest Jiaotong University
Chengdu Tangyuan Electric Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southwest Jiaotong University, Chengdu Tangyuan Electric Co Ltd filed Critical Southwest Jiaotong University
Priority to CN202210815910.XA priority Critical patent/CN114996318B/en
Publication of CN114996318A publication Critical patent/CN114996318A/en
Application granted granted Critical
Publication of CN114996318B publication Critical patent/CN114996318B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24564Applying rules; Deductive queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses an automatic discrimination method and system for detecting abnormal value processing mode of data, which is characterized in that each field type is determined; counting the proportion of the missing value data quantity in each data field to the total data quantity of the field, and judging whether the field is available; if the field is available, entering the next judging stage, otherwise, not entering the next judging stage; when a type field is available and a missing value exists, comparing the ratio of the missing value data amount in the type field with the availability threshold value R 0 Comparing, and judging the processing mode of the missing value of the type field according to the comparison result; when the numerical field is available, the processing modes of the missing value and the abnormal value are judged by calculating the ratio of the coefficient of variation value to the missing value data amount. By matching statistical and business rulesThe combined mode is based on a data analysis technology, so that the data analysis efficiency is effectively improved, and the burden of big data analysis personnel and business experts is reduced.

Description

Automatic judgment method and system for processing mode of abnormal value of detection data
Technical Field
The invention relates to the technical field of statistics and data mining technology, in particular to an automatic judgment method and system for detecting a data abnormal value processing mode.
Background
The existing processing method for the abnormal value of the rail transit detection data is necessary to firstly obtain the data type and distribution of each field by analyzing each field of the detection data one by one through a data analyzer. Meanwhile, an analyst must finally determine the abnormal value and the missing value of each data field to process by combining the business background of the data field with the assistance of a business expert. The above method has the disadvantages that if the dimension or field of the detected data is more, the burden of data analysis personnel and service experts is increased, and the efficiency of data analysis is reduced. Therefore, the invention discloses an automatic judging system and method for processing abnormal values and missing values of rail transit detection data, which are constructed on the basis of a data analysis technology in a mode of combining statistics and business rules.
Disclosure of Invention
In order to overcome the defects in the prior art, the invention aims to provide an automatic judgment method for a processing mode of detecting abnormal values of data, which is suitable for the field of rail transit.
The technical scheme of the invention is as follows:
and S1, determining each field type according to the relevant business rule of each field data, wherein the field types comprise a deterministic field and an indeterminate field, and the deterministic field comprises a numerical field, a categorical field and a timestamp field.
Further, the step S1 includes:
retrieving the relevant business rules of each field data from the business rule base;
if the field type of the data field is clear in the service rule base, the data field type is the specified type in the service rule;
if the field data has no business rule, acquiring the data type of each non-missing value of the data field, wherein the data type of each non-missing value comprises a numerical type, a category type and a time stamp type;
respectively calculating the proportion of the number of the three data types to the total number of the non-missing value data of the field according to the number of the three data types of each non-missing value of the field, and taking the data type with the highest proportion as the field type of the field; if the proportion of the three data types is equal, the field type of the field is uncertain.
S2, counting the proportion of the missing value quantity in each field in the total data quantity of the field, and judging whether the field is available; and if the field is available, entering the next discrimination stage, otherwise, not entering the next discrimination stage.
Further, when the missing value number ratio R is greater than the set availability threshold R 0 If so, the field is determined to be unavailable.
Further, the data of the determined field type is analyzed, and if the sum of the data amount of the other two data types in the field accounts for the total data amount of the field and is larger than the availability threshold value R 0 Then the field is not available; if the field is available, the data of the other two data types in the field is converted into missing value processing.
Further, a standard state database of the numerical type field is constructed according to the data of the available numerical type field.
Furthermore, N times of inspection data with good quality are extracted from the historical inspection data, and the inspection data are aligned according to the inspection positions to obtain a standard state database.
S3, when the type field is available and the missing value exists, comparing the missing value data quantity in the type field with the availability threshold value R 0 And comparing, and judging the processing mode of the missing value of the type field according to the comparison result.
Further, when the missing data amount ratio R in the type field is smaller than the availability threshold value
Figure 600002DEST_PATH_IMAGE001
Filling missing values with the mode of the type field;
when the ratio R of the missing data amount in the type field is greater than or equal to the availability threshold value
Figure 156885DEST_PATH_IMAGE002
And then, constructing a Softmax classification model of the type field by using data of other fields, and filling the missing value of the type field with the classification result of the type field by using the classification model.
And S4, when the numerical field is available, judging the processing modes of the missing value and the abnormal value by calculating the ratio of the coefficient of variation value to the missing value data amount respectively.
Further, the step S4 specifically includes:
s41, calculating the ratio of the standard deviation and the arithmetic mean of the numerical field to obtain a coefficient of variation CV;
the specific calculation formula is as follows:
Figure 762267DEST_PATH_IMAGE003
wherein,
Figure 430008DEST_PATH_IMAGE004
for field data markingThe difference in the alignment is obtained by the following steps,
Figure 161204DEST_PATH_IMAGE005
is the arithmetic mean of the field data;
judging the data abnormal value of the numerical field by a judging method set corresponding to the threshold range according to the threshold range of the value of the coefficient of variation;
s42, comparing the missing data quantity ratio R in the numerical type field with the availability threshold value
Figure 888989DEST_PATH_IMAGE006
And comparing, and judging the filling mode of the missing value of the numerical field according to the comparison result.
Further, the step S41 includes:
when the coefficient of variation CV value is less than 15%, judging the abnormal value of the data by using the standard state;
when the coefficient of variation CV value is less than 35% and more than or equal to 15%, judging a data abnormal value by using an isolated forest algorithm;
when the coefficient of variation CV value is less than 50% and more than or equal to 35%, judging a data abnormal value by using a clustering algorithm;
when the coefficient of variation CV value is 50% or more, the data abnormal value is determined by the 3 σ method.
According to the judgment method corresponding to the threshold range of the variation coefficient, the efficiency of automatic judgment can be improved.
Further, the ratio R of the missing data amount in the numerical field to the availability threshold value
Figure 987526DEST_PATH_IMAGE007
Comparing, and judging the missing value filling mode of the numerical field according to the comparison result, wherein the missing value filling mode comprises the following steps:
when in use
Figure 193379DEST_PATH_IMAGE008
If so, filling the missing value by using the mean value of the non-missing data of the field;
when in use
Figure 310240DEST_PATH_IMAGE009
Then, an interpolation model is established by utilizing the numerical field and the detection position, and missing values are filled by an interpolation method;
when the temperature is higher than the set temperature
Figure 208926DEST_PATH_IMAGE010
And then, constructing a regression model of the numerical field by using the data of other fields, and filling the missing value of the numerical field by using the regression model.
Compared with the prior art, the invention has the beneficial effects that:
1. combining expert experience with business rules to realize automation of judgment of abnormal values and missing value processing modes of the detected data;
2. from the aspect of data quality, the judgment result is more reliable by combining the usability of data;
3. in the process of constructing the numerical variable, the historical detection data is fully utilized;
4. the automatic discrimination system is constructed in a modularized way, and is beneficial to the realization of a computer.
Based on the above automatic discrimination method for detecting the abnormal data processing mode, the present invention further provides an automatic discrimination system for detecting the abnormal data processing mode, comprising:
the business rule judging module is used for setting and storing the business rules of all the fields, wherein the business rules comprise the data types of the fields and the field value ranges or sets;
the field type automatic judging module is used for analyzing the data type of an undefined data field in the business rule to judge the field type of the field, wherein the field type comprises a deterministic field and an indeterminate field, and the deterministic field comprises a numerical field, a type field and a timestamp field;
the data field availability automatic judging module is used for judging the quality condition of each data field so as to judge whether each data field has analytical significance;
the standard state database module is used for judging the abnormal value and missing value processing mode of the numerical field;
and the data field processing mode automatic judging module is used for judging the specific processing mode of the abnormal value and/or the missing value in each data field type.
Further, analyzing the data type of the undetermined data field in the service rule, including analyzing the ratio of the numerical value type value, the category type value and the timestamp type value in the non-missing values in each data field, to obtain the field type of each data field.
Furthermore, the quality condition of each data field is judged by judging the data chaos degree, the data missing value proportion and the data repeated value.
Further, if the field data is cluttered and the type is uncertain, the field is determined to be unavailable.
Further, when the number of the numerical data and the number of the category data in the field are the same, determining that the data are disordered, and if no designated type exists in the business rule, determining that the data type is uncertain;
when the proportion of the number of certain values in the field data to the total number of the non-missing values exceeds a preset threshold value, judging that the data repetition values are excessive;
and when the proportion of the number of the missing values in the field data to the total number of the data exceeds a preset threshold value, judging that the missing values of the data are too many and the data are unavailable.
Further, the standard state database is constructed by data of available numerical fields.
Furthermore, N times of inspection data with good quality are extracted from the historical inspection data, and the inspection data are aligned according to the inspection positions to obtain a standard state database.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
Detailed Description
The concept, embodiments and technical effects of the present invention will be clearly and completely described below in conjunction with the embodiments and the accompanying drawings to fully understand the objects, features and effects of the present invention.
Example 1
As shown in fig. 1, the present embodiment provides an automatic determination method for processing abnormal values of detection data, which is suitable for the field of rail transit, and includes the following steps:
s1, determining each data field type according to the relevant business rule of each data field, wherein the field type comprises a deterministic field and an indeterminate field, and the deterministic field comprises a numerical field, a categorical field and a timestamp field;
s2, counting the proportion of the missing value data quantity in each data field in the total data quantity of the field, and judging whether the field is available; if the field is available, entering the next discrimination stage, otherwise, not entering the next discrimination stage;
s3, when the type field is available and the missing value exists, comparing the missing value data quantity in the type field with the ratio of R to N times of the availability threshold value
Figure 529180DEST_PATH_IMAGE012
Comparing, and judging the processing mode of the missing value of the type field according to the comparison result;
and S4, when the numerical field is available, judging the processing modes of the missing value and the abnormal value by calculating the ratio of the coefficient of variation value to the missing value data amount respectively.
Example 2
On the basis of embodiment 1, the invention provides a data type determination method, which includes:
retrieving the relevant business rules for each data field from a business rule base;
if the field type of the data field is clear in the service rule base, the data field type is a specified type in the service rule;
if the business rule of the data field does not exist, acquiring the type of each non-missing value of the data field, wherein the data type of each non-missing value comprises a numerical type, a category type and a time stamp type;
calculating the proportion of the data quantity of the three data types to the total quantity of the field non-missing value data according to the acquired data quantity corresponding to the three data types of each non-missing value of the field, and taking the data type with the highest proportion as the field type of the data field; and if the proportions of the three data types are equal, the field type of the data field is an indeterminate type.
Example 3
On the basis of embodiment 2, a method for judging whether the field is available is provided, which specifically includes:
when the missing value data quantity ratio R is greater than the set availability threshold value
Figure 335462DEST_PATH_IMAGE014
If so, the data field is determined to be unavailable.
Further, the data of the determined field type is analyzed, and if the sum of the data amount of the other two data types in the data field accounts for the total data amount of the field and is larger than the availability threshold value
Figure 447774DEST_PATH_IMAGE016
Then the data field is not available; if available, the data of the other two data types in the data field is converted into missing value processing.
Example 4
And constructing a standard state database of the numerical type field according to the data of the available numerical type field.
Furthermore, N times of inspection data with good quality are extracted from the historical inspection data, and the inspection data are aligned according to the inspection positions to obtain a standard state database.
Example 5
On the basis of embodiment 3, the specific judgment of the processing mode of the missing value of the category-type field proposed by the scheme includes:
when the missing data amount ratio R in the type field is less than N times the availability threshold value
Figure 110837DEST_PATH_IMAGE014
Filling missing values with the mode of the type field;
when the type fieldThe ratio R of the medium missing value data amount is more than or equal to N times of the availability threshold value
Figure DEST_PATH_IMAGE018
And then, constructing a Softmax classification model of the type field by using non-missing data of other fields, classifying the type field, and filling a missing value of the type field according to a classification result of the type field by using the classification model, wherein N is preferably 0.1 in the scheme.
Example 6
Based on embodiment 3, the specific determination of the processing manner of the missing value and the abnormal value of the numeric field proposed by the present solution includes:
s41, calculating the ratio of the standard deviation and the arithmetic mean of the numerical field to obtain a coefficient of variation CV;
the specific calculation formula is as follows:
Figure 652808DEST_PATH_IMAGE003
wherein,
Figure 200464DEST_PATH_IMAGE004
is the standard deviation of the field data,
Figure 291916DEST_PATH_IMAGE005
is the arithmetic mean of the field data;
judging the data abnormal value of the numerical field by a judging method set corresponding to the threshold range according to the threshold range of the value of the coefficient of variation;
s42, comparing the missing data quantity ratio R in the numerical type field with the availability threshold value
Figure DEST_PATH_IMAGE020
And comparing, and judging the missing value filling mode of the numerical field according to the comparison result.
Further, the step S41 includes:
when the coefficient of variation CV value is less than 15%, judging the result to be the abnormal value of the data by using the standard state;
when the CV value of the variation coefficient is more than or equal to 15% and less than 35%, judging the abnormal value of the data by using an isolated forest algorithm;
when the CV value of the variation coefficient is more than or equal to 35% and less than 50%, judging the result that the abnormal value of the data is judged by utilizing a clustering algorithm;
when the CV value of the variation coefficient is more than or equal to 50 percent, the result of the judgment is that the abnormal value of the data is judged by using a 3 sigma method.
According to the judgment method corresponding to the threshold range of the variation coefficient, the efficiency of automatic judgment can be improved.
Further, the ratio R of the missing data quantity in the numerical field to the availability threshold value is
Figure DEST_PATH_IMAGE022
Comparing, and judging the missing value filling mode of the numerical field according to the comparison result, wherein the missing value filling mode comprises the following steps:
when in use
Figure DEST_PATH_IMAGE023
If so, filling missing values by using the mean value of the non-missing data in the numerical type field;
when in use
Figure DEST_PATH_IMAGE024
If so, establishing an interpolation model by using the numerical field and the detection position, and filling the missing value by an interpolation method;
when the temperature is higher than the set temperature
Figure DEST_PATH_IMAGE025
And then, constructing a regression model of the numerical field by using the data of other fields, and filling the missing value of the numerical field by using the regression model.
Compared with the prior art, the invention has the beneficial effects that:
1. combining expert experience with business rules to realize automation of judgment of abnormal values and missing value processing modes of the detection data;
2. from the aspect of data quality, the judgment result is more reliable by combining the usability of data;
3. in the process of constructing the numerical variable, the historical detection data is fully utilized;
4. the automatic discrimination system is constructed in a modularized way, and is beneficial to the realization of a computer.
Example 7
Based on the above automatic discrimination method for detecting the abnormal data processing mode, the present invention further provides an automatic discrimination system for detecting the abnormal data processing mode, comprising:
the business rule judging module is used for setting and storing the business rules of all the fields, wherein the business rules comprise the data types of the fields and the field value ranges or sets;
the automatic data field type judging module is used for analyzing the data type of an undefined data field in the business rule to judge the field type of the field, wherein the field type comprises a deterministic field and an indeterminate field, and the deterministic field comprises a numerical field, a type field and a timestamp field;
the data field availability automatic judging module is used for judging the quality condition of each data field so as to judge whether each data field has analytical significance;
the standard state database module is used for judging the abnormal value and missing value processing mode of the numerical field;
and the data field processing method automatic judging module is used for judging the specific processing mode of abnormal values and/or missing values in each data field type.
Further, the analyzing the data type of the unspecified data field in the service rule includes analyzing the ratio of a numerical value type value, a category type value and a timestamp type value in the non-missing values in each data field to obtain the field type of each data field.
Further, the quality condition of the data field is judged by judging the degree of data confusion, the ratio of data missing values and the data duplicate value.
Further, if the field data is cluttered and the type is uncertain, the field is determined to be unavailable.
Further, when the number of the numerical data and the number of the category data in the field are the same, determining that the data are disordered, and if no designated type exists in the business rule, determining that the data type is uncertain;
when the proportion of the number of a certain value in the field data to the total number of the non-missing values exceeds a preset threshold value, judging that the data repetition value is excessive;
and when the ratio of the number of missing values in the field data to the total number of the data exceeds a preset threshold value, judging that the number of the missing values in the field data is too many and the data is unavailable.
Further, the standard state database is constructed by data of available numerical fields.
Furthermore, N times of inspection data with good quality are extracted from the historical inspection data, and the inspection data are aligned according to the inspection position to obtain a standard state database.
The embodiments of the present invention have been described in detail, but the present invention is not limited to the embodiments, and those skilled in the art can make various equivalent modifications or substitutions without departing from the spirit of the present invention, and the equivalents or substitutions are included in the scope of the present invention defined by the claims.

Claims (12)

1. An automatic discrimination method for detecting a processing mode of an abnormal value of data, comprising:
determining each field type according to the related business rule of each field data, wherein the field type comprises a deterministic field and an indeterminate field, and the deterministic field comprises a numerical field, a categorical field and a timestamp field;
counting the proportion R of the number of missing values in the field in the total data amount of the field, and judging whether the field is available; if the field is available, entering the next judging stage, otherwise, not entering the next judging stage;
when the type field is available and the missing value exists, the missing value data quantity in the type field is divided into R and the availability threshold value R 0 Is compared according toJudging the processing mode of the missing value of the type field according to the comparison result;
when the numerical field is available, the processing modes of the missing value and the abnormal value are judged by calculating the ratio of the coefficient of variation value to the missing value data amount.
2. The method for automatically determining a detection data abnormal value processing mode according to claim 1, wherein: and constructing a standard state database of the numerical type field according to the data of the available numerical type field.
3. The method for automatically discriminating a processing mode for detecting an abnormal value of data according to claim 1, wherein:
if the field type is not determined in the service rule base, acquiring a data type corresponding to each non-missing value in the field, wherein the data type of the field comprises a numerical type, a category type and a time stamp type;
respectively calculating the proportion of the data quantity of the three data types to the total quantity of the non-missing value data in the field data according to the data quantity corresponding to the three data types of the non-missing value;
and judging the field type according to the proportion of the data quantity of the data type in the field.
4. The method for automatically discriminating a processing mode for detecting an abnormal value of data according to claim 3, comprising:
the judging the field type according to the ratio of the data quantity of the data type in the field specifically comprises:
taking the data type with the highest proportion as the type of the deterministic field;
and if the occupation ratios of the three data types are equal, the field type is an uncertain field.
5. The method for automatically discriminating a processing mode for detecting an abnormal value of data according to claim 1, wherein:
the judging whether the field is available comprises the following steps:
when the missing value data quantity ratio R is greater than the set availability threshold value R 0 If so, the field is determined to be unavailable.
6. The method for automatically discriminating a processing mode for detecting an abnormal value of data according to claim 5, wherein: the determining whether the field is available further includes:
counting the proportion of the sum of the other two data types in the deterministic field to the total data amount of the field;
if greater than the set availability threshold R 0 Then the deterministic field is not available, otherwise the deterministic field is available.
7. The method for automatically determining a mode of processing an abnormal value of detection data according to claim 6, wherein the method further comprises the step of determining the mode of processing the abnormal value of the detection data,
when the deterministic field is available;
and converting the data of the other two data types in the deterministic field into missing values for processing.
8. The method for automatically determining a mode of processing an abnormal value of detection data according to claim 1, wherein the method further comprises the step of determining a mode of processing an abnormal value of detection data,
the processing mode for judging the missing value of the type field according to the comparison result comprises the following steps:
when the missing data amount ratio R in the category type field is less than N times the availability threshold value R 0 Filling missing values with the mode of the type field;
when the ratio R of the missing data amount in the type field is more than or equal to N times of the availability threshold value R 0 And then, constructing a Softmax classification model of the type field by using data of other fields, and filling the missing value of the type field with the classification result of the type field by using the classification model.
9. The method of claim 1, wherein the method of automatically determining the processing mode of the missing value and the abnormal value by calculating the ratio of the coefficient of variation value to the missing value data amount comprises:
calculating the ratio of the standard deviation and the arithmetic mean of the numerical field to obtain a coefficient of variation CV, and judging the data abnormal value of the numerical field by a judgment method set corresponding to the threshold range according to the threshold range of the value of the coefficient of variation;
comparing the missing data quantity in the numerical field with a usability threshold value R 0 And comparing, and filling the missing value of the numerical field according to the comparison result.
10. The method for automatically discriminating a processing mode for detecting an abnormal value of data according to claim 9,
the determining, according to the threshold range in which the value of the coefficient of variation is located, a determination method set corresponding to the threshold range is used to determine the data abnormal value of the numeric field, specifically including:
when the CV value of the coefficient of variation is in the range of less than 15%, judging the abnormal value of the data by using a standard state;
when the CV value of the coefficient of variation is within the range of more than or equal to 15% and less than 35%, judging the abnormal value of the data by using an isolated forest algorithm;
when the CV value of the coefficient of variation is within the range of more than or equal to 35% and less than 50%, judging the abnormal value of the data by using a clustering algorithm;
when the CV value of the coefficient of variation is within a range of not less than 50%, the abnormal value of the data is determined by the 3 σ method.
11. The method for automatically discriminating a processing mode for detecting an abnormal value of data according to claim 9,
when R is less than 0.1R 0 If so, filling the missing value by using the mean value of the non-missing data of the field;
when 0.1R 0 ≤R<0.5R 0 Then, an interpolation model is established by utilizing the numerical field and the detection position, and missing values are filled by an interpolation method;
when R is more than or equal to 0.5R 0 And then, constructing a regression model of the numerical field by using the data of other fields, and filling the missing value of the numerical field by using the regression model.
12. An automatic judging system for detecting abnormal value processing modes is characterized by comprising a service rule judging module, a data field type automatic judging module, a data field availability automatic judging module, a standard state database module and a data field processing mode automatic judging module;
the business rule judging module is used for setting and storing the business rules of all the fields, wherein the business rules comprise the data types of the fields and the field value ranges or sets;
the data field type automatic judging module is used for analyzing the data type of an undefined data field in a business rule to judge the field type of the field, wherein the field type comprises a deterministic field and an indeterminate field, and the deterministic field comprises a numerical field, a categorical field and a timestamp field;
the data field availability automatic judging module is used for judging the quality condition of each data field so as to judge whether each data field has analytical significance;
the standard state database module is used for judging the abnormal value and missing value processing mode of the numerical field;
and the data field processing mode automatic judging module is used for judging the specific processing mode of abnormal values and/or missing values in each data field type.
CN202210815910.XA 2022-07-12 2022-07-12 Automatic judgment method and system for processing mode of abnormal value of detection data Active CN114996318B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210815910.XA CN114996318B (en) 2022-07-12 2022-07-12 Automatic judgment method and system for processing mode of abnormal value of detection data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210815910.XA CN114996318B (en) 2022-07-12 2022-07-12 Automatic judgment method and system for processing mode of abnormal value of detection data

Publications (2)

Publication Number Publication Date
CN114996318A true CN114996318A (en) 2022-09-02
CN114996318B CN114996318B (en) 2022-11-04

Family

ID=83020719

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210815910.XA Active CN114996318B (en) 2022-07-12 2022-07-12 Automatic judgment method and system for processing mode of abnormal value of detection data

Country Status (1)

Country Link
CN (1) CN114996318B (en)

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020169735A1 (en) * 2001-03-07 2002-11-14 David Kil Automatic mapping from data to preprocessing algorithms
US20040194061A1 (en) * 2003-03-31 2004-09-30 Hitachi, Ltd. Method for allocating programs
CN103440283A (en) * 2013-08-13 2013-12-11 江苏华大天益电力科技有限公司 Vacancy filling system for measured point data and vacancy filling method
CN105426425A (en) * 2015-11-04 2016-03-23 华中科技大学 Big data marketing method based on mobile signaling
CN106649579A (en) * 2016-11-17 2017-05-10 苏州航天系统工程有限公司 Time-series data cleaning method for pipe net modeling
CN107729293A (en) * 2017-09-27 2018-02-23 中南大学 A kind of geographical space method for detecting abnormal based on Multivariate adaptive regression splines
CN110086860A (en) * 2019-04-19 2019-08-02 武汉大学 A kind of data exception detection method and device under Internet of Things big data environment
CN110808084A (en) * 2019-09-19 2020-02-18 西安电子科技大学 Copy number variation detection method based on single-sample second-generation sequencing data
CN111177217A (en) * 2019-12-24 2020-05-19 平安信托有限责任公司 Data preprocessing method and device, computer equipment and storage medium
CN111680267A (en) * 2020-06-01 2020-09-18 四川大学 Three-step online identification method for abnormal dam safety monitoring data
CN111737249A (en) * 2020-08-24 2020-10-02 国网浙江省电力有限公司 Abnormal data detection method and device based on Lasso algorithm
CN112883340A (en) * 2021-04-30 2021-06-01 西南交通大学 Track quality index threshold value rationality analysis method based on quantile regression
CN113934716A (en) * 2021-09-27 2022-01-14 杭州电子科技大学 Smart campus time series data-oriented repair method based on variation coefficient constraint
CN114492552A (en) * 2020-11-12 2022-05-13 中移动信息技术有限公司 Method, device and equipment for training broadband user authenticity judgment model
CN114660378A (en) * 2022-02-28 2022-06-24 成都唐源电气股份有限公司 Multi-source detection parameter-based contact network comprehensive diagnosis method

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020169735A1 (en) * 2001-03-07 2002-11-14 David Kil Automatic mapping from data to preprocessing algorithms
US20040194061A1 (en) * 2003-03-31 2004-09-30 Hitachi, Ltd. Method for allocating programs
CN103440283A (en) * 2013-08-13 2013-12-11 江苏华大天益电力科技有限公司 Vacancy filling system for measured point data and vacancy filling method
CN105426425A (en) * 2015-11-04 2016-03-23 华中科技大学 Big data marketing method based on mobile signaling
CN106649579A (en) * 2016-11-17 2017-05-10 苏州航天系统工程有限公司 Time-series data cleaning method for pipe net modeling
CN107729293A (en) * 2017-09-27 2018-02-23 中南大学 A kind of geographical space method for detecting abnormal based on Multivariate adaptive regression splines
CN110086860A (en) * 2019-04-19 2019-08-02 武汉大学 A kind of data exception detection method and device under Internet of Things big data environment
CN110808084A (en) * 2019-09-19 2020-02-18 西安电子科技大学 Copy number variation detection method based on single-sample second-generation sequencing data
CN111177217A (en) * 2019-12-24 2020-05-19 平安信托有限责任公司 Data preprocessing method and device, computer equipment and storage medium
CN111680267A (en) * 2020-06-01 2020-09-18 四川大学 Three-step online identification method for abnormal dam safety monitoring data
CN111737249A (en) * 2020-08-24 2020-10-02 国网浙江省电力有限公司 Abnormal data detection method and device based on Lasso algorithm
CN114492552A (en) * 2020-11-12 2022-05-13 中移动信息技术有限公司 Method, device and equipment for training broadband user authenticity judgment model
CN112883340A (en) * 2021-04-30 2021-06-01 西南交通大学 Track quality index threshold value rationality analysis method based on quantile regression
CN113934716A (en) * 2021-09-27 2022-01-14 杭州电子科技大学 Smart campus time series data-oriented repair method based on variation coefficient constraint
CN114660378A (en) * 2022-02-28 2022-06-24 成都唐源电气股份有限公司 Multi-source detection parameter-based contact network comprehensive diagnosis method

Also Published As

Publication number Publication date
CN114996318B (en) 2022-11-04

Similar Documents

Publication Publication Date Title
CN111882446B (en) Abnormal account detection method based on graph convolution network
CN108322347B (en) Data detection method, device, detection server and storage medium
CN109816031B (en) Transformer state evaluation clustering analysis method based on data imbalance measurement
CN107679734A (en) It is a kind of to be used for the method and system without label data classification prediction
CN111796957B (en) Transaction abnormal root cause analysis method and system based on application log
CN103593470B (en) The integrated unbalanced data flow classification algorithm of a kind of two degree
CN111176953B (en) Abnormality detection and model training method, computer equipment and storage medium
CN105426441B (en) A kind of automatic preprocess method of time series
CN114201374A (en) Operation and maintenance time sequence data anomaly detection method and system based on hybrid machine learning
CN115410342B (en) Landslide hazard intelligent early warning method based on real-time monitoring of crack meter
CN113762764A (en) Automatic grading and early warning system and method for safety risk of imported food
CN115222303B (en) Industry risk data analysis method and system based on big data and storage medium
CN102945222A (en) Poor information measurement data gross error discrimination method based on Grey System Theory
CN114996318B (en) Automatic judgment method and system for processing mode of abnormal value of detection data
CN117154716B (en) Planning method and system for accessing distributed power supply into power distribution network
CN111882289B (en) Device and method for measuring and calculating project data auditing index interval
CN112434886A (en) Method for predicting client mortgage loan default probability
CN113393169B (en) Financial industry transaction system performance index analysis method based on big data technology
CN111654853B (en) Data analysis method based on user information
CN114596152A (en) Method, device and storage medium for predicting debt subject default based on unsupervised model
CN114781667A (en) Multi-equipment full life cycle PHM health management and prediction maintenance platform
CN113987240A (en) Customs inspection sample tracing method and system based on knowledge graph
CN108737399B (en) Snort alarm data aggregation method based on corner mark random reading
CN109521312B (en) Non-technical line loss detection method, device and system
CN112926991A (en) Cascade group severity grade dividing method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant