WO2017020725A1 - 一种数据检测方法及装置 - Google Patents

一种数据检测方法及装置 Download PDF

Info

Publication number
WO2017020725A1
WO2017020725A1 PCT/CN2016/090826 CN2016090826W WO2017020725A1 WO 2017020725 A1 WO2017020725 A1 WO 2017020725A1 CN 2016090826 W CN2016090826 W CN 2016090826W WO 2017020725 A1 WO2017020725 A1 WO 2017020725A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
discrete data
detection
data set
discrete
Prior art date
Application number
PCT/CN2016/090826
Other languages
English (en)
French (fr)
Inventor
陈国俊
Original Assignee
阿里巴巴集团控股有限公司
陈国俊
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司, 陈国俊 filed Critical 阿里巴巴集团控股有限公司
Publication of WO2017020725A1 publication Critical patent/WO2017020725A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP

Definitions

  • the present application relates to the field of computer technologies, and in particular, to a data detection method and apparatus.
  • Big data often includes a variety of data, one of which is called discrete data (in statistics, data can be divided into continuous data and discrete data according to whether the variable value is continuous), for example: personnel number, Gender attribute values and so on are all discrete data.
  • abnormalities may occur in discrete data (abnormal discrete data is a kind of dirty data), and the causes of abnormalities in discrete data are various, such as: some discrete data is generated by the corresponding business system, in this case Under the circumstance, if some business logic in the business system has problems, the discrete data generated by it may be abnormal.
  • the commonly used method is to manually detect the discrete data, that is, after the discrete data is generated, the generated discrete data is processed in different ways by manual intervention.
  • the division is performed, and it is manually determined whether the divided discrete data has an abnormality.
  • Discrete data is detected by manual intervention, and the efficiency and accuracy are low in the case of massive discrete data.
  • the embodiment of the present application provides a data detection method for solving the problem of low efficiency and accuracy of the discrete data detection method in the prior art.
  • the embodiment of the present application further provides a data detecting apparatus, which solves the problem of low efficiency and low accuracy of the discrete data detecting method in the prior art.
  • a receiving module configured to receive a detection request for discrete data
  • a determining module configured to determine discrete data corresponding to the detection request, and a detection manner corresponding to the discrete data
  • a detecting module configured to detect the discrete data according to the determined detection manner, and determine whether the discrete data is abnormal.
  • the embodiment of the present application provides a data detection method and apparatus, by which a discrete data set to be tested is automatically determined according to the detection request, and triggered by the detection request for the discrete data set to be tested.
  • a detection method in which a discrete data set matches, and the abnormality of the discrete data set to be detected is detected by the detection mode.
  • FIG. 1 is a data detection process provided by an embodiment of the present application
  • FIG. 2 is a schematic diagram of a system architecture of a data detection method according to an embodiment of the present application
  • FIG. 2b is a schematic structural diagram of a data detecting apparatus according to an embodiment of the present application.
  • the discrete data is stored in a data warehouse under big data, wherein the data warehouse can be regarded as a storage environment for storing large data.
  • the data warehouse can also summarize, reorganize, and integrate various types of data stored therein for use by different users.
  • the above data warehouse provides massive data support for the network service provider's service system.
  • discrete data in a service system can be stored in a data warehouse.
  • the service system can also extract the required discrete data from the data warehouse.
  • the discrete data stored in the data warehouse may be discrete data generated by the network service provider's own service system (such as various types of business data), or may be generated by different developers developed at runtime. Discrete data.
  • the data warehouse When the service system needs to use the above data in the data warehouse, the data warehouse will integrate and "produce" the corresponding data according to the instructions of the service system. Specifically, the operation of integrating the discrete data in the data warehouse can be completed by a data production system running on the data warehouse.
  • a user runs a statistical task (which can be regarded as a process) in the service system of the network service provider, and the statistical task is used to count the amount of goods purchased by the user in a specified website from a year ago until now. Then, after the statistical task is run, the production system in the data warehouse integrates and extracts the amount data belonging to the user from one year to the present according to the statistical task, and feeds back to the user.
  • a statistical task which can be regarded as a process
  • the different amount values in the above amount data are often not continuous in a certain value range.
  • the value is a discontinuous and discrete value, so the amount data is a kind of discrete data.
  • the discrete data generated by the data production system of the data warehouse can be viewed as a discrete data set containing a plurality of discrete data.
  • the amount data in the above example it can be regarded as a discrete data set, and the amount of money purchased by the user for each item is the discrete data in the discrete data set.
  • the discrete data stored in the data warehouse may be redundant or incorrect, which may result in anomalies in the discrete data collection after the data warehouse is consolidated.
  • the data processing method is provided in the embodiment of the present application. Specifically, as shown in FIG. 1 .
  • FIG. 1 is a data detection process provided by an embodiment of the present application, where the process specifically includes the following steps:
  • the data warehouse may generate a corresponding detection request for the discrete data set to trigger the receiver of the request to produce the The discrete data set is detected.
  • the operation of detecting the discrete data set in the present application may be performed by a device (such as a server) having a data detection function.
  • a device such as a server
  • a data detection function in order to detect a large number of discrete data sets, a single detection device cannot withstand a large amount of workload, so a distributed system or a server cluster can be used (forming a detection system with data detection function) ), to complete the detection of massive discrete data sets.
  • the data detection device described in the present application may be disposed in a data warehouse or in a service system in the background of the network service provider.
  • the present invention is not limited thereto.
  • the data warehouse contains a large amount of original discrete data, and the data warehouse can simultaneously integrate and generate different discrete data, in order to ensure that the different discrete data sets generated by the integration are not confusing when detecting. Therefore, the data warehouse will be integrated and generated.
  • a detection request corresponding to the discrete data is generated.
  • the receiving device of the detecting request is a detecting device.
  • the detecting device may uniquely determine the corresponding corresponding to the detecting request according to the discrete data set identifier included in the detecting request.
  • a discrete data set that is, a discrete data set corresponding to the identifier is determined.
  • the detection request may not include the discrete data set identifier, so that the receiver of the detection request may directly determine the existence of the one discrete data set as the detection request in response to the detection request.
  • the detection request may not include the discrete data set identifier, so that the receiver of the detection request may directly determine the existence of the one discrete data set as the detection request in response to the detection request.
  • the detection result may be inaccurate. Therefore, in the embodiment of the present application, Different detection methods can be used when detecting different discrete data sets. To achieve the purpose, after the foregoing step determines the discrete data set corresponding to the detection request, the detection mode corresponding to the discrete data set may be further determined according to the discrete data set.
  • the detection mode corresponding to the discrete data set may be determined according to the correspondence between the pre-established detection mode and the discrete data type; or the detection corresponding to the discrete data set may be determined according to the preset detection configuration information. the way. Of course, this does not constitute a limitation on the present application.
  • step S103 Detect the discrete data set according to the determined detection manner to determine whether the discrete data is abnormal. If yes, go to step S104; otherwise, go to step S105.
  • step S104 is performed. Conversely, for a normal discrete data set, step S105 can be performed.
  • the step S104 or the step S105 may not be performed after the determination result is obtained.
  • the abnormal discrete data set may be processed, for example, discrete data according to the abnormality. Collection, repairing the business logic of the previous business. For another example, for a discrete data set in which an abnormality occurs, a notification message may be sent to the developer user to inform the developer that an abnormality has occurred in the discrete data set, and the developer user may subsequently modify and adjust the abnormal discrete data set.
  • the manner in which discrete data sets are processed herein does not constitute a limitation of the present application.
  • the corresponding detecting device determines the discrete data set to be detected according to the detection request, and the discrete data.
  • the matching detection mode is used to detect the discrete data set to be detected. Thereby determining whether the discrete data set is abnormal.
  • the data warehouse when the data warehouse organizes the corresponding discrete data sets for the data stored therein, the generated discrete data sets are usually temporarily stored in different partitions of the data warehouse in the form of data tables.
  • the data warehouse may carry the identifier of the data table stored in the discrete data set to be detected, the storage location information in the data warehouse, and the like in the detection request. And sent to the data detection device.
  • the data collection device can determine and find the discrete data set according to the storage location information of the discrete data set carried therein, that is, in the foregoing step S102 of the present application, the detection request is determined.
  • the corresponding discrete data set specifically includes: acquiring storage location information of the discrete data set included in the detection request, and searching for the discrete data set according to the storage location information.
  • a developer user runs a query task in the data warehouse and queries the specified 5 users.
  • Loan interest rate data The corresponding data production system will query and integrate the data of the loan interest rate data of the five users (belonging to a discrete data set) according to the running query task, and collect the generated loan interest rate data into the data table.
  • the way is stored in partition A of the data warehouse. Assume that the storage location information of the loan interest rate data set is "loan interest rate table A-101".
  • the storage location information not only reflects the partition in which the loan interest rate data set is located ("A" in the character “A-101") It means that the partition A) in the data warehouse also indicates the specific name of the data table ("101" in the character “A-101” is the name of the data table).
  • this example is only for the purpose of clearly explaining the representation of the storage location information, and is not intended to limit the application.
  • the data detecting device may query the specific discrete data set according to the storage location information of the discrete data set carried in the detection request.
  • the detection manner of the discrete data set is usually configured by a corresponding developer user, that is, the detection device in the embodiment of the present application provides different The type of detection method is selected by the developer user.
  • the developer user can select multiple detection methods for a certain discrete data set, so that the detection device can be targeted Discrete data sets perform multiple tests.
  • determining the detection manner corresponding to the discrete data includes: acquiring detection configuration information that matches the discrete data set (where the detection configuration information includes detection mode information), The detection mode included in the detection configuration information is read, and the detection mode corresponding to the detection mode information is determined as a detection mode corresponding to the discrete data set.
  • the data detecting device can detect the discrete data.
  • the discrete data set is detected according to the determined detection manner, to determine whether the discrete data set is abnormal, which may include: determining the discrete data set according to the detection manner. Determining, in the specified feature of the corresponding discrete data, the specified feature of the discrete data as the sample data to be tested, and comparing whether the sample data to be tested and the standard data are compared by comparing the preset standard data with the sample data to be tested Matching, if so, determining that the discrete data set is normal; otherwise, determining that the discrete data set is abnormal.
  • the method for detecting the discrete data set in the embodiment of the present application detects the specified feature of some or all of the discrete data in the discrete data set. If an abnormality occurs in a certain specified feature of the discrete data, then Indicates that the discrete data set is abnormal.
  • corresponding discrete data refers to discrete data corresponding to the “detection method” determined by executing step S102. For example, if the detection mode includes “determining a specified feature of all discrete data in the discrete data set”, the “corresponding discrete data” refers to all discrete data in the discrete data set; The detection mode includes "determining a specified feature of discrete data in a certain subset of the discrete data set”, then the “corresponding discrete data” refers to all discrete data in the subset; and so on.
  • some discrete data in a discrete data set has different categories.
  • the number of categories can reflect whether the discrete data set is abnormal.
  • the standard data is a preset standard quantity of the category
  • the sample data to be tested and the standard data are determined. Whether the matching is specific is: determining whether the quantity of the category of the corresponding discrete data matches the preset standard quantity of the category.
  • a discrete data set is a gender data set of five users
  • the gender data of the five users is as follows in Table 1a:
  • the gender data is divided into two sets of data by gender, that is, the number of categories is 2.
  • the number of human genders is usually fixed, that is, the number of preset criteria for the category is 2 (indicating only two genders).
  • the number of categories obtained and the number of preset standards are Match. Therefore, it can be considered that the gender data of the above Table 1a is normal (in practical applications, if only one set of data is obtained after being collected by gender, the sample data to be tested can also be considered normal).
  • the number of categories after the gender collection exceeds 2, then it is proved that the sample data to be tested is abnormal, that is, the discrete data set is abnormal.
  • a discrete data set is a statistical data set of the number of occurrences of different types of test results obtained after testing for an application.
  • the discrete data set is as shown in Table 2 below:
  • test results of the application test results of multiple types of abnormal categories are allowed. In this case, if only the number of categories of test results is collected (there are four test results in Table 2, that is, the number of categories) Also 4), to determine whether the test passed, then it will affect the accuracy of the application test.
  • the standard data is a preset standard rate of change interval of the number of categories, and then the sample to be tested is determined. Whether the data matches the standard data is specifically: determining whether the rate of change of the number of categories of the corresponding discrete data falls within a preset standard rate of change interval of the number of categories.
  • test result category in Table 2 is 4 (ie, the specified feature) 4), assuming that the rate of change of acceptable abnormal results for each test result of the application is [1, 3] (that is, the preset standard rate of change interval at this time is [1, 3]), Obviously, the specified feature of the discrete data in Table 2 (4) does not fall within the preset standard rate of change interval. Therefore, the test result can be considered as not passing.
  • the specified features are all related to the number of categories of discrete data in the discrete data set.
  • the specified feature can also be related to the data value corresponding to the discrete data.
  • the standard data is a standard data value, and then it is determined whether the sample data to be tested matches the standard data, specifically: determining Whether the data value of the corresponding discrete data conforms to the standard data value.
  • the loan interest rate value corresponding to each user in Table 3 is the data value of the discrete data. It is also assumed that the user's loan interest rate is at least 1.5 (that is, the standard data value is 1.5). Obviously, the loan interest rate of User 5 is less than 1.5 of the standard data value. Therefore, the loan interest rate data in Table 3 can be considered abnormal.
  • the data values of the discrete data also have a certain range of fluctuations, then, in such a case, only the size of the data value is determined. Whether the discrete data set is abnormal or not is not accurate.
  • the standard data is a preset standard rate of change interval of the data value, and it is determined whether the sample data to be tested matches the standard data. Specifically, it is determined whether the rate of change of the data value of the discrete data falls within the preset standard rate of change interval.
  • the number of occurrences of an abnormal test result category is the data value of the discrete data. Assume that in the historical data, the average value of the number of occurrences of the test result category is “abnormal class” is 3, and the mean value of the number of occurrences of the test result category “abnormal class 2” is 2, then, in the test result of this test, the class of exceptions The rate of change is 4, and the rate of change of the abnormal class 2 is 5. Assume that the standard rate of change range is [1.5 to 3.5]. Obviously, the number of test result types obtained in this test has exceeded the range of the standard rate of change interval. This indicates that the results obtained in this test are abnormal. .
  • the foregoing method in the present application can be implemented by using the system architecture shown in FIG. 2a, and in the architecture shown in FIG. 2a, by running in the data warehouse.
  • the data production system sends a detection request to the discrete data monitoring system.
  • the monitoring trigger module in the discrete data monitoring system performs an initialization operation according to the detection request, including: verifying the information format of the detection request, and the detection request Corresponding discrete data sets, determining the corresponding detection mode and other operations.
  • the sample data to be tested is further collected by the monitoring acquisition module, and the sample data is detected by the monitoring verification data.
  • the monitoring and collecting module can collect sample data and standard data through a database (DataBase, DB), an Open Data Processing Service (ODPS) platform, or a Hive (a data warehouse tool). After the monitoring verification module detects that the discrete data set is normal, the discrete data set is stored in the DB for subsequent use.
  • DataBase DataBase
  • ODPS Open Data Processing Service
  • Hive a data warehouse tool
  • the embodiment of the present application further provides a data detecting device, as shown in FIG. 2b.
  • the data detecting device includes: a receiving module 201, a determining module 202, and a detecting module 203, where
  • the receiving module 201 is configured to receive a detection request for a discrete data set.
  • the determining module 202 is configured to determine the discrete data set corresponding to the detection request, and a detection manner corresponding to the discrete data set.
  • the detecting module 203 is configured to detect the discrete data set according to the determined detection manner to determine whether the discrete data set is abnormal. If so, the discrete data set is processed accordingly; otherwise, the discrete data set is stored.
  • the detection request carries the storage location information of the discrete data set.
  • the determining module 202 is specifically configured to acquire the discrete data set included in the detection request. The location information is stored, and the discrete data set is searched according to the storage location information.
  • the detection mode matching the discrete data can be determined, so the determining module 202 is specifically configured to acquire detection configuration information that matches the discrete data set;
  • the detection configuration information includes detection mode information, and the detection mode information included in the detection configuration information is read, and the detection mode corresponding to the detection mode information is determined.
  • the detecting module 203 is specifically configured to determine, according to the detecting manner, a specified feature of the corresponding discrete data in the discrete data set, and collect the specified feature of the discrete data as the sample data to be tested. Determining whether the sample data to be tested matches the standard data by comparing the preset standard data with the sample data to be tested, and if yes, determining that the discrete data set is normal; otherwise, determining the discrete data set abnormal.
  • the standard data is a preset standard number of categories.
  • the detecting module 203 is specifically configured to determine whether the quantity of the category matches the preset standard quantity of the category.
  • the standard data is a preset standard rate of change interval for the number of categories.
  • the detecting module 203 is specifically configured to determine whether the rate of change of the number of categories falls within a preset standard rate of change interval of the number of categories.
  • the standard data is a standard data value.
  • the detecting module 203 is specifically configured to determine whether the data value meets the standard data value.
  • the standard data is a preset standard rate of change interval of the data value.
  • the detecting module 203 is specifically configured to determine whether the rate of change of the data value falls within the preset standard rate of change interval.
  • a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
  • processors CPUs
  • input/output interfaces network interfaces
  • memory volatile and non-volatile memory
  • the memory may include non-persistent memory, random access memory (RAM), and/or non-volatile memory in a computer readable medium, such as read only memory (ROM) or flash memory.
  • RAM random access memory
  • ROM read only memory
  • Memory is an example of a computer readable medium.
  • Computer readable media includes both permanent and non-persistent, removable and non-removable media.
  • Information storage can be implemented by any method or technology.
  • the information can be computer readable instructions, data structures, modules of programs, or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory. (ROM), electrically erasable programmable read only memory (EEPROM), flash memory or other memory technology, compact disk read only memory (CD-ROM), digital versatile disk (DVD) or other optical storage, Magnetic tape cartridges, magnetic tape storage or other magnetic storage devices or any other non-transportable media can be used to store information that can be accessed by a computing device.
  • computer readable media does not include temporary storage of computer readable media, such as modulated data signals and carrier waves.
  • embodiments of the present application can be provided as a method, system, or computer program product. Therefore, the present application can employ an entirely hardware embodiment, an entirely software embodiment, or a combination of software and A form of embodiment of the hardware aspect. Moreover, the application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Probability & Statistics with Applications (AREA)
  • Fuzzy Systems (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种数据检测方法及装置,该方法包括:接收针对离散数据集合的检测请求(S101),确定所述检测请求所对应的离散数据集合,及该离散数据集合对应的检测方式(S102),根据确定出的所述检测方式,对所述离散数据集合进行检测,以判断所述离散数据是否异常(S103)。通过本方法,改变了现有技术中需要人工介入对离散数据进行检测的方式,检测过程可由检测设备自动执行,在有效提升对离散数据进行检测的便捷性的同时,也有效提升了检测的效率和准确性。

Description

一种数据检测方法及装置 技术领域
本申请涉及计算机技术领域,尤其涉及一种数据检测方法及装置。
背景技术
随着信息技术的发展,大数据已经成为信息行业发展中的一种崭新的数据资源。不同的网络服务商通过对大数据进行相应的处理(如:对大数据进行数据挖掘、数据集成等),可为用户提供丰富的各类数据服务。
大数据中往往包括类型多样的数据,其中一种重要的数据类型称为离散数据(在统计学中,数据按变量值是否连续可分为连续数据与离散数据两种),例如:人员编号、性别属性值等等均属于离散数据。
在实际应用中,离散数据中可能会出现异常(异常的离散数据是脏数据的一种),造成离散数据出现异常的原因多样,如:某些离散数据由相应的业务系统生成,在此情况下,如果业务系统中的某些业务逻辑出现问题,那么,其产生的离散数据就可能出现异常。
现有技术中,为了检测离散数据中是否存在脏数据,通常采用的方式是对离散数据进行人工检测,即:在离散数据产生后,通过人工介入的方式,针对生成的离散数据按照不同的方式进行划分,并由人工进行判断划分后的离散数据是否存在异常。
采用人工介入的方式对离散数据进行检测,在海量离散数据的情况下,效率和准确性均较低。
发明内容
本申请实施例提供一种数据检测方法,用以解决现有技术中的离散数据检测方式存在的效率和准确性较低的问题。
本申请实施例还提供一种数据检测装置,用以解决现有技术中的离散数据检测方式存在的效率和准确性较低的问题。
本申请实施例提供的一种数据检测方法,包括:
接收针对离散数据的检测请求;
确定所述检测请求所对应的离散数据,及该离散数据对应的检测方式;
根据确定出的所述检测方式,对所述离散数据进行检测,以判断所述离散数据是否异常。
本申请实施例提供的一种数据检测装置,包括:
接收模块,用于接收针对离散数据的检测请求;
确定模块,用于确定所述检测请求所对应的离散数据,及该离散数据对应的检测方式;
检测模块,用于根据确定出的所述检测方式,以对所述离散数据进行检测,判断所述离散数据是否异常。
本申请实施例提供一种数据检测方法及装置,通过该方法,在针对待测试的离散数据集合的检测请求的触发下,可以实现自动根据该检测请求确定出待测试的离散数据集合以及与该离散数据集合相匹配的检测方式,并通过该检测方式,对该待检测的离散数据集合是否异常进行检测。通过本方法,改变了现有技术中需要人工介入对离散数据进行检测的方式,检测过程可由检测设备自动执行,在有效提升对离散数据进行检测的便捷性的同时,也有效提升了检测的效率和准确性。
附图说明
此处所说明的附图用来提供对本申请的进一步理解,构成本申请的一部分,本申请的示意性实施例及其说明用于解释本申请,并不构成对本申请的不当限定。在附图中:
图1为本申请实施例提供的数据检测过程;
图2a为本申请实施例提供的数据检测方法在实际应用中的系统架构示意图;
图2b为本申请实施例提供的数据检测装置结构示意图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚,下面将结合本申请具体实施例及相应的附图对本申请技术方案进行清楚、完整地描述。显然,所描述的实施例仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
在本申请实施例对应的实际应用场景下,离散数据存储在大数据下的数据仓库中,其中,数据仓库可以看作是一种针对大数据进行存储的存储环境。该数据仓库还可以对其中存储的各类数据进行归纳、重组、整合等操作,供不同的使用者使用。
需要说明的是,上述的数据仓库为网络服务商的服务系统提供了海量的数据支持。比如,服务系统中的离散数据均可以存储在数据仓库中,相应地,服务系统也可以从数据仓库中提取需要的离散数据。其中,存储在数据仓库中的离散数据可以是由网络服务商自身的服务系统运行所产生的离散数据(如:各类业务数据),也可以是不同的开发者所开发的应用在运行时生成的离散数据。
当服务系统需要使用数据仓库中的上述数据时,数据仓库就会根据该服务系统的指示,将对应的数据进行整合而“生产”出来。具体地,对数据仓库中的离散数据进行整合生成的操作,可以由运行在该数据仓库上的数据生产系统完成。
例如:某一用户在网络服务商的服务系统中运行统计任务(可看作一种进程),该统计任务用于统计该用户自一年以前至今,在指定网站中所购买商品使用的金额。那么,该统计任务运行后,数据仓库中的生产系统就会根据该统计任务,集成并提取属于该用户的自一年以前至今的金额数据,反馈给用户。
显然,上述的金额数据中的不同额度值往往不是在某一数值区间内的连续 取值,而是间断且离散的取值,所以,金额数据就是一种离散数据。在本申请中,由数据仓库的数据生产系统所生成的离散数据可看作是一个离散数据集合,该离散数据集合中包含了多个离散数据。正如上例中统计的金额数据,就可以看作是一个离散数据集合,而其中用户购买每一种商品的金额,就是该离散数据集合中的离散数据。
但是,数据仓库中所存储的离散数据可能存在冗余或错误,这样就会导致数据仓库整合后的离散数据集合存在异常。为了避免整合后的离散数据集合中的异常情况对后续的处理过程造成影响,因此在本申请实施例中,提供了一种数据检测方法,具体而言,如图1所示。
图1为本申请实施例提供的数据检测过程,该过程具体包括以下步骤:
S101,接收针对离散数据集合的检测请求。
当数据仓库生成了相应的离散数据集合之后,为了保证该离散数据集合的准确性,故数据仓库可以针对所述离散数据集合生成相应的检测请求,以触发该请求的接收方对生产出的所述离散数据集合进行检测。
当然,作为本申请实施例中的一种可选方式,本申请中对离散数据集合进行检测的操作,可以由具有数据检测功能的设备(如:服务器)完成。在实际应用场景中,为了对海量的离散数据集合进行检测,单一一台检测设备无法承受大量的工作负荷,所以,可以采用分布式系统的方式或者服务器集群(形成具有数据检测功能的检测系统),来完成对海量离散数据集合的检测操作。
本申请中所述的数据检测设备,可以设置在数据仓库中,也可以设置在网络服务商后台的服务系统中,当然,这里并不构成对本申请的限定。
S102,确定所述检测请求所对应的离散数据集合,及该离散数据集合对应的检测方式。
需要说明的是,在实际应用中,数据仓库中包含大量原始的离散数据,并且,数据仓库可以同时整合生成不同的离散数据,为了保证整合生成的不同离散数据集合在进行检测时不发生混乱,所以,数据仓库会在已经整合生成的离 散数据集合的基础上,生成与该离散数据对应的检测请求。从而,以所述检测请求的接收方为某检测设备为例,当该检测设备接收到了检测请求后,可以根据该检测请求中包含的离散数据集合标识,唯一确定出与该检测请求相对应的离散数据集合,即确定该标识所对应的离散数据集合。
当仅存在一个离散数据集合时,该检测请求中也可以不包含离散数据集合标识,从而所述检测请求的接收方可以直接响应于该检测请求,确定存在的该一个离散数据集合为该检测请求所对应的、默认的离散数据集合。
由于不同的离散数据集合中离散数据的类型、数据构成均不相同,如果针对不同的离散数据集合,仅采用单一的检测方式,可能会造成检测结果不准确的情况,所以,在本申请实施例中,对不同的离散数据集合进行检测时,可以采用不同的检测方式。为达到该目的,经过上述步骤确定出检测请求所对应的离散数据集合之后,可以根据所述离散数据集合进一步确定该离散数据集合对应的检测方式。
本申请中,既可以根据预先建立的检测方式与离散数据类型之间的对应关系,来确定离散数据集合对应的检测方式;也可以根据预设的检测配置信息,来确定离散数据集合对应的检测方式。当然,这里并不构成对本申请的限定。
S103,根据确定出的所述检测方式,对所述离散数据集合进行检测,以判断所述离散数据是否异常。若是,则执行步骤S104;否则,则执行步骤S105。
由于出现了异常的离散数据集合,将影响后续对数据进行处理时的准确性,所以,当检测到离散数据集合出现了异常后,就将执行步骤S104。反之,对于正常的离散数据集合,就可以执行步骤S105。
本申请实施例中,如不考虑根据判断结果对离散数据集合进行处理,则也可以在得到判断结果后,不执行步骤S104或步骤S105。
S104,对所述离散数据集合进行相应处理。
在本申请实施例中,为了最大程度降低异常的离散数据集合对后续处理过程的影响,可以对异常的离散数据集合进行处理,例如:根据异常的离散数据 集合,修复前续的业务运行逻辑。又例如:对于出现异常的离散数据集合,可以向开发者用户发送通知消息,以告知开发者用户该离散数据集合中出现了异常,后续可由该开发者用户对异常的离散数据集合进行修正调整。这里对离散数据集合进行处理的方式并不构成对本申请的限定。
S105,将所述离散数据集合进行存储。
对于没有出现异常的离散数据集合,将不会对后续的数据处理过程造成影响,故这些离散数据集合就可以确定是正常的,那么,检测设备就会将正常的离散数据集合进行存储,以便后续对这些正常的离散数据集合进行使用。
通过上述步骤,本申请实施例中,相应的检测设备在接收到了针对待检测的离散数据集合的检测请求后,就会根据该检测请求,确定出待检测的离散数据集合,以及与该离散数据相匹配的检测方式,再通过该检测方式,对该待检测的离散数据集合进行检测。从而确定出离散数据集合是否出现异常。通过本方法,改变了现有技术中需要人工介入对离散数据进行检测的方式,在有效提升对离散数据进行检测的便捷性的同时,也有效提升了检测的效率和准确性。
在实际应用中,当数据仓库针对其中存储的数据进行整理生成了相应的离散数据集合后,通常会将生成的离散数据集合以数据表的形式,临时存储在数据仓库的不同分区中。为了能够保证数据检测设备准确地查找到生成的离散数据集合,数据仓库可以将待检测的离散数据集合所存储的数据表的标识、在数据仓库中的分区等存储位置信息,携带在检测请求中,一并发送至数据检测设备。
因此,当数据检测设备接收到了检测请求后,就可以根据其中携带的离散数据集合的存储位置信息,确定并查找到离散数据集合,也即,本申请上述步骤S102中,确定所述检测请求所对应的离散数据集合,具体包括:获取所述检测请求中包含的所述离散数据集合的存储位置信息,根据所述存储位置信息,查找所述离散数据集合。
例如:某开发者用户在数据仓库中运行某查询任务,查询指定的5名用户 的贷款利率数据。相应的数据生产系统就会根据运行的查询任务,在数据仓库中查询并整合生成这5名用户的贷款利率数据集合(属于一种离散数据集合),并将生成的贷款利率数据集合以数据表的方式存储在了数据仓库的分区A中。假设,该贷款利率数据集合的存储位置信息为“贷款利率表A-101”,显然,该存储位置信息中不仅反映了,贷款利率数据集合所在的分区(字符“A-101”中“A”就表示了数据仓库中的分区A)、也标示出了数据表的具体名称(字符“A-101”中“101”就是数据表的名称)。当然,该示例只是为了清楚说明存储位置信息的表现形式,并不作为对本申请的限定。
当数据检测设备接收到了检测请求后,就可以根据检测请求中所携带的离散数据集合的上述存储位置信息,查询到具体的离散数据集合。
而在实际应用中,不同的离散数据集合通常匹配有不同的检测方式,那么,在查询到了具体的离散数据集合后,便可以确定该离散数据集合所匹配的检测方式。需要说明的是,在本申请实施例中的一种可选方式下,针对离散数据集合的检测方式,通常由相应的开发者用户进行配置,也即,本申请实施例中的检测设备提供不同类型的检测方式,由开发者用户进行选择,当然,为了提升对离散数据集合进行检测的准确性,开发者用户可以针对某一离散数据集合,选择多种检测方式,从而,检测设备就可以针对离散数据集合进行多项检测。
开发者用户所选定的检测方式,会以检测方式信息的形式,保存在相应的配置信息中。故在本申请实施例中,确定所述离散数据对应的检测方式,具体包括:获取与所述离散数据集合相匹配的检测配置信息(其中,所述检测配置信息中包含有检测方式信息),读取所述检测配置信息中包含的检测方式,确定所述检测方式信息对应的检测方式,作为所述离散数据集合对应的检测方式。
当数据检测设备确定出需要检测的离散数据,以及与该离散数据相匹配的检测方式后,数据检测设备就可以对离散数据进行检测。在本申请实施例中根据确定出的所述检测方式,对所述离散数据集合进行检测,以判断所述离散数据集合是否异常,具体可以包括:根据所述检测方式,确定所述离散数据集合 中相应的离散数据的指定特征,采集所述离散数据的指定特征作为待测样本数据,通过比对预设的标准数据与所述待测样本数据,判断所述待测样本数据与标准数据是否匹配,若是,则判定所述离散数据集合正常;否则,则判定所述离散数据集合异常。
换言之,本申请实施例中对离散数据集合的检测方式,就是对离散数据集合中,部分或全部的离散数据的指定特征进行检测,如果离散数据的某种指定特征出现了异常,那么,也就表示该离散数据集合是异常的。
需要说明的是,“相应的离散数据”,是指与通过执行步骤S102确定出的“检测方式”对应的离散数据。举例而言,若该检测方式包括“确定所述离散数据集合中所有离散数据的指定特征”,则所述的“相应的离散数据”,是指离散数据集合中的所有离散数据;而若该检测方式包括“确定所述离散数据集合的某个子集合中的离散数据的指定特征”,则所述的“相应的离散数据”,是指该子集合中的所有离散数据;以此类推。
为了清楚的阐述本申请中的检测过程,下面将以不同的检测方式为例进行详细说明。
一、对相应的离散数据的类别的数量进行检测
实际应用中,离散数据集合中的某些离散数据,拥有不同的类别,在某些情况下,类别的多少就能反映出该离散数据集合是否异常。
也即,在该场景下,当所述指定特征为所述相应的离散数据的类别的数量时,所述标准数据为类别的预设标准数量,那么,判断所述待测样本数据与标准数据是否匹配,具体为:判断所述相应的离散数据的类别的数量是否匹配所述类别的预设标准数量。
例如:假设,某离散数据集合是5名用户的性别数据集合,这5名用户的性别数据具体如下表1a所示:
用户 性别
用户1 1
用户2 1
用户3 2
用户4 1
用户5 2
表1a
针对上述表1a中的性别数据,假设相应的指定特征为性别数据的类别的数量,那么,对上述表1a所示的性别数据进行采集后,可得到表1b所示的两组待测样本数据:
性别为1 性别为2
用户1 用户3
用户2 用户5
用户4  
表1b
在表1b中,性别数据按照性别划分为两组数据,也即,类别的数量为2。显然,人类性别的数量通常是固定的,即类别的预设标准数量就为2(表示只有两种性别),那么,表1b中以性别进行采集后,得到的类别的数量与预设标准数量相匹配。所以,可以认为上述表1a的性别数据是正常的(在实际应用中,如果以性别进行采集后只得到一组数据,那么,该待测样本数据也可以认为是正常的)。当然,如果对性别进行采集后的类别的数量超过2,那么,就证明待测样本数据出现了异常,也即,离散数据集合是异常的。
一、对相应的离散数据的类别数量的变化率进行检测
实际应用中的某些情况下,只通过相应的离散数据的类别的数量多少来确定离散数据集合是否异常并不一定准确。例如:假设离散数据集合是针对某应用程序进行测试后,所得到的不同类别的测试结果出现次数的统计数据集合,该离散数据集合如下表2所示:
测试结果 次数
异常九类 1
异常三类 1
异常五类 2
异常四类 1
表2
对应用程序的测试结果中,允许存在多类异常类别的测试结果,在这样的情况下,如果只采集测试结果的类别的数量(表2中有4种测试结果,也即,类别的个数也为4),来判断测试是否通过,那么,将影响对应用程序测试的准确性。
所以,在该场景下,当所述指定特征为所述相应的离散数据的类别数量的变化率时,所述标准数据为类别数量的预设标准变化率区间,那么,判断所述待测样本数据与标准数据是否匹配,具体为:判断相应的离散数据的类别数量的变化率是否落入所述类别数量的预设标准变化率区间。
延续表2对应的示例,假设上一次针对该应用程序的测试后,没有出现异常结果(即,类别数量为0),而本次测试后,表2中的测试结果类别为4(即指定特征为4),假设针对该应用程序的各次测试结果,可接受的异常结果的变化率为[1,3](也即,此时的预设标准变化率区间为[1,3]),显然,表2中离散数据的指定特征(为4)并未落入到预设标准变化率区间中,因此,可以认为此次测试结果不通过。
在上述两种方式中,指定特征均是与离散数据集合中离散数据的类别数量相关。除此之外,指定特征还可以与离散数据对应的数据值相关。
三、对相应的离散数据的数据值进行检测
在该场景下,当所述待测样本数据为相应的离散数据的数据值时,所述标准数据为标准数据值,那么,判断所述待测样本数据与标准数据是否匹配,具体为:判断相应的离散数据的数据值是否符合所述标准数据值。
例如:假设离散数据集合为5名用户的贷款利率数据,如下表3所示:
用户 贷款利率
用户1 1.9
用户2 1.7
用户3 1.8
用户4 1.7
用户5 0.9
表3
其中,表3中各用户对应的贷款利率值就是离散数据的数据值。并假设用户的贷款利率至少为1.5(也即,标准数据值为1.5),显然,用户5的贷款利率0.9低于该标准数据值1.5,因此,可以认为表3中的贷款利率数据异常。
四、对相应的离散数据的数据值的变化率进行检测
与上述基于类别数量的变化率的检测方式类似,实际应用中的某些情况下,离散数据的数据值也存在一定范围的波动,那么,在这样的情况下,只通过数据值的大小来确定离散数据集合是否异常并不一定准确。
所以,在该场景下,当所述指定特征为离散数据的数据值的变化率时,所述标准数据为数据值的预设标准变化率区间,判断所述待测样本数据与标准数据是否匹配,具体为:判断所述离散数据的数据值的变化率是否落入所述预设的标准变化率区间。
例如:假设离散数据集合是针对某应用程序进行测试后,所得到的不同类别测试结果出现的次数,该离散数据集合如下表4a所示:
测试结果 次数
异常一类 12
异常二类 10
表4a
表4a中,异常的测试结果类别所出现的次数,就是离散数据的数据值, 假设历史数据中,测试结果类别为“异常一类”出现次数的均值为3,测试结果类别为“异常二类”出现次数的均值为2,那么,本次测试的测试结果中,异常一类的变化率为4、异常二类的变化率为5。假设,标准变化率区间为[1.5~3.5],显然,本次测试所得到的测试结果类型的次数,已经超出了标准变化率区间的范围,这就表明,本次测试得到的结果出现了异常。
上述内容仅作为本申请实施例中的可选方式,并不构成对本申请的限定。显然,通过上述的检测方式,可以针对离散数据集合进行不同方面的检测,尤其在实际应用中,用户针对待测试的离散数据集合,可以配置多种检测方式,从而有效地提升了对离散数据集合检测的准确性,而且,用户所配置的检测方式将由相应的检测设备自动执行,不需要检测过程中的人工介入,有效提升了对离散数据集合进行检测的效率。
以上为本申请实施例提供的数据检测方法,在实际应用中,本申请中的上述方法,可以通过如图2a所示的系统架构实现,在图2a所示的架构中,由运行在数据仓库上的数据生产系统向离散数据监控系统发送检测请求,相应地,离散数据监控系统中的监控触发模块会根据该检测请求进行初始化操作,包括:校验该检测请求的信息格式、与该检测请求对应的离散数据集合、确定相应的检测方式等操作。完成了初始化操作后,就会由监控采集模块进一步采集待测样本数据,再由监控校验数据完成对样本数据的检测。
其中,监控采集模块可以通过数据库(DataBase,DB)、开放数据处理服务(Open Data Processing Service,ODPS)平台或Hive(一种数据仓库工具),采集到样本数据和标准数据。监控校验模块在检测到离散数据集合为正常后,会将离散数据集合存储在DB中,以便后续使用。
基于同样的思路,本申请实施例还提供一种数据检测装置,如图2b所示。
在图2b中,所述数据检测装置包括:接收模块201、确定模块202以及检测模块203,其中,
所述接收模块201,用于接收针对离散数据集合的检测请求。
所述确定模块202,用于确定所述检测请求所对应的所述离散数据集合,及该离散数据集合对应的检测方式。
所述检测模块203,用于根据确定出的所述检测方式,对所述离散数据集合进行检测,以判断所述离散数据集合是否异常。若是,则对所述离散数据集合进行相应处理;否则,则将所述离散数据集合进行存储。
在本申请实施例中,所述检测请求中携带有所述离散数据集合的存储位置信息,此时,所述确定模块202,具体用于获取所述检测请求中包含的所述离散数据集合的存储位置信息,根据所述存储位置信息,查找所述离散数据集合。
在查找到了所述离散数据后,便可以确定与该离散数据相匹配的检测方式,故所述确定模块202,具体用于获取与所述离散数据集合相匹配的检测配置信息;其中,所述检测配置信息中包含有检测方式信息;读取所述检测配置信息中包含的检测方式信息,确定所述检测方式信息对应的检测方式。
在本申请实施例中,所述检测模块203,具体用于根据所述检测方式,确定所述离散数据集合中相应的离散数据的指定特征,采集所述离散数据的指定特征作为待测样本数据,通过比对预设的标准数据与所述待测样本数据,判断所述待测样本数据与标准数据是否匹配,若是,则判定所述离散数据集合正常;否则,则判定所述离散数据集合异常。
在一种实施方式中,当所述指定特征为类别的数量时,所述标准数据为类别的预设标准数量。此时,所述检测模块203,具体用于判断所述类别的数量是否匹配所述类别的预设标准数量。
在一种实施方式中,当所述指定特征为类别数量的变化率时,所述标准数据为类别数量的预设标准变化率区间。此时,所述检测模块203,具体用于判断所述类别数量的变化率是否落入所述类别数量的预设标准变化率区间。
在一种实施方式中,当所述指定特征为所述数据值时,所述标准数据为标准数据值。此时,所述检测模块203,具体用于判断所述数据值是否符合所述标准数据值。
在一种实施方式中,当所述指定特征为数据值的变化率时,所述标准数据为数据值的预设标准变化率区间。此时,所述检测模块203,具体用于判断所述数据值的变化率是否落入所述预设的标准变化率区间。
在一个典型的配置中,计算设备包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。
内存可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。
计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitory media),如调制的数据信号和载波。
还需要说明的是,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、商品或者设备中还存在另外的相同要素。
本领域技术人员应明白,本申请的实施例可提供为方法、系统或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例或结合软件和 硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
以上所述仅为本申请的实施例而已,并不用于限制本申请。对于本领域技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原理之内所作的任何修改、等同替换、改进等,均应包含在本申请的权利要求范围之内。

Claims (16)

  1. 一种数据检测方法,其特征在于,包括:
    接收针对离散数据集合的检测请求;
    确定所述检测请求所对应的所述离散数据集合,及所述离散数据集合对应的检测方式;
    根据确定出的所述检测方式,对所述离散数据集合进行检测,以判断所述离散数据集合是否异常。
  2. 如权利要求1所述的方法,其特征在于,所述检测请求中携带有所述离散数据集合的存储位置信息;
    确定所述检测请求所对应的所述离散数据集合,具体包括:
    获取所述检测请求中包含的所述离散数据集合的存储位置信息;
    根据所述存储位置信息,查找所述离散数据集合。
  3. 如权利要求1所述的方法,其特征在于,确定所述离散数据集合对应的检测方式,具体包括:
    获取与所述离散数据集合相匹配的检测配置信息;其中,所述检测配置信息中包含有检测方式信息;
    读取所述检测配置信息中包含的检测方式信息;
    确定所述检测方式信息对应的检测方式。
  4. 如权利要求1所述的方法,其特征在于,根据确定出的所述检测方式,对所述离散数据集合进行检测,以判断所述离散数据集合是否异常,具体包括:
    根据所述检测方式,确定所述离散数据集合中相应的离散数据的指定特征;
    采集所述离散数据的指定特征作为待测样本数据;
    通过比对预设的标准数据与所述待测样本数据,判断所述待测样本数据与标准数据是否匹配;
    若是,则判定所述离散数据集合正常;
    否则,则判定所述离散数据集合异常。
  5. 如权利要求4所述的方法,其特征在于,当所述指定特征为类别的数量时,所述标准数据为类别的预设标准数量;
    判断所述待测样本数据与标准数据是否匹配,具体包括:
    判断所述类别的数量是否匹配所述类别的预设标准数量。
  6. 如权利要求4所述的方法,其特征在于,当所述指定特征为类别数量的变化率时,所述标准数据为类别数量的预设标准变化率区间;
    判断所述待测样本数据与标准数据是否匹配,具体包括:
    判断所述类别数量的变化率是否落入所述类别数量的预设标准变化率区间。
  7. 如权利要求4所述的方法,其特征在于,当所述指定特征为数据值时,所述标准数据为标准数据值;
    判断所述待测样本数据与标准数据是否匹配,具体包括:
    判断所述数据值是否符合所述标准数据值。
  8. 如权利要求4所述的方法,其特征在于,当所述指定特征为数据值的变化率时,所述标准数据为数据值的预设标准变化率区间;
    判断所述待测样本数据与标准数据是否匹配,具体包括:
    判断所述数据值的变化率是否落入所述预设的标准变化率区间。
  9. 一种数据检测装置,其特征在于,包括:
    接收模块,用于接收针对离散数据集合的检测请求;
    确定模块,用于确定所述检测请求所对应的所述离散数据集合,及该离散数据集合对应的检测方式;
    检测模块,用于根据确定出的所述检测方式,对所述离散数据集合进行检测,判断所述离散数据集合是否异常。
  10. 如权利要求9所述的装置,其特征在于,所述检测请求中携带有所述离散数据集合的存储位置信息;
    所述确定模块,具体用于获取所述检测请求中包含的所述离散数据集合的 存储位置信息,根据所述存储位置信息,查找所述离散数据集合。
  11. 如权利要求9所述的装置,其特征在于,所述确定模块,具体用于获取与所述离散数据集合相匹配的检测配置信息;其中,所述检测配置信息中包含有检测方式信息;读取所述检测配置信息中包含的检测方式信息,确定所述检测方式信息对应的检测方式。
  12. 如权利要求9所述的装置,其特征在于,所述检测模块,具体用于根据所述检测方式,确定所述离散数据集合中相应的离散数据的指定特征,采集所述离散数据的指定特征作为待测样本数据,通过比对预设的标准数据与所述待测样本数据,判断所述待测样本数据与标准数据是否匹配,若是,则判定所述离散数据集合正常;否则,则判定所述离散数据集合异常。
  13. 如权利要求12所述的装置,其特征在于,当所述指定特征为类别的数量时,所述标准数据为类别的预设标准数量;
    所述检测模块,具体用于判断所述类别的数量是否匹配所述类别的预设标准数量。
  14. 如权利要求12所述的装置,其特征在于,当所述指定特征为类别数量的变化率时,所述标准数据为类别数量的预设标准变化率区间;
    所述检测模块,具体用于判断所述类别数量的变化率是否落入所述类别数量的预设标准变化率区间。
  15. 如权利要求12所述的装置,其特征在于,当所述指定特征为数据值时,所述标准数据为标准数据值;
    所述检测模块,具体用于判断所述数据值是否符合所述标准数据值。
  16. 如权利要求12所述的装置,其特征在于,当所述指定特征为所述数据值的变化率时,所述标准数据为数据值的预设标准变化率区间;
    所述检测模块,具体用于判断所述数据值的变化率是否落入所述预设的标准变化率区间。
PCT/CN2016/090826 2015-08-05 2016-07-21 一种数据检测方法及装置 WO2017020725A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510474635.XA CN106445938B (zh) 2015-08-05 2015-08-05 一种数据检测方法及装置
CN201510474635.X 2015-08-05

Publications (1)

Publication Number Publication Date
WO2017020725A1 true WO2017020725A1 (zh) 2017-02-09

Family

ID=57943765

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/090826 WO2017020725A1 (zh) 2015-08-05 2016-07-21 一种数据检测方法及装置

Country Status (2)

Country Link
CN (1) CN106445938B (zh)
WO (1) WO2017020725A1 (zh)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3663457A1 (de) 2018-12-06 2020-06-10 BSH Hausgeräte GmbH Wasserführendes haushaltsgerät
CN111541575A (zh) * 2020-04-30 2020-08-14 重庆富民银行股份有限公司 一种用于闭源网络设备的自动化巡检方法及系统
EP3954820A1 (de) 2020-08-14 2022-02-16 BSH Hausgeräte GmbH Fluidführendes haushaltsgerät
DE102022207949A1 (de) 2021-08-04 2023-02-09 BSH Hausgeräte GmbH Fluidführendes Haushaltsgerät
CN117236694A (zh) * 2023-09-26 2023-12-15 国家市场监督管理总局国家标准技术审评中心 一种基于大数据的国内外标准指标的比对方法及系统

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107682349A (zh) * 2017-10-19 2018-02-09 广东小天才科技有限公司 一种检测干扰数据的方法及设备
CN111427928A (zh) * 2020-03-26 2020-07-17 京东数字科技控股有限公司 一种数据质量检测方法及装置

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101290611A (zh) * 2007-04-20 2008-10-22 中芯国际集成电路制造(上海)有限公司 数据中异常点的检测方法和装置
CN101571891A (zh) * 2008-04-30 2009-11-04 中芯国际集成电路制造(北京)有限公司 异常数据检验方法和装置
US20110032098A1 (en) * 2009-08-06 2011-02-10 Cheng-Yun Yang Portable electronic apparatus with a user physical status sensing and warning circuit
CN102319060A (zh) * 2011-09-19 2012-01-18 广州天绎智能科技有限公司 体温异常检测方法与检测系统
CN103020166A (zh) * 2012-11-26 2013-04-03 宁波电业局 一种电力实时数据异常检测方法
CN103684910A (zh) * 2013-12-02 2014-03-26 北京工业大学 一种基于工业控制系统网络流量的异常检测方法

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7065534B2 (en) * 2004-06-23 2006-06-20 Microsoft Corporation Anomaly detection in data perspectives
CN103076104B (zh) * 2012-11-15 2014-08-13 江苏省电力公司淮安供电公司 电力电缆温度在线监测数据的处理方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101290611A (zh) * 2007-04-20 2008-10-22 中芯国际集成电路制造(上海)有限公司 数据中异常点的检测方法和装置
CN101571891A (zh) * 2008-04-30 2009-11-04 中芯国际集成电路制造(北京)有限公司 异常数据检验方法和装置
US20110032098A1 (en) * 2009-08-06 2011-02-10 Cheng-Yun Yang Portable electronic apparatus with a user physical status sensing and warning circuit
CN102319060A (zh) * 2011-09-19 2012-01-18 广州天绎智能科技有限公司 体温异常检测方法与检测系统
CN103020166A (zh) * 2012-11-26 2013-04-03 宁波电业局 一种电力实时数据异常检测方法
CN103684910A (zh) * 2013-12-02 2014-03-26 北京工业大学 一种基于工业控制系统网络流量的异常检测方法

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3663457A1 (de) 2018-12-06 2020-06-10 BSH Hausgeräte GmbH Wasserführendes haushaltsgerät
DE102018221092A1 (de) 2018-12-06 2020-06-10 BSH Hausgeräte GmbH Wasserführendes Haushaltsgerät
CN111541575A (zh) * 2020-04-30 2020-08-14 重庆富民银行股份有限公司 一种用于闭源网络设备的自动化巡检方法及系统
CN111541575B (zh) * 2020-04-30 2023-06-09 重庆富民银行股份有限公司 一种用于闭源网络设备的自动化巡检方法及系统
EP3954820A1 (de) 2020-08-14 2022-02-16 BSH Hausgeräte GmbH Fluidführendes haushaltsgerät
DE102020210389A1 (de) 2020-08-14 2022-02-17 BSH Hausgeräte GmbH Fluidführendes Haushaltsgerät
DE102022207949A1 (de) 2021-08-04 2023-02-09 BSH Hausgeräte GmbH Fluidführendes Haushaltsgerät
CN117236694A (zh) * 2023-09-26 2023-12-15 国家市场监督管理总局国家标准技术审评中心 一种基于大数据的国内外标准指标的比对方法及系统

Also Published As

Publication number Publication date
CN106445938B (zh) 2021-03-23
CN106445938A (zh) 2017-02-22

Similar Documents

Publication Publication Date Title
WO2017020725A1 (zh) 一种数据检测方法及装置
US10664837B2 (en) Method and system for real-time, load-driven multidimensional and hierarchical classification of monitored transaction executions for visualization and analysis tasks like statistical anomaly detection
US20180365085A1 (en) Method and apparatus for monitoring client applications
US10671627B2 (en) Processing a data set
JP6434154B2 (ja) トランザクションアクセスパターンに基づいた結合関係の識別
CN107168977B (zh) 一种数据查询的优化方法及装置
CN109934268B (zh) 异常交易检测方法及系统
CN103902442A (zh) 一种云软件健康度评测方法及系统
CN107832446B (zh) 一种配置项信息的搜索方法及计算设备
WO2017118318A1 (zh) 一种数据存储与业务处理的方法及装置
WO2017092599A1 (zh) 一种库存异常数据的检测方法、装置及电子设备
CN111241122A (zh) 任务监测方法、装置、电子设备和可读存储介质
CN105184156A (zh) 一种安全威胁管理方法和系统
US9727666B2 (en) Data store query
CN116521092B (zh) 一种工业设备数据的存储方法和装置
US10223421B2 (en) Virtual Aggregation
CN107515864B (zh) 监控工作流的方法及设备
CN112526905A (zh) 一种针对指标异常的处理方法及系统
US11797366B1 (en) Identifying a root cause of an error
CN114070737B (zh) 设备的配置数据的检查方法、装置、存储介质及电子设备
CN115374109A (zh) 数据访问方法、装置、计算设备和系统
CN114860432A (zh) 一种内存故障的信息确定方法及装置
CN113312197A (zh) 批量故障的确定方法和装置,计算机存储介质和电子设备
CN108156197B (zh) 一种用户分布信息的获取方法及装置
WO2014206121A1 (en) An alteration processing method,apparatus and system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16832213

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16832213

Country of ref document: EP

Kind code of ref document: A1