WO2015103880A1 - 一种对批量数据进行自动修复的方法及系统 - Google Patents

一种对批量数据进行自动修复的方法及系统 Download PDF

Info

Publication number
WO2015103880A1
WO2015103880A1 PCT/CN2014/084625 CN2014084625W WO2015103880A1 WO 2015103880 A1 WO2015103880 A1 WO 2015103880A1 CN 2014084625 W CN2014084625 W CN 2014084625W WO 2015103880 A1 WO2015103880 A1 WO 2015103880A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
rules
rule
correct
attribute
Prior art date
Application number
PCT/CN2014/084625
Other languages
English (en)
French (fr)
Inventor
卢长烛
贾西贝
Original Assignee
深圳市华傲数据技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市华傲数据技术有限公司 filed Critical 深圳市华傲数据技术有限公司
Publication of WO2015103880A1 publication Critical patent/WO2015103880A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/221Parsing markup language streams
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring

Definitions

  • the present invention relates to the field of data repair, and in particular, to a method and system for automatically repairing bulk data.
  • the main data systems of these group companies have a unified and strict data management mechanism.
  • the head office will carefully clean the main data.
  • Maintenance however, some subsidiaries or branches do not have such a perfect data management system.
  • Each subsidiary or branch often has its own input method when entering business data to be processed, making it possible for different subsidiaries or branches.
  • the data format cannot be consistent.
  • Even because different subsidiaries or branches will eventually aggregate their business data into the main data, errors will be introduced into the main database during the data entry process.
  • data quality errors may be caused by inconsistent data standards or human factors, which affects the quality of the company's overall data. Therefore, it is necessary to monitor and repair data when batch data is entered. Repair method.
  • the present invention has been made to solve one of the above drawbacks.
  • the present invention provides a method and system for automatically repairing batch data, by using rules to filter bulk data and interacting with users to determine correct data, thereby performing rule review and repair on other uncertain data, thereby performing batch data. Automatic repair ensures data correctness and data quality.
  • an embodiment of the present invention provides a method for automatically repairing batch data, the method comprising: detecting a current batch of data to be entered, and triggering an automatic repairing step; and adopting rule filtering and traversing the steps of the data to be entered that may be incorrect one by one Identifying the correct data steps by interacting with the user; reviewing other indeterminate data based on the correct data and rules described above, and marking the erroneous data; the subsystem updates the erroneous data based on the reference data, and Enter the updated data into the database step of the subsystem.
  • the rule filtering uses the correct data.
  • rule filtering to obtain correct data is determined by conflicts between rules.
  • the data that may be incorrectly entered is filtered according to rules to obtain conflicting data between rules.
  • the user interaction determines that the correct data is specified from data that may be incorrectly entered.
  • the reviewing other uncertain data according to the above correct data and rules specifically includes the following steps:
  • the firewall system acquires a list of known rules and determined data attribute sets; determines their dependency steps according to the logical order of the rules; determines a rule set VSet step that can be directly applied from the above rules; traverses the rule set VSet, if If the rule set VSet is empty, the confirmed attribute set is output. Otherwise, the determined attribute corresponding to the rule in the rule set VSet is reviewed and repaired, and other rules introduced by the rule are found through the dependency relationship, and the determined attribute corresponding to the rule is put into Steps to the rule set VSet.
  • said determining, according to the logical order of the rules, their dependencies are performed according to attribute values of the rules.
  • the updating the erroneous data comprises updating a corresponding attribute of the erroneous data according to a correct value to which the data in the reference data belongs.
  • the invention automatically filters the batch data by using the rules to filter the batch data and interact with the user to determine the correct data to perform regular rule review and repair on the other uncertain data, thereby ensuring the correct data and the data quality.
  • Another embodiment of the present invention provides a system for automatically repairing batch data, the system comprising: a data detecting unit, configured to detect a current batch of data to be entered, and trigger an automatic repair; and a data filtering unit for filtering by rules Tracing the data that may be incorrectly entered one by one; the data interaction unit is used to determine the correct data by interacting with the user; the data review unit is for reviewing other uncertain data according to the above correct data and rules, and error The data is marked; the data update unit is configured to update the error data according to the reference data by the subsystem, and record the updated data into the database of the subsystem.
  • the data filtering unit uses rule filtering to obtain correct data.
  • the reviewing, by the data review unit, the other uncertain data according to the correct data and rules includes the following steps: the firewall system acquires a list of known rules and the determined set of data attributes; according to the rules The logical order determines their dependency steps; the rule set VSet step that can be directly applied is determined from the above rules; the rule set VSet is traversed, and if the rule set VSet is empty, the confirmed attribute set is output, otherwise, the rule set VSet is audited and repaired. The rule corresponding to the determined attribute, and finds other rules that are specified by the dependency, and puts the determined attribute corresponding to the rule into the rule set VSet.
  • said determining, according to the logical order of the rules, their dependencies are performed according to attribute values of the rules.
  • the invention automatically filters the batch data by using the rules to filter the batch data and interact with the user to determine the correct data to perform regular rule review and repair on the other uncertain data, thereby ensuring the correct data and the data quality.
  • FIG. 1 is a schematic flowchart of a method for automatically repairing batch data according to an embodiment of the present invention.
  • FIG. 2 is a schematic diagram of a specific process for reviewing other uncertain data according to correct data and rules according to another embodiment of the present invention.
  • FIG. 3 is a schematic diagram of a system for automatically repairing batch data according to another embodiment of the present invention.
  • the invention provides a method and a system for automatically repairing batch data.
  • the invention filters the batch data by using rules and interacts with the user to determine correct data, thereby performing rule review and repair on other uncertain data, thereby performing batch data.
  • Automatic repair ensures correct data and data quality.
  • 1 is a schematic flowchart of a method for automatically repairing batch data according to the present invention, which specifically includes the following steps.
  • Step S110 Detect the current batch of data to be entered, and trigger an automatic repair step.
  • Step S120 Stepping through the rules and traversing the steps of the data to be entered that may be incorrect.
  • the rule is used to filter to obtain the correct data.
  • the use of rule filtering to obtain correct data is determined by non-conflict between rules.
  • the data that may be incorrectly entered is filtered according to rules to obtain conflicting data between rules. For example, to filter according to the rules to find the data matching the rules, if there are two data, their attribute A is '0', attribute B is '1', and we know that one rule: (A, A')-> (B, B')
  • Step S130 Determine the correct data step by interacting with the user.
  • the user interaction determines that the correct data is specified from the potentially incorrect entry data that is obtained after filtering through the above steps.
  • the user interaction determines that the correct data includes the correct attributes in the data.
  • the correct attribute is judged based on the user's experience, and the user interaction determines that the correct data can be an attribute in a certain data. For example, there is a data to be determined that contains the following attributes: A, B, C, D, E, F, G, H, I.
  • the user can determine one of the pieces according to his own experience and can also determine that the entire data attribute is correct.
  • Step S140 Review other uncertain data according to the above correct data and rules, and mark the erroneous data.
  • the reviewing other uncertain data according to the above correct data and rules specifically includes the steps of: obtaining a known rule list and a determined data attribute set step; determining them according to the logical order of the rules.
  • the dependency step determining the rule set VSet step that can be directly applied from the above rules; traversing the rule set VSet, if the rule set VSet is empty, outputting the confirmed attribute set, otherwise, the auditing and repairing rule set corresponds to the rule in the VSet Determine the attributes, and find the other rules that are specified by the dependency, and put the determined attributes corresponding to the rules into the rule set VSet.
  • the determining, based on the logical order of the rules, their dependencies are performed according to attribute values of the rules.
  • the traversal order of the rules is different, and the result of the deduced audit attribute is different.
  • the dependency relationship between the rules needs to be determined according to the attribute value of the rule. For example, it is known that Three rules Rule1: (A, Am)-> (B, Bm)
  • (D '0').
  • Rule2 depends on Rule1 and Rule3 at the same time. That is, when the attribute values contained in any of Rule1 and Rule3 are confirmed, the attribute value of Rule2 may be confirmed. For example, the user interaction determines that the correct attribute is A.
  • Step S150 The subsystem updates the above erroneous data according to the reference data, and records the updated data into the database step of the subsystem.
  • the updating the erroneous data includes updating a corresponding attribute of the erroneous data according to a correct value to which the data in the reference data belongs.
  • a correct value to which the data in the reference data belongs.
  • the invention automatically filters the batch data by using the rules to filter the batch data and interact with the user to determine the correct data to perform regular rule review and repair on the other uncertain data, thereby ensuring the correct data and the data quality.
  • FIG. 3 is a schematic diagram of a system for automatically repairing batch data according to another embodiment of the present invention.
  • the data detecting unit 10 is configured to detect data currently to be entered in a batch, and trigger automatic repair.
  • the data filtering unit 20 is configured to adopt rule filtering and traverse the data to be entered that may be incorrect one by one.
  • the data filtering unit 20 uses rule filtering to obtain correct data.
  • the data interaction unit 30 is configured to determine correct data by interacting with the user.
  • the user interaction determines that the correct data includes the correct attribute in the data, the correct attribute is determined according to the experience of the user, and the user interaction determines that the correct data can be an attribute in a certain data.
  • the correct data can be an attribute in a certain data.
  • the user can determine one of the pieces according to his own experience and can also determine that the entire data attribute is correct.
  • the data review unit 40 is configured to review other uncertain data according to the above correct data and rules, and mark the erroneous data.
  • the data review unit 40 according to the above-mentioned correct data and rules, reviewing other uncertain data specifically includes the following steps: acquiring a known rule list and a determined data attribute set step; determining them according to the logical order of the rules The dependency step; determining the rule set VSet step that can be directly applied from the above rules; traversing the rule set VSet, if the rule set VSet is empty, outputting the confirmed attribute set, otherwise, the auditing and repairing rule set corresponds to the rule in the VSet Determine the attributes, and find the other rules that are specified by the dependency, and put the determined attributes corresponding to the rules into the rule set VSet.
  • the data review unit 40 determines that their dependencies are based on the attribute values of the rules in accordance with the logical order of the rules.
  • the data updating unit 50 is configured to: the subsystem updates the erroneous data according to the reference data, and records the updated data into a database of the subsystem.
  • the invention automatically filters the batch data by using the rules to filter the batch data and interact with the user to determine the correct data to perform regular rule review and repair on the other uncertain data, thereby ensuring the correct data and the data quality.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

本发明提供一种对批量数据进行自动修复的方法,该方法包括:检测当前批量待录入数据,并触发自动修复步骤;采用规则过滤并逐条遍历可能不正确的待录入数据步骤;通过与用户交互来确定正确的数据步骤;根据上述正确的数据和规则来审核其他不确定的数据,并对错误的数据进行标记步骤;子系统根据参考数据对上述错误的数据进行更新,并将更新后的数据录入到子系统的数据库步骤。本发明通过采用规则对批量数据进行过滤并与用户交互确定正确数据从而对其他不确定数据进行规则审核与修复,从而对批量数据进行了自动修复,确保了数据正确和数据质量。另外,本发明还提供一种对批量数据进行自动修复的系统。

Description

一种对批量数据进行自动修复的方法及系统 技术领域
本发明涉及数据修复领域,尤其涉及一种对批量数据进行自动修复的方法及系统。
背景技术
大型的集团控股公司,拥有总公司和分散到各地的若干子公司或分公司。这些集团股份总公司的主数据系统都有统一、严格的数据管理机制;同时为了提高基础数据质量,降低业务数据分析处理难度,提高业务数据准确性,总公司会对主数据进行认真的清洗和维护,然而若干子公司或分公司则并没有如此完善的数据管理体系,每个子公司或分公司在录入待处理的业务数据时,常常都有自己的输入方式,使得不同子公司或分公司的数据形式无法保持一致。甚至于,因为不同的子公司或分公司最终都会把自己的业务数据汇总到主数据中,在数据的录入过程中,会引入错误到主数据库当中。特别是对批量数据进行处理的情况下,就会因为数据标准不一致或者人为因素会导致数据的错误,影响了公司整体数据的质量,因此需要一种对批量数据录入时就进行监控与修复的数据修复方法。
发明内容
为此,本发明为了解决上述缺陷之一。
因而本发明提供一种对批量数据进行自动修复的方法及系统,通过采用规则对批量数据进行过滤并与用户交互确定正确数据从而对其他不确定数据进行规则审核与修复,从而对批量数据进行了自动修复,确保了数据正确和数据质量。
所以,本发明一个实施例提供一种对批量数据进行自动修复的方法,该方法包括:检测当前批量待录入数据,并触发自动修复步骤;采用规则过滤并逐条遍历可能不正确的待录入数据步骤;通过与用户交互来确定正确的数据步骤;根据上述正确的数据和规则来审核其他不确定的数据,并对错误的数据进行标记步骤;子系统根据参考数据对上述错误的数据进行更新,并将更新后的数据录入到子系统的数据库步骤。
优选地,所述采用规则过滤得到正确的数据。
优选地,所述采用规则过滤得到正确的数据由规则间不冲突来确定。
优选地,所述可能不正确的待录入数据为根据规则过滤得到规则间相冲突的数据。
优选地,所述用户交互来确定正确的数据从可能不正确的待录入数据中指定。
进一步地,所述根据上述正确的数据和规则来审核其他不确定的数据具体包括以下步骤:
防火墙系统获取已知规则列表和已确定的数据属性集合步骤;根据所述规则的逻辑顺序确定它们的依赖关系步骤;从上述规则中确定能直接应用的规则集合VSet步骤;遍历规则集合VSet,如果规则集合VSet为空,则输出确认的属性集合,否则,审核和修复规则集合VSet中规则对应的确定属性,并通过依赖关系找到所述规定推出的其他规则,将这些规则对应的确定属性放入到规则集合VSet中步骤。
优选地,所述依据所述规则的逻辑顺序确定它们的依赖关系依据所述规则的属性值进行。
优选地,所述对错误的数据进行更新包括根据参考数据中数据所属的正确值更新所述错误数据的对应属性。本发明通过采用规则对批量数据进行过滤并与用户交互确定正确数据从而对其他不确定数据进行规则审核与修复,从而对批量数据进行了自动修复,确保了数据正确和数据质量。
本发明另一个实施例提供一种对批量数据进行自动修复的系统,该系统包括:数据检测单元,用于检测当前批量待录入数据,并触发自动修复;数据过滤单元,用于采用规则过滤并逐条遍历可能不正确的待录入数据;数据交互单元,用于通过与用户交互来确定正确的数据;数据审核单元,用于根据上述正确的数据和规则来审核其他不确定的数据,并对错误的数据进行标记;数据更新单元,用于子系统根据参考数据对上述错误的数据进行更新,并将更新后的数据录入到子系统的数据库。
优选地,所述数据过滤单元采用规则过滤得到正确的数据。
进一步地,所述数据审核单元所述根据上述正确的数据和规则来审核其他不确定的数据具体包括以下步骤:防火墙系统获取已知规则列表和已确定的数据属性集合步骤;根据所述规则的逻辑顺序确定它们的依赖关系步骤;从上述规则中确定能直接应用的规则集合VSet步骤;遍历规则集合VSet,如果规则集合VSet为空,则输出确认的属性集合,否则,审核和修复规则集合VSet中规则对应的确定属性,并通过依赖关系找到所述规定推出的其他规则,将这些规则对应的确定属性放入到规则集合VSet中步骤。
优选地,所述依据所述规则的逻辑顺序确定它们的依赖关系依据所述规则的属性值进行。本发明通过采用规则对批量数据进行过滤并与用户交互确定正确数据从而对其他不确定数据进行规则审核与修复,从而对批量数据进行了自动修复,确保了数据正确和数据质量。
附图说明
图1是本发明一个实施例提供的一种对批量数据进行自动修复的方法的流程示意图。
图2是本发明另一个实施例提供的根据正确的数据和规则来审核其他不确定的数据的具体流程示意图。
图3是本发明另一个实施例提供的一种对批量数据进行自动修复的系统的示意图。
具体实施方式
为了使本发明的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本发明进行进一步的详细说明。应当理解,此处所描述的具体实施例仅仅用于解释本发明,并不用于限定本发明。
本发明提供一种对批量数据进行自动修复的方法及系统,本发明通过采用规则对批量数据进行过滤并与用户交互确定正确数据从而对其他不确定数据进行规则审核与修复,从而对批量数据进行了自动修复,确保了数据正确和数据质量。附图1是本发明提供的一种对批量数据进行自动修复的方法的流程示意图,具体包括以下步骤。
步骤S110:检测当前批量待录入数据,并触发自动修复步骤。
步骤S120:采用规则过滤并逐条遍历可能不正确的待录入数据步骤。
在对当前批量待录入数据启动自动修复步骤后,在本步骤中,先采用规则过滤得到正确的数据。所述采用规则过滤得到正确的数据由规则间不冲突来确定。所述可能不正确的待录入数据为根据规则过滤得到规则间相冲突的数据。例如,对待录入根据规则进行过滤,查找匹配规则的数据,如存在两条数据,它们的属性A都为‘0’,属性B都为‘1’,而我们已知,一条规则:(A,A’)-> (B,B’)||(),则这两条数据的这两个属性值都是正确的。反之,则为可能不正确的待录入数据。
步骤S130:通过与用户交互来确定正确的数据步骤。
在本步骤中,所述用户交互来确定正确的数据从经上步骤过滤后得到的可能不正确的待录入数据中指定。所述用户交互确定正确的数据包括数据中的正确属性。所述正确属性根据用户的经验进行判断,所述用户交互来确定正确的数据可以为一条确定数据里的属性。例如:有一条待确定数据包含以下多个属性:A、B、C、D、E、F、G、H、I。用户既可以根据自身经验确定其中某一条也可以确定整条数据属性均为正确。
步骤S140:根据上述正确的数据和规则来审核其他不确定的数据,并对错误的数据进行标记步骤。
如图2所示,所述根据上述正确的数据和规则来审核其他不确定的数据具体包括以下步骤:获取已知规则列表和已确定的数据属性集合步骤;根据所述规则的逻辑顺序确定它们的依赖关系步骤;从上述规则中确定能直接应用的规则集合VSet步骤;遍历规则集合VSet,如果规则集合VSet为空,则输出确认的属性集合,否则,审核和修复规则集合VSet中规则对应的确定属性,并通过依赖关系找到所述规定推出的其他规则,将这些规则对应的确定属性放入到规则集合VSet中步骤。所述依据所述规则的逻辑顺序确定它们的依赖关系依据所述规则的属性值进行。在本步骤中,对规则的遍历顺序不同,会是推导出来的审核属性结果不一样,那么在本步骤中需要先根据规则的属性值来确定所述规则间的依赖关系,例如,已知有三条规则Rule1:(A,Am)-> (B,Bm)||(),Rule2:(B,Bm)-> (C,Cm)||(),Rule3:(E,Em)-> (B,Bm)||(D =‘0’)。根据这三条规则的属性可知Rule2同时依赖于Rule1和Rule3,也即当Rule1和Rule3中任意一个规则包含的属性值确认后,Rule2的属性值才有可能确认。比如用户交互确定正确属性为A,根据上述规则应用,首先利用Rule1,即当前VSet = {Rule1}。因为VSet不为空,所以我们根据Rule1可以推导出属性B能够被确认。因为Rule2依赖于Rule1,由此得到Rule2在当前情况下是可用的规则,把它放入VSet,此时,VSet = {Rule2},Rule1已经用完,被删除了。再次使用VSet,能够确认属性C。而此时,VSet是为空了,没有更多的规则可用了。最后属性{B、C}的集合会被返回,作为经过审核能够确认的属性,如果错误,则可以修复它们。
步骤S150:子系统根据参考数据对上述错误的数据进行更新,并将更新后的数据录入到子系统的数据库步骤。
本步骤中,所述对错误的数据进行更新包括根据参考数据中数据所属的正确值更新所述错误数据的对应属性。如上例中,发现B和C的值为错误,那么我们就把Bm的值更新到B上、Cm的值更新到C上。本发明通过采用规则对批量数据进行过滤并与用户交互确定正确数据从而对其他不确定数据进行规则审核与修复,从而对批量数据进行了自动修复,确保了数据正确和数据质量。
如图3所示是本发明另一个实施例提供的一种对批量数据进行自动修复的系统的示意图,具体为:数据检测单元10,用于检测当前批量待录入数据,并触发自动修复。
数据过滤单元20,用于采用规则过滤并逐条遍历可能不正确的待录入数据。所述数据过滤单元20采用规则过滤得到正确的数据。
数据交互单元30,用于通过与用户交互来确定正确的数据。
在数据交互单元30中,所述用户交互确定正确的数据包括数据中的正确属性,所述正确属性根据用户的经验进行判断,所述用户交互来确定正确的数据可以为一条确定数据里的属性。例如:有一条待确定数据包含以下多个属性:A、B、C、D、E、F、G、H、I。用户既可以根据自身经验确定其中某一条也可以确定整条数据属性均为正确。
数据审核单元40,用于根据上述正确的数据和规则来审核其他不确定的数据,并对错误的数据进行标记。所述数据审核单元40所述根据上述正确的数据和规则来审核其他不确定的数据具体包括以下步骤:获取已知规则列表和已确定的数据属性集合步骤;根据所述规则的逻辑顺序确定它们的依赖关系步骤;从上述规则中确定能直接应用的规则集合VSet步骤;遍历规则集合VSet,如果规则集合VSet为空,则输出确认的属性集合,否则,审核和修复规则集合VSet中规则对应的确定属性,并通过依赖关系找到所述规定推出的其他规则,将这些规则对应的确定属性放入到规则集合VSet中步骤。所述数据审核单元40依据所述规则的逻辑顺序确定它们的依赖关系依据所述规则的属性值进行。
数据更新单元50,用于子系统根据参考数据对上述错误的数据进行更新,并将更新后的数据录入到子系统的数据库。本发明通过采用规则对批量数据进行过滤并与用户交互确定正确数据从而对其他不确定数据进行规则审核与修复,从而对批量数据进行了自动修复,确保了数据正确和数据质量。
以上内容是结合具体的优选实施方式对本发明所作的进一步详细说明,不能认定本发明的具体实施只局限于这些说明。对于本发明所属技术领域的普通技术人员来说,在不脱离本发明构思的前提下,还可以做出若干简单推演或替换。

Claims (12)

  1. 一种对批量数据进行自动修复的方法,其特征在于,所述方法包括以下步骤:
    检测当前批量待录入数据,并触发自动修复步骤;
    采用规则过滤并逐条遍历可能不正确的待录入数据步骤;
    通过与用户交互来确定正确的数据步骤;
    根据上述正确的数据和规则来审核其他不确定的数据,并对错误的数据进行标记步骤;
    子系统根据参考数据对上述错误的数据进行更新,并将更新后的数据录入到子系统的数据库步骤。
  2. 根据权利要求1所述的方法,其特征在于,所述采用规则过滤得到正确的数据。
  3. 根据权利要求1或2所述的方法,其特征在于,所述采用规则过滤得到正确的数据由规则间不冲突来确定。
  4. 根据权利要求1所述的方法,其特征在于,所述可能不正确的待录入数据为根据规则过滤得到规则间相冲突的数据。
  5. 根据权利要求1或4所述的方法,其特征在于,所述用户交互来确定正确的数据从可能不正确的待录入数据中指定。
  6. 根据权利要求1所述的方法,其特征在于,所述根据上述正确的数据和规则来审核其他不确定的数据具体包括以下步骤:
    获取已知规则列表和已确定的数据属性集合步骤;
    根据所述规则的逻辑顺序确定它们的依赖关系步骤;
    从上述规则中确定能直接应用的规则集合VSet步骤;
    遍历规则集合VSet,如果规则集合VSet为空,则输出确认的属性集合,否则,审核和修复规则集合VSet中规则对应的确定属性,并通过依赖关系找到所述规定推出的其他规则,将这些规则对应的确定属性放入到规则集合VSet中步骤。
  7. 根据权利要求6所述的方法,其特征在于,所述依据所述规则的逻辑顺序确定它们的依赖关系依据所述规则的属性值进行。
  8. 根据权利要求1或6所述的方法,其特征在于,所述对错误的数据进行更新包括根据参考数据中数据所属的正确值更新所述错误数据的对应属性。
  9. 一种对批量数据进行自动修复的系统,其特征在于,所述系统包括:
    数据检测单元,用于检测当前批次待录入数据,并触发自动修复;
    数据过滤单元,用于采用规则过滤并逐条遍历可能不正确的待录入数据;
    数据交互单元,用于通过与用户交互来确定正确的数据;
    数据审核单元,用于根据上述正确的数据和规则来审核其他不确定的数据,并对错误的数据进行标记;
    数据更新单元,用于子系统根据参考数据对上述错误的数据进行更新,并将更新后的数据录入到子系统的数据库。
  10. 根据权利要求9所述的系统,其特征在于,所述数据过滤单元采用规则过滤得到正确的数据。
  11. 根据权利要求9所述的系统,其特征在于,所述数据审核单元所述根据上述正确的数据和规则来审核其他不确定的数据具体包括以下步骤:
    获取已知规则列表和已确定的数据属性集合步骤;
    根据所述规则的逻辑顺序确定它们的依赖关系步骤;
    从上述规则中确定能直接应用的规则集合VSet步骤;
    遍历规则集合VSet,如果规则集合VSet为空,则输出确认的属性集合,否则,审核和修复规则集合VSet中规则对应的确定属性,并通过依赖关系找到所述规定推出的其他规则,将这些规则对应的确定属性放入到规则集合VSet中步骤。
  12. 根据权利要求9或11所述的系统,其特征在于,所述依据所述规则的逻辑顺序确定它们的依赖关系依据所述规则的属性值进行。
PCT/CN2014/084625 2014-01-07 2014-08-18 一种对批量数据进行自动修复的方法及系统 WO2015103880A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201410006101.X 2014-01-07
CN201410006101.XA CN104253850A (zh) 2014-01-07 2014-01-07 一种任务分布式调度方法和系统

Publications (1)

Publication Number Publication Date
WO2015103880A1 true WO2015103880A1 (zh) 2015-07-16

Family

ID=52188378

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/084625 WO2015103880A1 (zh) 2014-01-07 2014-08-18 一种对批量数据进行自动修复的方法及系统

Country Status (2)

Country Link
CN (1) CN104253850A (zh)
WO (1) WO2015103880A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108205541A (zh) * 2016-12-16 2018-06-26 北大方正集团有限公司 分布式网络爬虫任务的调度方法及装置

Families Citing this family (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104598320B (zh) * 2015-01-30 2018-11-30 北京正奇联讯科技有限公司 基于分布式系统的任务执行方法及系统
CN106681823A (zh) * 2015-11-05 2017-05-17 田文洪 一种处理MapReduce数据倾斜的负载均衡方法
CN105975334A (zh) * 2016-04-25 2016-09-28 深圳市永兴元科技有限公司 任务分布式调度方法及系统
CN105893157B (zh) * 2016-04-29 2019-08-30 国家计算机网络与信息安全管理中心 一种开放分布式系统资源管理与任务调度系统与方法
CN106095572B (zh) * 2016-06-08 2019-12-06 东方网力科技股份有限公司 一种大数据处理的分布式调度系统及方法
CN106779376A (zh) * 2016-12-02 2017-05-31 温瑭玮 一种快速触发服务器数据检索及数据分析的方法
CN108733469B (zh) * 2017-04-24 2021-09-03 北京京东尚科信息技术有限公司 一种分布式系统任务执行的方法和装置
CN107483601A (zh) * 2017-08-28 2017-12-15 郑州云海信息技术有限公司 一种分布式定时任务的实现方法及执行系统
CN110569252B (zh) * 2018-05-16 2023-04-07 杭州海康威视数字技术股份有限公司 一种数据处理系统及方法
CN109101333A (zh) * 2018-06-27 2018-12-28 北京蜂盒科技有限公司 图像特征提取方法、装置、存储介质及电子设备
CN110381134B (zh) * 2019-07-18 2022-05-17 湖南快乐阳光互动娱乐传媒有限公司 调度方法、系统、调度器及cdn系统
CN112448977A (zh) * 2019-08-30 2021-03-05 北京京东尚科信息技术有限公司 分配任务的系统、方法、设备和计算机可读介质
CN110912967A (zh) * 2019-10-31 2020-03-24 北京浪潮数据技术有限公司 一种服务节点调度方法、装置、设备及存储介质
CN111143057B (zh) * 2019-12-13 2024-04-19 中国科学院深圳先进技术研究院 一种基于多数据中心的异构集群数据处理方法、系统及电子设备
CN111104225A (zh) * 2019-12-23 2020-05-05 杭州安恒信息技术股份有限公司 一种基于MapReduce的数据处理方法、装置、设备及介质
CN114981778A (zh) * 2020-01-14 2022-08-30 华为技术有限公司 确定芯片状态的方法、调度集群资源的方法及其装置
KR20220002547A (ko) * 2020-03-11 2022-01-06 상하이 센스타임 인텔리전트 테크놀로지 컴퍼니 리미티드 태스크 스케줄링 방법과 장치

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040073843A1 (en) * 2002-10-15 2004-04-15 Dean Jason Arthur Diagnostics using information specific to a subsystem
CN102411600A (zh) * 2011-08-02 2012-04-11 暨南大学 一种基于蕴涵规则的数据质量自动检测方法
CN103716301A (zh) * 2013-12-04 2014-04-09 深圳市华傲数据技术有限公司 一种基于防火墙的数据修复方法及系统
CN103713967A (zh) * 2013-12-04 2014-04-09 深圳市华傲数据技术有限公司 一种基于规则优化的数据防火墙修复方法及系统
CN103714415A (zh) * 2013-12-04 2014-04-09 深圳市华傲数据技术有限公司 一种对批量数据进行自动修复的方法及系统

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102541640B (zh) * 2011-12-28 2014-10-29 厦门市美亚柏科信息股份有限公司 一种集群gpu资源调度系统和方法
CN102521044B (zh) * 2011-12-30 2013-12-25 北京拓明科技有限公司 一种基于消息中间件的分布式任务调度方法及系统
KR101893982B1 (ko) * 2012-04-09 2018-10-05 삼성전자 주식회사 분산 처리 시스템, 분산 처리 시스템의 스케줄러 노드 및 스케줄 방법, 및 이를 위한 프로그램 생성 장치

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040073843A1 (en) * 2002-10-15 2004-04-15 Dean Jason Arthur Diagnostics using information specific to a subsystem
CN102411600A (zh) * 2011-08-02 2012-04-11 暨南大学 一种基于蕴涵规则的数据质量自动检测方法
CN103716301A (zh) * 2013-12-04 2014-04-09 深圳市华傲数据技术有限公司 一种基于防火墙的数据修复方法及系统
CN103713967A (zh) * 2013-12-04 2014-04-09 深圳市华傲数据技术有限公司 一种基于规则优化的数据防火墙修复方法及系统
CN103714415A (zh) * 2013-12-04 2014-04-09 深圳市华傲数据技术有限公司 一种对批量数据进行自动修复的方法及系统

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108205541A (zh) * 2016-12-16 2018-06-26 北大方正集团有限公司 分布式网络爬虫任务的调度方法及装置
CN108205541B (zh) * 2016-12-16 2020-12-04 北大方正集团有限公司 分布式网络爬虫任务的调度方法及装置

Also Published As

Publication number Publication date
CN104253850A (zh) 2014-12-31

Similar Documents

Publication Publication Date Title
WO2015103880A1 (zh) 一种对批量数据进行自动修复的方法及系统
WO2015103879A1 (zh) 一种基于规则优化的数据防火墙修复方法及系统
CN108132957B (zh) 一种数据库处理方法及装置
CN104135521B (zh) 环境自动监测网络的数据异常值标识方法及系统
US20080270420A1 (en) Method and System for Verification of Source Data in Pharmaceutical Studies and Other Applications
US20140025645A1 (en) Resolving Database Integration Conflicts Using Data Provenance
CN107918629B (zh) 一种告警故障的关联方法和装置
CN108572996A (zh) 数据库表结构的同步方法、装置、电子设备和存储介质
CN107491429A (zh) 一种解决同时编辑文档内容冲突的方法
CN106201502A (zh) 一种跨主机应用程序部署方法及装置
CN103716301A (zh) 一种基于防火墙的数据修复方法及系统
CN107316176A (zh) 保单处理的控制方法和装置
WO2009147704A1 (ja) テーブルとテーブル項目の平行編集プログラム
CN106571940A (zh) 一种融合网管数据与资源数据的方法和装置
JP6504237B2 (ja) 共有データ定義支援システム、その支援装置、プログラム
Fu et al. Design and implementation of clinical LIS360 laboratory management system based on AI technology
KR100987761B1 (ko) 통합 데이터베이스 구축을 위한 데이터 정제 시스템 및방법과 이를 위한 기록매체
CN115829192B (zh) 一种用于实现工程信息安全监管的数字化管理系统及方法
CN103713967A (zh) 一种基于规则优化的数据防火墙修复方法及系统
CN112416918A (zh) 数据治理系统及其工作方法
WO2015103878A1 (zh) 一种基于防火墙的数据修复方法及系统
CN107168822A (zh) Oracle streams的异常修复系统及方法
KR101415528B1 (ko) 분산된 시스템을 위한 데이터 오류 처리 장치 및 방법
WO2015067087A1 (zh) 一种属性集推荐方法和装置
EP3945527A1 (en) Test result auto verification

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14877989

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14877989

Country of ref document: EP

Kind code of ref document: A1