CN115357625A - Structured data comparison method and device, electronic equipment and storage medium - Google Patents

Structured data comparison method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN115357625A
CN115357625A CN202211085688.9A CN202211085688A CN115357625A CN 115357625 A CN115357625 A CN 115357625A CN 202211085688 A CN202211085688 A CN 202211085688A CN 115357625 A CN115357625 A CN 115357625A
Authority
CN
China
Prior art keywords
comparison
data
structured data
field
structured
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211085688.9A
Other languages
Chinese (zh)
Other versions
CN115357625B (en
Inventor
陈帆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Construction Bank Corp
CCB Finetech Co Ltd
Original Assignee
China Construction Bank Corp
CCB Finetech Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Construction Bank Corp, CCB Finetech Co Ltd filed Critical China Construction Bank Corp
Priority to CN202211085688.9A priority Critical patent/CN115357625B/en
Publication of CN115357625A publication Critical patent/CN115357625A/en
Application granted granted Critical
Publication of CN115357625B publication Critical patent/CN115357625B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24564Applying rules; Deductive queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2365Ensuring data consistency and integrity
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/258Data format conversion from or to a database
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computer Security & Cryptography (AREA)
  • Artificial Intelligence (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application relates to the technical field of data processing, and provides a structured data comparison method, a device, an electronic device and a storage medium, wherein the method comprises the following steps: acquiring a structured data set and application scene requirements corresponding to the structured data set, wherein the structured data set comprises first structured data and second structured data, the first structured data is reference source data, and the second structured data is changed data; acquiring a field comparison rule based on application scene requirements, and analyzing the structured data set based on the field comparison rule to obtain comparison item numbers and sequencing fields; grouping the structured data sets based on the comparison item numbers to obtain at least one group of comparison data; and aiming at each group of comparison data, sequencing the comparison data based on the sequencing field, and sequentially comparing to obtain a comparison result. Therefore, the field comparison rules meeting the requirements of various application scenes can be supported, the comparison of batch data is supported, and the comparison efficiency is improved.

Description

结构化数据比对方法、装置、电子设备及存储介质Structured data comparison method, device, electronic device and storage medium

技术领域technical field

本申请涉及数据处理技术领域,尤其涉及一种结构化数据比对方法、装置、电子设备及存储介质。The present application relates to the technical field of data processing, and in particular to a structured data comparison method, device, electronic equipment and storage medium.

背景技术Background technique

结构化数据也被称为行数据,具有特定的字段,可以使用关系型数据库表示和存储,且可以用二维表结构来逻辑表达实现的数据,而对结构化数据进行数据迁移时,需要对迁移前后的数据进行比对,以保证数据迁移工作的正确性。Structured data, also known as row data, has specific fields, can be represented and stored using a relational database, and can be logically expressed in a two-dimensional table structure. When migrating structured data, it is necessary to Compare the data before and after the migration to ensure the correctness of the data migration.

现有技术中,对结构化数据进行比对时,可以把结构化数据文件加载入Oracle/Mysql等关系型数据库中,通过全连接(SQL Full Join)的方式进行两表关联,进而将两表中的字符串进行比较。In the prior art, when comparing structured data, the structured data file can be loaded into a relational database such as Oracle/Mysql, and the two tables can be associated through a full connection (SQL Full Join), and then the two tables Compare the strings in .

但是,上述数据比对方式依赖单机数据库的性能,当存在批量数据进行比对时,需要花费大量的时间,比对效率差。However, the above data comparison method depends on the performance of the stand-alone database. When there is batch data for comparison, it takes a lot of time and the comparison efficiency is poor.

发明内容Contents of the invention

本申请提供一种结构化数据比对方法、装置、电子设备及存储介质,可以支持符合各种应用场景需求的字段比对规则,并同时支持批量数据的比对,提高比对效率。The present application provides a structured data comparison method, device, electronic device, and storage medium, which can support field comparison rules that meet the requirements of various application scenarios, and simultaneously support batch data comparison to improve comparison efficiency.

第一方面,本申请提供一种结构化数据比对方法,所述方法包括:In a first aspect, the present application provides a structured data comparison method, the method comprising:

获取结构化数据集合和所述结构化数据集合对应的应用场景需求,所述结构化数据集合包括第一结构化数据和第二结构化数据,所述第一结构化数据为基准源数据,所述第二结构化数据为经过变动后的数据;Obtain a structured data set and application scenario requirements corresponding to the structured data set, the structured data set includes first structured data and second structured data, the first structured data is reference source data, and the The second structured data is changed data;

基于所述应用场景需求获取字段比对规则,并基于所述字段比对规则对所述结构化数据集合进行解析,得到比对项编号和排序字段;Obtaining field comparison rules based on the application scenario requirements, and parsing the structured data set based on the field comparison rules to obtain comparison item numbers and sorting fields;

基于所述比对项编号将所述结构化数据集合进行分组,得到至少一组比对数据;grouping the structured data sets based on the comparison item numbers to obtain at least one set of comparison data;

针对每一组比对数据,基于所述排序字段对所述比对数据进行排序,并依次进行比对,得到比对结果。For each set of comparison data, sort the comparison data based on the sorting field, and perform comparison in sequence to obtain a comparison result.

可选的,基于所述字段比对规则对所述结构化数据集合进行解析,得到比对项编号和排序字段,包括:Optionally, the structured data set is parsed based on the field comparison rules to obtain the comparison item number and sorting fields, including:

获取所述字段比对规则中对应的比对字段,所述比对字段包括类型字段、函数字段和编号字段;Obtaining the corresponding comparison field in the field comparison rule, the comparison field includes a type field, a function field and a number field;

基于所述比对字段获取所述结构化数据集合中相应的数据,并基于所述字段比对规则对所述数据进行预处理,得到所述数据对应的比对项编号和排序字段。Obtain corresponding data in the structured data set based on the comparison fields, and preprocess the data based on the field comparison rules to obtain the comparison item number and sorting field corresponding to the data.

可选的,基于所述字段比对规则对所述数据进行预处理,得到所述数据对应的比对项编号和排序字段,包括:Optionally, the data is preprocessed based on the field comparison rule to obtain the comparison item number and sorting field corresponding to the data, including:

基于所述类型字段识别出所述数据中待比对字符串,并将所述待比对字符串进行拼接;所述待比对字符串包括主键字符串和业务字符串;Identifying character strings to be compared in the data based on the type field, and splicing the character strings to be compared; the character strings to be compared include primary key character strings and business character strings;

基于所述函数字段对拼接后的待比对字符串进行处理,得到比对字符串;Processing the spliced character strings to be compared based on the function field to obtain the compared character strings;

利用所述编号字段对所述比对字符串进行组装,并识别组装后比对字符串对应的比对项编号和排序字段。The comparison string is assembled by using the number field, and the comparison item number and sorting field corresponding to the assembled comparison string are identified.

可选的,基于所述比对项编号将所述结构化数据集合进行分组,包括:Optionally, grouping the structured data set based on the comparison item number includes:

获取所述结构化数据集合中每一数据对应的主键字符串,并基于所述主键字符串和所述比对项编号对所述结构化数据集合进行分组。Obtain the primary key string corresponding to each data in the structured data set, and group the structured data set based on the primary key string and the comparison item number.

可选的,基于所述排序字段对所述比对数据进行排序,并依次进行比对,得到比对结果,包括:Optionally, sort the comparison data based on the sorting field, and perform comparison in sequence to obtain a comparison result, including:

获取每一组比对数据中的数据条数,并判断所述数据条数是否大于1;Obtain the number of data items in each group of comparison data, and determine whether the number of data items is greater than 1;

若是,则基于所述排序字段对所述比对数据进行排序,并依次进行比对,得到比对结果;If so, sorting the comparison data based on the sorting field, and performing comparisons in sequence to obtain a comparison result;

若否,则基于所述比对数据的来源得到比对结果。If not, the comparison result is obtained based on the source of the comparison data.

可选的,基于所述比对数据的来源得到比对结果,包括:Optionally, the comparison result is obtained based on the source of the comparison data, including:

若所述比对数据来自第一结构化数据,则确定所述比对数据为被删除的数据;If the comparison data is from the first structured data, then determining that the comparison data is deleted data;

若所述比对数据来自第二结构化数据,则确定所述比对数据为新增的数据。If the comparison data is from the second structured data, it is determined that the comparison data is newly added data.

可选的,获取结构化数据集合和所述结构化数据集合对应的应用场景需求,包括:Optionally, acquire the structured data set and the application scenario requirements corresponding to the structured data set, including:

利用标准化接口获取不同数据库中的实时数据,并将所述实时数据进行格式转换,得到结构化数据集合;Using a standardized interface to obtain real-time data in different databases, and converting the format of the real-time data to obtain a structured data set;

针对不同数据库对应的结构化数据集合,获取每一结构化数据集合对应的应用场景需求。For the structured data sets corresponding to different databases, obtain the application scenario requirements corresponding to each structured data set.

可选的,所述方法还包括:Optionally, the method also includes:

在得到所述比对结果后,判断所述比对结果是否一致;After obtaining the comparison result, it is judged whether the comparison result is consistent;

若是,则间隔预设时间,对所述结构化数据集合再次进行比对,得到比对结果后覆盖上一次的比对结果,以验证比对的准确性;If so, then compare the structured data set again at an interval of a preset time, and cover the last comparison result after obtaining the comparison result to verify the accuracy of the comparison;

若否,则生成告警提示,以提醒用户对所述比对结果进行核验。If not, an alarm prompt is generated to remind the user to verify the comparison result.

第二方面,本申请提供一种结构化数据比对装置,所述装置包括:In a second aspect, the present application provides a structured data comparison device, the device comprising:

获取模块,用于获取结构化数据集合和所述结构化数据集合对应的应用场景需求,所述结构化数据集合包括第一结构化数据和第二结构化数据,所述第一结构化数据为基准源数据,所述第二结构化数据为经过变动后的数据;An acquisition module, configured to acquire a structured data set and application scenario requirements corresponding to the structured data set, the structured data set includes first structured data and second structured data, and the first structured data is Reference source data, the second structured data is changed data;

解析模块,用于基于所述应用场景需求获取字段比对规则,并基于所述字段比对规则对所述结构化数据集合进行解析,得到比对项编号和排序字段;A parsing module, configured to obtain field comparison rules based on the application scenario requirements, and analyze the structured data set based on the field comparison rules to obtain comparison item numbers and sorting fields;

分组模块,用于基于所述比对项编号将所述结构化数据集合进行分组,得到至少一组比对数据;A grouping module, configured to group the structured data set based on the comparison item number to obtain at least one set of comparison data;

比对模块,用于针对每一组比对数据,基于所述排序字段对所述比对数据进行排序,并依次进行比对,得到比对结果。The comparison module is configured to, for each group of comparison data, sort the comparison data based on the sorting field, and perform comparison in sequence to obtain a comparison result.

第三方面,本申请提供一种电子设备,包括:处理器,以及与所述处理器通信连接的存储器;In a third aspect, the present application provides an electronic device, including: a processor, and a memory communicatively connected to the processor;

所述存储器存储计算机执行指令;the memory stores computer-executable instructions;

所述处理器执行所述存储器存储的计算机执行指令,以实现如第一方面中任一项所述的方法。The processor executes the computer-implemented instructions stored in the memory to implement the method according to any one of the first aspects.

第四方面,本申请提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机执行指令,所述计算机执行指令被处理器执行时用于实现如第一方面中任一项所述的方法。In a fourth aspect, the present application provides a computer-readable storage medium, the computer-readable storage medium stores computer-executable instructions, and when the computer-executable instructions are executed by a processor, they are used to implement any one of the first aspect. described method.

第五方面,本申请提供一种计算机程序产品,包括程序代码,当计算机运行所述计算机程序时,所述程序代码执行如第一方面中任一项所述的方法。In a fifth aspect, the present application provides a computer program product, including program code, and when a computer runs the computer program, the program code executes the method described in any one of the first aspect.

综上所述,本申请提供一种结构化数据比对方法、装置、电子设备及存储介质,可以通过获取结构化数据集合和结构化数据集合对应的应用场景需求,并基于应用场景需求获取字段比对规则,进一步的,基于字段比对规则对结构化数据集合进行解析,得到比对项编号和排序字段;进而基于比对项编号将结构化数据集合进行分组,得到至少一组比对数据;进一步的,针对每一组比对数据,基于排序字段对比对数据进行排序,并依次进行比对,得到比对结果,其中,结构化数据集合包括第一结构化数据和第二结构化数据,第一结构化数据为基准源数据,第二结构化数据为经过变动后的数据;这样,可以支持符合各种应用场景需求的字段比对规则,并同时支持批量数据的比对,在保证数据比对的正确性的同时,提高比对效率。To sum up, the present application provides a structured data comparison method, device, electronic device and storage medium, which can obtain the structured data set and the application scenario requirements corresponding to the structured data set, and obtain the fields based on the application scenario requirements. The comparison rules, further, analyze the structured data set based on the field comparison rules to obtain the comparison item number and sorting field; then group the structured data set based on the comparison item number to obtain at least one set of comparison data ; Further, for each set of comparison data, sort the data based on the sorting field comparison, and perform comparison in turn to obtain the comparison result, wherein the structured data set includes the first structured data and the second structured data , the first structured data is the reference source data, and the second structured data is the changed data; in this way, it can support field comparison rules that meet the requirements of various application scenarios, and at the same time support the comparison of batch data. While improving the accuracy of data comparison, the comparison efficiency is improved.

附图说明Description of drawings

此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本申请的实施例,并与说明书一起用于解释本申请的原理。The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description serve to explain the principles of the application.

图1为本申请实施例提供的一种应用场景示意图;FIG. 1 is a schematic diagram of an application scenario provided by an embodiment of the present application;

图2为本申请实施例提供的一种结构化数据比对方法的流程示意图;FIG. 2 is a schematic flow chart of a structured data comparison method provided in an embodiment of the present application;

图3为本申请实施例提供的一种标准化接口的框架示意图;FIG. 3 is a schematic diagram of a framework of a standardized interface provided by an embodiment of the present application;

图4为本申请实施例提供的一种批量数据比对方法的流程示意图;FIG. 4 is a schematic flow diagram of a batch data comparison method provided in an embodiment of the present application;

图5为本申请实施例提供的一种实时数据比对方法的流程示意图;FIG. 5 is a schematic flow chart of a real-time data comparison method provided in an embodiment of the present application;

图6为本申请实施例提供的一种结构化数据比对装置的结构示意图;FIG. 6 is a schematic structural diagram of a structured data comparison device provided in an embodiment of the present application;

图7为本申请实施例提供的一种电子设备的结构示意图。FIG. 7 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.

通过上述附图,已示出本申请明确的实施例,后文中将有更详细的描述。这些附图和文字描述并不是为了通过任何方式限制本申请构思的范围,而是通过参考特定实施例为本领域技术人员说明本申请的概念。By means of the above drawings, specific embodiments of the present application have been shown, which will be described in more detail hereinafter. These drawings and text descriptions are not intended to limit the scope of the concept of the application in any way, but to illustrate the concept of the application for those skilled in the art by referring to specific embodiments.

具体实施方式Detailed ways

这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本申请相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本申请的一些方面相一致的装置和方法的例子。Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numerals in different drawings refer to the same or similar elements unless otherwise indicated. The implementations described in the following exemplary embodiments do not represent all implementations consistent with this application. Rather, they are merely examples of apparatuses and methods consistent with aspects of the present application as recited in the appended claims.

为了便于清楚描述本申请实施例的技术方案,在本申请的实施例中,采用了“第一”、“第二”等字样对功能和作用基本相同的相同项或相似项进行区分。例如,第一设备和第二设备仅仅是为了区分不同的设备,并不对其先后顺序进行限定。本领域技术人员可以理解“第一”、“第二”等字样并不对数量和执行次序进行限定,并且“第一”、“第二”等字样也并不限定一定不同。In order to clearly describe the technical solutions of the embodiments of the present application, in the embodiments of the present application, words such as "first" and "second" are used to distinguish the same or similar items with basically the same function and effect. For example, the first device and the second device are only used to distinguish different devices, and their sequence is not limited. Those skilled in the art can understand that words such as "first" and "second" do not limit the number and execution order, and words such as "first" and "second" do not necessarily limit the difference.

需要说明的是,本申请中,“示例性的”或者“例如”等词用于表示作例子、例证或说明。本申请中被描述为“示例性的”或者“例如”的任何实施例或设计方案不应被解释为比其他实施例或设计方案更优选或更具优势。确切而言,使用“示例性的”或者“例如”等词旨在以具体方式呈现相关概念。It should be noted that, in this application, words such as "exemplary" or "for example" are used as examples, illustrations or illustrations. Any embodiment or design described herein as "exemplary" or "for example" is not to be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete manner.

本申请中,“至少一个”是指一个或者多个,“多个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B的情况,其中A,B可以是单数或者复数。字符“/”一般表示前后关联对象是一种“或”的关系。“以下至少一项(个)”或其类似表达,是指的这些项中的任意组合,包括单项(个)或复数项(个)的任意组合。例如,a,b,或c中的至少一项(个),可以表示:a,b,c,a-b,a-c,b-c,或a-b-c,其中a,b,c可以是单个,也可以是多个。In this application, "at least one" means one or more, and "multiple" means two or more. "And/or" describes the association relationship of associated objects, indicating that there may be three types of relationships, for example, A and/or B, which can mean: A exists alone, A and B exist simultaneously, and B exists alone, where A, B can be singular or plural. The character "/" generally indicates that the contextual objects are an "or" relationship. "At least one of the following" or similar expressions refer to any combination of these items, including any combination of single or plural items. For example, at least one item (piece) of a, b, or c can represent: a, b, c, a-b, a-c, b-c, or a-b-c, where a, b, c can be single or multiple .

下面结合附图对本申请进行介绍,图1为本申请实施例提供的一种应用场景示意图,本申请提供的一种结构化数据比对方法可以应用于如图1所示的应用场景中。该应用场景包括:第一数据库1011、第二数据库1012、数据处理平台102和显示终端103;当第一数据库1011中的数据向第二数据库1012进行迁移时,需要对迁移前后的数据进行比对。The application is introduced below with reference to the accompanying drawings. FIG. 1 is a schematic diagram of an application scenario provided by an embodiment of the application. A structured data comparison method provided by the application can be applied to the application scenario shown in FIG. 1 . The application scenario includes: the first database 1011, the second database 1012, the data processing platform 102 and the display terminal 103; when the data in the first database 1011 is migrated to the second database 1012, it is necessary to compare the data before and after the migration .

具体的,数据处理平台102可以从第一数据库1011中获取基准源数据,并从第二数据库1012中获取迁移后的数据,相应的,获取该应用场景下对应的配置文件,该配置文件为人工提前配置好的适用于该应用场景的字段比对规则,该配置文件包括需要进行数据比对的比对字段以及比对规则,进一步的,数据处理平台102解析该配置文件,得到字段比对规则,并利用该字段比对规则对从第一数据库1011中获取到的基准源数据和从第二数据库1012中获取到的迁移后的数据进行比对,得到比对结果,并将比对不一致的数据进行记录,生成告警提示,发送至显示终端103,以使用户操作显示终端103查找比对不一致的数据进行核验。Specifically, the data processing platform 102 can acquire the reference source data from the first database 1011, and acquire the migrated data from the second database 1012, and correspondingly, acquire the configuration file corresponding to the application scenario, the configuration file is manual The field comparison rules applicable to the application scenario configured in advance, the configuration file includes the comparison fields and comparison rules that need to be compared with data, and further, the data processing platform 102 parses the configuration file to obtain the field comparison rules , and use this field comparison rule to compare the reference source data obtained from the first database 1011 with the migrated data obtained from the second database 1012 to obtain the comparison result, and compare the inconsistent The data is recorded, an alarm prompt is generated, and sent to the display terminal 103, so that the user operates the display terminal 103 to search and compare inconsistent data for verification.

需要说明的是,显示终端103可以是数据处理平台102对应的显示设备,也可以是用户的终端设备,本申请实施例对此不做具体限定,在本申请中,数据处理平台102可以对来自多个数据库中的数据进行比对,也可以对一台数据库中的发生更改的数据进行比对,并不仅限于以上所述的两台数据库进行数据迁移场景下的数据比对,本申请实施例对数据库的个数以及应用场景不做具体限定。It should be noted that the display terminal 103 may be a display device corresponding to the data processing platform 102, or may be a user's terminal device, which is not specifically limited in this embodiment of the application. The data in multiple databases can be compared, and the changed data in one database can also be compared. It is not limited to the data comparison in the data migration scenario of the two databases described above. The embodiment of this application The number of databases and application scenarios are not specifically limited.

上述终端设备可以是无线终端也可以是有线终端。无线终端可以是指向用户提供语音和/或其他业务数据连通性的设备,具有无线连接功能的手持式设备、或连接到无线调制解调器的其他处理设备。无线终端可以经无线接入网(Radio Access Network,简称RAN)与一个或多个核心网设备进行通信,无线终端可以是移动终端,如移动电话(或称为“蜂窝”电话)和具有移动终端的计算机,例如,可以是便携式、袖珍式、手持式、计算机内置的或者车载的移动装置,它们与无线接入网交换语言和/或数据。再例如,无线终端还可以是个人通信业务(Personal Communication Service,简称PCS)电话、无绳电话、会话发起协议(Session Initiation Protocol,简称SIP)话机、无线本地环路(Wireless Local Loop,简称WLL)站、个人数字助理(Personal Digital Assistant,简称PDA)等设备。无线终端也可以称为系统、订户单元(Subscriber Unit)、订户站(Subscriber Station),移动站(MobileStation)、移动台(Mobile)、远程站(Remote Station)、远程终端(Remote Terminal)、接入终端(Access Terminal)、用户终端(User Terminal)、用户代理(User Agent)、用户设备(User Device or User Equipment),在此不作限定。可选的,上述终端设备还可以是智能手机、平板电脑等设备。The aforementioned terminal device may be a wireless terminal or a wired terminal. A wireless terminal may be a device that provides voice and/or other business data connectivity to a user, a handheld device with a wireless connection function, or other processing device connected to a wireless modem. The wireless terminal can communicate with one or more core network devices via the radio access network (Radio Access Network, referred to as RAN), and the wireless terminal can be a mobile terminal, such as a mobile phone (or called a "cellular" phone) and a The computers, which may be, for example, portable, pocket, handheld, built-in or vehicle-mounted mobile devices, exchange speech and/or data with the radio access network. For another example, the wireless terminal may also be a Personal Communication Service (PCS for short) phone, a cordless phone, a Session Initiation Protocol (SIP for short) phone, a Wireless Local Loop (WLL for short) station , Personal Digital Assistant (Personal Digital Assistant, referred to as PDA) and other equipment. Wireless terminal can also be called system, subscriber unit (Subscriber Unit), subscriber station (Subscriber Station), mobile station (MobileStation), mobile station (Mobile), remote station (Remote Station), remote terminal (Remote Terminal), access A terminal (Access Terminal), a user terminal (User Terminal), a user agent (User Agent), and a user device (User Device or User Equipment) are not limited herein. Optionally, the aforementioned terminal devices may also be smart phones, tablet computers and other devices.

因此,对结构化数据进行比对可以应用于数据迁移或联机维护交易等场景中,具体的,在对结构化数据进行数据迁移时,需要对迁移前后的数据进行比对,以保证数据迁移工作的正确性;在联机维护交易修改数据时,需要实时在前端看到数据修改前后的差异,即需要记录联机维护交易时数据修改前后的差异。Therefore, comparing structured data can be applied to scenarios such as data migration or online maintenance transactions. Specifically, when migrating structured data, it is necessary to compare data before and after migration to ensure data migration correctness; when modifying data in an online maintenance transaction, it is necessary to see the difference before and after the data modification on the front end in real time, that is, it is necessary to record the difference before and after the data modification during the online maintenance transaction.

一种可能的实现方式中,对结构化数据进行比对时,可以把结构化数据文件加载入Oracle/Mysql等关系型数据库中,通过全连接的方式进行两表关联,进而将两表中的字符串进行比较。In a possible implementation, when comparing structured data, the structured data file can be loaded into a relational database such as Oracle/Mysql, and the two tables are associated through a full connection, and then the data in the two tables strings to compare.

但是,上述数据比对方式需要将文件加载入库,且依赖单机数据库的性能,当存在批量数据进行比对时,需要花费大量的时间,比对效率差。However, the above data comparison method needs to load files into the database, and relies on the performance of a stand-alone database. When there is batch data for comparison, it takes a lot of time and the comparison efficiency is poor.

需要说明的是,当存在批量数据进行比对时,还需要多种比对规则以适应不同的比对需求,但是复杂的比对规则的实现对开发人员有一定门槛,且对于实时比对的需求,如果基于联机交易实现,则需要与交易业务逻辑绑定,会让联机交易非常复杂。It should be noted that when there is batch data for comparison, multiple comparison rules are needed to meet different comparison requirements, but the implementation of complex comparison rules has a certain threshold for developers, and for real-time comparison Requirements, if implemented based on online transactions, need to be bound with transaction business logic, which will make online transactions very complicated.

针对上述问题,本申请提供一种结构化数据比对方法,使用Spark程序开发,即利用分布式技术替代单机数据库,通过Spark程序加载并解析不同应用场景下的字段比对规则,进而基于字段比对规则对结构化数据集合进行解析,得到比对项编号和排序字段,结构化数据集合包括基准源数据和经过变动后的数据,进一步的,利用比对项编号对结构化数据集合分组,得到至少一组比对数据,进一步的,将位于同一组内的比对数据基于排序字段排序,进而进行数据比对,即将相同一条记录、相同比对项编号的比对数据放到一起排好序后进行比对,可以支持符合各种应用场景需求的字段比对规则,进而支持批量数据的灵活比对,提高比对效率。In view of the above problems, this application provides a structured data comparison method, using Spark program development, that is, using distributed technology to replace stand-alone databases, loading and analyzing field comparison rules in different application scenarios through Spark programs, and then based on field comparison Analyze the structured data set by the rules to obtain the comparison item number and sorting field. The structured data set includes the reference source data and the changed data. Further, the structured data set is grouped by the comparison item number to obtain At least one set of comparison data, further, sort the comparison data in the same group based on the sorting field, and then perform data comparison, that is, put the comparison data of the same record and the same comparison item number together and sort them After comparison, it can support field comparison rules that meet the requirements of various application scenarios, and then support flexible comparison of batch data and improve comparison efficiency.

下面以具体地实施例对本申请的技术方案进行详细说明。下面这几个具体的实施例可以相互结合,对于相同或相似的概念或过程可能在某些实施例中不再赘述。下面将结合附图,对本申请的实施例进行描述。The technical solution of the present application will be described in detail below with specific embodiments. The following specific embodiments may be combined with each other, and the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present application will be described below in conjunction with the accompanying drawings.

图2为本申请实施例提供的一种结构化数据比对方法的流程示意图,如图2所示,所述结构化数据比对方法包括如下步骤:Fig. 2 is a schematic flow chart of a structured data comparison method provided in the embodiment of the present application. As shown in Fig. 2, the structured data comparison method includes the following steps:

S201、获取结构化数据集合和所述结构化数据集合对应的应用场景需求,所述结构化数据集合包括第一结构化数据和第二结构化数据,所述第一结构化数据为基准源数据,所述第二结构化数据为经过变动后的数据。S201. Acquire a structured data set and application scenario requirements corresponding to the structured data set, the structured data set includes first structured data and second structured data, and the first structured data is reference source data , the second structured data is changed data.

本申请实施例中,第一结构化数据和第二结构化数据可以来自同一数据库或不同数据库,结构化数据可以为数据库表或数据文件,所述第一结构化数据为基准源数据,即数据库中存储的原始数据,所述第二结构化数据为经过变动后的数据,如新增、删除、更新以及迁移后的数据,本申请实施例对此不做具体限定。In this embodiment of the application, the first structured data and the second structured data may come from the same database or different databases, the structured data may be a database table or a data file, and the first structured data is the reference source data, that is, the database The original data stored in , and the second structured data is changed data, such as newly added, deleted, updated and migrated data, which is not specifically limited in this embodiment of the present application.

可选的,所述第二结构化数据可以为实时数据变动采集组件从redolog日志中实时捕获的变动数据,所述变动数据包括新增、删除、更新后记录的全字段,则所述第一结构化数据包括新增、删除、更新前记录的全字段,这样,本申请可以采集实时数据,并进行实时数据的比对。Optionally, the second structured data may be change data captured in real time by the real-time data change collection component from the redolog log, and the change data includes all fields of new, deleted, and updated records, then the first Structured data includes all fields recorded before adding, deleting, and updating, so that this application can collect real-time data and compare real-time data.

S202、基于所述应用场景需求获取字段比对规则,并基于所述字段比对规则对所述结构化数据集合进行解析,得到比对项编号和排序字段。S202. Acquire field comparison rules based on the application scenario requirements, and analyze the structured data set based on the field comparison rules to obtain comparison item numbers and sorting fields.

本申请实施例中,字段比对规则为提前定义好的配置文件,不同应用场景对应不同的配置文件,所述配置文件包括比对字段和比对规则,所述比对字段为结构化数据集合中数据对应的字段类型,如字段名称、类型字段、函数字段和编号字段等,所述比对规则可以为利用获取到函数字段对数据进行处理再依次比对、和/或利用编号字段对数据进行组合再依次比对、和/或利用某种类型字段对应的数据中依次进行比对等,本申请实施例对比对字段和比对规则对应的具体内容不做限定,其根据业务场景而定,且支持设置复杂多样的字段比对规则。In the embodiment of this application, the field comparison rules are configuration files defined in advance, and different application scenarios correspond to different configuration files. The configuration files include comparison fields and comparison rules, and the comparison fields are structured data sets The field type corresponding to the data in the data, such as field name, type field, function field and number field, etc., the comparison rule can be to use the obtained function field to process the data and then compare them sequentially, and/or use the number field to compare the data Combining and then sequentially comparing, and/or using data corresponding to a certain type of field to perform sequential comparison, etc., the embodiment of the present application does not limit the specific content corresponding to the comparison field and the comparison rule, which depends on the business scenario , and supports setting complex and diverse field comparison rules.

在本步骤中,基于所述字段比对规则对所述结构化数据集合进行解析,即解析出所述字段比对规则中对应的比对字段,利用所述比对字段获取所述结构化数据集合对应的比对项编号和排序字段,所述比对项编号用于标记比对数据需要进行比对的字段,如200010表示某条数据中第9至第10个字段需要进行比对;所述排序字段可以指的是数据中可以用于排序的字段,如操作人员编号、机构号、更新时间戳等,可以人为自定义设计,本申请实施例对此不做具体限定。In this step, the structured data set is parsed based on the field comparison rule, that is, the corresponding comparison field in the field comparison rule is parsed out, and the structured data is obtained by using the comparison field The comparison item number and sorting field corresponding to the set, the comparison item number is used to mark the fields that need to be compared in the comparison data, such as 200010 means that the 9th to 10th fields in a piece of data need to be compared; The above-mentioned sorting fields may refer to fields that can be used for sorting in data, such as operator number, organization number, update timestamp, etc., which can be customized and designed by humans, which is not specifically limited in this embodiment of the application.

S203、基于所述比对项编号将所述结构化数据集合进行分组,得到至少一组比对数据。S203. Group the structured data set based on the comparison item number to obtain at least one set of comparison data.

在本步骤中,基于所述比对项编号将所述结构化数据集合进行分组,即基于比对项编号,组装需要比对的字符串,进一步的,将同一条记录、具有相同比对项编号的比对数据归为一组,得到至少一组比对数据。In this step, the structured data set is grouped based on the comparison item number, that is, based on the comparison item number, the character strings to be compared are assembled, and further, the same record with the same comparison item The numbered comparison data are grouped into one group to obtain at least one group of comparison data.

S204、针对每一组比对数据,基于所述排序字段对所述比对数据进行排序,并依次进行比对,得到比对结果。S204. For each set of comparison data, sort the comparison data based on the sorting field, and perform comparison in sequence to obtain a comparison result.

在本步骤中,基于所述排序字段对所述比对数据进行排序,如每一组结构化数据集合对应的数据更新时间戳,基于所述数据更新时间戳对每一组内的数据按照先后顺序进行排序,进一步的,将排好序的数据依次进行比对,得到比对结果;所述比对结果包括数据一致的结果和数据不一致的结果,针对数据不一致的结果,本申请还可以对不一致的数据进行存储,以便于后续的查询。In this step, the comparison data is sorted based on the sorting field, such as the data update timestamp corresponding to each group of structured data sets, and the data in each group is sorted in order based on the data update timestamp Sorting in order, and further, comparing the sorted data in turn to obtain the comparison result; the comparison result includes the result of consistent data and the result of inconsistent data, and for the result of inconsistent data, the application can also Inconsistent data is stored for subsequent queries.

因此,本申请提供一种结构化数据比对方法,可以通过获取结构化数据集合和结构化数据集合对应的应用场景需求,并基于应用场景需求获取字段比对规则,进一步的,基于字段比对规则对结构化数据集合进行解析,得到比对项编号和排序字段;进而基于比对项编号将结构化数据集合进行分组,得到至少一组比对数据;进一步的,针对每一组比对数据,基于排序字段对比对数据进行排序,并依次进行比对,得到比对结果,其中,结构化数据集合包括基准源数据以及经过变动后的数据;这样,本申请提供的结构化数据比对方法可以支持符合各种应用场景需求的字段比对规则进行数据比对,并同时支持批量数据的比对,在保证数据比对的正确性的同时,提高比对效率。Therefore, this application provides a method for comparing structured data, which can obtain the structured data set and the application scenario requirements corresponding to the structured data set, and obtain field comparison rules based on the application scenario requirements, and further, based on the field comparison The rules analyze the structured data set to obtain the comparison item number and sorting field; then group the structured data set based on the comparison item number to obtain at least one set of comparison data; further, for each set of comparison data , sort the data based on the sorting field comparison, and compare them in turn to obtain the comparison result, wherein the structured data set includes the reference source data and the changed data; thus, the structured data comparison method provided by this application It can support field comparison rules that meet the requirements of various application scenarios for data comparison, and at the same time support batch data comparison, which can improve the comparison efficiency while ensuring the correctness of data comparison.

可选的,基于所述字段比对规则对所述结构化数据集合进行解析,得到比对项编号和排序字段,包括:Optionally, the structured data set is parsed based on the field comparison rules to obtain the comparison item number and sorting fields, including:

获取所述字段比对规则中对应的比对字段,所述比对字段包括类型字段、函数字段和编号字段;Obtaining the corresponding comparison field in the field comparison rule, the comparison field includes a type field, a function field and a number field;

基于所述比对字段获取所述结构化数据集合中相应的数据,并基于所述字段比对规则对所述数据进行预处理,得到所述数据对应的比对项编号和排序字段。Obtain corresponding data in the structured data set based on the comparison fields, and preprocess the data based on the field comparison rules to obtain the comparison item number and sorting field corresponding to the data.

本申请实施例中,比对字段可以包括类型字段、函数字段和编号字段,所述类型字段用于标记字段比对的类型,如Y、N、K、S、C、HB、D等类型,Y表示该字段用于比对,N表示该字段不用于比对,K表示该字段作为主键,C表示该字段为固定字段,可以人为自定义设计,S表示该字段对应有比对项编号,HB表示该字段作为联合字段的一部分参与比对,D表示该字段为排序字段;所述函数字段可以指的是对字段进行加工的函数表达式,比如trim、to_char、lower等,所述to_char表示将该字段的数据类型转化为字符型,所述trim表示将该字段的数据中的空格进行删减,所述trim表示将该字段的数据转换为小写的字符串,如果没有函数字段则表示无需对字段进行加工;所述编号字段用于标识每个业务字段的比对结果,可以据此编号筛选比对结果,如某个数据第9到第10个字节的比对项编号是200010。In this embodiment of the application, the comparison field may include a type field, a function field, and a number field, and the type field is used to mark the type of field comparison, such as Y, N, K, S, C, HB, D, etc. Y indicates that this field is used for comparison, N indicates that this field is not used for comparison, K indicates that this field is used as a primary key, C indicates that this field is a fixed field, which can be customized by humans, and S indicates that this field corresponds to a comparison item number. HB indicates that the field participates in the comparison as a part of the joint field, and D indicates that the field is a sorting field; the function field can refer to a function expression for processing the field, such as trim, to_char, lower, etc., and the to_char indicates Convert the data type of the field to a character type. The trim means to delete the spaces in the data in the field. The trim means to convert the data in the field to a lowercase string. If there is no function field, it means no need Process the field; the number field is used to identify the comparison result of each business field, and the comparison result can be filtered according to the number, for example, the comparison item number of the 9th to 10th bytes of a certain data is 200010.

示例性的,本申请实施例中字段比对规则中对应的比对字段可以如表1所示,基于比对字段可以构成符合不同应用场景需求的字段比对规则:Exemplarily, the corresponding comparison fields in the field comparison rules in the embodiment of the present application can be shown in Table 1, and field comparison rules that meet the requirements of different application scenarios can be formed based on the comparison fields:

表1Table 1

Figure BDA0003835310390000101
Figure BDA0003835310390000101

其中,第一列代表数据中不同字段对应的字段英文名称,以上仅是举例说明,数据中字段对应的字段英文名称也可以以其他形式进行表示,本申请实施例对此不做具体限定,第二列代表数据对应的类型字段,第三列代表数据对应的函数字段,第四列代表数据对应的编号字段,所述类型字段、所述函数字段和所述编号字段可以自定义设计,本申请实施例对此不做具体限定,以上仅是示例说明。Among them, the first column represents the English names of the fields corresponding to different fields in the data. The above is just an example, and the English names of the fields corresponding to the fields in the data can also be expressed in other forms, which are not specifically limited in this embodiment of the application. The second column represents the type field corresponding to the data, the third column represents the function field corresponding to the data, and the fourth column represents the number field corresponding to the data. The type field, the function field and the number field can be customized. This application The embodiment does not specifically limit this, and the above is only an example for illustration.

可以理解的是,若结构化数据集合中的数据对应有函数字段,则可以利用该函数字段对应的函数对字段进行函数处理,进而将函数处理后的数据用于数据比对,可选的,字段比对规则中预置了常用的字符串处理函数,同时该字段比对规则还可以支持用户自定义函数以扩展函数字段对应的函数类型,本申请实施例对此不做具体限定。It can be understood that if the data in the structured data set corresponds to a function field, the function corresponding to the function field can be used to perform function processing on the field, and then the data processed by the function can be used for data comparison. Optionally, Commonly used string processing functions are preset in the field comparison rule, and the field comparison rule can also support user-defined functions to expand the function type corresponding to the function field, which is not specifically limited in this embodiment of the present application.

因此,本申请实施例支持复杂多样的字段比对规则,可以通过获取字段比对规则中的比对字段对数据进行预处理,提高处理灵活性。Therefore, the embodiment of the present application supports complex and diverse field comparison rules, and the data can be preprocessed by acquiring the comparison fields in the field comparison rules to improve processing flexibility.

可选的,基于所述字段比对规则对所述数据进行预处理,得到所述数据对应的比对项编号和排序字段,包括:Optionally, the data is preprocessed based on the field comparison rule to obtain the comparison item number and sorting field corresponding to the data, including:

基于所述类型字段识别出所述数据中待比对字符串,并将所述待比对字符串进行拼接;所述待比对字符串包括主键字符串和业务字符串;Identifying character strings to be compared in the data based on the type field, and splicing the character strings to be compared; the character strings to be compared include primary key character strings and business character strings;

基于所述函数字段对拼接后的待比对字符串进行处理,得到比对字符串;Processing the spliced character strings to be compared based on the function field to obtain the compared character strings;

利用所述编号字段对所述比对字符串进行组装,并识别组装后比对字符串对应的比对项编号和排序字段。The comparison string is assembled by using the number field, and the comparison item number and sorting field corresponding to the assembled comparison string are identified.

本申请实施例中,主键字符串可以指的是主键对应的字符串数据,如cst_id,业务字符串指的是组成某条数据的多个业务字段对应的字符串数据,如联系地址包括国家地区、省、市、县、详细地址等多个业务字段,即A国B省C市D县E村。In the embodiment of this application, the primary key string can refer to the string data corresponding to the primary key, such as cst_id, and the business string refers to the string data corresponding to multiple business fields that make up a piece of data. For example, the contact address includes country and region , province, city, county, detailed address and other business fields, namely A country B province C city D county E village.

可选的,如果数据是由多个业务字段组成,则每个字段间用|@|分隔,如联系地址对应的数据:A国|@|B省|@|C市|@|D县|@|E村,因此,本申请支持多个业务字段合并起来作为一个字段进行比对,但是若合并字段中任意一个发生变动,则需要将合并字段整体作为输出结果用于数据比对,本申请实施例对每个业务字段组合在一起用的分隔符不做具体限定,以上仅是示例说明。Optionally, if the data is composed of multiple business fields, separate each field with |@|, such as the data corresponding to the contact address: A country|@|B province|@|C city|@|D county| @|E Village, therefore, this application supports the combination of multiple business fields as one field for comparison, but if any one of the merged fields changes, the entire merged field needs to be used as the output result for data comparison. This application The embodiment does not specifically limit the delimiter used for combining each service field, and the above is only an example.

可以理解的是,一条数据可以按字节拆分为多个业务字段进行比对,故一条数据中的某个业务字段可以表示一个业务含义,其他相邻业务字段可以表示另外的业务含义。It is understandable that a piece of data can be split into multiple business fields for comparison by byte, so a certain business field in a piece of data can represent a business meaning, and other adjacent business fields can represent another business meaning.

在本步骤中,基于所述类型字段识别出数据中待比对字符串,可以去掉不需要的比对信息,如识别出数据中主键字段并基于识别出的主键字段拼接出主键字符串;进一步的,利用所述编号字段对所述比对字符串进行组装,如对每个比对项编号,组装需要比对的字符串,进而识别组装后比对字符串对应的比对项编号和排序字段;其中,还可以识别出比对字符串对应的固定输出字段,用于进行数据比对,提高比对的准确性。In this step, identifying the character string to be compared in the data based on the type field can remove unnecessary comparison information, such as identifying the primary key field in the data and splicing the primary key string based on the identified primary key field; further , using the number field to assemble the comparison string, such as numbering each comparison item, assembling the string to be compared, and then identifying the number and sorting of the comparison item corresponding to the assembled comparison string field; among them, the fixed output field corresponding to the comparison string can also be identified, which is used for data comparison and improves the accuracy of the comparison.

因此,本申请实施例可以支持复杂比对逻辑,如数据可以拆分为多个业务字段进行比对,且支持函数对数据进行处理,还可以去掉不需要的比对信息,适用场景广泛。Therefore, the embodiment of this application can support complex comparison logic, such as data can be split into multiple business fields for comparison, and supports functions to process data, and can also remove unnecessary comparison information, which is applicable to a wide range of scenarios.

可选的,基于所述比对项编号将所述结构化数据集合进行分组,包括:Optionally, grouping the structured data set based on the comparison item number includes:

获取所述结构化数据集合中每一数据对应的主键字符串,并基于所述主键字符串和所述比对项编号对所述结构化数据集合进行分组。Obtain the primary key string corresponding to each data in the structured data set, and group the structured data set based on the primary key string and the comparison item number.

可以理解的是,本申请可以基于所述主键字符串对所述结构化数据集合进行分组。It can be understood that the present application may group the structured data set based on the primary key string.

因此,本申请实施例可以利用主键字符串和比对项编号相结合的方式对结构化数据集合进行分组,也可以只基于主键字符串或比对项编号进行对结构化数据集合进行分组,提高分组的准确性。Therefore, in the embodiment of the present application, the structured data set can be grouped by combining the primary key string and the comparison item number, or the structured data set can be grouped only based on the primary key string or the comparison item number, so as to improve grouping accuracy.

可选的,基于所述排序字段对所述比对数据进行排序,并依次进行比对,得到比对结果,包括:Optionally, sort the comparison data based on the sorting field, and perform comparison in sequence to obtain a comparison result, including:

获取每一组比对数据中的数据条数,并判断所述数据条数是否大于1;Obtain the number of data items in each group of comparison data, and determine whether the number of data items is greater than 1;

若是,则基于所述排序字段对所述比对数据进行排序,并依次进行比对,得到比对结果;If so, sorting the comparison data based on the sorting field, and performing comparisons in sequence to obtain a comparison result;

若否,则基于所述比对数据的来源得到比对结果。If not, the comparison result is obtained based on the source of the comparison data.

示例性的,在图1的应用场景下,在数据处理平台102将从第一数据库1011中获取到的基准源数据和从第二数据库1012中获取到的迁移后的数据基于比对项编号进行分组,得到至少一组比对数据后,数据处理平台102可以获取每一组比对数据中的数据条数N,并判断数据条数N是否大于1;若是,则基于排序字段对比对数据进行排序,并依次进行比对,得到比对结果;若否,则基于比对数据来自第一数据库1011还是第二数据库1012得到比对结果,其中,N为大于或等于1的正整数。Exemplarily, in the application scenario of FIG. 1 , the reference source data obtained from the first database 1011 and the migrated data obtained from the second database 1012 are compared on the data processing platform 102 based on the comparison item number. Grouping, after obtaining at least one set of comparison data, the data processing platform 102 can obtain the number N of data pieces in each set of comparison data, and judge whether the number N of data pieces is greater than 1; if so, compare the data based on the sorting field Sorting and comparing in sequence to obtain the comparison result; if not, then obtain the comparison result based on whether the comparison data comes from the first database 1011 or the second database 1012, wherein, N is a positive integer greater than or equal to 1.

可以理解的是,本申请实施例还可以设置预设要求,基于数据条数确定是否符合预设要求,来确定是否需要对所述比对数据进行排序进而比对,以此得到比对结果,本申请实施例对预设要求不做具体限定。It can be understood that, in the embodiment of the present application, preset requirements can also be set to determine whether the preset requirements are met based on the number of data pieces, to determine whether the comparison data needs to be sorted and then compared, so as to obtain the comparison result. The embodiments of the present application do not specifically limit the preset requirements.

因此,本申请实施例可以基于每一组比对数据中的数据条数是否符合预设要求,进而利用排序字段对比对数据进行排序和比对,得到比对结果,在数据条数不符合预设要求时,可以直接根据比对数据的来源确定比对结果,提高比对效率。Therefore, the embodiment of the present application can sort and compare the data based on whether the number of data items in each group of comparison data meets the preset requirements, and then use the sorting field comparison to sort and compare the data to obtain the comparison result. When setting requirements, the comparison result can be directly determined according to the source of the comparison data to improve the comparison efficiency.

可选的,基于所述比对数据的来源得到比对结果,包括:Optionally, the comparison result is obtained based on the source of the comparison data, including:

若所述比对数据来自第一结构化数据,则确定所述比对数据为被删除的数据;If the comparison data is from the first structured data, then determining that the comparison data is deleted data;

若所述比对数据来自第二结构化数据,则确定所述比对数据为新增的数据。If the comparison data is from the second structured data, it is determined that the comparison data is newly added data.

在本步骤中,若确定所述比对数据来自第一结构化数据,而在第二结构化数据中不存在,则说明所述比对数据为被删除的数据;若确定所述比对数据来自第二结构化数据,而在第一结构化数据中不存在,则说明所述比对数据为新增的数据。In this step, if it is determined that the comparison data comes from the first structured data but does not exist in the second structured data, it means that the comparison data is deleted data; if it is determined that the comparison data If it comes from the second structured data but does not exist in the first structured data, it means that the comparison data is newly added data.

因此,本申请实施例可以基于比对数据的来源直接得到比对结果,提高比对的时效性。Therefore, the embodiment of the present application can directly obtain the comparison result based on the source of the comparison data, thereby improving the timeliness of the comparison.

可选的,获取结构化数据集合和所述结构化数据集合对应的应用场景需求,包括:Optionally, acquire the structured data set and the application scenario requirements corresponding to the structured data set, including:

利用标准化接口获取不同数据库中的实时数据,并将所述实时数据进行格式转换,得到结构化数据集合;Using a standardized interface to obtain real-time data in different databases, and converting the format of the real-time data to obtain a structured data set;

针对不同数据库对应的结构化数据集合,获取每一结构化数据集合对应的应用场景需求。For the structured data sets corresponding to different databases, obtain the application scenario requirements corresponding to each structured data set.

本申请实施例中,标准化接口可以指的是设计的用于调用不同数据库中数据的统一接口,该标准化接口为提前定义好的代码框架,在调用不同数据库中数据时,只需修改代码框架中相应的参数值即可。In this embodiment of the application, the standardized interface may refer to a unified interface designed to call data in different databases. The standardized interface is a code framework defined in advance. When calling data in different databases, only need to modify the code framework corresponding parameter values.

在本步骤中,可以利用Kafka消息中间件读取数据库的实时变动数据,但是不同数据库产品解析出变动数据的结果相差很大,因此需要对变动数据进行标准化处理,即将采集到的实时数据进行格式转换,得到可以用于数据比对的结构化数据集合,通过标准化处理后的数据格式统一,便于后续处理流程的统一。In this step, the Kafka message middleware can be used to read the real-time change data of the database, but the results of parsing the change data by different database products are very different, so it is necessary to standardize the change data, and format the collected real-time data Transformation to obtain a structured data set that can be used for data comparison. The data format after standardized processing is unified, which facilitates the unification of subsequent processing procedures.

其中,进行批量数据比对时的数据是文本格式,实时数据比对过程的标准化输出也是文本格式,二者文本文件格式一致。Among them, the data during the batch data comparison is in text format, and the standardized output of the real-time data comparison process is also in text format, and the text file formats of the two are consistent.

示例性的,图3为本申请实施例提供的一种标准化接口的框架示意图;如图3所示,本申请实施例提供统一的标准化接口,用于调用每种数据库的变动数据,每个变动数据对应相应的实现类,每个实现类的调用需要基于该标准化接口实现;如Mysql、Oracle、DB2、OceanBase等数据库的变动数据对应的实现类包括:databaseType(数据类型)、processinsertRec(增加)、processDeleteRec(删除)、processUpdateNotKeyRec(更新非主键普通字段)、processUpdateKeyRec(更新主键字段),基于标准化接口实现,因此,对于各种类型的数据库,使用方只需要实现该标准化接口即可。Exemplarily, FIG. 3 is a schematic diagram of a framework of a standardized interface provided by the embodiment of the present application; as shown in FIG. The data corresponds to the corresponding implementation class, and the call of each implementation class needs to be realized based on the standardized interface; the implementation classes corresponding to the changed data of databases such as Mysql, Oracle, DB2, OceanBase include: databaseType (data type), processinsertRec (increase), processDeleteRec (delete), processUpdateNotKeyRec (update non-primary key common fields), processUpdateKeyRec (update primary key field), are implemented based on standardized interfaces. Therefore, for various types of databases, users only need to implement this standardized interface.

需要说明的是,调用数据库中的不同的实现类对应不同的比对结果,以Oracle数据库为例,每个dml操作的比对结果包括事务开始和结束标记,新增/删除只有一条结果语句;更新非主键普通字段对应的比对结果是更新前一条记录或更新后一条记录;更新主键字段对应的比对结果是V、D、I三条语句,而对于OceanBase数据库的更新主键字段,对应的比对结果是一条delete操作、一条insert操作,其他的dml操作,对应的比对结果是一条包含变更前后信息的记录,不同数据库中不同的实现类对应不同的比对结果,本申请实施例在此不一一赘述。It should be noted that different implementation classes in the calling database correspond to different comparison results. Taking the Oracle database as an example, the comparison results of each dml operation include transaction start and end marks, and there is only one result statement for adding/deleting; The comparison result corresponding to updating a non-primary key common field is to update the previous record or the updated record; the comparison result corresponding to updating the primary key field is the three statements V, D, and I. For the update primary key field of the OceanBase database, the corresponding comparison The result of the comparison is a delete operation, an insert operation, and other dml operations. The corresponding comparison result is a record containing information before and after the change. Different implementation classes in different databases correspond to different comparison results. The embodiment of this application is here I won't go into details one by one.

可以理解的是,实时比对通常需要联机维护交易或者后端专门的比对程序对每次变动数据进行定制化比对,开发和维护成本很高、且重复工作量较大,但是,本申请实施例可以基于标准化接口实现实时数据比对,还支持从不同数据库产品中解析出数据,其中,实时比对逻辑和批量比对逻辑一致,共用核心代码,加工口径一致,可以减少后期运维成本。It is understandable that real-time comparison usually requires online maintenance transactions or a special back-end comparison program to perform customized comparisons for each changed data, which leads to high development and maintenance costs and a large amount of repetitive work. However, this application The embodiment can realize real-time data comparison based on a standardized interface, and also supports parsing data from different database products. Among them, the real-time comparison logic and batch comparison logic are consistent, the core code is shared, and the processing caliber is consistent, which can reduce later operation and maintenance costs. .

可选的,所述方法还包括:Optionally, the method also includes:

在得到所述比对结果后,判断所述比对结果是否一致;After obtaining the comparison result, it is judged whether the comparison result is consistent;

若是,则间隔预设时间,对所述结构化数据集合再次进行比对,得到比对结果后覆盖上一次的比对结果,以验证比对的准确性;If so, then compare the structured data set again at an interval of a preset time, and cover the last comparison result after obtaining the comparison result to verify the accuracy of the comparison;

若否,则生成告警提示,以提醒用户对所述比对结果进行核验。If not, an alarm prompt is generated to remind the user to verify the comparison result.

本申请实施例中,预设时间可以指的是设定的用于进行再次比对对应的时间,本申请实施例对预设时间对应的具体数值不做限定,其可以为一天,且本申请实施例对生成告警提示的内容以及发送形式也不做具体限定,其可以基于不一致的数据生成告警提示,如告警提示为“表A中数据1和表B中数据2不一致”,其发送形式可以为短信以及显示框等。In the embodiment of the present application, the preset time may refer to the corresponding time set for re-comparison. The embodiment of the present application does not limit the specific value corresponding to the preset time, which may be one day, and the present application The embodiment does not specifically limit the content and sending form of the generated warning prompt. It can generate a warning prompt based on inconsistent data. For text messages and display boxes, etc.

需要说明的是,虽然实时数据比对弥补了批量处理时效性差的缺点,但同时也存在实时数据可能丢失的情况,因此,本申请支持间隔预设时间后,将数据比对再执行一遍,将得到比对结果后覆盖上一次的比对结果,即通过数据库幂等性的方法实现数据读取应用对重复比对无感,进而保证数据时效性的同时也保障数据完整性。It should be noted that although real-time data comparison makes up for the shortcoming of poor timeliness in batch processing, there is also the possibility that real-time data may be lost. Therefore, this application supports performing data comparison again after a preset time interval. After the comparison result is obtained, the last comparison result is overwritten, that is, the data reading application is insensitive to repeated comparisons through the method of database idempotence, thereby ensuring data timeliness and data integrity at the same time.

因此,本申请实施例通过再次执行的方式减少实时数据可能丢失的情况,在提高数据时效性的同时也提高了数据完整性,还可以生成告警提示,提示用户比对结果的异常情况,以便于后续调用数据进行核验,提高便利性。Therefore, the embodiment of the present application reduces the possible loss of real-time data through re-execution, improves data timeliness and improves data integrity, and can also generate an alarm prompt to remind the user of the abnormality of the comparison result, so as to facilitate Subsequent call data for verification to improve convenience.

结合上述实施例,本申请提供的结构化数据比对方法支持实时和批量两种数据比对方式,实时比对基于flink datastream技术实现,批量比对基于flink dataset技术实现,实时比对逻辑和批量比对逻辑一致,比对口径一致,二者共用核心代码,具体的,图4为本申请实施例提供的一种批量数据比对方法的流程示意图;如图4所示,所述批量数据比对方法包括如下步骤:In combination with the above-mentioned embodiments, the structured data comparison method provided by this application supports real-time and batch data comparison methods. Real-time comparison is based on flink datastream technology, batch comparison is based on flink dataset technology, real-time comparison logic and batch The comparison logic is consistent, the comparison caliber is consistent, and the two share the core code. Specifically, Fig. 4 is a schematic flow diagram of a batch data comparison method provided by the embodiment of the present application; as shown in Fig. 4, the batch data comparison The method includes the following steps:

步骤1:Spark程序启动,加载并解析配置文件(应用场景需求),并从数据文件中读取文件A和文件B中的数据、和\或从数据库表中读取表a和表b中的数据,并按照配置文件解析数据,执行步骤2。Step 1: The Spark program starts, loads and parses the configuration file (application scenario requirements), and reads the data in file A and file B from the data file, and/or reads the data in table a and table b from the database table data, and parse the data according to the configuration file, go to step 2.

需要说明的是,文件A和表a中的数据为基准源数据,文件B和表b中的数据为变动后的数据,本申请实施例可以对文件A和文件B中的数据单独进行比对处理,也可以对表a和表b中的数据单独进行比对处理,也可以分别对文件A和文件B中的数据进行比对处理以及对表a和表b中的数据进行比对处理,本申请实施例对此不做具体限定。It should be noted that the data in file A and table a are reference source data, and the data in file B and table b are changed data. In this embodiment of the present application, the data in file A and file B can be compared separately The data in table a and table b can also be compared separately, or the data in file A and file B can be compared and the data in table a and table b can be compared separately. This embodiment of the present application does not specifically limit it.

步骤2:合并文件A和文件B中两部分数据,和/或,合并表a和表b中两部分数据,根据主键、比对项编号对输入数据(结构化数据集合)进行分组,得到至少一组比对数据,对于同一分组内的比对数据,可以按照“数据更新时间戳”排序,其中,同一分组内的数据条数记为N,进一步的,根据数据条数N的数量确定比对结果,N为大于或等于1的正整数,进而执行步骤3。Step 2: Merge the two parts of data in file A and file B, and/or, merge the two parts of data in table a and table b, and group the input data (structured data set) according to the primary key and comparison item number to obtain at least For a set of comparison data, the comparison data in the same group can be sorted according to the "data update timestamp", where the number of data items in the same group is recorded as N, and further, the comparison is determined according to the number of data items N. For the result, N is a positive integer greater than or equal to 1, and then go to step 3.

步骤3:若数据条数为1,且确定数据来自文件A和/或表a,文件B和/或表b中没有该数据,则说明该数据对应的记录是被删除的,若确定数据来自文件B和/或表b,文件A和/或表a中没有该数据,则说明该数据对应的记录是新增的,若数据条数大于1,则可以对排序后的结果依次进行比对,得到比对结果,其中,比对不一致项对应的数据可以记录到比对结果中,供用户查看。Step 3: If the number of data items is 1, and it is determined that the data comes from file A and/or table a, and there is no such data in file B and/or table b, it means that the record corresponding to the data is deleted. If it is determined that the data comes from If there is no such data in file B and/or table b, file A and/or table a, it means that the record corresponding to the data is newly added. If the number of data items is greater than 1, the sorted results can be compared in turn , to obtain the comparison result, wherein the data corresponding to the inconsistent item in the comparison can be recorded in the comparison result for the user to view.

图5为本申请实施例提供的一种实时数据比对方法的流程示意图;如图5所示,所述实时数据比对方法包括如下步骤:Fig. 5 is a schematic flow chart of a real-time data comparison method provided in the embodiment of the present application; as shown in Fig. 5, the real-time data comparison method includes the following steps:

步骤一:Spark Streaming程序启动,加载并解析配置文件(应用场景需求),并利用Kafka消息中间件调用统一的标准化接口实时采集Oracle数据库中的变动数据与原始数据,为了适配各种数据库产品,将变更记录标准化为统一格式的json字符串,相应的,还需要统一数据更新类型的取值,以及对更新主键的记录进行适配,即将变动数据与原始数据转换为统一格式,其中,Oracle数据库中的数据被修改后称为变动数据,进而执行后续的步骤。Step 1: Start the Spark Streaming program, load and parse the configuration file (application scenario requirements), and use the Kafka message middleware to call a unified standardized interface to collect the changed data and original data in the Oracle database in real time. In order to adapt to various database products, Standardize the change record into a json string in a unified format. Correspondingly, you also need to unify the value of the data update type, and adapt the record for updating the primary key, that is, convert the changed data and the original data into a unified format. Among them, the Oracle database After the data in is modified, it is called changed data, and then the subsequent steps are executed.

需要说明的是,后续的执行步骤与图4所述实施例中的步骤2和步骤3类似,详情可参照上述图4所述实施例的描述,在此不再描述。It should be noted that the subsequent execution steps are similar to step 2 and step 3 in the embodiment shown in FIG. 4 . For details, refer to the description of the embodiment shown in FIG. 4 above, and will not be described here again.

本申请对结构化数据比对支持实时和批量两种方式,均以配置文件的形式支持复杂多样的字段比对规则,学习和使用门槛较低,使得使用方无需重复开发,而且本申请还预置常用的字符串处理函数,支持自定义函数以扩展函数处理功能,以及流批结合的方式进行数据比对,减少后期运维成本的同时,提升时效性。This application supports both real-time and batch methods for structured data comparison, both of which support complex and diverse field comparison rules in the form of configuration files. The threshold for learning and using is low, so that users do not need to develop repeatedly, and this application also foresees Commonly used string processing functions are set, custom functions are supported to extend function processing functions, and data comparison is performed by combining streams and batches, which reduces post-operation and maintenance costs and improves timeliness.

在前述实施例中,对本申请实施例提供的结构化数据比对方法进行了介绍,而为了实现上述本申请实施例提供的方法中的各功能,作为执行主体的电子设备可以包括硬件结构和/或软件模块,以硬件结构、软件模块、或硬件结构加软件模块的形式来实现上述各功能。上述各功能中的某个功能以硬件结构、软件模块、还是硬件结构加软件模块的方式来执行,取决于技术方案的特定应用和设计约束条件。In the preceding embodiments, the structured data comparison method provided by the embodiments of the present application is introduced, and in order to realize the functions in the method provided by the above embodiments of the present application, the electronic device as the execution subject may include a hardware structure and/or Or a software module, the above-mentioned functions are realized in the form of a hardware structure, a software module, or a hardware structure plus a software module. Whether one of the above-mentioned functions is executed in the form of a hardware structure, a software module, or a hardware structure plus a software module depends on the specific application and design constraints of the technical solution.

例如,图6为本申请实施例提供的一种结构化数据比对装置的结构示意图,如图6所示,该装置包括:获取模块610、解析模块620、分组模块630和比对模块640;其中,所述获取模块610,用于获取结构化数据集合和所述结构化数据集合对应的应用场景需求,所述结构化数据集合包括第一结构化数据和第二结构化数据,所述第一结构化数据为基准源数据,所述第二结构化数据为经过变动后的数据;For example, FIG. 6 is a schematic structural diagram of a structured data comparison device provided in the embodiment of the present application. As shown in FIG. 6 , the device includes: an acquisition module 610, an analysis module 620, a grouping module 630, and a comparison module 640; Wherein, the acquiring module 610 is configured to acquire a structured data set and application scenario requirements corresponding to the structured data set, the structured data set includes first structured data and second structured data, and the first structured data set The first structured data is the reference source data, and the second structured data is the changed data;

所述解析模块620,用于基于所述应用场景需求获取字段比对规则,并基于所述字段比对规则对所述结构化数据集合进行解析,得到比对项编号和排序字段;The parsing module 620 is configured to obtain field comparison rules based on the application scenario requirements, and analyze the structured data set based on the field comparison rules to obtain comparison item numbers and sorting fields;

所述分组模块630,用于基于所述比对项编号将所述结构化数据集合进行分组,得到至少一组比对数据;The grouping module 630 is configured to group the structured data set based on the comparison item number to obtain at least one set of comparison data;

所述比对模块640,用于针对每一组比对数据,基于所述排序字段对所述比对数据进行排序,并依次进行比对,得到比对结果。The comparison module 640 is configured to, for each group of comparison data, sort the comparison data based on the sorting field, and perform comparison in sequence to obtain a comparison result.

可选的,所述解析模块620,包括获取单元和预处理单元;Optionally, the parsing module 620 includes an acquisition unit and a preprocessing unit;

具体的,所述获取单元,用于获取所述字段比对规则中对应的比对字段,所述比对字段包括类型字段、函数字段和编号字段;Specifically, the acquisition unit is configured to acquire the corresponding comparison field in the field comparison rule, and the comparison field includes a type field, a function field and a number field;

所述预处理单元,用于基于所述比对字段获取所述结构化数据集合中相应的数据,并基于所述字段比对规则对所述数据进行预处理,得到所述数据对应的比对项编号和排序字段。The preprocessing unit is configured to obtain corresponding data in the structured data set based on the comparison field, and preprocess the data based on the field comparison rules to obtain a comparison corresponding to the data Item number and sort fields.

可选的,所述预处理单元,具体用于:Optionally, the preprocessing unit is specifically used for:

基于所述类型字段识别出所述数据中待比对字符串,并将所述待比对字符串进行拼接;所述待比对字符串包括主键字符串和业务字符串;Identifying character strings to be compared in the data based on the type field, and splicing the character strings to be compared; the character strings to be compared include primary key character strings and business character strings;

基于所述函数字段对拼接后的待比对字符串进行处理,得到比对字符串;Processing the spliced character strings to be compared based on the function field to obtain the compared character strings;

利用所述编号字段对所述比对字符串进行组装,并识别组装后比对字符串对应的比对项编号和排序字段。The comparison string is assembled by using the number field, and the comparison item number and sorting field corresponding to the assembled comparison string are identified.

可选的,所述分组模块630,具体用于:Optionally, the grouping module 630 is specifically used for:

获取所述结构化数据集合中每一数据对应的主键字符串,并基于所述主键字符串和所述比对项编号对所述结构化数据集合进行分组。Obtain the primary key string corresponding to each data in the structured data set, and group the structured data set based on the primary key string and the comparison item number.

可选的,所述比对模块640包括判断单元、比对单元和确定单元;Optionally, the comparison module 640 includes a judgment unit, a comparison unit and a determination unit;

具体的,所述判断单元,用于获取每一组比对数据中的数据条数,并判断所述数据条数是否大于1;Specifically, the judging unit is used to obtain the number of data pieces in each group of comparison data, and judge whether the number of data pieces is greater than 1;

所述比对单元,用于当所述数据条数大于1时,基于所述排序字段对所述比对数据进行排序,并依次进行比对,得到比对结果;The comparison unit is configured to sort the comparison data based on the sorting field when the number of the data pieces is greater than 1, and perform comparisons in sequence to obtain a comparison result;

所述确定单元,用于当所述数据条数等于1时,则基于所述比对数据的来源得到比对结果。The determining unit is configured to obtain a comparison result based on a source of the comparison data when the number of data pieces is equal to 1.

可选的,所述确定单元,具体用于:Optionally, the determining unit is specifically used for:

若所述比对数据来自第一结构化数据,则确定所述比对数据为被删除的数据;If the comparison data is from the first structured data, then determining that the comparison data is deleted data;

若所述比对数据来自第二结构化数据,则确定所述比对数据为新增的数据。If the comparison data is from the second structured data, it is determined that the comparison data is newly added data.

可选的,所述获取模块610,具体用于:Optionally, the obtaining module 610 is specifically used for:

利用标准化接口获取不同数据库中的实时数据,并将所述实时数据进行格式转换,得到结构化数据集合;Using a standardized interface to obtain real-time data in different databases, and converting the format of the real-time data to obtain a structured data set;

针对不同数据库对应的结构化数据集合,获取每一结构化数据集合对应的应用场景需求。For the structured data sets corresponding to different databases, obtain the application scenario requirements corresponding to each structured data set.

可选的,所述装置还包括判断模块,所述判断模块,用于:Optionally, the device further includes a judging module, and the judging module is configured to:

在得到所述比对结果后,判断所述比对结果是否一致;After obtaining the comparison result, it is judged whether the comparison result is consistent;

若是,则间隔预设时间,对所述结构化数据集合再次进行比对,得到比对结果后覆盖上一次的比对结果,以验证比对的准确性;If so, then compare the structured data set again at an interval of a preset time, and cover the last comparison result after obtaining the comparison result to verify the accuracy of the comparison;

若否,则生成告警提示,以提醒用户对所述比对结果进行核验。If not, an alarm prompt is generated to remind the user to verify the comparison result.

本申请实施例提供的一种结构化数据比对装置的具体实现原理和效果可以参见上述实施例对应的相关描述和效果,此处不做过多赘述。For the specific implementation principles and effects of a structured data comparison device provided in the embodiments of the present application, please refer to the relevant descriptions and effects corresponding to the above embodiments, and details will not be repeated here.

本申请实施例还提供了一种电子设备的结构示意图,图7为本申请实施例提供的一种电子设备的结构示意图,如图7所示,该电子设备可以包括:处理器701以及与所述处理器通信连接的存储器702;该存储器702存储计算机程序;该处理器701执行该存储器702存储的计算机程序,使得该处理器701执行上述任一实施例所述的方法。The embodiment of the present application also provides a schematic structural diagram of an electronic device. FIG. 7 is a schematic structural diagram of an electronic device provided in the embodiment of the present application. As shown in FIG. 7, the electronic device may include: a processor 701 and a The memory 702 is communicatively connected to the processor; the memory 702 stores a computer program; the processor 701 executes the computer program stored in the memory 702, so that the processor 701 executes the method described in any one of the above embodiments.

其中,存储器702和处理器701可以通过总线703连接。Wherein, the memory 702 and the processor 701 may be connected through a bus 703 .

本申请实施例还提供了一种计算机可读存储介质,计算机可读存储介质存储有计算机程序执行指令,计算机执行指令被处理器执行时用于实现如本申请前述任一实施例中的所述的方法。The embodiment of the present application also provides a computer-readable storage medium. The computer-readable storage medium stores computer program execution instructions. When the computer execution instructions are executed by a processor, they are used to implement the above-mentioned information in any of the foregoing embodiments of the application. Methods.

本申请实施例还提供了一种运行指令的芯片,该芯片用于执行如本申请前述任一实施例中由电子设备所执行的前述任一实施例中所述的方法。The embodiment of the present application also provides a chip for running instructions, and the chip is used to execute the method described in any of the foregoing embodiments executed by an electronic device in any of the foregoing embodiments of the present application.

本申请实施例还提供了一种计算机程序产品,该程序产品包括程序代码,当计算机运行所述计算机程序时,所述程序代码执行如本申请前述任一实施例中由电子设备所执行的前述任一实施例中所述的方法。The embodiment of the present application also provides a computer program product, the program product includes program code, and when the computer runs the computer program, the program code executes the aforementioned steps performed by the electronic device in any of the aforementioned embodiments of the present application. The method described in any of the Examples.

本申请的技术方案中,所涉及的金融数据或用户数据等信息的收集、存储、使用、加工、传输、提供和公开等处理,均符合相关法律法规的规定,且不违背公序良俗。In the technical solution of this application, the collection, storage, use, processing, transmission, provision, and disclosure of financial data or user data and other information involved are in compliance with relevant laws and regulations, and do not violate public order and good customs.

在本申请所提供的几个实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,模块的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个模块或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或模块的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed devices and methods may be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of modules is only a logical function division. In actual implementation, there may be other division methods. For example, multiple modules or components can be combined or integrated. to another system, or some features may be ignored, or not implemented. In another point, the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or modules may be in electrical, mechanical or other forms.

作为分离部件说明的模块可以是或者也可以不是物理上分开的,作为模块显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案。A module described as a separate component may or may not be physically separated, and a component shown as a module may or may not be a physical unit, that is, it may be located in one place, or may also be distributed to multiple network units. Part or all of the modules can be selected according to actual needs to implement the solution of this embodiment.

另外,在本申请各个实施例中的各功能模块可以集成在一个处理单元中,也可以是各个模块单独物理存在,也可以两个或两个以上模块集成在一个单元中。上述模块成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能单元的形式实现。In addition, each functional module in each embodiment of the present application may be integrated into one processing unit, each module may exist separately physically, or two or more modules may be integrated into one unit. The units formed by the above modules can be implemented in the form of hardware, or in the form of hardware plus software functional units.

上述以软件功能模块的形式实现的集成的模块,可以存储在一个计算机可读取存储介质中。上述软件功能模块存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或处理器执行本申请各个实施例所述方法的部分步骤。The above-mentioned integrated modules implemented in the form of software function modules can be stored in a computer-readable storage medium. The above-mentioned software function modules are stored in a storage medium, and include several instructions to make a computer device (which may be a personal computer, server, or network device, etc.) or a processor execute some steps of the methods described in various embodiments of the present application.

应理解,上述处理器可以是中央处理单元(Central Processing Unit,简称CPU),还可以是其它通用处理器、数字信号处理器(Digital Signal Processor,简称DSP)、专用集成电路(Application Specific Integrated Circuit,简称ASIC)等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合申请所公开的方法的步骤可以直接体现为硬件处理器执行完成,或者用处理器中的硬件及软件模块组合执行完成。It should be understood that the above-mentioned processor may be a central processing unit (Central Processing Unit, referred to as CPU), and may also be other general-purpose processors, a digital signal processor (Digital Signal Processor, referred to as DSP), an application specific integrated circuit (Application Specific Integrated Circuit, referred to as ASIC) and so on. A general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like. The steps of the method disclosed in conjunction with the application can be directly implemented by a hardware processor, or implemented by a combination of hardware and software modules in the processor.

存储器可能包含高速随机存取存储器(Random Access memory,简称RAM),也可能还包括非不稳定的存储器(Non-volatile Memory,简称NVM),例如至少一个磁盘存储器,还可以为U盘、移动硬盘、只读存储器、磁盘或光盘等。The memory may include a high-speed random access memory (Random Access memory, referred to as RAM), and may also include a non-volatile memory (Non-volatile Memory, referred to as NVM), such as at least one disk memory, and may also be a U disk, a mobile hard disk , read-only memory, disk or CD-ROM, etc.

总线可以是工业标准体系结构(Industry Standard Architecture,简称ISA)总线、外部设备互连(Peripheral Component Interconnect,简称PCI)总线或扩展工业标准体系结构(Extended Industry Standard Architecture,简称EISA)总线等。总线可以分为地址总线、数据总线、控制总线等。为便于表示,本申请附图中的总线并不限定仅有一根总线或一种类型的总线。The bus may be an Industry Standard Architecture (Industry Standard Architecture, ISA for short) bus, a Peripheral Component Interconnect (PCI for short) bus, or an Extended Industry Standard Architecture (EISA for short) bus. The bus can be divided into address bus, data bus, control bus and so on. For ease of representation, the buses in the drawings of the present application are not limited to only one bus or one type of bus.

上述存储介质可以是由任何类型的易失性或非易失性存储设备或者它们的组合实现,如静态随机存取存储器(Static Random-Access Memory,简称SRAM),电可擦除可编程只读存储器(Electrically Erasable Programmable Read Only Memory,简称EEPROM),可擦除可编程只读存储器(Erasable Programmable Read-Only Memory,简称EPROM),可编程只读存储器(Programmable Read-Only Memory,简称PROM),只读存储器(Read-OnlyMemory,简称ROM),磁存储器,快闪存储器,磁盘或光盘。存储介质可以是通用或专用计算机能够存取的任何可用介质。The above-mentioned storage medium can be realized by any type of volatile or non-volatile storage device or their combination, such as Static Random-Access Memory (Static Random-Access Memory, referred to as SRAM), electrically erasable programmable read-only Electrically Erasable Programmable Read Only Memory (EEPROM for short), Erasable Programmable Read-Only Memory (EPROM for short), Programmable Read-Only Memory (PROM for short), only Read-Only Memory (ROM for short), magnetic memory, flash memory, magnetic disk or optical disk. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.

一种示例性的存储介质耦合至处理器,从而使处理器能够从该存储介质读取信息,且可向该存储介质写入信息。当然,存储介质也可以是处理器的组成部分。处理器和存储介质可以位于专用集成电路(Application Specific Integrated Circuits,简称ASIC)中。当然,处理器和存储介质也可以作为分立组件存在于电子设备或主控设备中。An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. Of course, the storage medium may also be a component of the processor. The processor and the storage medium may be located in application specific integrated circuits (Application Specific Integrated Circuits, ASIC for short). Of course, the processor and the storage medium can also exist in the electronic device or the main control device as discrete components.

以上所述,仅为本申请实施例的具体实施方式,但本申请实施例的保护范围并不局限于此,任何在本申请实施例揭露的技术范围内的变化或替换,都应涵盖在本申请实施例的保护范围之内。因此,本申请实施例的保护范围应以所述权利要求的保护范围为准。The above is only the specific implementation of the embodiment of the application, but the protection scope of the embodiment of the application is not limited thereto, and any changes or replacements within the technical scope disclosed in the embodiment of the application shall be covered by this application. Within the scope of protection of the application examples. Therefore, the protection scope of the embodiments of the present application should be based on the protection scope of the claims.

Claims (12)

1. A method for comparing structured data, the method comprising:
acquiring a structured data set and application scene requirements corresponding to the structured data set, wherein the structured data set comprises first structured data and second structured data, the first structured data is reference source data, and the second structured data is changed data;
acquiring a field comparison rule based on the application scene requirement, and analyzing the structured data set based on the field comparison rule to obtain a comparison item number and a sequencing field;
grouping the structured data set based on the comparison item numbers to obtain at least one group of comparison data;
and aiming at each group of comparison data, sequencing the comparison data based on the sequencing field, and sequentially comparing to obtain a comparison result.
2. The method of claim 1, wherein parsing the structured data set based on the field alignment rule to obtain an alignment number and a ranking field comprises:
acquiring a corresponding comparison field in the field comparison rule, wherein the comparison field comprises a type field, a function field and a number field;
and acquiring corresponding data in the structured data set based on the comparison field, and preprocessing the data based on the field comparison rule to obtain a comparison item number and a sequencing field corresponding to the data.
3. The method of claim 2, wherein preprocessing the data based on the field alignment rule to obtain an alignment number and a sorting field corresponding to the data comprises:
identifying character strings to be compared in the data based on the type fields, and splicing the character strings to be compared; the character strings to be compared comprise a main key character string and a service character string;
processing the spliced character strings to be compared based on the function fields to obtain comparison character strings;
and assembling the comparison character strings by using the number fields, and identifying comparison item numbers and sequencing fields corresponding to the assembled comparison character strings.
4. The method of claim 1, wherein grouping the structured data set based on the alignment numbers comprises:
and acquiring a primary key character string corresponding to each data in the structured data set, and grouping the structured data set based on the primary key character string and the comparison item number.
5. The method of claim 1, wherein sorting the comparison data based on the sorting field and sequentially comparing the sorted comparison data to obtain a comparison result comprises:
acquiring the number of data in each group of comparison data, and judging whether the number of the data is more than 1;
if yes, sorting the comparison data based on the sorting field, and sequentially comparing to obtain a comparison result;
if not, obtaining a comparison result based on the source of the comparison data.
6. The method of claim 5, wherein obtaining alignment results based on the source of the alignment data comprises:
if the comparison data is from the first structured data, determining the comparison data as deleted data;
and if the comparison data is from the second structured data, determining the comparison data as the newly added data.
7. The method of claim 1, wherein obtaining a structured data set and application scenario requirements corresponding to the structured data set comprises:
acquiring real-time data in different databases by using a standardized interface, and performing format conversion on the real-time data to obtain a structured data set;
and aiming at the structured data sets corresponding to different databases, acquiring the application scene requirements corresponding to each structured data set.
8. The method according to any one of claims 1-7, further comprising:
after the comparison result is obtained, judging whether the comparison result is consistent;
if yes, comparing the structured data set again at preset time intervals, and covering the last comparison result after obtaining the comparison result so as to verify the accuracy of the comparison;
and if not, generating an alarm prompt to remind the user to check the comparison result.
9. A structured data alignment apparatus, comprising:
the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a structured data set and application scene requirements corresponding to the structured data set, the structured data set comprises first structured data and second structured data, the first structured data is reference source data, and the second structured data is changed data;
the analysis module is used for acquiring a field comparison rule based on the application scene requirement, and analyzing the structured data set based on the field comparison rule to obtain a comparison item number and a sequencing field;
the grouping module is used for grouping the structured data set based on the comparison item numbers to obtain at least one group of comparison data;
and the comparison module is used for sequencing the comparison data based on the sequencing fields aiming at each group of comparison data and sequentially comparing the sequencing data to obtain a comparison result.
10. An electronic device, comprising: a processor, and a memory communicatively coupled to the processor;
the memory stores computer-executable instructions;
the processor executes computer-executable instructions stored by the memory to implement the method of any of claims 1-8.
11. A computer-readable storage medium having computer-executable instructions stored thereon, which when executed by a processor, perform the method of any one of claims 1-8.
12. A computer program product comprising a program code for performing the method of any one of claims 1-8 when the computer program is run by a computer.
CN202211085688.9A 2022-09-06 2022-09-06 Structured data comparison method, device, electronic device and storage medium Active CN115357625B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211085688.9A CN115357625B (en) 2022-09-06 2022-09-06 Structured data comparison method, device, electronic device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211085688.9A CN115357625B (en) 2022-09-06 2022-09-06 Structured data comparison method, device, electronic device and storage medium

Publications (2)

Publication Number Publication Date
CN115357625A true CN115357625A (en) 2022-11-18
CN115357625B CN115357625B (en) 2024-12-20

Family

ID=84005920

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211085688.9A Active CN115357625B (en) 2022-09-06 2022-09-06 Structured data comparison method, device, electronic device and storage medium

Country Status (1)

Country Link
CN (1) CN115357625B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116628107A (en) * 2023-05-30 2023-08-22 曙光云计算集团有限公司 Data comparison method, device, equipment and medium
CN118689912A (en) * 2024-08-27 2024-09-24 杭州乒乓智能技术有限公司 Data comparison method, device, business data comparison system and electronic device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170185712A1 (en) * 2014-02-06 2017-06-29 Genalice B.V. A method of storing/reconstructing a multitude of sequences in/from a data storage structure
CN111782657A (en) * 2020-07-08 2020-10-16 上海乾臻信息科技有限公司 Data processing method and device
CN112199935A (en) * 2020-09-24 2021-01-08 建信金融科技有限责任公司 Data comparison method and device, electronic equipment and computer readable storage medium
CN113642311A (en) * 2021-08-12 2021-11-12 北京奇艺世纪科技有限公司 Data comparison method and device, electronic equipment and storage medium
CN114461611A (en) * 2022-01-26 2022-05-10 连连(杭州)信息技术有限公司 A data comparison method, device, electronic device and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170185712A1 (en) * 2014-02-06 2017-06-29 Genalice B.V. A method of storing/reconstructing a multitude of sequences in/from a data storage structure
CN111782657A (en) * 2020-07-08 2020-10-16 上海乾臻信息科技有限公司 Data processing method and device
CN112199935A (en) * 2020-09-24 2021-01-08 建信金融科技有限责任公司 Data comparison method and device, electronic equipment and computer readable storage medium
CN113642311A (en) * 2021-08-12 2021-11-12 北京奇艺世纪科技有限公司 Data comparison method and device, electronic equipment and storage medium
CN114461611A (en) * 2022-01-26 2022-05-10 连连(杭州)信息技术有限公司 A data comparison method, device, electronic device and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116628107A (en) * 2023-05-30 2023-08-22 曙光云计算集团有限公司 Data comparison method, device, equipment and medium
CN118689912A (en) * 2024-08-27 2024-09-24 杭州乒乓智能技术有限公司 Data comparison method, device, business data comparison system and electronic device

Also Published As

Publication number Publication date
CN115357625B (en) 2024-12-20

Similar Documents

Publication Publication Date Title
CN109522746B (en) A data processing method, electronic device and computer storage medium
CN112052242A (en) Data query method and device, electronic equipment and storage medium
CN114741070B (en) Code generation method, device, electronic equipment and storage medium
CN108334609B (en) Method, device, equipment and storage medium for realizing JSON format data access in Oracle
US9229971B2 (en) Matching data based on numeric difference
CN113268500B (en) Service processing method and device and electronic equipment
CN115357625A (en) Structured data comparison method and device, electronic equipment and storage medium
CN113204558B (en) Automatic data table structure updating method and device
CN115033816A (en) Rule engine-based business processing method, apparatus, computer equipment and medium
CN108460149B (en) Text data processing method, device and equipment and computer readable storage medium
CN110782169A (en) Updating business process method and apparatus
CN115237805A (en) Test case data preparation method and device
CN114281845A (en) Index generation method, apparatus, electronic device and readable storage medium
CN110781182B (en) Automatic encoding method and device for check logic and computer equipment
US11934396B2 (en) Data reconciliation for big data environments
CN117827902A (en) Service data processing method, device, computer equipment and storage medium
CN106682107B (en) Method and device for determining incidence relation of database table
CN116701355A (en) Data view processing method, device, computer equipment and readable storage medium
CN115357628A (en) Data report generation method and device, computer equipment and storage medium
CN117010358A (en) Message card generation method, device, computer equipment and storage medium
CN109785099B (en) Method and system for automatically processing service data information
CN113792055A (en) Data processing method, electronic device and storage medium
US9251253B2 (en) Expeditious citation indexing
CN112131016A (en) Application program internal data processing method, device and equipment
CN112416924A (en) Data synchronous query method and device, computer device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant