CN116701220A - Data synchronization test method and device, electronic equipment and computer readable medium - Google Patents

Data synchronization test method and device, electronic equipment and computer readable medium Download PDF

Info

Publication number
CN116701220A
CN116701220A CN202310725879.5A CN202310725879A CN116701220A CN 116701220 A CN116701220 A CN 116701220A CN 202310725879 A CN202310725879 A CN 202310725879A CN 116701220 A CN116701220 A CN 116701220A
Authority
CN
China
Prior art keywords
data
determining
synchronization
scene
test
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310725879.5A
Other languages
Chinese (zh)
Inventor
余尧尧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Construction Bank Corp
CCB Finetech Co Ltd
Original Assignee
China Construction Bank Corp
CCB Finetech Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Construction Bank Corp, CCB Finetech Co Ltd filed Critical China Construction Bank Corp
Priority to CN202310725879.5A priority Critical patent/CN116701220A/en
Publication of CN116701220A publication Critical patent/CN116701220A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • G06F11/3672Test management
    • G06F11/3688Test management for test execution, e.g. scheduling of test suites
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a data synchronization test method, a device, electronic equipment and a computer readable medium, and relates to the technical field of computers.A specific implementation mode comprises the steps of receiving a data synchronization test request, acquiring corresponding test data, and determining a synchronization table type and a data dimension corresponding to the test data; determining a corresponding synchronous scene identifier based on the synchronous table type and the data dimension; determining the production service data volume corresponding to the synchronous scene identifier, and further determining the scene data volume according to the production service data volume; and executing a data synchronization test according to the scene data volume, and further determining the data synchronization time. The method and the device for evaluating the synchronization performance of the data synchronization tool can improve the efficiency and the accuracy of the evaluation of the synchronization performance of the data synchronization tool, and provide faster data support for selecting the efficient data synchronization tool.

Description

Data synchronization test method and device, electronic equipment and computer readable medium
Technical Field
The present application relates to the field of big data testing technologies, and in particular, to a method and apparatus for synchronously testing data, an electronic device, and a computer readable medium.
Background
When the relational database is synchronous with the data transmission of the big data ecological ring, the data synchronization tool is adopted, and the synchronization performance of the data synchronization tool on the service data cannot be effectively evaluated at present, so that the efficient data synchronization tool can be selected in a targeted manner according to the evaluation result.
In the process of implementing the present application, the inventor finds that at least the following problems exist in the prior art:
when the data synchronization test service is carried out, the efficiency and accuracy of the synchronization performance evaluation of the data synchronization tool are low.
Disclosure of Invention
In view of this, the embodiments of the present application provide a data synchronization testing method, apparatus, electronic device, and computer readable medium, which can solve the problem of low efficiency and accuracy of evaluating synchronization performance of a data synchronization tool when performing a data synchronization testing service.
To achieve the above object, according to an aspect of the embodiments of the present application, there is provided a data synchronization test method, including:
receiving a data synchronization test request, acquiring corresponding test data, and determining a synchronization table type and a data dimension corresponding to the test data;
determining a corresponding synchronous scene identifier based on the synchronous table type and the data dimension;
determining the production service data volume corresponding to the synchronous scene identifier, and further determining the scene data volume according to the production service data volume;
and executing a data synchronization test according to the scene data volume, and further determining the data synchronization time.
Optionally, determining the synchronization table type and the data dimension corresponding to the test data includes:
determining a corresponding data source identifier according to the test data;
determining a processing type based on the data source identification;
and determining the type of the synchronous table according to the processing type.
Optionally, determining the synchronization table type and the data dimension corresponding to the test data includes:
determining a business event corresponding to the fact table in response to the fact table type as the fact table;
data dimensions are determined based on the business events.
Optionally, determining the synchronization table type according to the processing type includes:
responding to the processing type as batch processing, and obtaining a test model and a table structure corresponding to the batch processing;
and determining the synchronous table type according to the test model and the table structure.
Optionally, determining the corresponding synchronization scene identifier includes:
determining a corresponding service logic type according to the synchronous table type and the data dimension;
and determining the synchronous scene identification according to the service logic type.
Optionally, determining the scene data volume from the production traffic data volume includes:
and reducing the production service data volume by a preset quantity multiple in a same ratio to obtain the scene data volume.
Optionally, determining the scene data volume from the production traffic data volume includes:
determining the number of partitions corresponding to the synchronous scene identifiers;
and dividing the production service data volume according to the partition number to obtain the scene data volume.
In addition, the application also provides a data synchronization testing device, which comprises:
the receiving unit is configured to receive a data synchronization test request, acquire corresponding test data and determine a synchronization table type and a data dimension corresponding to the test data;
a synchronization scene identification determining unit configured to determine a corresponding synchronization scene identification based on the synchronization table type and the data dimension;
the scene data volume determining unit is configured to determine the production service data volume corresponding to the synchronous scene identifier, and further determine the scene data volume according to the production service data volume;
and the execution unit is configured to execute a data synchronization test according to the scene data amount so as to determine the data synchronization time.
Optionally, the receiving unit is further configured to:
determining a corresponding data source identifier according to the test data;
determining a processing type based on the data source identification;
and determining the type of the synchronous table according to the processing type.
Optionally, the receiving unit is further configured to:
determining a business event corresponding to the fact table in response to the fact table type as the fact table;
data dimensions are determined based on the business events.
Optionally, the receiving unit is further configured to:
responding to the processing type as batch processing, and obtaining a test model and a table structure corresponding to the batch processing;
and determining the synchronous table type according to the test model and the table structure.
Optionally, the synchronization scene identification determination unit is further configured to:
determining a corresponding service logic type according to the synchronous table type and the data dimension;
and determining the synchronous scene identification according to the service logic type.
Optionally, the scene data amount determining unit is further configured to:
and reducing the production service data volume by a preset quantity multiple in a same ratio to obtain the scene data volume.
Optionally, the scene data amount determining unit is further configured to:
determining the number of partitions corresponding to the synchronous scene identifiers;
and dividing the production service data volume according to the partition number to obtain the scene data volume. In addition, the application also provides data synchronization test electronic equipment, which comprises: one or more processors; and the storage device is used for storing one or more programs, and when the one or more programs are executed by the one or more processors, the one or more processors are enabled to realize the data synchronization test method.
In addition, the application also provides a computer readable medium, on which a computer program is stored, which when executed by a processor implements the data synchronization test method as described above.
To achieve the above object, according to still another aspect of an embodiment of the present application, there is provided a computer program product.
The computer program product of the embodiment of the application comprises a computer program, and the data synchronization test method provided by the embodiment of the application is realized when the program is executed by a processor.
One embodiment of the above application has the following advantages or benefits: the method comprises the steps of obtaining corresponding test data by receiving a data synchronous test request, and determining a synchronous table type and a data dimension corresponding to the test data; determining a corresponding synchronous scene identifier based on the synchronous table type and the data dimension; determining the production service data volume corresponding to the synchronous scene identifier, and further determining the scene data volume according to the production service data volume; and executing a data synchronization test according to the scene data volume, and further determining the data synchronization time. The method and the device for evaluating the synchronization performance of the data synchronization tool can improve the efficiency and the accuracy of the evaluation of the synchronization performance of the data synchronization tool, and provide faster data support for selecting the efficient data synchronization tool.
Further effects of the above-described non-conventional alternatives are described below in connection with the embodiments.
Drawings
The drawings are included to provide a better understanding of the application and are not to be construed as unduly limiting the application. Wherein:
FIG. 1 is a schematic diagram of the main flow of a data synchronization test method according to one embodiment of the application;
FIG. 2 is a schematic diagram of the main flow of a data synchronization test method according to one embodiment of the application;
FIG. 3 is a schematic flow diagram of a data synchronization test method according to one embodiment of the application;
FIG. 4 is a schematic diagram of the main units of a data synchronization testing apparatus according to an embodiment of the present application;
FIG. 5 is an exemplary system architecture diagram in which embodiments of the present application may be applied;
fig. 6 is a schematic diagram of a computer system suitable for use in implementing an embodiment of the application.
Detailed Description
Exemplary embodiments of the present application will now be described with reference to the accompanying drawings, in which various details of the embodiments of the present application are included to facilitate understanding, and are to be considered merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness. In the technical scheme of the application, the aspects of acquisition, analysis, use, transmission, storage and the like of the related user personal information all meet the requirements of related laws and regulations, are used for legal and reasonable purposes, are not shared, leaked or sold outside the aspects of legal use and the like, and are subjected to supervision and management of a supervision department. Necessary measures should be taken for the personal information of the user to prevent illegal access to such personal information data, ensure that personnel having access to the personal information data comply with the regulations of the relevant laws and regulations, and ensure the personal information of the user. Once these user personal information data are no longer needed, the risk should be minimized by limiting or even prohibiting the data collection and/or deletion.
User privacy is protected by de-identifying data when used, including in some related applications, such as by removing a particular identifier, controlling the amount or specificity of stored data, controlling how data is stored, and/or other methods.
Fig. 1 is a schematic diagram of main flow of a data synchronization testing method according to an embodiment of the present application, and as shown in fig. 1, the data synchronization testing method includes:
step S101, receiving a data synchronization test request, acquiring corresponding test data, and determining a synchronization table type and a data dimension corresponding to the test data.
In this embodiment, the execution body (for example, may be a server) of the data synchronization test method may receive the data synchronization test request through a wired connection or a wireless connection. In particular, the data synchronization test request may be a request to test the synchronization performance of one synchronization tool. After receiving the data synchronization test request, the execution theme can acquire the corresponding test data in the request. And acquiring the corresponding synchronous table type and data dimension from the test data. The synchronization table type may be a fact table and/or a dimension table, and the fact table is simply called a fact data table. The main characteristic is that the data contains a large amount of data, and the data can be summarized and recorded. The key feature of fact tables is the inclusion of digital data (facts), and these digital information can be summarized to provide data about units as a history, each fact data table contains an index of parts that contains the primary key of the dependency dimension table as an external key, and the dimension table contains the properties of the fact record. If the synchronization table type is a fact table, the data dimension may be determined from the various dimension fields contained in the fact table. By way of example, the various dimension fields contained in the fact table may be a client dimension, a time dimension, a region dimension, and the corresponding data dimension may be a client dimension, a time dimension, and a region dimension. The embodiment of the application does not limit the dimension fields contained in the fact table in detail.
Step S102, corresponding synchronous scene identification is determined based on the synchronous table type and the data dimension.
Matching the synchronization table type and the data dimension with a preset scene description in similarity, and exemplary, the scene description 2 may include: and (3) carrying out complex join synchronization on a single fact table and a plurality of dimension tables, wherein if the type of the synchronization table is the fact table and the data dimension is the client dimension, the time dimension and the region dimension, the scene description 2 is satisfied, and the corresponding synchronization scene identification is 2.
Step S103, determining the production service data volume corresponding to the synchronous scene identifier, and further determining the scene data volume according to the production service data volume.
Specifically, the executing body may identify the corresponding scene description according to the synchronous scene, for example: the business logic is complex in a large wide table, and the business logic comprises sub-inquiry, index calculation and the like, so that the actual production business data volume corresponding to the scene description, for example, 30 kilobytes, is obtained.
Specifically, determining the scene data volume from the production traffic data volume includes: and reducing the production service data volume by a preset quantity multiple in a same ratio to obtain the scene data volume.
In order to improve the data synchronization test efficiency, the production service data volume can be reduced by a preset quantity multiple in a same ratio. For example, reduced by a factor of 10 by the same ratio to obtain the scene data amount, for example, reduced from 30 kilobytes to 3 kilobytes. The scaling down by 10 times can be to reserve field types in the actual production service quantity, and the actual production service data volume under each field type is scaled down by 10 times to obtain the scene data volume containing the field types identical to the actual production service data volume.
Step S104, according to the scene data quantity, executing a data synchronization test, and further determining the data synchronization time.
And according to the obtained scene data quantity, the size of the storage space occupied by the test data and the data synchronization efficiency, counting the time consumed by executing the data synchronization.
According to the embodiment, through receiving a data synchronization test request, corresponding test data are obtained, and the type and the data dimension of a synchronization table corresponding to the test data are determined; determining a corresponding synchronous scene identifier based on the synchronous table type and the data dimension; determining the production service data volume corresponding to the synchronous scene identifier, and further determining the scene data volume according to the production service data volume; and executing a data synchronization test according to the scene data volume, and further determining the data synchronization time. The method and the device for evaluating the synchronization performance of the data synchronization tool can improve the efficiency and the accuracy of the evaluation of the synchronization performance of the data synchronization tool, and provide faster data support for selecting the efficient data synchronization tool.
Fig. 2 is a main flow diagram of a data synchronization testing method according to an embodiment of the present application, and as shown in fig. 2, the data synchronization testing method includes:
step S201, receiving a data synchronization test request, and obtaining corresponding test data.
After receiving the data synchronization test request, the execution body may acquire corresponding test data. For example, the test data may be service data to be synchronized, such as logistics data, order data, etc., and the content of the test data is not specifically limited in the embodiment of the present application.
Step S202, corresponding data source identification is determined according to the test data.
And acquiring the identification data from the test data, classifying the identification data, and determining the data source identification in the identification data according to the classification. By way of example, the identification data may be, for example: WL, DD or SJY-a, wherein the classification represented by WL may be: the classification represented by the logistics data, DD, may be: the order data, SJY-A, may be categorized as: the data source is the a database. And the SJY-A is the data source identification.
Step S203, determining the processing type based on the data source identifier.
In the embodiment of the application, some data sources do not support batch processing operation, a specific execution body can call a key value of the data source identification-batch processing operation to a database, find whether the data source identification SJY-A exists in the database from the key value, determine that the processing type is batch processing if the data source identification SJY-A exists, and determine that the processing type is not supporting batch processing if the data source identification SJY-A does not exist.
Step S204, determining the type of the synchronous table according to the processing type.
Specifically, determining the synchronization table type according to the processing type includes: responding to the processing type as batch processing, and obtaining a test model and a table structure corresponding to the batch processing; and determining the synchronous table type according to the test model and the table structure.
For example, the test model corresponding to the batch process may include a star model and a snowflake model, where a scene corresponding to the star model may be: each dimension of the multi-dimensional dataset is synchronized directly with the fact table. The scene corresponding to the snowflake model can be: a plurality of connection syncs are created between the dimension tables, fact tables. The table structure may include a simple data structure and a complex data structure. When the process type is a batch process, the synchronization table type may be determined according to the test type. For example, when the test type is a star model, the corresponding synchronization table type may be a fact table. When the test type is a snowflake model, the corresponding synchronization table type may be a dimension table and a fact table. When the table structure is a complex data structure, the corresponding synchronization table type may be a large-width table containing data structures such as Map, or may be a large-width table containing data structures such as Json.
In step S205, in response to the synchronization table type being the fact table, a business event corresponding to the fact table is determined.
The fact table is a central table in the data warehouse architecture that contains numerical metric values and keys that link facts to dimension tables. The fact data table contains data describing specific events within a business (e.g., banking or product sales). When the executing body determines that the synchronization table type is a fact table, each business event in the fact table, for example, each column name in the fact table, such as a client dimension, a time dimension, a region dimension, a transaction variety, a transaction amount, can be acquired.
Step S206, determining the data dimension according to the business event.
And determining the data dimension according to the dimension and the dimension quantity contained in the business event. Illustratively, the number of data dimensions and the specific data dimension value are determined based on the customer dimension, the time dimension, the region dimension, the transaction category and the transaction amount included in the business event. For example, the number of data dimensions is 5, and the data dimension values corresponding to the client dimensions may be: 110. 243, 105, etc., the data dimension values corresponding to the time dimension may be: 001. 085, 002, etc., the data dimension values corresponding to the regional dimensions may be: 042. 031, 025, etc., the data dimension value corresponding to the transaction variety may be: card consumption, withdrawal, deposit, etc., the data dimension value corresponding to the transaction amount may be 345, 1000, 1200, etc., and the number of data dimensions and the data dimension value corresponding to the data dimension corresponding to the business event are not specifically limited in the embodiment of the present application.
Step S207, determining corresponding synchronous scene identification based on the synchronous table type and the data dimension.
The execution main body can call a preset corresponding relation table of the synchronous table type-data dimension-synchronous scene identification so as to determine the synchronous scene identification corresponding to the synchronous table type and the data dimension according to the embodiment of the application based on the corresponding relation table. For example, when the synchronization table type is 5 fact tables and 5 dimension tables, the data dimension is 5, and then the corresponding synchronization scene is identified as scene 3.
Step S208, the production service data volume corresponding to the synchronous scene identification is determined, and then the scene data volume is determined according to the production service data volume.
Specifically, determining the scene data volume from the production traffic data volume includes: determining the number of partitions corresponding to the synchronous scene identifiers; and dividing the production service data volume according to the partition number to obtain the scene data volume.
When the data volume is reduced in a same ratio, the actual production service data volume may be partitioned according to a preset partition number, for example, K, specifically, the production service volume may be divided into each partition corresponding to the preset partition number, or the data volume may be partitioned according to an upper limit of the data volume that may be accommodated by each partition corresponding to the preset partition number, so as to obtain the scene data volume in each partition corresponding to the preset partition number.
Step S209, according to the scene data amount, a data synchronization test is executed, and further, the data synchronization time is determined.
And according to the obtained scene data quantity, the size of the storage space occupied by the test data and the data synchronization efficiency, counting the time consumed by executing the data synchronization. The method and the device for evaluating the synchronization performance of the data synchronization tool can improve the efficiency and the accuracy of the evaluation of the synchronization performance of the data synchronization tool, and provide faster data support for selecting the efficient data synchronization tool.
Fig. 3 is a schematic flow chart of a data synchronization test method according to an embodiment of the application. As shown in fig. 3, the data synchronization test method in the embodiment of the application includes:
step S301, a data synchronization test request is received, corresponding test data is obtained, and a synchronization table type and a data dimension corresponding to the test data are determined.
Step S302, corresponding service logic types are determined according to the synchronous table types and the data dimensions.
For example, the preset corresponding relation table of the synchronization table type-the data dimension-the synchronization scene identification is shown in the following table 1:
TABLE 1
Based on the corresponding relation table of the synchronous table type-data dimension-synchronous scene identification, the corresponding service logic type can be determined from the table according to the synchronous table type and the data dimension.
The corresponding scene description can be determined according to the type of the synchronous table and the data dimension, so that the corresponding service logic type is determined according to the scene description in the corresponding relation table of the preset synchronous table type-data dimension-synchronous scene identifier.
For example, when the synchronization table type is 1 fact table+5 dimension table, the scene description in the corresponding relationship table of the corresponding preset synchronization table type-data dimension-synchronization scene identifier may be: and carrying out complex join synchronization on the single fact table and the plurality of dimension tables, wherein the corresponding business logic types are as follows: a single fact table, multiple dimension tables, and complex join synchronization. That is, the scene descriptions in the corresponding relation table of the service logic type and the preset synchronization table type-data dimension-synchronization scene identification can be consistent.
Step S303, determining the synchronous scene identification according to the service logic type.
Based on a preset corresponding relation table of the synchronous table type-the data dimension-the synchronous scene identification, the corresponding synchronous scene identification is determined according to the service logic type (namely, scene description in the corresponding relation table of the synchronous table type-the data dimension-the synchronous scene identification). For example, when the service logic type is a single fact table, and complex join synchronization is performed by using multiple dimension tables, the corresponding synchronization scene identifier is determined to be scene 2 according to the preset corresponding relationship table of the synchronization table type-data dimension-synchronization scene identifier.
Step S304, the production service data volume corresponding to the synchronous scene identification is determined, and then the scene data volume is determined according to the production service data volume.
In step S305, according to the scene data amount, a data synchronization test is performed, so as to determine a data synchronization time.
The embodiment of the application can also call a service logic generation program according to the synchronous table type and the data dimension to generate a corresponding service logic type, and when the generated service logic type does not have a corresponding scene description in a corresponding relation table of a preset synchronous table type-data dimension-synchronous scene identifier, the data expansion program can be called to generate a scene description and a scene name corresponding to the service logic type and generate an expanded synchronous scene identifier based on the existing synchronous scene identifier, and then the expanded synchronous scene identifier, the scene name and the generated scene description corresponding to the service logic type are associated and dynamically updated into the corresponding relation table of the preset synchronous table type-data dimension-synchronous scene identifier for subsequent query. And the speed and the accuracy of the data synchronization test are improved.
In the embodiment of the present application, the data synchronization test method may be based on the big data FilinkX, and may specifically be divided into three stages: the environment depends on architecture design and construction, flinkX principle analysis is self-realized, and a method strategy is tested.
In embodiments of the application, the Flink: the novel computing framework has the characteristics of distribution, low delay, high throughput and high reliability, and supports various deployment modes: local (stand alone), standby mode. Distributed resource scheduling capability: distributed resource scheduling can be performed based on yarn, meso or k8s and the like, so that the resource utilization rate can be improved, and the operation efficiency can be improved. Distributed file system: a distributed file system (Distributed File System) designed to be suitable for running on general-purpose hardware (commodity hardware), such as: hadoop Distributed File System (HDFS), and the like. Task breakpoint continuous transmission: in a transmission scenario of a large data volume, when the network is jittered, the task may fail, and the task may continue from the point of failure, i.e. the breakpoint may continue. FlinkX: and adopting a plug-in architecture to realize data synchronization among various heterogeneous data sources. Different source databases are abstracted into different Reader plug-ins; different target databases are abstracted into different Writer plug-ins. Task automatic assembly: and loading Reader plug-in and Writer plug-in corresponding to the source database and the target database by the Template module according to the configuration information of the synchronous task, thereby realizing automatic assembly. The Checkpoint mechanism: breakpoint resume of the Flinkx framework is implemented based on the Flink's checkpoint mechanism. ETL: ETL is an abbreviation for Extract-Transform-Load (Extract-Transform-Load), which is a complete process of extracting data from a source system, performing Transform processing, and loading into a data warehouse.
Example, 1. Environment dependent architecture design and construction:
hadoop 2.8.5 version, flink was used: version 1.8, jdk 1.8. Hadoop related nodes are built, and DataNode, nodeManager all nodes are configured to the same directory of the Flink related files to the YARN client. And submitting the Flinkx task based on the Flink submitting task mode, and carrying out resource scheduling monitoring and acceptance related data based on the Flink monitoring page. Wherein Apache Hadoop YARN (Yet Another Resource Negotiator, another resource coordinator) is a new Hadoop resource manager, which is a universal resource management system, and can provide unified resource management and scheduling for upper layer applications, and its introduction brings great benefits to clusters in terms of utilization rate, unified resource management, data sharing, etc.
2. Self-implementing FlinkX principle analysis:
1) Acquiring a configuration file, and calling configuration after instantiation by a construction tool (Builder) through rewriting a set function of a stream output format generation class (streamoutputFormatBuilder):
2) Building synchronous data output
The self-defined data output logic completes the initialized processing logic by analyzing the RichOutputFormat class inherited by the stream output format (StreamOutputFormat):
writeOneRecordInternational: defining a single piece of data processing logic;
writeallrecordinternational: defining batch data processing logic;
close LastInternational: data shutdown processing logic is defined.
3) Line processing and batch processing implementation of Flinfk
The object of the writeOneRecordInternational data processing uses the structure of the Flink native org.apache.Flink.types.Row, which is essentially an array, the specific method is as follows:
getArity: acquiring the length of Row;
getField: acquiring a value of a designated position;
toString: values in the array are converted to String with "partition".
Concurrent processing for batch processing scenarios may improve the performance of the transmission, i.e., concurrent processing data using writeallrecords international. Different data sources batch interval-batch interval settings are different and some data sources do not support batch operations. Batch processing requires setting a processing interval duration: build.
3. The testing method comprises the following steps:
ETL is an abbreviation for Extract-Transform-Load (Extract-Transform-Load), which is a complete process of extracting data from a source system, performing Transform processing, and loading into a data warehouse. Providing multiple service scenes in the FlinkX in the ETL operation, setting 11 scenes (shown in table 1) according to service logic and complexity conditions, reducing the data volume of each scene by 10 times according to the same ratio of actual production service data, not considering partition conditions (the partition can be actually taken according to the partition), comparing the size of storage space occupied by test data and the data synchronization efficiency according to different scenes, and counting the time consumed by synchronous execution.
According to the embodiment of the application, the data synchronization test scene of the FlinkX is deduced according to the ETL operation scene, wherein the FlinkX: the plug-in architecture is adopted to realize data synchronization among various heterogeneous data sources, different source databases are abstracted into different Reader plug-ins, different target databases are abstracted into different Writer plug-ins, business scenes similar to business lines are fully considered, different data models can be simulated, differences between the business scenes and real business scenes are reduced, the real performance condition of the FlinkX synchronous data tool can be found, and the method is applicable to different test environments.
Fig. 4 is a schematic diagram of main units of a data synchronization testing apparatus according to an embodiment of the present application. As shown in fig. 4, the data synchronization test device 400 includes a receiving unit 401, a synchronization scene identification determining unit 402, a scene data amount determining unit 403, and an executing unit 404.
The receiving unit 401 is configured to receive a data synchronization test request, acquire corresponding test data, and determine a synchronization table type and a data dimension corresponding to the test data.
The synchronization scene identification determination unit 402 is configured to determine a corresponding synchronization scene identification based on the synchronization table type and the data dimension.
The scenario data amount determining unit 403 is configured to determine a production service data amount corresponding to the synchronous scenario identification, and further determine the scenario data amount according to the production service data amount.
An execution unit 404 configured to execute a data synchronization test according to the scene data amount, thereby determining a data synchronization time.
In some embodiments, the receiving unit 401 is further configured to: determining a corresponding data source identifier according to the test data; determining a processing type based on the data source identification; and determining the type of the synchronous table according to the processing type.
In some embodiments, the receiving unit 401 is further configured to: determining a business event corresponding to the fact table in response to the fact table type as the fact table; data dimensions are determined based on the business events.
In some embodiments, the receiving unit 401 is further configured to: responding to the processing type as batch processing, and obtaining a test model and a table structure corresponding to the batch processing; and determining the synchronous table type according to the test model and the table structure.
In some embodiments, the synchronization scene identification determination unit 402 is further configured to: determining a corresponding service logic type according to the synchronous table type and the data dimension; and determining the synchronous scene identification according to the service logic type.
In some embodiments, the scene data amount determination unit 403 is further configured to: and reducing the production service data volume by a preset quantity multiple in a same ratio to obtain the scene data volume.
In some embodiments, the scene data amount determination unit 403 is further configured to: determining the number of partitions corresponding to the synchronous scene identifiers; and dividing the production service data volume according to the partition number to obtain the scene data volume.
It should be noted that, the data synchronization test method and the data synchronization test device of the present application have a corresponding relationship in the implementation content, so the repeated content will not be described.
Fig. 5 illustrates an exemplary system architecture 500 to which the data synchronization test method or the data synchronization test apparatus of the embodiments of the present application may be applied.
As shown in fig. 5, the system architecture 500 may include terminal devices 501, 502, 503, a network 504, and a server 505. The network 504 is used as a medium to provide communication links between the terminal devices 501, 502, 503 and the server 505. The network 504 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
A user may interact with the server 505 via the network 504 using the terminal devices 501, 502, 503 to receive or send messages or the like. Various communication client applications may be installed on the terminal devices 501, 502, 503, such as shopping class applications, web browser applications, search class applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only).
The terminal devices 501, 502, 503 may be various electronic devices with a data synchronization test handler and support web browsing including, but not limited to, smartphones, tablet computers, laptop and desktop computers, and the like.
The server 505 may be a server providing various services, such as a background management server (by way of example only) providing support for data synchronization test requests submitted by users using the terminal devices 501, 502, 503. The background management server can receive a data synchronization test request, acquire corresponding test data and determine a synchronization table type and a data dimension corresponding to the test data; determining a corresponding synchronous scene identifier based on the synchronous table type and the data dimension; determining the production service data volume corresponding to the synchronous scene identifier, and further determining the scene data volume according to the production service data volume; and executing a data synchronization test according to the scene data volume, and further determining the data synchronization time. The method and the device for evaluating the synchronization performance of the data synchronization tool can improve the efficiency and the accuracy of the evaluation of the synchronization performance of the data synchronization tool, and provide faster data support for selecting the efficient data synchronization tool.
It should be noted that, the data synchronization testing method provided in the embodiment of the present application is generally executed by the server 505, and accordingly, the data synchronization testing device is generally disposed in the server 505.
It should be understood that the number of terminal devices, networks and servers in fig. 5 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Referring now to FIG. 6, there is illustrated a schematic diagram of a computer system 600 suitable for use in implementing an embodiment of the present application. The terminal device shown in fig. 6 is only an example, and should not impose any limitation on the functions and the scope of use of the embodiment of the present application.
As shown in fig. 6, the computer system 600 includes a Central Processing Unit (CPU) 601, which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 602 or a program loaded from a storage section 608 into a Random Access Memory (RAM) 603. In the RAM603, various programs and data required for the operation of the computer system 600 are also stored. The CPU601, ROM602, and RAM603 are connected to each other through a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
The following components are connected to the I/O interface 605: an input portion 606 including a keyboard, mouse, etc.; an output portion 607 including a Cathode Ray Tube (CRT), a liquid crystal credit authorization query processor (LCD), and the like, and a speaker, and the like; a storage section 608 including a hard disk and the like; and a communication section 609 including a network interface card such as a LAN card, a modem, or the like. The communication section 609 performs communication processing via a network such as the internet. The drive 610 is also connected to the I/O interface 605 as needed. Removable media 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is installed as needed on drive 610 so that a computer program read therefrom is installed as needed into storage section 608.
In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication portion 609, and/or installed from the removable medium 611. The above-described functions defined in the system of the present application are performed when the computer program is executed by a Central Processing Unit (CPU) 601.
The computer readable medium shown in the present application may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium may include, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present application, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units involved in the embodiments of the present application may be implemented in software or in hardware. The described units may also be provided in a processor, for example, described as: a processor includes a receiving unit, a synchronous scene identification determining unit, a scene data amount determining unit, and an executing unit. Wherein the names of the units do not constitute a limitation of the units themselves in some cases.
As another aspect, the present application also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be present alone without being fitted into the device. The computer readable medium carries one or more programs, and when the one or more programs are executed by one device, the device receives a data synchronization test request, acquires corresponding test data, and determines a synchronization table type and a data dimension corresponding to the test data; determining a corresponding synchronous scene identifier based on the synchronous table type and the data dimension; determining the production service data volume corresponding to the synchronous scene identifier, and further determining the scene data volume according to the production service data volume; and executing a data synchronization test according to the scene data volume, and further determining the data synchronization time.
The computer program product of the present application comprises a computer program which, when executed by a processor, implements the data synchronization test method of the embodiments of the present application.
According to the technical scheme provided by the embodiment of the application, the efficiency and the accuracy of evaluating the synchronization performance of the data synchronization tool can be improved, and faster data support is provided for selecting the efficient data synchronization tool.
The above embodiments do not limit the scope of the present application. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives can occur depending upon design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present application should be included in the scope of the present application.

Claims (16)

1. A method for testing data synchronization, comprising:
receiving a data synchronization test request, acquiring corresponding test data, and determining a synchronization table type and a data dimension corresponding to the test data;
determining a corresponding synchronization scene identifier based on the synchronization table type and the data dimension;
determining the production service data volume corresponding to the synchronous scene identifier, and further determining the scene data volume according to the production service data volume;
and executing a data synchronization test according to the scene data volume, and further determining the data synchronization time.
2. The method of claim 1, wherein determining the synchronization table type and data dimension to which the test data corresponds comprises:
determining a corresponding data source identifier according to the test data;
determining a processing type based on the data source identifier;
and determining the type of the synchronous table according to the processing type.
3. The method of claim 2, wherein determining the synchronization table type and data dimension to which the test data corresponds comprises:
responding to the synchronous table type as a fact table, and determining a business event corresponding to the fact table;
and determining the data dimension according to the business event.
4. The method of claim 2, wherein determining a synchronization table type based on the processing type comprises:
responding to the processing type as batch processing, and obtaining a test model and a table structure corresponding to the batch processing;
and determining the synchronous table type according to the test model and the table structure.
5. The method of claim 1, wherein the determining the corresponding synchronization scene identification comprises:
determining a corresponding service logic type according to the synchronous table type and the data dimension;
and determining the synchronous scene identification according to the service logic type.
6. The method of claim 1, wherein said determining a scene data volume from said production service data volume comprises:
and reducing the production service data volume by a preset quantity multiple in a same ratio to obtain scene data volume.
7. The method of claim 1, wherein said determining a scene data volume from said production service data volume comprises:
determining the number of partitions corresponding to the synchronous scene identification;
and dividing the production service data volume according to the partition number to obtain scene data volume.
8. A data synchronization testing apparatus, comprising:
the receiving unit is configured to receive a data synchronization test request, acquire corresponding test data and determine a synchronization table type and a data dimension corresponding to the test data;
a synchronization scene identification determination unit configured to determine a corresponding synchronization scene identification based on the synchronization table type and the data dimension;
a scene data amount determining unit configured to determine a production service data amount corresponding to the synchronous scene identifier, and further determine a scene data amount according to the production service data amount;
and the execution unit is configured to execute a data synchronization test according to the scene data volume so as to determine the data synchronization time.
9. The apparatus of claim 8, wherein the receiving unit is further configured to:
determining a corresponding data source identifier according to the test data;
determining a processing type based on the data source identifier;
and determining the type of the synchronous table according to the processing type.
10. The apparatus of claim 9, wherein the receiving unit is further configured to:
responding to the synchronous table type as a fact table, and determining a business event corresponding to the fact table;
and determining the data dimension according to the business event.
11. The apparatus of claim 9, wherein the receiving unit is further configured to:
responding to the processing type as batch processing, and obtaining a test model and a table structure corresponding to the batch processing;
and determining the synchronous table type according to the test model and the table structure.
12. The apparatus of claim 8, wherein the synchronization scene identification determination unit is further configured to:
determining a corresponding service logic type according to the synchronous table type and the data dimension;
and determining the synchronous scene identification according to the service logic type.
13. The apparatus of claim 8, wherein the scene data amount determination unit is further configured to:
and reducing the production service data volume by a preset quantity multiple in a same ratio to obtain scene data volume.
14. A data synchronization test electronic device, comprising:
one or more processors;
storage means for storing one or more programs,
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 1-7.
15. A computer readable medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any of claims 1-7.
16. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any of claims 1-7.
CN202310725879.5A 2023-06-19 2023-06-19 Data synchronization test method and device, electronic equipment and computer readable medium Pending CN116701220A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310725879.5A CN116701220A (en) 2023-06-19 2023-06-19 Data synchronization test method and device, electronic equipment and computer readable medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310725879.5A CN116701220A (en) 2023-06-19 2023-06-19 Data synchronization test method and device, electronic equipment and computer readable medium

Publications (1)

Publication Number Publication Date
CN116701220A true CN116701220A (en) 2023-09-05

Family

ID=87835498

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310725879.5A Pending CN116701220A (en) 2023-06-19 2023-06-19 Data synchronization test method and device, electronic equipment and computer readable medium

Country Status (1)

Country Link
CN (1) CN116701220A (en)

Similar Documents

Publication Publication Date Title
US11580107B2 (en) Bucket data distribution for exporting data to worker nodes
CN111046237B (en) User behavior data processing method and device, electronic equipment and readable medium
US20130013549A1 (en) Hardware-assisted approach for local triangle counting in graphs
CN109901987B (en) Method and device for generating test data
CN109522751B (en) Access right control method and device, electronic equipment and computer readable medium
CN113282611A (en) Method and device for synchronizing stream data, computer equipment and storage medium
CN110389873A (en) A kind of method and apparatus of determining server resource service condition
CN112925859A (en) Data storage method and device
CN110188113B (en) Method, device and storage medium for comparing data by using complex expression
CN110888972A (en) Sensitive content identification method and device based on Spark Streaming
CN116450622B (en) Method, apparatus, device and computer readable medium for data warehouse entry
Choi et al. Intelligent reconfigurable method of cloud computing resources for multimedia data delivery
CN115454971A (en) Data migration method and device, electronic equipment and storage medium
Chang et al. Development of multiple big data analytics platforms with rapid response
CN112131257B (en) Data query method and device
CN111026629A (en) Method and device for automatically generating test script
CN116701220A (en) Data synchronization test method and device, electronic equipment and computer readable medium
CN113722007A (en) Configuration method, device and system of VPN branch equipment
CN113362097B (en) User determination method and device
US20230342352A1 (en) System and Method for Matching into a Complex Data Set
CN117472871A (en) Data analysis method, device, electronic equipment and computer readable medium
US10896193B2 (en) Cache fetching of OLAP based data using client to client relationships and data encoding
CN114610507A (en) Application service processing method, device, equipment, storage medium and program product
CN117950850A (en) Data transmission method, device, electronic equipment and computer readable medium
CN113760925A (en) Data processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination