WO2022205938A1

WO2022205938A1 - Data acquisition method and apparatus, computer device, and storage medium

Info

Publication number: WO2022205938A1
Application number: PCT/CN2021/131752
Authority: WO
Inventors: 孙岩; 董光杰; 顾永飞; 杭军; 吴金迎; 钱津津
Original assignee: 苏宁易购集团股份有限公司
Priority date: 2021-03-30
Filing date: 2021-11-19
Publication date: 2022-10-06
Also published as: CN112948504B; CN112948504A

Abstract

A data acquisition method and apparatus, a computer device, and a storage medium. The method comprises: acquiring specified time interval information, and acquiring, from a service database according to the time interval information, first data updated in a corresponding time interval (S202); screening, from field information stored in an intermediate table, target field information that does not match the time interval information, the intermediate table comprising field information of a feature field of data to be acquired, and said data being data determined from the service database according to preset acquisition logic (S204); acquiring second data from the service database according to the target field information (S206); and integrating the first data and the second data to obtain target data (S208). Using the method can improve the efficiency of data acquisition.

Description

Data acquisition method, device, computer equipment and storage medium

technical field

The present application relates to the technical field of data processing, and in particular, to a data acquisition method, device, computer equipment and storage medium.

Background technique

With the development of data processing technology, data collection technology has emerged. Data collection generally uses collection tools to collect source data in the source database to the data warehouse of the big data platform.

In traditional data collection methods, the data update time is usually used as a filter condition to collect incremental data. However, when there is a correlation between incremental data and some inventory data, for example, when collecting return form data, the newly updated data needs to be The data of the return order and the data of the original order corresponding to the return order are collected together into the data warehouse before they can be used for downstream statistical analysis. However, after the incremental data is collected according to the data update time, in order to obtain part of the inventory data corresponding to the incremental data, the big data center platform needs to use the full calculation method to calculate the historical inventory data, which not only consumes computing resources, but also The efficiency of data collection is reduced.

technical solutions

Based on this, it is necessary to provide a data collection method, device, computer equipment and storage medium that can improve the efficiency of data collection in response to the above technical problems.

A data processing method, the method comprising:

Obtain the specified time interval information, and collect the updated first data in the corresponding time interval from the business database according to the time interval information;

The target field information that does not match the time interval information is filtered from the field information stored in the intermediate table. The intermediate table includes the field information of the characteristic fields of the data to be collected. The data to be collected is determined from the business database according to the preset collection logic. The data;

Collect the second data from the business database according to the target field information;

Data integration processing is performed on the first data and the second data to obtain target data.

In one embodiment, the above method further includes:

Store the target data in the partition table of the corresponding partition in the data warehouse.

In one embodiment, the above-mentioned intermediate table is a data table set in a data warehouse.

In one embodiment, before obtaining the specified time interval information, the above method further includes: obtaining preset collection logic information; determining data in the business database that conforms to the collection logic information as the data to be collected; The extracted field information is stored in the intermediate table.

In one embodiment, filtering the target field information that does not match the time interval information from the field information stored in the intermediate table includes: acquiring task parameter information, reading the field information matching the task parameter information from the intermediate table and storing it in a temporary Table; filter the target field information that does not match the time interval information from the field information stored in the temporary table.

In one embodiment, collecting the second data from the business database according to the target field information includes: generating a structured query language by using the target field information as a value corresponding to a query condition; collecting the second data from the business database according to the structured query language .

In one embodiment, after performing data integration processing on the first data and the second data, the above method further includes: performing data deduplication processing on the data after the data integration processing.

In one embodiment, after performing data integration processing on the first data and the second data, the above method further includes: comparing the data after the data integration processing with the data to be collected, and removing data different from the data to be collected.

A data acquisition device, the device comprising:

a first data acquisition module, configured to acquire specified time interval information, and collect the updated first data in the corresponding time interval from the business database according to the time interval information;

The field information acquisition module is used to filter the target field information that does not match the time interval information from the field information stored in the intermediate table. The intermediate table includes the field information of the characteristic fields of the data to be collected, and the data to be collected is collected according to a preset Logic determines the data from the business database;

The second data collection module is used for collecting the second data from the business database according to the target field information;

The data integration processing module is used for performing data integration processing on the first data and the second data to obtain target data.

A computer device includes a memory, a processor, and a computer program stored in the memory and running on the processor. The processor implements the steps of the data acquisition method when the processor executes the computer program.

A computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, implements the steps of the above-mentioned data acquisition method.

beneficial effect

The above-mentioned data collection method, device, computer equipment and storage medium collect the corresponding first data through a specified time interval, filter the target field information that is not in the specified time interval through an intermediate table containing the field information of the pre-determined data to be collected, and according to the The second data corresponding to the target field information is collected, and finally the first data and the second data are integrated to obtain the target data of this collection task. Using this solution, the updated data within the specified time interval and the related data can be quickly collected. The historical data of the relationship no longer needs to be fully calculated on the historical data, thereby improving the efficiency of data collection.

Description of drawings

Fig. 1 is the application environment diagram of the data acquisition method in one embodiment;

2 is a schematic flowchart of a data collection method in one embodiment;

Fig. 3 is the technical framework diagram of distributed data acquisition task execution in an application example;

4 is a schematic flowchart of a data acquisition method in an application example;

5 is a structural block diagram of a data acquisition device in one embodiment;

FIG. 6 is a diagram of the internal structure of a computer device in one embodiment.

Embodiments of the present invention

In order to make the purpose, technical solutions and advantages of the present application more clearly understood, the present application will be described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present application, but not to limit the present application.

The data collection method provided in this application can be applied to the application environment shown in FIG. 1 . Wherein, the server 102 obtains the specified time interval information, and collects the first data updated in the corresponding time interval from the business database 104 according to the time interval information; and selects the target fields that do not match the time interval information from the field information stored in the intermediate table 106. information, the intermediate table 106 includes the field information of the characteristic fields of the data to be collected, the data to be collected is the data determined from the business database according to the preset collection logic; the second data is collected from the business database 104 according to the target field information; Data integration processing is performed on the first data and the second data to obtain target data. The server 102 may be implemented by an independent server or a server cluster composed of multiple servers.

In one embodiment, as shown in FIG. 2 , a data collection method is provided, which is described by taking the method applied to the server in FIG. 1 as an example, including the following steps:

Step S202: Acquire specified time interval information, and collect first data updated in the corresponding time interval from the service database according to the time interval information.

The business database is a database that stores business data, and may be a relational database or a non-relational database. The business database may contain at least one business data table. The first data is data updated to a certain data table in the business database within a specified time interval.

Specifically, the user can collect data by using the time interval of data update as a condition for data filtering. The time interval information may be information indicating any valid time period or point in time. The server obtains the time interval information specified by the user, uses the time interval information as a condition for data screening, and collects the data updated in the time interval or at the time point corresponding to the time interval information in the business database as the first data.

Step S204: Screen the target field information that does not match the time interval information from the field information stored in the intermediate table, the intermediate table includes the field information of the characteristic fields of the data to be collected, and the data to be collected is obtained from the business database according to the preset collection logic. data identified in.

The intermediate table is a data table in the database for storing intermediate calculation results. The data to be collected is the data to be collected determined from the data of the business database according to the user-defined or preset collection logic, and the purpose of determining the data to be collected is to frame the scope of data collection. The characteristic fields can be set adaptively according to different data types. For example, for return order data, the characteristic fields can include at least one of order number, order line number, stock number, table number, and order time. The field information can be the field value in the field.

Specifically, the server performs matching and screening from the field information stored in the intermediate table according to the time interval information, filters out the field information of the data to be collected in the time interval not corresponding to the time interval information, and uses the filtered field information as the target field information. For example, if the time interval information is yesterday, the field information of the data to be collected that is not updated yesterday is selected from the intermediate table as the target field information.

Step S206: Collect second data from the service database according to the target field information.

The second data refers to the data queried from the business database according to the target field information. Specifically, after acquiring the target field information, the server can use the target field information as a data screening condition, and query the business data containing the target field information from the business database, and can use a query language matching the business database to query, for example , the IN query in the SQL language (Structured Query Language, structured query language) can be used to collect the queried business data as the second data.

Step S208: Perform data integration processing on the first data and the second data to obtain target data.

Specifically, data integration processing is performed on the collected first data and second data, and all the data in the data set obtained after the data integration is used as the target data of this data collection task.

In the above data collection method, the corresponding first data is collected through a designated time interval, the target field information that is not in the designated time interval is filtered through an intermediate table containing the field information of the predetermined data to be collected, and the corresponding second data is collected according to the target field information. Finally, after integrating the first data and the second data, the target data of this collection task can be obtained. Using this solution, the updated data within the specified time interval and the historical data related to it can be quickly collected. The full amount of historical data is calculated, thereby improving the efficiency of data collection.

In one embodiment, the above method further includes: storing the target data in the partition table of the corresponding partition in the data warehouse. In this embodiment, by storing the collected target data in the corresponding partition table, the data can be quickly partitioned and divided into tables, the computing resources consumed by the data warehouse of the big data platform for data partition processing are reduced, and the data processing is improved. s efficiency. Among them, the partition table can be a data table in the hive database, where data can be written to a partition table in a custom format or a default format, and the hive data table in a custom format can prevent the content of some fields from containing line breaks. Data corruption problem occurs.

In one embodiment, the intermediate table is a data table set in a data warehouse. In this embodiment, the intermediate table is set in the data warehouse of the big data platform, which may be one or more data tables in the data warehouse, and its format is not limited, for example, it may be a hive data table. In the traditional collection method, an intermediate table is created in each sub-database of the business database, and then when the data of a certain table is extracted, it is collected in the form of an inner join of the intermediate table. For example, when collecting return table data, query by means of inner join intermediate table of the return table, and collect the newly added return order data and the original order data corresponding to the newly added return order to the data warehouse for statistical analysis of downstream sales data .

In this embodiment, by directly setting an intermediate table storing intermediate data for data collection in the data warehouse, the intermediate table of each sub-database of the business database in the business system can be removed. Because the premise of writing data to the intermediate table in the business database (data source) is that the data source needs to be configured with read and write permissions, therefore, traditional data collection can only use the main database of the business database, which reduces system performance during collection. , which affects the normal operation of the business, and the data writing operation also reduces the security of the database. By adopting the method of this embodiment, the intermediate table created in the business database is removed, and there is no need to query the intermediate table in the form of inner join. Therefore, the standby business data database can be used for data collection, and there is no need for the main business data database. Influence, you can decouple business systems and ensure system security.

In one embodiment, before acquiring the specified time interval information, the above method further includes:

Obtain preset collection logic information; determine data in the business database that conforms to the collection logic information as data to be collected; extract field information from characteristic fields of the data to be collected and store it in an intermediate table.

In this embodiment, before each collection task starts, the collection logic information that conforms to the business rules can be preset according to the business rules of each collection task, and the scope of data collection can be determined according to the preset collection logic information, which is about to meet the collection logic information. The logical data is determined as the data to be collected, and the field information in the characteristic fields of the data to be collected is extracted and stored in the intermediate table. The characteristic fields can be pre-specified according to different collection tasks, for example, the order number, order line number can be specified. , stock number, table number or order time and other fields are characteristic fields.

In this embodiment, the task parameters refer to the parameters corresponding to the collection task configured by the user before the collection task is started. For example, data collection can be performed through the spark task of the big data platform. When the spark task is started, the server obtains the user-configured parameters Task parameters, and load the task parameters to the spark task. Task parameters can include information such as specifying the business database to be queried, specifying the field information of the source table to be collected, and specifying the partition table to be written.

In this embodiment, since the intermediate table may include data to be collected that is pre-determined according to the collection logic information of different collection tasks, by acquiring and loading the task parameters configured by the user before the collection task is started, it is possible to obtain data from the intermediate table that conform to the current collection The data to be collected for the task, and the data to be collected for the current collection task is stored in a temporary table for subsequent processing. By setting task parameters, distributed task execution can be performed, which solves the problem of single-point tasks and improves the efficiency of data collection.

In this embodiment, by taking the target field information as the value corresponding to the query condition, and generating a structured query language, for example, the IN query statement of the relational database, the corresponding data can be quickly located from the business database according to the target field information, Improve the efficiency of data collection.

In one embodiment, after performing data integration processing on the first data and the second data, the above method further includes: performing data deduplication processing on the data after the data integration processing. In this embodiment, by performing deduplication processing on the data, redundant and redundant data can be removed, and the accuracy of data collection can be improved.

In one embodiment, after performing data integration processing on the first data and the second data, the above method further includes: comparing the data after the data integration processing with the data to be collected, and removing the data that is different from the data to be collected. data. In this embodiment, by comparing the predetermined data to be collected with the integrated target data, the data in the non-collection range can be excluded, and the accuracy of data collection can be further improved.

Below, the data collection method of the present application will be further described in conjunction with an application example, as shown in Figures 3 to 4, Figure 3 shows a technical framework diagram of the execution of distributed data collection tasks in an application example, and Figure 4 shows a A schematic flowchart of the data collection method in the application example, which specifically includes the following steps:

Step 1: Data collection in the intermediate table. The collection logic in the intermediate table can be the integration of multiple scenario logics, such as the return table, the exchange table, etc., and the corresponding collection logic can be defined according to business requirements. When the data is updated, the related expansion table, payment table and other data need to be re-collected to the latest partition. As long as the intermediate table covers the data to be collected for the collection task, the data will be collected to the latest after the service is started. in the partition.

Step 2: Read the intermediate table data, and collect spark tasks to read the intermediate table data into the memory, which is convenient for subsequent data processing.

Step 3: Incremental collection of the return form. This step is the first step in the collection of business table data. The collection is performed according to the update time of the business table data. The incremental data collected is the data that was added and changed yesterday, and is stored in the memory. To facilitate subsequent statistical summary.

Step 4: Collect non-yesterday new data. This step is the second step of business table data collection. Part of the stock data in the business table is queried by IN, and stored in the memory for subsequent statistical summary.

Step 5: Combine and filter the data, read the incremental data and part of the stock data collected in the first two steps, and after summarizing and deduplicating the two, filter the data with the intermediate table to exclude more data than the intermediate table.

Step 6: Write to the target table. Finally, the last data in the previous step is automatically matched with the table format according to the target HIVE table and target table format configured by the user, and finally the data is written to the target partition table.

It should be understood that although the steps in the flowcharts of FIGS. 2 and 4 are shown in sequence according to the arrows, these steps are not necessarily executed in the sequence shown by the arrows. Unless explicitly stated herein, the execution of these steps is not strictly limited to the order, and these steps may be performed in other orders. Moreover, at least a part of the steps in FIGS. 2 and 4 may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily executed and completed at the same time, but may be executed at different times. These sub-steps or stages The order of execution of the steps is not necessarily sequential, but may be performed alternately or alternately with other steps or at least a part of sub-steps or stages of other steps.

In one embodiment, as shown in FIG. 5, a data collection device is provided, including: a first data collection module 510, a field information acquisition module 520, a second data collection module 530, and a data integration processing module 540, wherein:

The first data collection module 510 is configured to obtain the specified time interval information, and collect the updated first data in the corresponding time interval from the service database according to the time interval information;

The field information acquisition module 520 is used to filter the target field information that does not match the time interval information from the field information stored in the intermediate table, the intermediate table includes the field information of the characteristic fields of the data to be collected, and the data to be collected is based on preset Collection logic determines the data from the business database;

The second data collection module 530 is configured to collect the second data from the business database according to the target field information;

The data integration processing module 540 is configured to perform data integration processing on the first data and the second data to obtain target data.

In one embodiment, the data integration processing module 540 is further configured to store the target data in the partition table of the corresponding partition in the data warehouse.

In one embodiment, the first data collection module 510 is further configured to obtain preset collection logic information before obtaining the specified time interval information; determine the data in the business database that conforms to the collection logic information as the data to be collected; The field information extracted from the characteristic fields of the collected data is stored in the intermediate table.

In one embodiment, the field information acquisition module 520 acquires task parameter information, reads the field information that matches the task parameter information from the intermediate table and stores it in the temporary table; filters out the field information that does not match the time interval information from the field information stored in the temporary table. Target field information.

In one embodiment, the second data collection module 530 uses the target field information as a value corresponding to the query condition to generate a structured query language; and collects the second data from the business database according to the structured query language.

In one embodiment, the data integration processing module 540 is further configured to perform data deduplication processing on the data after the data integration processing after performing the data integration processing on the first data and the second data.

In one embodiment, the data integration processing module 540 is further configured to perform data integration processing on the first data and the second data, compare the data after data integration processing with the data to be collected, and remove the data that is different from the data to be collected. data.

For the specific limitation of the data collection device, reference may be made to the limitation of the data collection method above, which will not be repeated here. Each module in the above-mentioned data acquisition device can be implemented in whole or in part by software, hardware and combinations thereof. The above modules can be embedded in or independent of the processor in the computer device in the form of hardware, or stored in the memory in the computer device in the form of software, so that the processor can call and execute the operations corresponding to the above modules.

In one embodiment, a computer device is provided, the computer device may be a server, and its internal structure diagram may be as shown in FIG. 6 . The computer device includes a processor, memory, a network interface, and a database connected by a system bus. Among them, the processor of the computer device is used to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium, an internal memory. The nonvolatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the execution of the operating system and computer programs in the non-volatile storage medium. The database of the computer device is used to store business data. The network interface of the computer device is used to communicate with an external terminal through a network connection. The computer program, when executed by the processor, implements a data acquisition method.

Those skilled in the art can understand that the structure shown in FIG. 6 is only a block diagram of a partial structure related to the solution of the present application, and does not constitute a limitation on the computer equipment to which the solution of the present application is applied. Include more or fewer components than shown in the figures, or combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is provided, which includes a memory, a processor, and a computer program stored in the memory and running on the processor. When the processor executes the computer program, the processor implements the following steps: acquiring specified time interval information , collect the first data updated in the corresponding time interval from the business database according to the time interval information; filter the target field information that does not match the time interval information from the field information stored in the intermediate table, and the intermediate table includes the characteristic fields of the data to be collected The data to be collected is the data determined from the business database according to the preset collection logic; the second data is collected from the business database according to the target field information; the data integration processing is performed on the first data and the second data to obtain target data.

In one embodiment, the processor further implements the following steps when executing the computer program: storing the target data in the partition table of the corresponding partition in the data warehouse.

In one embodiment, before the processor executes the computer program to achieve the acquisition of the specified time interval information, it further implements the following steps: acquiring preset acquisition logic information; determining data in the business database that conforms to the acquisition logic information as the data to be acquired; The field information extracted from the characteristic fields of the data to be collected is stored in the intermediate table.

In one embodiment, when the processor executes the computer program to filter the target field information that does not match the time interval information from the field information stored in the intermediate table, it specifically implements the following steps: acquiring task parameter information, reading and matching from the intermediate table The field information that matches the task parameter information is stored in the temporary table; the target field information that does not match the time interval information is filtered from the field information stored in the temporary table.

In one embodiment, when the processor executes the computer program to collect the second data from the business database according to the target field information, the following steps are specifically implemented: generating a structured query language by using the target field information as a value corresponding to the query condition; The query language collects the second data from the business database.

In one embodiment, after the processor executes the computer program to perform data integration processing on the first data and the second data, the processor further implements the following step: performing data deduplication processing on the data after the data integration processing.

In one embodiment, after the processor executes the computer program to perform data integration processing on the first data and the second data, it also implements the following steps: comparing the data after the data integration processing with the data to be collected, and removing the data from the data to be collected. different data.

In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored, and when the computer program is executed by a processor, the following steps are implemented: acquiring specified time interval information, and collecting information from a service database according to the time interval information The first data updated in the corresponding time interval; the target field information that does not match the time interval information is filtered from the field information stored in the intermediate table, and the intermediate table includes the field information of the characteristic fields of the data to be collected, and the data to be collected is based on the prediction. The set collection logic determines the data from the business database; collects the second data from the business database according to the target field information; performs data integration processing on the first data and the second data to obtain the target data.

In one embodiment, the computer program further implements the following steps when executed by the processor: storing the target data in the partition table of the corresponding partition in the data warehouse.

In one embodiment, before the computer program is executed by the processor to achieve the acquisition of the specified time interval information, the following steps are also implemented: acquiring preset acquisition logic information; determining data in the business database that conforms to the acquisition logic information as data to be acquired; The field information is extracted from the characteristic fields of the data to be collected and stored in the intermediate table.

In one embodiment, when the computer program is executed by the processor to filter the target field information that does not match the time interval information from the field information stored in the intermediate table, the following steps are specifically implemented: acquiring task parameter information, reading from the intermediate table The field information that matches the task parameter information is stored in the temporary table; the target field information that does not match the time interval information is filtered from the field information stored in the temporary table.

In one embodiment, when the computer program is executed by the processor to collect the second data from the business database according to the target field information, the following steps are specifically implemented: generating a structured query language by using the target field information as a value corresponding to a query condition; The second data is collected from the business database using a query language.

In one embodiment, after the computer program is executed by the processor to perform data integration processing on the first data and the second data, the following step is further implemented: performing data deduplication processing on the data after the data integration processing.

In one embodiment, after the computer program is executed by the processor to perform data integration processing on the first data and the second data, the following steps are also implemented: comparing the data after the data integration processing with the data to be collected, and removing the data with the data to be collected. data different data.

Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented by instructing relevant hardware through a computer program, and the computer program can be stored in a non-volatile computer-readable storage In the medium, when the computer program is executed, it may include the processes of the above-mentioned method embodiments. Wherein, any reference to memory, storage, database or other medium used in the various embodiments provided in this application may include non-volatile and/or volatile memory. Nonvolatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in various forms such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Road (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

The technical features of the above embodiments can be combined arbitrarily. In order to make the description simple, all possible combinations of the technical features in the above embodiments are not described. However, as long as there is no contradiction in the combination of these technical features It is considered to be the range described in this specification.

The above-mentioned embodiments only represent several embodiments of the present application, and the descriptions thereof are specific and detailed, but should not be construed as a limitation on the scope of the invention patent. It should be pointed out that for those skilled in the art, without departing from the concept of the present application, several modifications and improvements can be made, which all belong to the protection scope of the present application. Therefore, the scope of protection of the patent of the present application shall be subject to the appended claims.

Claims

A data collection method, the method comprising:

Obtain the specified time interval information, and collect the updated first data in the corresponding time interval from the business database according to the time interval information;

The target field information that does not match the time interval information is filtered from the field information stored in the intermediate table, where the intermediate table includes field information of characteristic fields of the data to be collected, and the data to be collected is based on a preset collection logic data determined from the business database;

Collect second data from the business database according to the target field information;

Perform data integration processing on the first data and the second data to obtain target data.
The method according to claim 1, wherein the method further comprises:

The target data is stored in the partition table of the corresponding partition in the data warehouse.
The method according to claim 2, wherein the intermediate table is a data table set in the data warehouse.
The method according to claim 1, characterized in that, before acquiring the specified time interval information, the method further comprises:

Obtain preset collection logic information;

Determining the data in the business database that conforms to the collection logic information as the data to be collected;

The field information is extracted from the characteristic fields of the data to be collected and stored in the intermediate table.
The method according to claim 1, wherein the filtering of target field information that does not match the time interval information from the field information stored in the intermediate table comprises:

Obtain task parameter information, read field information that matches the task parameter information from the intermediate table and store it in a temporary table;

The target field information that does not match the time interval information is filtered from the field information stored in the temporary table.
The method according to claim 1, wherein the collecting the second data from the service database according to the target field information comprises:

Using the target field information as a value corresponding to the query condition to generate a structured query language;

Collect second data from the business database according to the structured query language.
The method according to any one of claims 1 to 6, wherein after performing data integration processing on the first data and the second data, the method further comprises:

Data deduplication is performed on the data after the data integration process; and/or,

The data after data integration processing is compared with the data to be collected, and the data that is different from the data to be collected is removed.
A data collection device, characterized in that the device comprises:

a first data acquisition module, configured to acquire specified time interval information, and collect the updated first data in the corresponding time interval from the service database according to the time interval information;

A field information acquisition module, configured to filter target field information that does not match the time interval information from the field information stored in the intermediate table, where the intermediate table includes field information of characteristic fields of the data to be collected, the data to be collected is the data determined from the business database according to the preset collection logic;

a second data collection module, configured to collect second data from the business database according to the target field information;

A data integration processing module, configured to perform data integration processing on the first data and the second data to obtain target data.
A computer device, comprising a memory, a processor and a computer program stored in the memory and running on the processor, wherein the processor implements any one of claims 1 to 7 when executing the computer program the steps of the method.
A computer-readable storage medium on which a computer program is stored, characterized in that, when the computer program is executed by a processor, the steps of the method according to any one of claims 1 to 7 are implemented.