CN115455110A - Incremental data acquisition method and device, storage medium and processor - Google Patents

Incremental data acquisition method and device, storage medium and processor Download PDF

Info

Publication number
CN115455110A
CN115455110A CN202210910737.1A CN202210910737A CN115455110A CN 115455110 A CN115455110 A CN 115455110A CN 202210910737 A CN202210910737 A CN 202210910737A CN 115455110 A CN115455110 A CN 115455110A
Authority
CN
China
Prior art keywords
data
incremental data
incremental
cursor
client
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210910737.1A
Other languages
Chinese (zh)
Inventor
朱德斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianyi Cloud Technology Co Ltd
Original Assignee
Tianyi Cloud Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianyi Cloud Technology Co Ltd filed Critical Tianyi Cloud Technology Co Ltd
Priority to CN202210910737.1A priority Critical patent/CN115455110A/en
Publication of CN115455110A publication Critical patent/CN115455110A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2474Sequence data queries, e.g. querying versioned data

Abstract

The application discloses a method and a device for processing incremental data, a storage medium and a processor. The method comprises the following steps: the method comprises the steps that when a server monitors that incremental data are generated by target data, an incremental data acquisition signaling is sent to a client, wherein the client is used for requesting the incremental data; the server receives an incremental data request sent by the client, wherein the incremental data request comprises a data set cursor for acquiring incremental data last time by the client; and the server responds to the incremental data request, searches the incremental data according to the data set cursor and acquires the incremental data, wherein the data set cursor comprises a plurality of parameter items for searching the incremental data. The problem of low incremental data preparation efficiency caused by the fact that the server side needs to provide incremental data for each client side independently due to condition difference of a plurality of client sides for acquiring data sets when the client sides request the server side for data synchronization in the related technology is solved.

Description

Incremental data acquisition method and device, storage medium and processor
Technical Field
The present application relates to the field of data synchronization, and in particular, to a method and an apparatus for obtaining incremental data, a storage medium, and a processor.
Background
In the cloud service scene, the system comprises a cloud server and a plurality of clients, wherein the clients request data set synchronization from the cloud server according to requirements, and the data set synchronization comprises full synchronization or incremental synchronization. However, when the number of clients is large, the server needs to push incremental data through a message queue or a long connection. In this way, the condition difference of each client for acquiring the data set needs to be considered, and the problem of long incremental data preparation period exists; it is difficult to guarantee idempotency of client delta data. And the data scheme is obtained in a full amount, the client analyzes the data and has a memory bottleneck, and the server inquires the data and has the memory and disk pressure.
Aiming at the problem that in the prior art, when a client requests a server for data synchronization, the server needs to provide incremental data for each client independently due to condition difference of a plurality of clients for acquiring a data set, so that the incremental data preparation efficiency is low, an effective solution is not provided at present.
Disclosure of Invention
The present application mainly aims to provide an incremental data obtaining method and system, so as to solve the problem in the related art that when a client requests a server for data synchronization, conditions for obtaining a data set by a plurality of clients are different, so that the server needs to provide incremental data for each client individually, and the incremental data preparation efficiency is low.
In order to achieve the above object, according to an aspect of the present application, there is provided an incremental data acquisition method including: sending an incremental data acquisition signaling to a client when monitoring that incremental data are generated by target data, wherein the client is used for requesting the incremental data; receiving an incremental data request sent by the client, wherein the incremental data request comprises a data set cursor for acquiring incremental data last time by the client; and responding to the incremental data request, searching the incremental data according to the data set cursor, and acquiring the incremental data, wherein the data set cursor comprises a plurality of parameter items for searching the incremental data.
Optionally, in response to the incremental data request, searching the incremental data according to the data set cursor, and before obtaining the incremental data, the method further includes: judging whether the data set identifier in the data set cursor is consistent with the data set identifier currently used by the client, wherein the incremental data request further comprises: a dataset identification currently used by the client, the dataset identification identifying a modified version of the dataset; initializing a shift cursor of the target data under the condition that a data set identifier in the data set cursor is consistent with a data set identifier currently used by the client, wherein the data set cursor further comprises the shift cursor of the target data, and the shift cursor is used for searching incremental data of the target data; and under the condition that the data set identification in the data set cursor is inconsistent with the data set identification currently used by the client, replacing the data set identification in the data set cursor with the data set identification currently used by the client, and initializing the offset cursor of the target data.
Optionally, initializing the offset cursor of the target data includes: setting the searched starting time as the minimum creating time of the target data, setting the searched ending time as the current time, and judging whether a main key identifier between the starting time and the ending time is empty or not, wherein the offset vernier comprises the searched starting time and the searched ending time and a searched batch, and the batch corresponds to the main key identifier; under the condition that the primary key mark between the starting time and the ending time is empty, setting the searched batch as the initial batch number, and searching the target data; and under the condition that the primary key identification between the starting time and the termination is not empty, determining a batch number needing to be searched according to the primary key identification in the data set cursor, and continuously searching the dimension of the target data according to the batch number, wherein the data set cursor also comprises the primary key identification of the data set of the incremental data acquired last time by the client.
Optionally, obtaining search data of multiple dimensions by traversing the offset cursor, and after the search data is used as the incremental data, the method includes: monitoring whether the target data generates new delta data after the current time; and sending the search result to a client, and sending information whether to generate new incremental data to the client, wherein the client is used for determining whether to continue generating an incremental data acquisition request of the next batch according to the data set vernier at this time according to the information, and sending the request to a server for acquiring the new incremental data.
Optionally, responding to the incremental data request, searching the incremental data according to the data set cursor, and acquiring the incremental data includes: responding to the incremental data request, searching the incremental data of the corresponding target data according to the offset cursors to obtain a search result of the current batch, wherein the data set cursors comprise a plurality of offset cursors, and the offset cursors correspond to the dimensionality of the target data; determining search data of the dimensionality to which the target data belongs according to the search result; and acquiring search data of multiple dimensions as the incremental data by traversing the offset vernier.
Optionally, after the incremental data request is responded, and the corresponding incremental data of the target data is searched according to the offset cursor to obtain the search result of the current batch, the method further includes: judging whether the data volume in the search result is smaller than the maximum preset data volume searched in each batch set in the data set cursor; under the condition that the data volume is not smaller than the preset data volume, continuing to search for the next batch, and determining the search result of the next batch until the data volume is smaller than the preset data volume; and when the data volume is smaller than the preset data volume, finishing the search, and determining the search data of the dimension to which the target data belongs according to the search results of the searched batches.
Optionally, before initializing the offset cursor of the target data, the method further includes: and carrying out serialization processing on the data set cursor to generate a data set cursor object, wherein the data set cursor object comprises a plurality of offset cursor objects, and the offset cursor objects are used for executing initialization processing.
In order to achieve the above object, according to another aspect of the present application, an incremental data acquisition apparatus is provided. The device includes: the system comprises a monitoring module, a processing module and a processing module, wherein the monitoring module is used for sending an incremental data acquisition signaling to a client when monitoring that incremental data is generated by target data, and the client is used for requesting the incremental data; the receiving module is used for receiving an incremental data request sent by the client, wherein the incremental data request comprises a data set cursor for acquiring incremental data last time by the client; and the acquisition module is used for responding to the incremental data request, searching the incremental data according to the data set cursor and acquiring the incremental data, wherein the data set cursor comprises a plurality of parameter items for searching the incremental data.
According to another aspect of the present application, there is also provided a computer-readable storage medium storing a program, wherein the program executes the incremental data acquisition method of any one of the above.
According to another aspect of the present application, there is also provided an electronic device, comprising one or more processors and a memory for storing one or more programs, wherein when the one or more programs are executed by the one or more processors, the one or more processors are caused to implement the incremental data acquisition method of any one of the above.
According to the method and the device, when the situation that the incremental data are generated by the target data is monitored, an incremental data acquisition signaling is sent to the client, wherein the client is used for requesting the incremental data; receiving an incremental data request sent by a client, wherein the incremental data request comprises a data set cursor for acquiring incremental data last time by the client; and responding to the incremental data request, searching the incremental data according to the data set cursor, and acquiring the incremental data, wherein the data set cursor comprises a plurality of parameter items for searching the incremental data. After monitoring that incremental data are generated, a server sends an acquisition signaling to a client to remind the client that the incremental data are generated, if the incremental data are required to be acquired, a data request is sent, the incremental data are further searched and acquired according to a data set cursor of the data request, the efficiency of incremental data query is improved through the information of the data set cursor, the acquisition flow of the incremental data is started in real time through the acquisition signaling, and the actions of the client and the server are accurately matched, so that the incremental data preparation efficiency is improved, and the problems that in the related technology, when the client requests the server for data synchronization, the server needs to provide the incremental data for each client independently, the time acquired by the client cannot be completely matched with the preparation time of the server, and the incremental data preparation efficiency is low are solved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the application and, together with the description, serve to explain the application and are not intended to limit the application. In the drawings:
FIG. 1 is a flow chart of a method for incremental data acquisition according to an embodiment of the present application;
FIG. 2 is a flow chart of an incremental data acquisition method provided in accordance with an embodiment of the present application;
FIG. 3 is a schematic diagram of an incremental data acquisition batch provided in accordance with an embodiment of the present application;
FIG. 4 is a schematic diagram of an incremental data acquisition device provided in accordance with an embodiment of the present application;
fig. 5 is a schematic diagram of an electronic device provided according to an embodiment of the present application.
Detailed Description
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
In order to make the technical solutions of the present application better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only some embodiments of the present application, and not all embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application without making any creative effort shall fall within the protection scope of the present application.
It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used may be interchanged under appropriate circumstances such that embodiments of the application described herein may be used. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
It should be noted that relevant information (including but not limited to user equipment information, user personal information, etc.) and data (including but not limited to data for presentation, analyzed data, etc.) referred to in the present disclosure are information and data that are authorized by the user or sufficiently authorized by various parties. For example, an interface is provided between the system and the relevant user or organization, before obtaining the relevant information, an obtaining request needs to be sent to the user or organization through the interface, and after receiving the consent information fed back by the user or organization, the relevant information is obtained.
Examples
The present invention is described below with reference to preferred implementation steps, and fig. 1 is a flowchart of an incremental data acquisition method provided in an embodiment of the present application, and as shown in fig. 1, the method includes the following steps:
step S101, sending an incremental data acquisition signaling to a client when monitoring that the target data generates incremental data, wherein the client is used for requesting the incremental data;
step S102, receiving an incremental data request sent by a client, wherein the incremental data request comprises a data set cursor of the client for obtaining incremental data last time;
and S103, responding to the incremental data request, searching the incremental data according to the data set cursor, and acquiring the incremental data, wherein the data set cursor comprises a plurality of parameter items for searching the incremental data.
Through the steps, when the situation that the target data generates the incremental data is monitored, an incremental data acquisition signaling is sent to the client, wherein the client is used for requesting the incremental data; receiving an incremental data request sent by a client, wherein the incremental data request comprises a data set cursor for acquiring incremental data last time by the client; and responding to the incremental data request, searching the incremental data according to the data set cursor, and acquiring the incremental data, wherein the data set cursor comprises a plurality of parameter items for searching the incremental data. After monitoring that incremental data are generated, a server sends an acquisition signaling to a client to remind the client that the incremental data are generated, if the incremental data are required to be acquired, a data request is sent, the incremental data are further searched and acquired according to a data set cursor of the data request, the efficiency of incremental data query is improved through the information of the data set cursor, the acquisition flow of the incremental data is started in real time through the acquisition signaling, and the actions of the client and the server are accurately matched, so that the incremental data preparation efficiency is improved, and the problems that in the related technology, when the client requests the server for data synchronization, the server needs to provide the incremental data for each client independently, the time acquired by the client cannot be completely matched with the preparation time of the server, and the incremental data preparation efficiency is low are solved.
The main body of the above steps may be a server device, which has the capability of data processing and data operation, and may be a server, a processor, a calculator, and the like. In the cloud service scenario of this embodiment, the server device may provide services for multiple clients at the same time. And a plurality of clients request the server for incremental data to synchronize according to requirements. However, when the client requests the server for data synchronization, conditions for acquiring data sets by a plurality of clients are different, so that the server needs to provide incremental data for each client separately, and the time acquired by the client cannot be completely matched with the preparation time of the server, which results in low incremental data preparation efficiency.
In the embodiment, the server autonomously monitors the incremental data, sends the incremental data acquisition signaling to the client when monitoring that the target data generates the incremental data, and the client knows that the server generates the incremental data after receiving the acquisition signaling, so that the client can set the incremental data acquisition triggering mode according to the user requirements. The request of incremental data can be automatically sent to the server once the acquisition signaling is received, so that the consistency of the data of the client and the server can be ensured to the maximum extent, and the synchronization instantaneity is ensured. Of course, after receiving the acquisition signaling, it may be determined whether to initiate the request for incremental data according to the requirement.
The dataset cursor may include a dataSetId, i.e., the dataset id of the last time the incremental data was acquired. cursor _ i, i.e., the offset cursor i of the target data in a certain dimension. Cursor TbId, which is the table primary key cursor id within the time window. frommime, i.e. the start time of a time window, toTime, i.e. the end time of a time window. The parameters included in the data cursor may play a role in the subsequent search of the incremental data by the server.
In this embodiment, by actively sending the incremental data generation notification to the client, the time when the client acquires the incremental data can be concentrated in a short time after the incremental data is sent, so that the time when the client requests the incremental data is matched with the time when the server generates the incremental data.
In step S102, the server receives an incremental data request sent by the client, where the incremental data request includes a data set cursor for the client to obtain incremental data last time. And after receiving the incremental data request, responding to the incremental data request, searching the incremental data according to the data set cursor to obtain the incremental data, wherein the data set cursor comprises a plurality of parameter items for searching the incremental data.
Optionally, in response to the incremental data request, the incremental data is searched according to the data set cursor, and before the incremental data is acquired, the method further includes: judging whether the data set identifier in the data set cursor is consistent with the data set identifier currently used by the client, wherein the incremental data request further comprises: a dataset identification currently used by the client, the dataset identification identifying a modified version of the dataset. The purpose is to determine whether the dataset id recorded in the dataset cursor matches the dataset id currently in use by the client.
If the data set identifier in the data set cursor is consistent with the data set identifier in the data set cursor, the data set currently used by the client can be determined according to the data set identifier in the data set cursor. If the data set identifier in the data set cursor is inconsistent with the data set identifier in the data set cursor, the data set currently used by the client cannot be determined according to the data set identifier in the data set cursor. Therefore, the data set identifier in the data set cursor can be replaced with the data set identifier currently used by the client in the case of inconsistency.
It should be noted that, in the step of determining whether the data set identifier in the data set cursor is consistent with the data set identifier currently used by the client, the data set identifier currently used by the client needs to be determined. The method can be executed by a client side and can also be executed by a server side. Under the condition of execution by the client, the client requests the server to acquire the currently used data set identifier in advance.
Initializing a shift cursor of the target data under the condition that the data set identification in the data set cursor is consistent with the data set identification currently used by the client, wherein the data set cursor also comprises the shift cursor of the target data, and the shift cursor is used for searching incremental data of the target data; and under the condition that the data set identification in the data set cursor is inconsistent with the data set identification currently used by the client, replacing the data set identification in the data set cursor with the data set identification currently used by the client, and initializing the offset cursor of the target data.
The initialization of the offset cursor for the target data is provided for the lookup of the incremental data. Optionally, initializing the offset cursor of the target data includes: setting the initial time of searching as the minimum creation time of the target data, setting the ending time of searching as the current time, and judging whether the main key identification between the initial time and the ending time is empty, wherein the offset vernier comprises the initial time and the ending time of searching and the searched batch, and the batch corresponds to the main key identification.
If the primary key identifier between the start time and the end time is null, it indicates that the data set in which the target data is located has not been acquired, and all data in the data set in which the target data is located should be taken as the incremental data acquired this time. It should be noted that, when the client requests to perform incremental data acquisition, the incremental data request sent by the client carries the maximum data size, such as limit, of the data acquired in a single time. The incremental data needs to be acquired in sequential batches.
In the case that the data set of the target data is not acquired from the beginning of creation to the present, the target data needs to be acquired from the initial batch, that is, from the beginning. That is, the searched batch is set as the most initial batch number, and the target data is searched.
If the primary key identifier is not null, it indicates that there is already acquired data in the data set where the target data is located between the start time and the end time, and the data in the data set where the target data is located except the acquired data is the incremental data acquired this time. Determining the batch number to be searched according to the primary key identification in the data set cursor to determine that the data of the batches have been obtained, and then continuously searching the dimension of the target data according to the batch number, that is, starting to obtain the incremental data after the obtained batch. The dataset cursor also includes the primary key identifier of the dataset of the incremental data last acquired by the client, i.e., the dataSetId, for identifying the already acquired data.
Optionally, after obtaining search data of multiple dimensions by traversing the offset cursor as incremental data, the method includes: monitoring whether the target data generates new incremental data after the current time; and sending the search result to the client, and sending information whether new incremental data is generated to the client, wherein the client is used for determining whether to continue to generate an incremental data acquisition request of the next batch according to the current data set cursor according to the information, and sending the request to the server for acquiring the new incremental data.
It should be noted that generation of the incremental data may be persistent, and obtaining the incremental data after detecting generation of the incremental data may not completely obtain the current incremental data, which results in re-triggering sending of the obtaining signaling, which is a waste of resources in terms of a flow and an optimization space in terms of efficiency. Therefore, in this embodiment, after the incremental data is acquired, whether new incremental data is generated is automatically detected, and if the new incremental data is generated, the acquired incremental data is sent to the client, and meanwhile, acquisition of the new incremental data is triggered in parallel. Therefore, the generated incremental data can be automatically and continuously acquired under the condition that the incremental data are continuously generated, the acquisition efficiency of the incremental data is improved, and the acquisition flow of the incremental data is optimized.
Optionally, responding to the incremental data request, searching the incremental data according to the data set cursor, and acquiring the incremental data includes: responding to the incremental data request, searching the incremental data of the corresponding target data according to the offset cursors to obtain the searching result of the current batch, wherein the data set cursors comprise a plurality of offset cursors, and the offset cursors correspond to the dimensionality of the target data; determining search data of the dimensionality to which the target data belongs according to the search result; and acquiring search data of multiple dimensions as incremental data by traversing the offset cursor.
The dimension of the data can be understood as the type of the target data because the business system is very complicated and has a lot of businesses, and the target data as the incremental data may be different types of data of a plurality of different businesses. And characterizing information such as a data set or a use scene of the target data, a data type and the like through the dimension of the data.
The offset cursor can be understood as a search tool for searching in the data set in which the target data is located. And (4) performing data search on the data set by combining the search batches and the maximum data volume acquired by a single batch, and acquiring all data after the last acquired data batch as the incremental data of the target data.
Optionally, after responding to the incremental data request and searching the incremental data of the corresponding target data according to the offset cursor to obtain the search result of the current batch, the method further includes: judging whether the data volume in the search result is smaller than the maximum preset data volume searched in each batch set in the data set vernier; under the condition that the data volume is not less than the preset data volume, continuing to search for the next batch, and determining the search result of the next batch until the data volume is less than the preset data volume; and when the data volume is smaller than the preset data volume, finishing the searching, and determining the searching data of the dimensionality to which the target data belongs according to the searching results of the searched batches.
It should be noted that, when performing incremental data lookup, taking a batch as a unit, after each batch is looked up, there is a lookup result, which includes data looked up by the batch. However, the generation of the incremental data may be persistent, and the generated incremental data may not have an identifier to inform that the incremental data is finished and how to finish the lookup of the incremental data. According to the data volume of the search result and the maximum data volume of the acquired data of a single batch, whether the search of the incremental data is completed can be determined.
Under the condition that the data volume is not less than the preset data volume, the data volume of the query result is usually equal to the preset data volume, because the data volume of the query result is not greater than the maximum data volume of the acquired data in a single batch, which indicates that the incremental data acquired this time cannot be determined to be completed, the search of the next batch is continued, and the search result of the next batch is determined until the data volume is less than the preset data volume. And when the data volume is smaller than the preset data volume, the incremental data search is completed, if the residual data volume is not enough to reach the maximum data volume acquired in the batch, the search is finished, and the search data of the dimensionality of the target data is determined according to the search results of the searched batches.
Optionally, before initializing the offset cursor of the target data, the method further includes: and carrying out serialization processing on the data set cursor to generate a data set cursor object, wherein the data set cursor object comprises a plurality of offset cursor objects, and the offset cursor objects are used for executing initialization processing.
Considering that the format of the data set cursor is a field and does not necessarily conform to the usage scenario of the server, the data set cursor can be appropriately processed so that the server can effectively use the data set cursor, for example, the server serializes the data set cursor into a data set cursor object, such as a json character string stored by the client, the backend is implemented by java, and a component serialized into an object by json can be considered.
It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer-executable instructions and that, although a logical order is illustrated in the flowcharts, in some cases, the steps illustrated or described may be performed in an order different than presented herein.
It should be noted that the present application also provides an alternative embodiment, which is described in detail below.
The embodiment provides an incremental data acquisition scheme which supports multi-dimensional acquisition based on time intervals, main key cursors, sliding windows and batch acquisition. The method and the device are beneficial to ensuring the instantaneity of the client for acquiring the incremental data and improving the performance of the server for calculating the incremental data. The scheme can trigger the acquisition increment updating by signaling, and ensures the real-time performance and the query performance of data acquisition.
Fig. 2 is a flowchart of an incremental data acquisition method provided in an embodiment of the present application, and as shown in fig. 2, the incremental data acquisition flow includes the following steps.
1) The server side judges that some dimensions trigger the addition, the update and the deletion of data records, namely, a specified dimension increment data acquisition signaling is sent;
2) The client receives an increment acquisition data signaling sent by the server and prepares to initiate data set query, the query parameter format is as shown in table 1, and table 1 is a client query parameter description table:
table 1 client side query parameter description table
Noun (name) Explanation of the invention
myDataSetId Client current dataset id
limit Query batch size
dataSetCursor Data set cursor, server generation, client storage
The client query parameter structure is exemplified as follows:
Figure BDA0003773891910000091
3) The client transmits a data set cursor (dataSetCursor) to inquire the incremental data set. It should be noted that, when a query is first made or a data set id (dataSetId) is changed, a certain dimension data offset cursor i (cursor _ i) is set to null;
the data structure of the dataset cursor includes dataSetId: dataset id (last acquired incremental data). cursor _ i: some dimension data is offset by a cursor i. cursorTbId: table primary key cursor id within the time window. fromTime: the start time of the time window. toTime: the expiration of the time window. Specifically, as shown in Table 2, table 2 is a data structure description of the data set cursor.
TABLE 2 data Structure description of data set Cursor
Figure BDA0003773891910000101
The data structure of the dataset cursor is exemplified as follows:
Figure BDA0003773891910000102
4) The server receives a data set query request of the client and prepares query data;
5) Serializing a data set cursor (dataSetCursor) into a data set cursor object (dataSetCursorObj) by a server, wherein the data set cursor object (dataSetCursorObj) is initialized by properties of a data set cursor object (dataSetCursorObj) 7) after the server is realized by java and can be considered as a component of the object serialized by json, such as json character strings stored by a client;
6) Judging whether a data set id (dataSetId) in a data set cursor object (dataSetCursorObj) is equal to a data set id (myDataSetId) in a parameter or not, and if the data set id (dataSetId) is not equal to the data set id (myDataSetId), setting a certain dimension data offset cursor i (cursor _ i) as an object with the attribute of null;
7) Initializing a certain dimension data offset cursor i (cursor _ i);
the initialization algorithm references the following pseudo-code:
hasNext:=false
if fromTime==null then fromTime:=#{min_time}
if toTime==null then toTime:=#{current_t ime}
if cursorTbId==null then cursorTbId:=0L else hasNext:=true
fig. 3 is a schematic diagram of an incremental data acquisition batch according to an embodiment of the present application, as shown in fig. 3, when currendbid in the query request parameter is not null, which indicates that the query request parameter is moving within a time window, hasNext: = true, as shown in fig. 3 for the second batch and the third batch; otherwise, hasNext = false, and judge hasNext according to the number of records of the query result, as shown in the first batch of FIG. 3.
The initialization algorithm has attribute information of a write server, and specifically, as shown in table 3, table 3 is a server attribute information table.
Table 3 server attribute information table
Noun (name) Explanation of the invention
minTime Minimum time of service start
currentTime Current time
8) And executing the incremental query, and querying in the database through the query grammar. Query syntax example:
select tb.*
from tb
where 1=1
and tb.data_set_id=#{dataSetId}
and tb.modify_time>#{fromTime}
and tb.modify_time<=#{endTime}
and tb.tb_id>#{cursorTbId}
order by tb.tb_id asc
limit 0,#{limit};
the data table in the database is provided with a plurality of columns, and the specific meaning of the data table is shown in table 4, and table 4 is a definition table of each column of the data table in the database.
TABLE 4 definition tables for the columns of the data sheet in the database
Noun (name) Explanation of the invention
tb Data sheet
tb_id Table primary key id
modify_time Update time of record
9) Analyzing the query result, as shown in fig. 2, there are two branches a and b:
a. if the query result exists, judging whether the record number (sizeOf (dataSet)) of the query result is greater than or equal to # { limit }; if the current time window is true, namely the number of the query result records is greater than or equal to # { limit }, the current time window may have data, and a batch is acquired by using the current time window in the next query; and (3) assigning pseudo codes to the attributes of the user-defined cursor i:
fromTime is unchanged, toTime is unchanged, currTbId: = max (tb _ id), hasNext: = true;
if false, i.e. the number of query result records is less than # { limit }, go to step b.
b. Showing that the current time window has no data, moving the time window for the next query, and resetting cursorTbId; and (3) assigning pseudo codes to the attributes of the user-defined cursor i:
fromTime = Time, time: = null, currSorTbId =0L, hasNext: = hasNext | | | false; (third batch as in FIG. 3)
10 A serialized data set cursor object (dataSetCursorObj) is a data set cursor (dataSetCursor), and a string compression algorithm can be considered, so that the data set cursor is as compact as possible;
11 The structure is shown in table 5, table 5 is a description table of the query result data structure, and includes a data set id (dataSetId), a data set cursor (datasetcurser), and a query data set (dataSet _ i);
table 5 query results data structure description table
Noun (name) Explanation of the invention
hasNext Whether the client continues to inquire the next batch
dataSet_i Incremental data set of a dimension
An example of a query result data structure is as follows:
Figure BDA0003773891910000121
12 The client processes the query results and stores the dataset cursor;
13 The client judges the hasNext attribute in the query result, if the attribute is true, the client jumps to the step 3), and if the attribute is not true, the client ends the query;
the signaling mechanism of asynchronous communication between the server and the client in the embodiment is used for storing a data set cursor data structure of the query progress, supporting a table structure of incremental query, executing an incremental query SQL grammar and judging an incremental data query batch. The table structure supporting incremental query realizes the execution of the SQL syntax of the incremental query, and ensures the real-time property of the client for acquiring the data set based on the signaling drive. And designing a data structure of a data set cursor, judging an algorithm for querying batches by incremental data, and judging whether a next batch exists or not by the algorithm through comparing a result set so as to reduce the query times.
The embodiment of the present application further provides an incremental data obtaining device, and it should be noted that the incremental data obtaining device in the embodiment of the present application may be used to execute the method for obtaining incremental data provided in the embodiment of the present application. The incremental data acquisition device provided by the embodiment of the present application is described below.
Fig. 4 is a schematic diagram of an incremental data acquisition apparatus according to an embodiment of the present application, where as shown in fig. 4, the apparatus includes: a monitoring module 42, a receiving module 44, and an obtaining module 46, which are described in detail below.
The monitoring module 42 is configured to send an incremental data acquisition signaling to a client when monitoring that incremental data is generated in target data, where the client is configured to request the incremental data; a receiving module 44, configured to be connected to the monitoring module 42, and configured to receive an incremental data request sent by the client, where the incremental data request includes a data set cursor for the client to obtain incremental data last time; an obtaining module 46, configured to be connected to the receiving module 44, and configured to, in response to the incremental data request, search the incremental data according to the data set cursor, and obtain the incremental data, where the data set cursor includes a plurality of parameter items for searching the incremental data.
The incremental data acquisition device provided by the embodiment of the application sends an incremental data acquisition signaling to a client when monitoring that the target data generates incremental data, wherein the client is used for requesting the incremental data; receiving an incremental data request sent by a client, wherein the incremental data request comprises a data set cursor for acquiring incremental data last time by the client; and responding to the incremental data request, searching the incremental data according to the data set cursor, and acquiring the incremental data, wherein the data set cursor comprises a plurality of parameter items for searching the incremental data. After monitoring that incremental data are generated, a server sends an acquisition signaling to a client to remind the client that the incremental data are generated, if the incremental data are required to be acquired, a data request is sent, the incremental data are further searched and acquired according to a data set cursor of the data request, the efficiency of incremental data query is improved through the information of the data set cursor, the acquisition flow of the incremental data is started in real time through the acquisition signaling, and the actions of the client and the server are accurately matched, so that the incremental data preparation efficiency is improved, and the problems that in the related technology, when the client requests the server for data synchronization, the server needs to provide the incremental data for each client independently, the time acquired by the client cannot be completely matched with the preparation time of the server, and the incremental data preparation efficiency is low are solved.
The incremental data acquiring device comprises a processor and a memory, and the monitoring module 42, the receiving module 44, the acquiring module 46, and the like are stored in the memory as program units, and the processor executes the program units stored in the memory to realize corresponding functions.
The processor comprises a kernel, and the kernel calls the corresponding program unit from the memory. The kernel can be set to be one or more than one, and the problem of low incremental data preparation efficiency caused by the fact that the server needs to provide incremental data for each client independently due to the fact that conditions of a plurality of clients for acquiring a data set are different when the clients request the server for data synchronization in the related art is solved by adjusting kernel parameters.
The memory may include volatile memory in a computer readable medium, random Access Memory (RAM) and/or nonvolatile memory such as Read Only Memory (ROM) or flash memory (flash RAM), and the memory includes at least one memory chip.
An embodiment of the present invention provides a computer-readable storage medium on which a program is stored, the program implementing an incremental data acquisition method when executed by a processor.
The embodiment of the invention provides a processor, which is used for running a program, wherein an incremental data acquisition method is executed when the program runs.
Fig. 5 is a schematic diagram of an electronic device according to an embodiment of the present application, and as shown in fig. 5, an embodiment of the present application provides an electronic device 50, which includes a processor, a memory, and a program stored in the memory and running on the processor, and when the processor executes the program, the processor implements any of the steps of the method.
The device in the application can be a server, a PC, a PAD, a mobile phone and the like.
The present application further provides a computer program product adapted to perform a program for initializing any of the above method steps when executed on an incremental data acquisition device.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable incremental data acquisition device to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable incremental data acquisition device, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable incremental data acquisition device to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable incremental data acquisition device to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). The memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, method, article, or apparatus that comprises the element.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The above are merely examples of the present application and are not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims (10)

1. An incremental data acquisition method, comprising:
sending an incremental data acquisition signaling to a client when monitoring that incremental data are generated by target data, wherein the client is used for requesting the incremental data;
receiving an incremental data request sent by the client, wherein the incremental data request comprises a data set cursor for acquiring incremental data last time by the client;
and responding to the incremental data request, searching the incremental data according to the data set cursor, and acquiring the incremental data, wherein the data set cursor comprises a plurality of parameter items for searching the incremental data.
2. The method of claim 1, wherein in response to the request for incremental data, prior to retrieving the incremental data by looking up the incremental data based on the dataset cursor, the method further comprises:
judging whether the data set identifier in the data set cursor is consistent with the data set identifier currently used by the client, wherein the incremental data request further comprises: a dataset identification currently used by the client, the dataset identification identifying a modified version of the dataset;
initializing a shift cursor of the target data under the condition that a data set identifier in the data set cursor is consistent with a data set identifier currently used by the client, wherein the data set cursor further comprises the shift cursor of the target data, and the shift cursor is used for searching incremental data of the target data;
and under the condition that the data set identification in the data set cursor is inconsistent with the data set identification currently used by the client, replacing the data set identification in the data set cursor with the data set identification currently used by the client, and initializing the offset cursor of the target data.
3. The method of claim 2, wherein initializing an offset cursor of the target data comprises:
setting the searched starting time as the minimum creating time of the target data, setting the searched ending time as the current time, and judging whether a main key identifier between the starting time and the ending time is empty or not, wherein the offset vernier comprises the searched starting time and the searched ending time and a searched batch, and the batch corresponds to the main key identifier;
under the condition that the primary key mark between the starting time and the ending time is empty, setting the searched batch as the initial batch number, and searching the target data;
and under the condition that the main key identification between the starting time and the ending time is not empty, determining a batch number needing to be searched according to the main key identification in the data set cursor, and continuously searching the dimensionality of the target data according to the batch number, wherein the data set cursor also comprises the main key identification of the data set of the incremental data acquired last time by the client.
4. The method of claim 3, wherein obtaining lookup data for a plurality of dimensions by traversing the offset cursor as the incremental data comprises:
monitoring whether the target data generates new delta data after the current time;
and sending the search result to a client, and sending information whether to generate new incremental data to the client, wherein the client is used for determining whether to continue generating an incremental data acquisition request of the next batch according to the data set vernier at this time according to the information, and sending the request to a server for acquiring the new incremental data.
5. The method of claim 2, wherein in response to the request for incremental data, looking up the incremental data according to the dataset cursor, obtaining the incremental data comprises:
responding to the incremental data request, searching the incremental data of the corresponding target data according to the offset cursors to obtain a search result of the current batch, wherein the data set cursors comprise a plurality of offset cursors, and the offset cursors correspond to the dimensionality of the target data;
determining search data of the dimensionality to which the target data belongs according to the search result;
and acquiring search data of multiple dimensions as the incremental data by traversing the offset vernier.
6. The method of claim 4, wherein in response to the incremental data request, searching the corresponding incremental data of the target data according to the offset cursor, and after obtaining the search result of the current batch, the method further comprises:
judging whether the data volume in the search result is smaller than the maximum preset data volume searched in each batch set in the data set cursor;
under the condition that the data volume is not smaller than the preset data volume, continuing to search for the next batch, and determining the search result of the next batch until the data volume is smaller than the preset data volume;
and when the data volume is smaller than the preset data volume, finishing the search, and determining the search data of the dimension to which the target data belongs according to the search results of the searched batches.
7. The method of claim 2, wherein prior to initializing the offset vernier for the target data, the method further comprises:
and carrying out serialization processing on the data set cursor to generate a data set cursor object, wherein the data set cursor object comprises a plurality of offset cursor objects, and the offset cursor objects are used for executing initialization processing.
8. An incremental data acquisition apparatus, comprising:
the system comprises a monitoring module, a processing module and a processing module, wherein the monitoring module is used for sending an incremental data acquisition signaling to a client when monitoring that incremental data is generated by target data, and the client is used for requesting the incremental data;
the receiving module is used for receiving an incremental data request sent by the client, wherein the incremental data request comprises a data set cursor for acquiring incremental data last time by the client;
and the acquisition module is used for responding to the incremental data request, searching the incremental data according to the data set cursor and acquiring the incremental data, wherein the data set cursor comprises a plurality of parameter items for searching the incremental data.
9. A computer-readable storage medium characterized by storing a program, wherein the program executes the incremental data acquisition method of any one of claims 1 to 7.
10. An electronic device comprising one or more processors and memory storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the incremental data acquisition method of any one of claims 1 to 7.
CN202210910737.1A 2022-07-29 2022-07-29 Incremental data acquisition method and device, storage medium and processor Pending CN115455110A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210910737.1A CN115455110A (en) 2022-07-29 2022-07-29 Incremental data acquisition method and device, storage medium and processor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210910737.1A CN115455110A (en) 2022-07-29 2022-07-29 Incremental data acquisition method and device, storage medium and processor

Publications (1)

Publication Number Publication Date
CN115455110A true CN115455110A (en) 2022-12-09

Family

ID=84297064

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210910737.1A Pending CN115455110A (en) 2022-07-29 2022-07-29 Incremental data acquisition method and device, storage medium and processor

Country Status (1)

Country Link
CN (1) CN115455110A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116226153A (en) * 2023-05-05 2023-06-06 中国工商银行股份有限公司 Data updating method and device, processor and electronic equipment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116226153A (en) * 2023-05-05 2023-06-06 中国工商银行股份有限公司 Data updating method and device, processor and electronic equipment
CN116226153B (en) * 2023-05-05 2023-08-11 中国工商银行股份有限公司 Data updating method and device, processor and electronic equipment

Similar Documents

Publication Publication Date Title
US11550769B2 (en) Data processing method, apparatus, and system
JP7130600B2 (en) Implementing semi-structured data as first-class database elements
US9778991B2 (en) Exporting and importing database tables in a multi-user database environment
US9563426B1 (en) Partitioned key-value store with atomic memory operations
US11157445B2 (en) Indexing implementing method and system in file storage
CN107229619B (en) Method and device for counting and displaying calling condition of internet service link
US9418094B2 (en) Method and apparatus for performing multi-stage table updates
CN109766349B (en) Task duplicate prevention method, device, computer equipment and storage medium
CN106326309B (en) Data query method and device
EP3120261A1 (en) Dependency-aware transaction batching for data replication
US20130304723A1 (en) Changing the compression level of query plans
CN106899654B (en) Sequence value generation method, device and system
TW201800967A (en) Method and device for processing distributed streaming data
CN111324665B (en) Log playback method and device
CN111046036A (en) Data synchronization method, device, system and storage medium
CN107423037B (en) Application program interface positioning method and device
WO2023232120A1 (en) Data processing method, electronic device, and storage medium
CN115455110A (en) Incremental data acquisition method and device, storage medium and processor
CN114372102A (en) Data analysis method and device, storage medium and electronic equipment
WO2017157111A1 (en) Method, device and system for preventing memory data loss
CN114297204A (en) Data storage and retrieval method and device for heterogeneous data source
CN107220265B (en) Database statement compiling and executing method and device
CN111221814B (en) Method, device and equipment for constructing secondary index
CN104317820B (en) Statistical method and device for report forms
US11568067B2 (en) Smart direct access

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination