CN112181979A - Data updating method and device, storage medium and electronic equipment - Google Patents

Data updating method and device, storage medium and electronic equipment Download PDF

Info

Publication number
CN112181979A
CN112181979A CN202010927867.7A CN202010927867A CN112181979A CN 112181979 A CN112181979 A CN 112181979A CN 202010927867 A CN202010927867 A CN 202010927867A CN 112181979 A CN112181979 A CN 112181979A
Authority
CN
China
Prior art keywords
data
source
updated
data source
changed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010927867.7A
Other languages
Chinese (zh)
Other versions
CN112181979B (en
Inventor
刘行
张桂贤
朱茵茵
王炜
陈超
张俊浩
张弓
王仲远
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sankuai Online Technology Co Ltd
Original Assignee
Beijing Sankuai Online Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sankuai Online Technology Co Ltd filed Critical Beijing Sankuai Online Technology Co Ltd
Priority to CN202010927867.7A priority Critical patent/CN112181979B/en
Publication of CN112181979A publication Critical patent/CN112181979A/en
Application granted granted Critical
Publication of CN112181979B publication Critical patent/CN112181979B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2379Updates performed during online database operations; commit processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the specification receives a data change message sent by any data source depending on a first data wide table according to a distributed system. And under the conditions of complex data source dependency relationship, large data volume and more real-time updates, determining data corresponding to the changed data in the main data source as the data to be updated through the data change message, and determining a main key of the data to be updated. And acquiring data corresponding to the primary key of the data to be updated in each data source depended by the first data wide table, updating the first data wide table, ensuring that the data changed by the changed data source is quickly found in the data source with complex relation, preferentially updating important data, accelerating the updating speed by using a plurality of threads, and quickly updating the data into the wide table, thereby effectively improving the updating efficiency of the data.

Description

Data updating method and device, storage medium and electronic equipment
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a data updating method, an apparatus, a storage medium, and an electronic device.
Background
With the continuous development of internet technology, a lot of internet-based services are produced.
In the search service, under the condition that the data volume is large and real-time change exists, data of different sources and storage types are integrated in a wide table, so that the search efficiency of a search engine can be increased.
How to update the data of all data sources into a wide table quickly and how to find the data changed by the changed data sources in the data sources with complex relationships is a problem to be solved urgently and needed to be faced by all data updating systems.
Disclosure of Invention
Embodiments of the present disclosure provide a data updating method, an apparatus, a storage medium, and an electronic device, so as to partially solve the problems in the prior art.
The embodiment of the specification adopts the following technical scheme:
in a data update method provided by this specification, a distributed system is configured to maintain a first data width table, where the first data width table depends on at least one data source, and the at least one data source includes a primary data source and a non-primary data source, and includes:
for any data source on which the first data wide table depends, the distributed system receives a data change message sent by the data source, wherein the data change message is sent when data in the data source is changed;
determining a primary key of the changed data in the data source according to the data change message;
determining data corresponding to the changed data in the main data source as data to be updated according to the main key of the changed data;
determining a primary key of the data to be updated in the primary data source;
acquiring data corresponding to the primary key of the data to be updated in each data source depended by the first data wide table;
and updating the first data wide table according to the main key of the data to be updated and the acquired data.
Optionally, the distributed system subscribes to a data change message of the at least one data source in advance;
the receiving, by the distributed system, the data change message sent by the data source specifically includes:
and the distributed system receives a data change message sent by the subscribed data source.
Optionally, determining data in the primary data source corresponding to the changed data specifically includes:
and in the main data source, determining data which has an association relation with a main key of the changed data as the data corresponding to the changed data in the main data source.
Optionally, acquiring, in each data source on which the first data wide table depends, data corresponding to the primary key of the data to be updated, specifically including:
adding the primary key of the data to be updated into a queue of the priority according to the priority of the data source where the changed data is located;
and aiming at each queue, acquiring data corresponding to the primary key in the queue from each data source depending on the first data wide table according to the priority of the queue.
Optionally, before adding the primary key of the data to be updated to the queue of the priority, the method further includes:
and determining that the primary key of the data to be updated does not exist in the queue of the priority.
Optionally, acquiring, in each data source on which the first data wide table depends, data corresponding to the primary key of the data to be updated, specifically including:
determining a thread corresponding to the primary key of the data to be updated according to the primary key of the data to be updated;
and acquiring data corresponding to the primary key of the data to be updated in each data source depended by the first data wide table through the determined thread.
Optionally, the updating the first data wide table according to the primary key of the data to be updated and the acquired data specifically includes:
and updating the first data wide table by taking the current time as a time stamp of the data corresponding to the primary key of the data to be updated according to the primary key of the data to be updated and the acquired data.
Optionally, when the primary data source is changed, the first data width table is a newly created data width table, and the method further includes:
and acquiring the data which is not changed in all the data sources, and updating the first data wide table for the time stamp of the data of all the data sources by using the time for starting to acquire the data which is not changed in all the data sources.
Optionally, when the non-primary data source is changed, the first data width table is a data width table created based on a snapshot of a second data width table, and the method further includes:
taking a non-main data source which is changed as a changed data source, and acquiring unchanged data from the changed data source as offline data;
and updating the first data width table by taking the time for starting to acquire the offline data as the time stamp of the data of the changed data source.
A data update apparatus provided herein is configured to maintain a first data width table, the first data width table being dependent on at least one data source, the at least one data source including a primary data source and a non-primary data source; the device comprises:
a receiving module, configured to receive, for any data source on which the first data wide table depends, a data change message sent by the data source, where the data change message is sent when data in the data source is changed;
the first determining module is used for determining a primary key of the changed data in the data source according to the data change message;
a second determining module, configured to determine, according to the primary key of the changed data, data in the primary data source corresponding to the changed data, as data to be updated, and determine the primary key of the data to be updated in the primary data source;
the acquisition module is used for acquiring data corresponding to the primary key of the data to be updated in each data source depended by the first data wide table;
and the updating module is used for updating the first data wide table according to the primary key of the data to be updated and the acquired data.
The present specification provides a computer-readable storage medium storing a computer program which, when executed by a processor, implements the data update method described above.
The present specification provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the processor implements the data updating method described above.
The embodiment of the specification adopts at least one technical scheme which can achieve the following beneficial effects:
the embodiment of the specification performs data updating through a distributed system to improve the efficiency of data updating when the data volume is large and the updating is large in real time, and further, receives a data change message sent by any data source depending on a first data wide table according to the distributed system, determines data corresponding to the changed data in a main data source as data to be updated through the data change message, and determines a main key of the data to be updated.
Drawings
The accompanying drawings, which are included to provide a further understanding of the specification and are incorporated in and constitute a part of this specification, illustrate embodiments of the specification and together with the description serve to explain the specification and not to limit the specification in a non-limiting sense. In the drawings:
FIG. 1 is a system framework diagram of a simple search service provided by embodiments of the present description;
FIG. 2 is a system framework diagram of a search engine supported by a data wide table according to an embodiment of the present disclosure;
FIG. 3 is a schematic diagram of a data update process provided by an embodiment of the present disclosure;
FIG. 4 is a schematic diagram illustrating a primary data source change data update flow provided by an embodiment of the present specification;
FIG. 5 is a schematic diagram illustrating a flow of updating data of a non-primary data source change provided by an embodiment of the present specification;
fig. 6 is a schematic structural diagram of a data updating apparatus according to an embodiment of the present disclosure;
fig. 7 is a schematic structural diagram of an electronic device provided in an embodiment of this specification.
Detailed Description
In order to make the objects, technical solutions and advantages of the present disclosure more clear, the technical solutions of the present disclosure will be clearly and completely described below with reference to the specific embodiments of the present disclosure and the accompanying drawings. It is to be understood that the embodiments described are only a few embodiments of the present disclosure, and not all embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present specification without any creative effort belong to the protection scope of the present specification.
The technical solutions provided by the embodiments of the present description are described in detail below with reference to the accompanying drawings.
Fig. 1 is a simple system framework for providing search services that is theoretically feasible. In the system shown in fig. 1, a terminal transmits a keyword input by a user to a search engine, and the search engine can directly search data related to the keyword from various data sources.
However, in an actual application scenario, if a search engine searches data directly from a data source, the search efficiency is low, and the search service requirement with high real-time performance cannot be met, so a method of supporting the search engine by a data wide table is mostly adopted at present, as shown in fig. 2.
FIG. 2 is a system framework for supporting a search engine from a data wide table. In fig. 2, the data of each data source may be integrated into the data wide table in advance, and when the search engine receives the keyword, the search engine may directly search the data in the data wide table, thereby improving the search efficiency.
As can be seen from the system shown in FIG. 2, the data in the data width table originates from each data source, and thus, the data width table is dependent on each data source. In addition, since the emphasis of different search services is different, the importance of data in each data source is different. The data source where the more important data is located may be set as the primary data source, and the other data sources may be set as non-primary data sources. The settings of the primary data source and the non-primary data sources may be set as desired.
Under the system framework described above and shown in fig. 2, the present specification provides a method for updating data by a distributed system, as shown in fig. 3.
Fig. 3 is a schematic diagram of a data update process provided in an embodiment of the present specification, including:
s300: for any data source on which the first data wide table depends, the distributed system receives a data change message sent by the data source, wherein the data change message is sent when the data in the data source is changed.
In embodiments of the present description, a first data wide table for supporting a search engine may be maintained by a distributed system. The distributed system can subscribe the data change message of each data source depended by the data wide table in advance, and when the data of the data source is changed, the data source sends the subscribed data change message to the distributed system. The data change Message includes a binary log (Binlog), a Message Queue (MQ), and the like, and includes a name of a data source where the changed data is located and change information, and the change information may include a value after the data is changed, a primary key of the data in the data source, and the like.
S302: and determining the primary key of the changed data in the data source according to the data change message.
In a data source, the primary key is a unique identifier of data, cannot be repeated, is not allowed to be empty, and only one primary key can be provided. If the same data or the same field in the same data exists in different data sources, the primary key of the data in different data sources may be different.
Figure BDA0002669076610000061
TABLE 1
Figure BDA0002669076610000062
TABLE 2
As shown in table 1 and table 2, the data in the data source a is composed of two fields, namely, the merchant number and the merchant name, the merchant number is the primary key of the data in the data source a, the data in the data source B is composed of three fields, namely, the order number, the order amount and the merchant name, and the order number is the primary key of the data in the data source B. For the field of the business name, although the business name "A" exists in the data source A and the data source B at the same time, the corresponding primary keys of the data in which the business name "A" exists in the data source A and the data source B are different.
If the first piece of data in the data source A is changed, specifically, the business user name 'A' is changed into 'C', the data source A sends a corresponding data change message to the distributed system.
If the data change message includes the changed business name "c" and its corresponding primary key "100", in step S302, the distributed system may determine that the primary key of the changed data in the data source a is "100" directly according to the data change message. If the change message only includes the changed business name "c", the distributed system may query the data source a for the primary key of the data where the changed business name "c" is located, and the query result is still "100".
S304: and determining data corresponding to the changed data in the main data source as data to be updated according to the main key of the changed data.
In an actual application scenario, an association relationship exists between a primary key of a data source and other data sources. The configuration file corresponding to the first data width table records the association relationship between the data sources depended by the first data width table. According to the association relationship, data corresponding to the primary key in the data source where the changed data is located in the primary data source can be determined, and specifically, a field corresponding to the primary key in the data source where the changed data is located in the primary data source can be determined.
As shown in tables 1 and 2, the data source B is set as the master data source. The data in the configuration file consists of four fields, namely, an order number, an order amount, a merchant name in data source B and a merchant number in data source A. Since the data source a and the data source B shown in table 1 and table 2 both include a field of "business user name", the association relationship between the data source a and the data source B is: if the business names of different data in the data source A and the data source B are the same, the two data have an association relationship.
If the first piece of data in the data source A is changed, specifically, the business user name 'A' is changed into 'C', and the distributed system determines that the primary key of the changed data in the data source A is '100' according to the data change message. According to the association relationship between the data source a and the data source B, obtaining the data in the primary data source (data source B) associated with the data with the primary key of 100 in the data source a includes: the first two data of data source B. That is, the merchant name "a" in the data source a corresponds to the merchant name "a" in the data source B, the values in the fields are the same, and two pieces of data about the merchant name "a" in the data source B are obtained from the merchant name "a" in the data source a, and the two pieces of data are data to be updated.
S306: and determining a primary key of the data to be updated in the primary data source.
When the data to be updated in the main data source is determined, the main key of the data to be updated in the main data source can be obtained.
As shown in table 1 and table 2, after determining that the first two pieces of data in the data source B are to-be-updated data, the primary keys of the to-be-updated data in the data source B may be determined to be "100" and "101", respectively.
S308: and acquiring data corresponding to the primary key of the data to be updated in each data source depended by the first data wide table.
The primary key of the primary data source and other data sources have an association relationship, and data corresponding to the primary key of the data to be updated in the primary data source in each data source can be determined according to the association relationship, that is, the data associated with the primary key in each data source can be obtained according to the association relationship after the primary key of the data to be updated in the primary data source is determined.
In an actual application scenario, when the data volume is large and there is real-time update, the occupied system resources of the distributed system are also large, and therefore, in this embodiment of the present specification, after determining the primary key of the data to be updated through step S304, the primary key of the data to be updated may be added to the priority queue according to the priority of the data source where the changed data is located. For example, the priority queue may be divided into a priority queue, a normal queue, and a low-level queue. And acquiring data in each data source corresponding to the main key in the corresponding queue from each data source depended by the first data wide table according to the sequence of processing the priority queue, processing the common queue and processing the low-level queue at last.
It should be noted that, before adding the primary key of the data to be updated into the queue corresponding to the priority of the data source where the changed data is located, it is necessary to determine that the primary key of the data to be updated does not exist in the queue of the priority. That is, when adding the primary key of the data to be updated to the priority queue, the primary keys in the priority queue need to be deduplicated, that is, when there are multiple identical primary keys in the priority queue, only the newly added primary key is retained.
When the data volume is large and there is real-time update, in order to improve the speed of data update, a plurality of threads are used in a distributed system to accelerate the speed of update, and therefore, in the embodiment of the present specification, a thread corresponding to a primary key of data to be updated is determined according to the primary key of the data to be updated. And according to the primary key of the data to be updated, distributing corresponding threads by determining the hash value of the primary key of the data to be updated, wherein different hash values correspond to different threads, so that thread resources are not wasted. And acquiring data corresponding to the primary key of the data to be updated in each data source depended by the first data wide table through the thread.
It should be noted that, if the data is successfully acquired through the thread, the primary key of the data to be updated is deleted in the priority queue; if the data is not successfully acquired through the thread, the primary key of the data to be updated in the priority queue can still be kept.
When the data volume is large and real-time updating is available, the distributed system can separate or combine the two modes according to the actual application scene.
S310: and updating the first data wide table according to the main key of the data to be updated and the acquired data.
Figure BDA0002669076610000091
TABLE 3
In the embodiment of the present specification, a configuration file of the first data width table may be preset, and the data structure in the first data width table is specified by the configuration file, that is, the source of each field of the data in the first data width table is specified by the configuration file, as shown in table 3.
In table 3, the configuration file specifies that each field of the data in the first data width table respectively originates from the order number of data source B, the order amount of data source B, the merchant name in data source B, and the merchant number in data source a.
And according to the incidence relation among the data sources, after the data corresponding to the main key corresponding to the data to be updated in all the data sources is obtained, updating the first data wide table according to the obtained data.
As shown in table 3, the primary keys of the data to be updated are "100" and "101" in the data source B, in step S308, the data associated with "100" and "101" in the data source B can be obtained from all the data sources, and then the corresponding data in the first data width table is updated according to the data structure of the data in the first data width table specified in the configuration file.
Specifically, when the first data width table is updated, the first data width table may be updated according to the primary key of the data to be updated and the acquired data by using the current time as the time stamp of the data corresponding to the primary key of the data to be updated, where the current time is the time when the data acquisition is started according to the primary key of the data to be updated.
According to the method, on one hand, data corresponding to the changed data in the main data source is determined as the data to be updated through the data change message, the main key of the data to be updated is determined, and the data corresponding to the main key of the data to be updated is obtained in each data source depending on the first data wide table. The data changed by the changed data source is quickly found in the data source with complex relation, on the other hand, the main key of the data to be updated is added into the priority queue according to the priority of the data source where the changed data is located, in the priority queue, the thread corresponding to the main key of the data to be updated is determined according to the main key of the data to be updated, and the data corresponding to the main key of the data to be updated is obtained in each data source on which the first data wide table depends through the determined thread. The method has the advantages that the important data are preferentially updated under the condition of large data volume, the updating speed is accelerated by the aid of the multiple threads, namely, the data changed by the changed data source can be rapidly found in the data source with complex relation, the important data are preferentially updated when the data volume is large and real-time updating is carried out, the updating speed is accelerated by the multiple threads, the data are rapidly updated to the first data wide table, and the data updating efficiency is effectively improved.
The first data wide table can be used for supporting a search engine, and the data in the first data wide table can be updated in time by the updating method, so that the data searched by the search engine is accurate as much as possible. In another embodiment of the present specification, the first data width table may be a newly created data width table, which is used to replace an original data width table, and the original data width table is referred to as a second data width table hereinafter.
Specifically, the second data width table supports the search service of the search engine, and when the data source on which the second data width table depends is changed, the first data width table can be newly created. Because the newly created first data width table is used to replace the original second data width table, on one hand, the first data width table needs to update and store incremental data (i.e., updated data) by using the method shown in fig. 3, on the other hand, original data in the second data width table needs to be migrated into the first data width table, after the migration is completed, the first data width table can support a search engine, and the second data width table can be discarded. Therefore, at least two data width tables in the distributed system are the first data width table and the second data width table.
Having described how the first data width table updates and stores incremental data in real time, a method of migrating data originally in the second data width table to the first data width table will be described.
In this embodiment of the present specification, when the main data source is changed, the first data width table is a newly created data width table, data that is not changed in all the data sources is acquired, and the first data width table is updated with the time stamp of data of all the data sources by using the time when the acquisition of data that is not changed in all the data sources is started. It should be noted that the process of migrating the original data in the second data width table to the first data width table and the process of updating the incremental data in the first data width table in real time may be performed simultaneously, as shown in fig. 4.
S402: and the distributed system acquires data corresponding to the primary key of the data to be updated from a data source according to the primary key of the data to be updated.
S404: and acquiring data corresponding to the primary key of the data to be updated from a data source according to the primary key of the data to be updated, and updating the first data wide table by taking the current time as the time stamp of the data corresponding to the primary key of the data to be updated.
Steps S402 to S404 are brief processes of updating the incremental data of the first data wide table in real time, and the specific processes are consistent with fig. 3, and are not described again.
S406: and acquiring the data which is not changed in all the data sources.
S408: and updating the first data width table by taking the time of starting to acquire the unchanged data in all the data sources as the time stamp of all the data of the data sources.
Steps S402 to S404 are a process of migrating data existing in the second data width table to the first data width table.
When the main data source is changed, the second data width table may have a larger change in data structure, and the data in the second data width table may have an error and cannot be used continuously, so the first data width table may be a newly-created empty table. Because the first data width table is used for replacing the original second data width table, and the original data in the second data width table cannot be used due to the change of the primary data source, the data migration does not directly migrate the original data in the second data width table into the first data width table, but the distributed system determines the primary keys of all data in the primary data source, acquires the unchanged data (i.e., non-incremental data) corresponding to the primary keys of all data of the primary data source from all data sources, and updates the first data width table according to the acquired data, i.e., migrates the acquired data into the first data width table. The newly created first data width table is updated and stores incremental data (i.e., updated data) by the method shown in fig. 3, wherein the data migration process shown in fig. 4 and the real-time update process shown in fig. 3 can be performed synchronously, and the timestamp is used as a basis for updating the first data width table, so that a time conflict of data update is avoided. The updating of the incremental data is subject to the time stamp of the data updating to the first data wide table. The data corresponding to the primary key of all the data of the primary data source is subject to the time stamp of the data corresponding to the primary key of the primary data source.
It should be noted that, obtaining a large amount of data in the data source often requires offline import, and data update cannot be performed while offline importing data of the data source, but in the embodiment of the present specification, a process of migrating original data in the second data width table to the first data width table and a process of updating incremental data in the first data width table in real time may be performed simultaneously, so that suspension of data update is avoided, and a speed of data update is increased.
In this embodiment of the present specification, when the non-primary data source is changed, the first data width table may be a data width table created based on a snapshot of the second data width table, the changed non-primary data source is used as a changed data source, and unchanged data is obtained from the changed data source as offline data, and the first data width table is updated using a time for starting to obtain the offline data as a time stamp of the data of the changed data source, where it should be noted that a process of obtaining the offline data from the changed data source and updating the first data width table may be performed simultaneously with a process of updating incremental data in the first data width table in real time, as shown in fig. 5.
In fig. 5, the second data width table refers to the data width table in the in-use state, that is, the second data width table stores the original data therein and updates and stores the incremental data in real time.
S502: the first data width table is created based on the snapshot of the second data width table.
S504: and taking the non-main data source with the change as a changed data source, and acquiring the data which is not changed from the changed data source as offline data.
S506: and updating the first data width table by taking the time for starting to acquire the offline data as the time stamp of the data of the changed data source.
Steps S502 to S506 are processes of acquiring offline data from the changed data source and updating the first data wide table.
S508: and acquiring data corresponding to the primary key of the data to be updated from a data source according to the primary key of the data to be updated.
S510: and acquiring data corresponding to the primary key of the data to be updated from a data source according to the primary key of the data to be updated, and updating the first data wide table by taking the current time as the time stamp of the data corresponding to the primary key of the data to be updated.
Steps S508 to S510 are brief processes of updating the incremental data of the first data wide table in real time, and the specific processes are consistent with those in fig. 3, and are not described again.
When the non-primary data source is changed, the original data in the second data width table does not need to be greatly updated, and the second data width table can not be changed greatly in the data structure and can be continuously used. Therefore, the distributed system can create the first data width table based on the snapshot of the second data width table, so that the data updating time is prolonged, and the waste of resources is avoided. The distributed system updates the incremental data of the first data wide table in real time and updates the incremental data of the second data wide table in real time, and the first data wide table cannot support the search service of the search engine during the offline data updating process, so that the second data wide table is required to support the search service of the search engine during the offline data updating process of the first data wide table. When an error occurs in the offline data updating process of the first data wide table, the first data wide table can be recreated based on the snapshot of the second data wide table, and in the recreating process of the first data wide table, although the incremental data of the first data wide table may be lost, the incremental data updated in real time by the second data wide table is the same as the incremental data updated in real time by the first data wide table, so that when the first data wide table is recreated based on the snapshot of the second data wide table, the loss of the incremental data in the recreating process of the first data wide table is avoided.
The distributed system takes the non-main data source with change as a change data source, and obtains the data without change from the change data source as off-line data. That is, the distributed system determines primary keys for all data in the changed data source (i.e., the non-primary data source that changed), obtains unchanged data (i.e., non-incremental data) corresponding to the primary keys for all data in the changed data source, and updates the first data wide table according to the obtained data. The data migration process shown in fig. 5 and the real-time update process shown in fig. 3 may be performed synchronously, and the timestamp is used as a basis for data update to update the first data width table, so as to avoid a time conflict of data update. The updating of the incremental data is subject to the time stamp of the data updating to the first data wide table. And changing the data corresponding to the primary keys of all the data in the data source, wherein the time for starting to acquire the offline data is the time stamp of the data of the changed data source.
It should be noted that, after the data migration process shown in fig. 5 is completed, the updating of the second data width table is stopped, the first data width table can support the search engine, and the second data width table can be discarded.
By adopting the mode, when the data source is changed, the data does not need to enter the off-line state to update the data, can simultaneously carry out the import of off-line data and the real-time update of a data wide table, saves a great deal of time, avoids the loss of service, by dividing the data source change into the main data source change and the non-main data source change, when the main data source changes, in all data sources, acquiring data corresponding to the primary keys of all data of the primary data source, updating the first data wide table, when the non-primary data source is changed, and creating a first data wide table based on the snapshot of the second data wide table, acquiring data corresponding to the primary keys of all data in the changed data source, and updating the first data wide table without acquiring data which is not changed in all data sources, so that resource waste is avoided, and the efficiency of data updating is further improved.
Based on the same idea, the data updating method provided by the embodiment of the present specification further provides a corresponding apparatus, a storage medium, and an electronic device.
Fig. 6 is a schematic structural diagram of a data update apparatus provided in an embodiment of the present specification, where the apparatus is configured to maintain a first data width table, where the first data width table depends on at least one data source, and the at least one data source includes a primary data source and a non-primary data source, and the apparatus includes:
a receiving module 602, configured to receive, for any data source on which the first data wide table depends, a data change message sent by the data source, where the data change message is sent when data in the data source is changed;
a first determining module 604, configured to determine, according to the data change message, a primary key of the data that has been changed in the data source;
a second determining module 606, configured to determine, according to the primary key of the changed data, data in the primary data source corresponding to the changed data, as data to be updated, and determine the primary key of the data to be updated in the primary data source;
an obtaining module 608, configured to obtain, in each data source on which the first data width table depends, data corresponding to a primary key of the data to be updated;
the updating module 610 is configured to update the first data width table according to the primary key of the data to be updated and the acquired data.
Optionally, the receiving module 602 is specifically configured to subscribe to the data change message of the at least one data source in advance by the distributed system, and the distributed system receives the data change message sent by the subscribed data source.
Optionally, the second determining module 606 is specifically configured to determine, in the primary data source, data having an association relationship with a primary key of the changed data, as data in the primary data source corresponding to the changed data.
Optionally, the obtaining module 608 is specifically configured to add the primary key of the data to be updated to the queue of the priority according to the priority of the data source where the changed data is located, and for each queue, obtain, according to the priority of the queue, data corresponding to the primary key in the queue in each data source that the first data width table depends on.
Optionally, the obtaining module 608 is further configured to add the primary key of the data to be updated to the queue of the priority according to the priority of the data source where the changed data is located, and for each queue, obtain, according to the priority of the queue, data corresponding to the primary key in the queue in each data source that the first data width table depends on. Before adding the primary key of the data to be updated into the queue of the priority, determining that the primary key of the data to be updated does not exist in the queue of the priority.
Optionally, the obtaining module 608 is further configured to determine, according to the primary key of the data to be updated, a thread corresponding to the primary key of the data to be updated, and obtain, through the determined thread, data corresponding to the primary key of the data to be updated in each data source that the first data width table depends on.
Optionally, the obtaining module 608 is further configured to, when the main data source is changed, obtain unchanged data in all the data sources, where the first data width table is a newly created data width table.
Optionally, the obtaining module 608 is further configured to, when the non-primary data source is changed, use the first data width table as a data width table created based on a snapshot of a second data width table, use the changed non-primary data source as a changed data source, and obtain unchanged data from the changed data source as offline data.
Optionally, the updating module 610 is specifically configured to update the first data width table by taking the current time as a timestamp of data corresponding to the primary key of the data to be updated according to the primary key of the data to be updated and the acquired data.
Optionally, the updating module 610 is specifically configured to update the first data width table with the time when the unchanged data in all the data sources starts to be acquired as the time stamp of the data in all the data sources according to the unchanged data in all the data sources.
Optionally, the updating module 610 is specifically configured to update the first data width table with a timestamp of the data of the changed data source, where the time when the offline data starts to be acquired is used as the timestamp of the data of the changed data source according to the offline data.
The present specification also provides a computer readable storage medium storing a computer program which, when executed by a processor, is operable to perform the data update method provided in fig. 1 above.
Based on the data updating method shown in fig. 3, the embodiment of the present specification further provides a schematic structural diagram of the electronic device shown in fig. 7. As shown in fig. 7, at the hardware level, the drone includes a processor, an internal bus, a network interface, a memory, and a non-volatile memory, although it may also include hardware required for other services. The processor reads the corresponding computer program from the non-volatile memory into the memory and then runs the computer program to implement the data updating method described in fig. 3 above.
Of course, besides the software implementation, the present specification does not exclude other implementations, such as logic devices or a combination of software and hardware, and the like, that is, the execution subject of the following processing flow is not limited to each logic unit, and may be hardware or logic devices.
In the 90 s of the 20 th century, improvements in a technology could clearly distinguish between improvements in hardware (e.g., improvements in circuit structures such as diodes, transistors, switches, etc.) and improvements in software (improvements in process flow). However, as technology advances, many of today's process flow improvements have been seen as direct improvements in hardware circuit architecture. Designers almost always obtain the corresponding hardware circuit structure by programming an improved method flow into the hardware circuit. Thus, it cannot be said that an improvement in the process flow cannot be realized by hardware physical modules. For example, a Programmable Logic Device (PLD), such as a Field Programmable Gate Array (FPGA), is an integrated circuit whose Logic functions are determined by programming the Device by a user. A digital system is "integrated" on a PLD by the designer's own programming without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Furthermore, nowadays, instead of manually making an Integrated Circuit chip, such Programming is often implemented by "logic compiler" software, which is similar to a software compiler used in program development and writing, but the original code before compiling is also written by a specific Programming Language, which is called Hardware Description Language (HDL), and HDL is not only one but many, such as abel (advanced Boolean Expression Language), ahdl (alternate Hardware Description Language), traffic, pl (core universal Programming Language), HDCal (jhdware Description Language), lang, Lola, HDL, laspam, hardward Description Language (vhr Description Language), vhal (Hardware Description Language), and vhigh-Language, which are currently used in most common. It will also be apparent to those skilled in the art that hardware circuitry that implements the logical method flows can be readily obtained by merely slightly programming the method flows into an integrated circuit using the hardware description languages described above.
The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer-readable medium storing computer-readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, an Application Specific Integrated Circuit (ASIC), a programmable logic controller, and an embedded microcontroller, examples of which include, but are not limited to, the following microcontrollers: ARC 625D, Atmel AT91SAM, Microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic for the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller as pure computer readable program code, the same functionality can be implemented by logically programming method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Such a controller may thus be considered a hardware component, and the means included therein for performing the various functions may also be considered as a structure within the hardware component. Or even means for performing the functions may be regarded as being both a software module for performing the method and a structure within a hardware component.
The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. One typical implementation device is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smartphone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.
For convenience of description, the above devices are described as being divided into various units by function, and are described separately. Of course, the functions of the various elements may be implemented in the same one or more software and/or hardware implementations of the present description.
As will be appreciated by one skilled in the art, embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, the description may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the description may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The description has been presented with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the description. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
As will be appreciated by one skilled in the art, embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, the description may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the description may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
This description may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
The above description is only an example of the present specification, and is not intended to limit the present specification. Various modifications and alterations to this description will become apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present specification should be included in the scope of the claims of the present specification.

Claims (12)

1. A data update method, wherein a distributed system is configured to maintain a first data wide table, wherein the first data wide table depends on at least one data source, wherein the at least one data source comprises a primary data source and a non-primary data source, and wherein the method comprises:
for any data source on which the first data wide table depends, the distributed system receives a data change message sent by the data source, wherein the data change message is sent when data in the data source is changed;
determining a primary key of the changed data in the data source according to the data change message;
determining data corresponding to the changed data in the main data source as data to be updated according to the main key of the changed data;
determining a primary key of the data to be updated in the primary data source;
acquiring data corresponding to the primary key of the data to be updated in each data source depended by the first data wide table;
and updating the first data wide table according to the main key of the data to be updated and the acquired data.
2. The method of claim 1, wherein the distributed system pre-subscribes to data change messages of the at least one data source;
the receiving, by the distributed system, the data change message sent by the data source specifically includes:
and the distributed system receives a data change message sent by the subscribed data source.
3. The method of claim 1, wherein determining data in the primary data source that corresponds to the changed data comprises:
and in the main data source, determining data which has an association relation with a main key of the changed data as the data corresponding to the changed data in the main data source.
4. The method according to claim 1, wherein obtaining, in each data source on which the first data width table depends, data corresponding to the primary key of the data to be updated specifically includes:
adding the primary key of the data to be updated into a queue of the priority according to the priority of the data source where the changed data is located;
and aiming at each queue, acquiring data corresponding to the primary key in the queue from each data source depending on the first data wide table according to the priority of the queue.
5. The method of claim 4, wherein prior to adding the primary key of the data to be updated to the prioritized queue, the method further comprises:
and determining that the primary key of the data to be updated does not exist in the queue of the priority.
6. The method according to claim 1, wherein obtaining, in each data source on which the first data width table depends, data corresponding to the primary key of the data to be updated specifically includes:
determining a thread corresponding to the primary key of the data to be updated according to the primary key of the data to be updated;
and acquiring data corresponding to the primary key of the data to be updated in each data source depended by the first data wide table through the determined thread.
7. The method according to claim 1, wherein the updating the first data width table according to the primary key of the data to be updated and the acquired data specifically includes:
and updating the first data wide table by taking the current time as a time stamp of the data corresponding to the primary key of the data to be updated according to the primary key of the data to be updated and the acquired data.
8. The method of claim 1, wherein the first data width table is a newly created data width table when the primary data source is changed, the method further comprising:
and acquiring the data which is not changed in all the data sources, and updating the first data wide table for the time stamp of the data of all the data sources by using the time for starting to acquire the data which is not changed in all the data sources.
9. The method of claim 1, wherein the first data width table is a data width table created based on a snapshot of a second data width table when a change occurs to the non-primary data source, the method further comprising:
taking a non-main data source which is changed as a changed data source, and acquiring unchanged data from the changed data source as offline data;
and updating the first data width table by taking the time for starting to acquire the offline data as the time stamp of the data of the changed data source.
10. An apparatus for updating data, the apparatus configured to maintain a first data width table, the first data width table dependent on at least one data source, the at least one data source comprising a primary data source and a non-primary data source; the device comprises:
a receiving module, configured to receive, for any data source on which the first data wide table depends, a data change message sent by the data source, where the data change message is sent when data in the data source is changed;
the first determining module is used for determining a primary key of the changed data in the data source according to the data change message;
a second determining module, configured to determine, according to the primary key of the changed data, data in the primary data source corresponding to the changed data, as data to be updated, and determine the primary key of the data to be updated in the primary data source;
the acquisition module is used for acquiring data corresponding to the primary key of the data to be updated in each data source depended by the first data wide table;
and the updating module is used for updating the first data wide table according to the primary key of the data to be updated and the acquired data.
11. A computer-readable storage medium, characterized in that the storage medium stores a computer program which, when being executed by a processor, carries out the method of any of the preceding claims 1-9.
12. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any of claims 1-9 when executing the program.
CN202010927867.7A 2020-09-07 2020-09-07 Data updating method and device, storage medium and electronic equipment Active CN112181979B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010927867.7A CN112181979B (en) 2020-09-07 2020-09-07 Data updating method and device, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010927867.7A CN112181979B (en) 2020-09-07 2020-09-07 Data updating method and device, storage medium and electronic equipment

Publications (2)

Publication Number Publication Date
CN112181979A true CN112181979A (en) 2021-01-05
CN112181979B CN112181979B (en) 2024-05-24

Family

ID=73924898

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010927867.7A Active CN112181979B (en) 2020-09-07 2020-09-07 Data updating method and device, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN112181979B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113704265A (en) * 2021-08-31 2021-11-26 上海华力集成电路制造有限公司 Data maintenance method, system, electronic equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6446089B1 (en) * 1997-02-26 2002-09-03 Siebel Systems, Inc. Method of using a cache to determine the visibility to a remote database client of a plurality of database transactions
US20090083341A1 (en) * 2007-09-21 2009-03-26 International Business Machines Corporation Ensuring that the archival data deleted in relational source table is already stored in relational target table
CN105320680A (en) * 2014-07-15 2016-02-10 中国移动通信集团公司 Data synchronization method and device
CN107229721A (en) * 2017-06-02 2017-10-03 泰华智慧产业集团股份有限公司 A kind of method and device for changing data pick-up
CN109189835A (en) * 2018-08-21 2019-01-11 北京京东尚科信息技术有限公司 The method and apparatus of the wide table of data are generated in real time
CN110209677A (en) * 2018-02-06 2019-09-06 北京京东尚科信息技术有限公司 The method and apparatus of more new data

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6446089B1 (en) * 1997-02-26 2002-09-03 Siebel Systems, Inc. Method of using a cache to determine the visibility to a remote database client of a plurality of database transactions
US20090083341A1 (en) * 2007-09-21 2009-03-26 International Business Machines Corporation Ensuring that the archival data deleted in relational source table is already stored in relational target table
CN105320680A (en) * 2014-07-15 2016-02-10 中国移动通信集团公司 Data synchronization method and device
CN107229721A (en) * 2017-06-02 2017-10-03 泰华智慧产业集团股份有限公司 A kind of method and device for changing data pick-up
CN110209677A (en) * 2018-02-06 2019-09-06 北京京东尚科信息技术有限公司 The method and apparatus of more new data
CN109189835A (en) * 2018-08-21 2019-01-11 北京京东尚科信息技术有限公司 The method and apparatus of the wide table of data are generated in real time

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113704265A (en) * 2021-08-31 2021-11-26 上海华力集成电路制造有限公司 Data maintenance method, system, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN112181979B (en) 2024-05-24

Similar Documents

Publication Publication Date Title
CN107450979B (en) Block chain consensus method and device
CN107402824B (en) Data processing method and device
CN108418851B (en) Policy issuing system, method, device and equipment
CN107391527B (en) Data processing method and device based on block chain
CN110875935B (en) Message publishing, processing and subscribing method, device and system
CN109344348B (en) Resource updating method and device
CN107038041B (en) Data processing method, error code dynamic compatibility method, device and system
CN108848244B (en) Page display method and device
CN108268289B (en) Parameter configuration method, device and system for web application
CN108845876B (en) Service distribution method and device
CN112597013A (en) Online development and debugging method and device
CN108304455B (en) Method, device and equipment for processing service request
CN115617799A (en) Data storage method, device, equipment and storage medium
CN107451204B (en) Data query method, device and equipment
CN111866169A (en) Service updating method, device and system
CN111459724A (en) Node switching method, device, equipment and computer readable storage medium
CN112181979A (en) Data updating method and device, storage medium and electronic equipment
CN109446271B (en) Data synchronization method, device, equipment and medium
CN110083602B (en) Method and device for data storage and data processing based on hive table
CN110022351B (en) Service request processing method and device
CN111930530A (en) Equipment message processing method, device and medium based on Internet of things
CN111273965A (en) Container application starting method, system and device and electronic equipment
CN111597200A (en) Data processing method, device and storage medium
CN110908429A (en) Timer operation method and device
CN114625410A (en) Request message processing method, device and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant