CN116821146B - Apache Iceberg-based data list updating method and system - Google Patents

Apache Iceberg-based data list updating method and system Download PDF

Info

Publication number
CN116821146B
CN116821146B CN202311113956.8A CN202311113956A CN116821146B CN 116821146 B CN116821146 B CN 116821146B CN 202311113956 A CN202311113956 A CN 202311113956A CN 116821146 B CN116821146 B CN 116821146B
Authority
CN
China
Prior art keywords
data
target data
files
file
result set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311113956.8A
Other languages
Chinese (zh)
Other versions
CN116821146A (en
Inventor
吕宴全
陈吉平
徐进挺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Daishu Technology Co ltd
Original Assignee
Hangzhou Daishu Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Daishu Technology Co ltd filed Critical Hangzhou Daishu Technology Co ltd
Priority to CN202311113956.8A priority Critical patent/CN116821146B/en
Publication of CN116821146A publication Critical patent/CN116821146A/en
Application granted granted Critical
Publication of CN116821146B publication Critical patent/CN116821146B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The application discloses a data list updating method and system based on Apache Iceberg, which relate to the technical field of data processing and comprise the following steps: constructing a column update file, and adding a first field representing update column information in metadata of all data files; acquiring each data file corresponding to the target data table, and screening a plurality of target data files from each data file according to the query condition; determining a data set to be updated according to a plurality of target data files, updating data in the data set to be updated and generating an update record; the primary key information and the first field information of the updated data are extracted from the update record, and written into the column update file. According to the application, the file type is newly added in the Apache Iceberg, and the complete updated content can be recorded only by writing the main key and updating the column information, so that the writing of unmodified column information is avoided, and the storage cost in the whole column updating scene is effectively reduced.

Description

Apache Iceberg-based data list updating method and system
Technical Field
The application relates to the technical field of data processing, in particular to a data list updating method and system based on Apache Iceberg.
Background
In a data warehouse, an index is an important measure for measuring service performance and monitoring service data, and in the case of new service requirements, data source changes, data quality problems or query performance optimization, an index column is generally required to be added or updated so as to better reflect the service requirements and the data changes.
In a traditional Hive offline data warehouse, the method of updating a column is generally: creating a new table, copying the data of the original table into the new table, updating the value of the appointed column in the new table, deleting the original table, and renaming the new table as the name of the original table. However, the method is only suitable for offline scenes, and is time-consuming when the data volume is large, and in addition, when the method is used for row update, no partition exists in the original table, because partition information cannot be copied into the new table, if the partition exists in the original table, the partition data needs to be exported to other files, then the partition is deleted, the table is updated, and finally the partition data is imported again, so that the method is more time-consuming.
Disclosure of Invention
The application provides a data list updating method based on Apache Iceberg, which aims to solve the problems of long time consumption and high data storage expense in the updating process caused by the fact that all information is rearranged during the updating of the data list in the prior art.
In order to achieve the above purpose, the present application adopts the following technical scheme:
the application discloses an Apache Iceberg-based data list updating method, which comprises the following steps:
constructing column update files, and adding a first field representing updated column information in metadata of all data files, wherein all data files comprise the column update files;
acquiring each data file corresponding to the target data table, and screening a plurality of target data files from the data files according to the query condition;
determining a data set to be updated according to the target data files, updating the data in the data set to be updated and generating an update record;
and extracting the primary key information and the first field information of the updated data from the update record, and writing the primary key information and the first field information into the column update file.
Preferably, the method further comprises:
and generating a first serial number according to the data updating time, writing the first serial number into all the first files generated by the data updating, and recording the maximum value and the minimum value of each column in each corresponding first file and bloom filter auxiliary information in metadata of each first file.
Preferably, the screening a plurality of target data files from the data files according to the query condition includes:
and comparing the maximum value and the minimum value of each column in each data file and the auxiliary information of the bloom filter with the query condition one by one, and removing the data which do not accord with the query condition in each data file to obtain a plurality of target data files.
Preferably, the metadata of all the data files further comprises a serial number field;
the all data files also include general data files and delete data files.
Preferably, the determining the data set to be updated according to the target data files includes:
determining the minimum sequence number in the target data files, and recording the mapping relation between each second sequence number in the target data files and the corresponding target data files, wherein the second sequence number is the sequence number larger than the minimum sequence number in the target data files;
loading all target data files corresponding to the minimum serial numbers to generate a first result set, and determining all target data files associated with each second serial number according to the mapping relation;
and processing the first result set according to the type of each target data file associated with each second serial number to obtain a data set to be updated.
Preferably, the processing the first result set according to the type of each target data file associated with each second serial number to obtain a data set to be updated includes:
when the target data file associated with the second serial number is a general data file, adding all data in the target data file into the first result set to obtain a second result set;
when the target data file associated with the second serial number is a deleted data file, deleting all data matched with the second result set in the target data file to obtain a third result set;
when the target data file associated with the second serial number is a column update file, covering the information corresponding to the original field in the third result set with the first field information in the target data file to obtain a fourth result set, and replacing the first result set with the fourth result set;
repeating the steps until all target data files associated with all the second serial numbers are processed, and taking a fourth result set corresponding to the finally processed second serial numbers as a data set to be updated.
An Apache Iceberg-based data table column update system, comprising:
the creation module is used for constructing column update files and adding a first field representing updated column information into metadata of all data files, wherein all data files comprise the column update files;
the selecting module is used for acquiring each data file corresponding to the target data table and screening a plurality of target data files from the data files according to the query condition;
the updating module is used for determining a data set to be updated according to the plurality of target data files, updating the data in the data set to be updated and generating an updating record;
and the extraction module is used for extracting the primary key information and the first field information of the updated data from the update record and writing the primary key information and the first field information into the column update file.
Preferably, the system further comprises:
and the recording module is used for generating a first serial number according to the data updating time, writing the first serial number into all the first files generated by the data updating, and recording the maximum value and the minimum value of each column in each corresponding first file and bloom filter auxiliary information in the metadata of each first file.
An electronic device comprising a memory and a processor, the memory to store one or more computer instructions, wherein the one or more computer instructions are executable by the processor to implement an Apache Iceberg-based data table column update method of any of the above.
A computer readable storage medium storing a computer program which, when executed by a computer, causes the computer to implement an Apache Iceberg-based data table column updating method as set forth in any one of the preceding claims.
The application has the following beneficial effects:
the application executes the column update operation based on Apache Iceberg, avoids searching the data to be updated by scanning the whole content of all data files, effectively reduces the writing delay, quickens the writing completion time, and simultaneously, can record the complete update content by only writing the main key and updating the column information through newly adding a file type in Apache Iceberg, thereby avoiding writing unmodified column information and effectively reducing the storage cost under the whole column update scene.
Drawings
In order to more clearly illustrate the embodiments of the application or the technical solutions of the prior art, the drawings which are used in the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the description below are only some embodiments of the application, and that other drawings can be obtained according to these drawings without inventive faculty for a person skilled in the art.
FIG. 1 is a flow chart of a data table column updating method based on Apache Iceberg provided by the application;
FIG. 2 is a flow chart of the construction of a data set to be updated in the present application;
fig. 3 is a schematic diagram of a data table column updating system based on Apache Iceberg provided by the application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The terms "first," "second," and the like in the claims and the description of the application, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order, and it is to be understood that the terms so used may be interchanged, if appropriate, merely to describe the manner in which objects of the same nature are distinguished in the embodiments of the application by the description, and furthermore, the terms "comprise" and "have" and any variations thereof are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of elements is not necessarily limited to those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
As shown in fig. 1, the present application provides a data table column updating method based on Apache Iceberg, which includes the following steps:
s110, constructing column update files, and adding a first field representing updated column information in metadata of all data files, wherein all data files comprise the column update files;
s120, acquiring each data file corresponding to the target data table, and screening a plurality of target data files from the data files according to the query condition;
s130, determining a data set to be updated according to the target data files, updating data in the data set to be updated and generating an update record;
s140, extracting the primary key information and the first field information of the updated data from the update record, and writing the primary key information and the first field information into the column update file.
Apache Iceberg is a common open source data lake solution, has the advantages of quick writing and inquiring, and generates a sequence number corresponding to operation time for each adding and deleting operation, wherein two types of data file types are defined in Apache Iceberg: the method comprises the steps of general data files and deleted data files, wherein the data files comprise field equaids for marking main key information and sequence number fields for marking the sequence in which all files are created, simultaneously, apache Iceberg also calculates and merges all files at a certain moment into complete data to be written into a new general data file, and cleans up previous files to reduce the operation times required for playback, at the moment, the same sequence numbers are written into files corresponding to a data table, and the same sequence numbers are recorded as reference sequence numbers.
Further, determining the minimum sequence number in the plurality of target data files, and recording the mapping relation between each second sequence number in the plurality of target data files and the corresponding target data files, wherein the second sequence number is the sequence number larger than the minimum sequence number in the plurality of target data files;
loading all target data files corresponding to the minimum serial numbers to generate a first result set, and determining all target data files associated with each second serial number according to the mapping relation;
and processing the first result set according to the type of each target data file associated with each second serial number to obtain a data set to be updated.
In an exemplary embodiment, as shown in fig. 2, all data files corresponding to the target data table when the task is started are obtained first, the maximum value and the minimum value of each column in each data file and the auxiliary information of the bloom filter are respectively compared with the query condition, and data which does not meet the query condition in the data files are deleted to reduce unnecessary reading and accelerate query, so that are obtained a plurality of target data files corresponding to the target data table, the minimum sequence number contained in the target data files are determined, the minimum sequence number contained in the target data files is determined, the sequence number contained in the target data files is recorded as the minimum sequence number is recorded as the second sequence number, all the target data files corresponding to each second sequence number are sequentially found, and the sequence number 0 is recorded as the minimum sequence number, and the sequence number 0 is represented by the initial sequence number written for the first time or the reference generated through calculation and combination, and the sequence number 0 is loaded and the sequence number 0 is recorded as the first result set, and the sequence number 0 is recorded and the sequence number 0 is the second sequence number and the second sequence number is mapped and the sequence number and the corresponding to the sequence number.
Further, when the target data file associated with the second serial number is a general data file, adding all data in the target data file into the first result set to obtain a second result set;
when the target data file associated with the second serial number is a deleted data file, deleting all data matched with the second result set in the target data file to obtain a third result set;
when the target data file associated with the second serial number is a column update file, covering the information corresponding to the original field in the third result set with the first field information in the target data file to obtain a fourth result set, and replacing the first result set with the fourth result set;
repeating the steps until all target data files associated with all the second serial numbers are processed, and taking a fourth result set corresponding to the finally processed second serial numbers as a data set to be updated.
Then, the first result set is processed according to the type of each target data file associated with each second serial number to obtain a data set to be updated, and for any one second serial number, when the corresponding target data file is a general data file, all data in the target data file is added into the first result set res0 to obtain a second result set res1; when the corresponding target data file is a deleted data file, matching the equivalent Ids field information in the target data file with the corresponding data in the second result set, and deleting the data matched with the equivalent Ids field information in the second result set to obtain a third result set res2; when the corresponding target data file is a column update file, loading the column update file to obtain updated data, matching the equivalent Ids field information in the update data with a third result set, and for the matched data, covering the information corresponding to the original field in the third result set with the partialIds field information in the column update file to obtain a fourth result set res3, after all the target data files corresponding to a certain second sequence number are processed, replacing the first result set with the fourth result set, if the second sequence number which is not processed is still present at the moment, repeating the steps, and then outputting the fourth result set corresponding to the last processed second sequence number as a data set to be updated, wherein the fourth result set is the new first result set.
And finally, modifying data in the data set to be updated in the memory to obtain an update record, extracting primary key information and first field information from the update record, writing the primary key information and the first field information into a column update file, and writing the column which is not modified into the column update file.
According to the embodiment, the row updating operation is performed based on Apache Iceberg, so that searching of data to be updated by scanning all contents of all data files is avoided, writing delay is effectively reduced, writing completion time is shortened, meanwhile, through adding a file type in Apache Iceberg, complete updating contents can be recorded only by writing a main key and updating row information, writing of unmodified row information is avoided, and storage cost in a whole row updating scene is effectively reduced.
As shown in fig. 3, the present application further provides a data table column updating system based on Apache Iceberg, which includes:
the creation module is used for constructing column update files and adding a first field representing updated column information into metadata of all data files, wherein all data files comprise the column update files;
the selecting module is used for acquiring each data file corresponding to the target data table and screening a plurality of target data files from the data files according to the query condition;
the updating module is used for determining a data set to be updated according to the plurality of target data files, updating the data in the data set to be updated and generating an updating record;
and the extraction module is used for extracting the primary key information and the first field information of the updated data from the update record and writing the primary key information and the first field information into the column update file.
One embodiment of the above system may be: the creation module builds a column update file, and adds a first field representing updated column information in metadata of all data files, wherein all data files comprise the column update file; the method comprises the steps that a selection module obtains each data file corresponding to a target data table, and a plurality of target data files are selected from the data files according to query conditions; the updating module determines a data set to be updated according to the plurality of target data files, updates data in the data set to be updated and generates an updating record; the extraction module extracts the primary key information and the first field information of the updated data from the update record, and writes the primary key information and the first field information into the column update file.
The application also provides an electronic device comprising a memory and a processor, wherein the memory is used for storing one or more computer instructions, and the one or more computer instructions are executed by the processor to realize the data list updating method based on Apache Iceberg.
It will be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process of the electronic device described above may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.
The present application also provides a computer-readable storage medium storing a computer program which, when executed by a computer, implements an Apache Iceberg-based data table column updating method as described above.
By way of example, a computer program may be divided into one or more modules/units stored in a memory and executed by a processor and the I/O interface transmission of data accomplished by an input interface and an output interface to accomplish the present application, and one or more modules/units may be a series of computer program instruction segments capable of accomplishing specific functions for describing the execution of the computer program in a computer device.
The computer device may be a desktop computer, a notebook computer, a palm computer, a cloud server, or the like. The computer device may include, but is not limited to, a memory, a processor, and it will be appreciated by those skilled in the art that the present embodiments are merely examples of computer devices and are not limiting of computer devices, may include more or fewer components, or may combine certain components, or different components, e.g., a computer device may also include an input, a network access device, a bus, etc.
The processor may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory may be an internal storage unit of the computer device, such as a hard disk or a memory of the computer device. The memory may also be an external storage device of the computer device, for example, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the computer device, and further, the memory may also include an internal storage unit of the computer device and an external storage device, and the memory may also be used to store a computer program and other programs and data required by the computer device, and the memory may also be used to temporarily store the program code in an output device, where the aforementioned storage medium includes a U-disk, a removable hard disk, a read-only memory ROM, a random access memory RAM, a disk or an optical disk and other various Media that can store program codes.
The foregoing is merely illustrative of specific embodiments of the present application, and the scope of the present application is not limited thereto, but any changes or substitutions within the technical scope of the present application should be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (8)

1. The data list updating method based on Apache Iceberg is characterized by comprising the following steps:
constructing column update files, and adding a first field representing updated column information in metadata of all data files, wherein all data files comprise the column update files;
acquiring each data file corresponding to the target data table, and screening a plurality of target data files from the data files according to the query condition;
determining a data set to be updated according to the target data files, updating the data in the data set to be updated and generating an update record, wherein the method comprises the following steps:
determining the minimum sequence number in the target data files, and recording the mapping relation between each second sequence number in the target data files and the corresponding target data files, wherein the second sequence number is the sequence number larger than the minimum sequence number in the target data files;
loading all target data files corresponding to the minimum serial numbers to generate a first result set, and determining all target data files associated with each second serial number according to the mapping relation;
processing the first result set according to the type of each target data file associated with each second serial number to obtain a data set to be updated;
the processing the first result set according to the type of each target data file associated with each second serial number to obtain a data set to be updated includes:
when the target data file associated with the second serial number is a general data file, adding all data in the target data file into the first result set to obtain a second result set;
when the target data file associated with the second serial number is a deleted data file, deleting all data matched with the second result set in the target data file to obtain a third result set;
when the target data file associated with the second serial number is a column update file, covering the information corresponding to the original field in the third result set with the first field information in the target data file to obtain a fourth result set, and replacing the first result set with the fourth result set;
repeating the steps until all target data files associated with all second serial numbers are processed, and taking a fourth result set corresponding to the finally processed second serial numbers as a data set to be updated;
and extracting the primary key information and the first field information of the updated data from the update record, and writing the primary key information and the first field information into the column update file.
2. The method for updating a data list based on Apache Iceberg of claim 1, further comprising:
and generating a first serial number according to the data updating time, writing the first serial number into all the first files generated by the data updating, and recording the maximum value and the minimum value of each column in each corresponding first file and bloom filter auxiliary information in metadata of each first file.
3. The method for updating a data list based on Apache Iceberg of claim 1, wherein the screening a plurality of target data files from the data files according to query conditions comprises:
and comparing the maximum value and the minimum value of each column in each data file and the auxiliary information of the bloom filter with the query condition one by one, and removing the data which do not accord with the query condition in each data file to obtain a plurality of target data files.
4. The method for updating a data list based on Apache Iceberg of claim 1, wherein metadata of all data files further includes a sequence number field;
the all data files also include general data files and delete data files.
5. An Apache Iceberg-based data table column update system, comprising:
the creation module is used for constructing column update files and adding a first field representing updated column information into metadata of all data files, wherein all data files comprise the column update files;
the selecting module is used for acquiring each data file corresponding to the target data table and screening a plurality of target data files from the data files according to the query condition;
the updating module is used for determining a data set to be updated according to the plurality of target data files, updating the data in the data set to be updated and generating an updating record, and comprises the following steps:
determining the minimum sequence number in the target data files, and recording the mapping relation between each second sequence number in the target data files and the corresponding target data files, wherein the second sequence number is the sequence number larger than the minimum sequence number in the target data files;
loading all target data files corresponding to the minimum serial numbers to generate a first result set, and determining all target data files associated with each second serial number according to the mapping relation;
processing the first result set according to the type of each target data file associated with each second serial number to obtain a data set to be updated;
the processing the first result set according to the type of each target data file associated with each second serial number to obtain a data set to be updated includes:
when the target data file associated with the second serial number is a general data file, adding all data in the target data file into the first result set to obtain a second result set;
when the target data file associated with the second serial number is a deleted data file, deleting all data matched with the second result set in the target data file to obtain a third result set;
when the target data file associated with the second serial number is a column update file, covering the information corresponding to the original field in the third result set with the first field information in the target data file to obtain a fourth result set, and replacing the first result set with the fourth result set;
repeating the steps until all target data files associated with all second serial numbers are processed, and taking a fourth result set corresponding to the finally processed second serial numbers as a data set to be updated;
and the extraction module is used for extracting the primary key information and the first field information of the updated data from the update record and writing the primary key information and the first field information into the column update file.
6. The Apache Iceberg-based data table column update system of claim 5, further comprising:
and the recording module is used for generating a first serial number according to the data updating time, writing the first serial number into all the first files generated by the data updating, and recording the maximum value and the minimum value of each column in each corresponding first file and bloom filter auxiliary information in the metadata of each first file.
7. An electronic device comprising a memory and a processor, the memory to store one or more computer instructions, wherein the one or more computer instructions are executable by the processor to implement an Apache Iceberg-based data table column update method of any one of claims 1-4.
8. A computer-readable storage medium storing a computer program, wherein the computer program when executed causes a computer to implement an Apache Iceberg-based data table column updating method according to any one of claims 1 to 4.
CN202311113956.8A 2023-08-31 2023-08-31 Apache Iceberg-based data list updating method and system Active CN116821146B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311113956.8A CN116821146B (en) 2023-08-31 2023-08-31 Apache Iceberg-based data list updating method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311113956.8A CN116821146B (en) 2023-08-31 2023-08-31 Apache Iceberg-based data list updating method and system

Publications (2)

Publication Number Publication Date
CN116821146A CN116821146A (en) 2023-09-29
CN116821146B true CN116821146B (en) 2023-12-08

Family

ID=88115360

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311113956.8A Active CN116821146B (en) 2023-08-31 2023-08-31 Apache Iceberg-based data list updating method and system

Country Status (1)

Country Link
CN (1) CN116821146B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111522784A (en) * 2020-04-20 2020-08-11 支付宝(杭州)信息技术有限公司 Metadata synchronization method, device and equipment for unstructured data file
CN113986829A (en) * 2021-11-03 2022-01-28 浪潮云信息技术股份公司 Method for changing Hive data based on index
CN114153891A (en) * 2021-10-22 2022-03-08 上海铂铸信息科技有限公司 Time series data processing method
CN114579589A (en) * 2022-02-10 2022-06-03 杭州玳数科技有限公司 Method for realizing Update function in Trino Iceberg connection
CN114780563A (en) * 2022-04-19 2022-07-22 上海聚音信息科技有限公司 Zipper surface processing method and equipment based on data lake
CN114895850A (en) * 2022-05-09 2022-08-12 湖南兴盛优选网络科技有限公司 Method for optimizing writing of data lake
CN116028514A (en) * 2022-12-22 2023-04-28 北京东方国信科技股份有限公司 Data updating method and device
CN116521641A (en) * 2023-01-18 2023-08-01 浙江大华技术股份有限公司 Data lake-based data reading and writing method, data reading and writing device and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150261794A1 (en) * 2014-03-12 2015-09-17 Apple Inc. Generating or updating table data

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111522784A (en) * 2020-04-20 2020-08-11 支付宝(杭州)信息技术有限公司 Metadata synchronization method, device and equipment for unstructured data file
CN114153891A (en) * 2021-10-22 2022-03-08 上海铂铸信息科技有限公司 Time series data processing method
CN113986829A (en) * 2021-11-03 2022-01-28 浪潮云信息技术股份公司 Method for changing Hive data based on index
CN114579589A (en) * 2022-02-10 2022-06-03 杭州玳数科技有限公司 Method for realizing Update function in Trino Iceberg connection
CN114780563A (en) * 2022-04-19 2022-07-22 上海聚音信息科技有限公司 Zipper surface processing method and equipment based on data lake
CN114895850A (en) * 2022-05-09 2022-08-12 湖南兴盛优选网络科技有限公司 Method for optimizing writing of data lake
CN116028514A (en) * 2022-12-22 2023-04-28 北京东方国信科技股份有限公司 Data updating method and device
CN116521641A (en) * 2023-01-18 2023-08-01 浙江大华技术股份有限公司 Data lake-based data reading and writing method, data reading and writing device and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Optimized methods for inserting and deleting records and data retrieving in quantum database;Amor Gueddana等;《 2010 12th International Conference on Transparent Optical Networks》;1-5 *
基于Json的小型异构数据库同步策略研究;黄志;李涛;宋瑶;苏传程;;气象研究与应用(01);50-55 *

Also Published As

Publication number Publication date
CN116821146A (en) 2023-09-29

Similar Documents

Publication Publication Date Title
CN107391628B (en) Data synchronization method and device
CN108932236B (en) File management method and device
CN103593440B (en) The reading/writing method and device of journal file
CN108334609B (en) Method, device, equipment and storage medium for realizing JSON format data access in Oracle
CN108228799B (en) Object index information storage method and device
JP2014523024A (en) Incremental data extraction
JP2005267600A5 (en)
US11036699B2 (en) Method for computing distinct values in analytical databases
CN111026568B (en) Data and task relation construction method and device, computer equipment and storage medium
CN108536745B (en) Shell-based data table extraction method, terminal, equipment and storage medium
US8347052B2 (en) Initializing of a memory area
CN115114232A (en) Method, device and medium for enumerating historical version objects
CN116821146B (en) Apache Iceberg-based data list updating method and system
CN110704573B (en) Catalog storage method, catalog storage device, computer equipment and storage medium
CN111190895B (en) Organization method, device and storage medium of column-type storage data
CN109710626B (en) Data warehousing management method and device, electronic equipment and storage medium
CN111290700A (en) Distributed data reading and writing method and system
CN111651531B (en) Data importing method, device, equipment and computer storage medium
US10853177B2 (en) Performant process for salvaging renderable content from digital data sources
CN110032445B (en) Big data aggregation calculation method and device
CN112965939A (en) File merging method, device and equipment
CN112527900A (en) Method, device, equipment and medium for database multi-copy reading consistency
CN110543622A (en) Text similarity detection method and device, electronic equipment and readable storage medium
CN113127408A (en) Data conversion method and device
CN111459949B (en) Data processing method, device and equipment for database and index updating method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant