CN116821146B - Apache Iceberg-based data list updating method and system - Google Patents
Apache Iceberg-based data list updating method and system Download PDFInfo
- Publication number
- CN116821146B CN116821146B CN202311113956.8A CN202311113956A CN116821146B CN 116821146 B CN116821146 B CN 116821146B CN 202311113956 A CN202311113956 A CN 202311113956A CN 116821146 B CN116821146 B CN 116821146B
- Authority
- CN
- China
- Prior art keywords
- data
- target data
- files
- file
- result set
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 34
- 238000012545 processing Methods 0.000 claims abstract description 11
- 238000012216 screening Methods 0.000 claims abstract description 9
- 238000004590 computer program Methods 0.000 claims description 8
- 238000013507 mapping Methods 0.000 claims description 8
- 238000000605 extraction Methods 0.000 claims description 4
- 238000005192 partition Methods 0.000 description 6
- 101100194362 Schizosaccharomyces pombe (strain 972 / ATCC 24843) res1 gene Proteins 0.000 description 1
- 101100194363 Schizosaccharomyces pombe (strain 972 / ATCC 24843) res2 gene Proteins 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/23—Updating
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2282—Tablespace storage structures; Management thereof
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Abstract
The application discloses a data list updating method and system based on Apache Iceberg, which relate to the technical field of data processing and comprise the following steps: constructing a column update file, and adding a first field representing update column information in metadata of all data files; acquiring each data file corresponding to the target data table, and screening a plurality of target data files from each data file according to the query condition; determining a data set to be updated according to a plurality of target data files, updating data in the data set to be updated and generating an update record; the primary key information and the first field information of the updated data are extracted from the update record, and written into the column update file. According to the application, the file type is newly added in the Apache Iceberg, and the complete updated content can be recorded only by writing the main key and updating the column information, so that the writing of unmodified column information is avoided, and the storage cost in the whole column updating scene is effectively reduced.
Description
Technical Field
The application relates to the technical field of data processing, in particular to a data list updating method and system based on Apache Iceberg.
Background
In a data warehouse, an index is an important measure for measuring service performance and monitoring service data, and in the case of new service requirements, data source changes, data quality problems or query performance optimization, an index column is generally required to be added or updated so as to better reflect the service requirements and the data changes.
In a traditional Hive offline data warehouse, the method of updating a column is generally: creating a new table, copying the data of the original table into the new table, updating the value of the appointed column in the new table, deleting the original table, and renaming the new table as the name of the original table. However, the method is only suitable for offline scenes, and is time-consuming when the data volume is large, and in addition, when the method is used for row update, no partition exists in the original table, because partition information cannot be copied into the new table, if the partition exists in the original table, the partition data needs to be exported to other files, then the partition is deleted, the table is updated, and finally the partition data is imported again, so that the method is more time-consuming.
Disclosure of Invention
The application provides a data list updating method based on Apache Iceberg, which aims to solve the problems of long time consumption and high data storage expense in the updating process caused by the fact that all information is rearranged during the updating of the data list in the prior art.
In order to achieve the above purpose, the present application adopts the following technical scheme:
the application discloses an Apache Iceberg-based data list updating method, which comprises the following steps:
constructing column update files, and adding a first field representing updated column information in metadata of all data files, wherein all data files comprise the column update files;
acquiring each data file corresponding to the target data table, and screening a plurality of target data files from the data files according to the query condition;
determining a data set to be updated according to the target data files, updating the data in the data set to be updated and generating an update record;
and extracting the primary key information and the first field information of the updated data from the update record, and writing the primary key information and the first field information into the column update file.
Preferably, the method further comprises:
and generating a first serial number according to the data updating time, writing the first serial number into all the first files generated by the data updating, and recording the maximum value and the minimum value of each column in each corresponding first file and bloom filter auxiliary information in metadata of each first file.
Preferably, the screening a plurality of target data files from the data files according to the query condition includes:
and comparing the maximum value and the minimum value of each column in each data file and the auxiliary information of the bloom filter with the query condition one by one, and removing the data which do not accord with the query condition in each data file to obtain a plurality of target data files.
Preferably, the metadata of all the data files further comprises a serial number field;
the all data files also include general data files and delete data files.
Preferably, the determining the data set to be updated according to the target data files includes:
determining the minimum sequence number in the target data files, and recording the mapping relation between each second sequence number in the target data files and the corresponding target data files, wherein the second sequence number is the sequence number larger than the minimum sequence number in the target data files;
loading all target data files corresponding to the minimum serial numbers to generate a first result set, and determining all target data files associated with each second serial number according to the mapping relation;
and processing the first result set according to the type of each target data file associated with each second serial number to obtain a data set to be updated.
Preferably, the processing the first result set according to the type of each target data file associated with each second serial number to obtain a data set to be updated includes:
when the target data file associated with the second serial number is a general data file, adding all data in the target data file into the first result set to obtain a second result set;
when the target data file associated with the second serial number is a deleted data file, deleting all data matched with the second result set in the target data file to obtain a third result set;
when the target data file associated with the second serial number is a column update file, covering the information corresponding to the original field in the third result set with the first field information in the target data file to obtain a fourth result set, and replacing the first result set with the fourth result set;
repeating the steps until all target data files associated with all the second serial numbers are processed, and taking a fourth result set corresponding to the finally processed second serial numbers as a data set to be updated.
An Apache Iceberg-based data table column update system, comprising:
the creation module is used for constructing column update files and adding a first field representing updated column information into metadata of all data files, wherein all data files comprise the column update files;
the selecting module is used for acquiring each data file corresponding to the target data table and screening a plurality of target data files from the data files according to the query condition;
the updating module is used for determining a data set to be updated according to the plurality of target data files, updating the data in the data set to be updated and generating an updating record;
and the extraction module is used for extracting the primary key information and the first field information of the updated data from the update record and writing the primary key information and the first field information into the column update file.
Preferably, the system further comprises:
and the recording module is used for generating a first serial number according to the data updating time, writing the first serial number into all the first files generated by the data updating, and recording the maximum value and the minimum value of each column in each corresponding first file and bloom filter auxiliary information in the metadata of each first file.
An electronic device comprising a memory and a processor, the memory to store one or more computer instructions, wherein the one or more computer instructions are executable by the processor to implement an Apache Iceberg-based data table column update method of any of the above.
A computer readable storage medium storing a computer program which, when executed by a computer, causes the computer to implement an Apache Iceberg-based data table column updating method as set forth in any one of the preceding claims.
The application has the following beneficial effects:
the application executes the column update operation based on Apache Iceberg, avoids searching the data to be updated by scanning the whole content of all data files, effectively reduces the writing delay, quickens the writing completion time, and simultaneously, can record the complete update content by only writing the main key and updating the column information through newly adding a file type in Apache Iceberg, thereby avoiding writing unmodified column information and effectively reducing the storage cost under the whole column update scene.
Drawings
In order to more clearly illustrate the embodiments of the application or the technical solutions of the prior art, the drawings which are used in the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the description below are only some embodiments of the application, and that other drawings can be obtained according to these drawings without inventive faculty for a person skilled in the art.
FIG. 1 is a flow chart of a data table column updating method based on Apache Iceberg provided by the application;
FIG. 2 is a flow chart of the construction of a data set to be updated in the present application;
fig. 3 is a schematic diagram of a data table column updating system based on Apache Iceberg provided by the application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The terms "first," "second," and the like in the claims and the description of the application, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order, and it is to be understood that the terms so used may be interchanged, if appropriate, merely to describe the manner in which objects of the same nature are distinguished in the embodiments of the application by the description, and furthermore, the terms "comprise" and "have" and any variations thereof are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of elements is not necessarily limited to those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
As shown in fig. 1, the present application provides a data table column updating method based on Apache Iceberg, which includes the following steps:
s110, constructing column update files, and adding a first field representing updated column information in metadata of all data files, wherein all data files comprise the column update files;
s120, acquiring each data file corresponding to the target data table, and screening a plurality of target data files from the data files according to the query condition;
s130, determining a data set to be updated according to the target data files, updating data in the data set to be updated and generating an update record;
s140, extracting the primary key information and the first field information of the updated data from the update record, and writing the primary key information and the first field information into the column update file.
Apache Iceberg is a common open source data lake solution, has the advantages of quick writing and inquiring, and generates a sequence number corresponding to operation time for each adding and deleting operation, wherein two types of data file types are defined in Apache Iceberg: the method comprises the steps of general data files and deleted data files, wherein the data files comprise field equaids for marking main key information and sequence number fields for marking the sequence in which all files are created, simultaneously, apache Iceberg also calculates and merges all files at a certain moment into complete data to be written into a new general data file, and cleans up previous files to reduce the operation times required for playback, at the moment, the same sequence numbers are written into files corresponding to a data table, and the same sequence numbers are recorded as reference sequence numbers.
Further, determining the minimum sequence number in the plurality of target data files, and recording the mapping relation between each second sequence number in the plurality of target data files and the corresponding target data files, wherein the second sequence number is the sequence number larger than the minimum sequence number in the plurality of target data files;
loading all target data files corresponding to the minimum serial numbers to generate a first result set, and determining all target data files associated with each second serial number according to the mapping relation;
and processing the first result set according to the type of each target data file associated with each second serial number to obtain a data set to be updated.
In an exemplary embodiment, as shown in fig. 2, all data files corresponding to the target data table when the task is started are obtained first, the maximum value and the minimum value of each column in each data file and the auxiliary information of the bloom filter are respectively compared with the query condition, and data which does not meet the query condition in the data files are deleted to reduce unnecessary reading and accelerate query, so that are obtained a plurality of target data files corresponding to the target data table, the minimum sequence number contained in the target data files are determined, the minimum sequence number contained in the target data files is determined, the sequence number contained in the target data files is recorded as the minimum sequence number is recorded as the second sequence number, all the target data files corresponding to each second sequence number are sequentially found, and the sequence number 0 is recorded as the minimum sequence number, and the sequence number 0 is represented by the initial sequence number written for the first time or the reference generated through calculation and combination, and the sequence number 0 is loaded and the sequence number 0 is recorded as the first result set, and the sequence number 0 is recorded and the sequence number 0 is the second sequence number and the second sequence number is mapped and the sequence number and the corresponding to the sequence number.
Further, when the target data file associated with the second serial number is a general data file, adding all data in the target data file into the first result set to obtain a second result set;
when the target data file associated with the second serial number is a deleted data file, deleting all data matched with the second result set in the target data file to obtain a third result set;
when the target data file associated with the second serial number is a column update file, covering the information corresponding to the original field in the third result set with the first field information in the target data file to obtain a fourth result set, and replacing the first result set with the fourth result set;
repeating the steps until all target data files associated with all the second serial numbers are processed, and taking a fourth result set corresponding to the finally processed second serial numbers as a data set to be updated.
Then, the first result set is processed according to the type of each target data file associated with each second serial number to obtain a data set to be updated, and for any one second serial number, when the corresponding target data file is a general data file, all data in the target data file is added into the first result set res0 to obtain a second result set res1; when the corresponding target data file is a deleted data file, matching the equivalent Ids field information in the target data file with the corresponding data in the second result set, and deleting the data matched with the equivalent Ids field information in the second result set to obtain a third result set res2; when the corresponding target data file is a column update file, loading the column update file to obtain updated data, matching the equivalent Ids field information in the update data with a third result set, and for the matched data, covering the information corresponding to the original field in the third result set with the partialIds field information in the column update file to obtain a fourth result set res3, after all the target data files corresponding to a certain second sequence number are processed, replacing the first result set with the fourth result set, if the second sequence number which is not processed is still present at the moment, repeating the steps, and then outputting the fourth result set corresponding to the last processed second sequence number as a data set to be updated, wherein the fourth result set is the new first result set.
And finally, modifying data in the data set to be updated in the memory to obtain an update record, extracting primary key information and first field information from the update record, writing the primary key information and the first field information into a column update file, and writing the column which is not modified into the column update file.
According to the embodiment, the row updating operation is performed based on Apache Iceberg, so that searching of data to be updated by scanning all contents of all data files is avoided, writing delay is effectively reduced, writing completion time is shortened, meanwhile, through adding a file type in Apache Iceberg, complete updating contents can be recorded only by writing a main key and updating row information, writing of unmodified row information is avoided, and storage cost in a whole row updating scene is effectively reduced.
As shown in fig. 3, the present application further provides a data table column updating system based on Apache Iceberg, which includes:
the creation module is used for constructing column update files and adding a first field representing updated column information into metadata of all data files, wherein all data files comprise the column update files;
the selecting module is used for acquiring each data file corresponding to the target data table and screening a plurality of target data files from the data files according to the query condition;
the updating module is used for determining a data set to be updated according to the plurality of target data files, updating the data in the data set to be updated and generating an updating record;
and the extraction module is used for extracting the primary key information and the first field information of the updated data from the update record and writing the primary key information and the first field information into the column update file.
One embodiment of the above system may be: the creation module builds a column update file, and adds a first field representing updated column information in metadata of all data files, wherein all data files comprise the column update file; the method comprises the steps that a selection module obtains each data file corresponding to a target data table, and a plurality of target data files are selected from the data files according to query conditions; the updating module determines a data set to be updated according to the plurality of target data files, updates data in the data set to be updated and generates an updating record; the extraction module extracts the primary key information and the first field information of the updated data from the update record, and writes the primary key information and the first field information into the column update file.
The application also provides an electronic device comprising a memory and a processor, wherein the memory is used for storing one or more computer instructions, and the one or more computer instructions are executed by the processor to realize the data list updating method based on Apache Iceberg.
It will be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process of the electronic device described above may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.
The present application also provides a computer-readable storage medium storing a computer program which, when executed by a computer, implements an Apache Iceberg-based data table column updating method as described above.
By way of example, a computer program may be divided into one or more modules/units stored in a memory and executed by a processor and the I/O interface transmission of data accomplished by an input interface and an output interface to accomplish the present application, and one or more modules/units may be a series of computer program instruction segments capable of accomplishing specific functions for describing the execution of the computer program in a computer device.
The computer device may be a desktop computer, a notebook computer, a palm computer, a cloud server, or the like. The computer device may include, but is not limited to, a memory, a processor, and it will be appreciated by those skilled in the art that the present embodiments are merely examples of computer devices and are not limiting of computer devices, may include more or fewer components, or may combine certain components, or different components, e.g., a computer device may also include an input, a network access device, a bus, etc.
The processor may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory may be an internal storage unit of the computer device, such as a hard disk or a memory of the computer device. The memory may also be an external storage device of the computer device, for example, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the computer device, and further, the memory may also include an internal storage unit of the computer device and an external storage device, and the memory may also be used to store a computer program and other programs and data required by the computer device, and the memory may also be used to temporarily store the program code in an output device, where the aforementioned storage medium includes a U-disk, a removable hard disk, a read-only memory ROM, a random access memory RAM, a disk or an optical disk and other various Media that can store program codes.
The foregoing is merely illustrative of specific embodiments of the present application, and the scope of the present application is not limited thereto, but any changes or substitutions within the technical scope of the present application should be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.
Claims (8)
1. The data list updating method based on Apache Iceberg is characterized by comprising the following steps:
constructing column update files, and adding a first field representing updated column information in metadata of all data files, wherein all data files comprise the column update files;
acquiring each data file corresponding to the target data table, and screening a plurality of target data files from the data files according to the query condition;
determining a data set to be updated according to the target data files, updating the data in the data set to be updated and generating an update record, wherein the method comprises the following steps:
determining the minimum sequence number in the target data files, and recording the mapping relation between each second sequence number in the target data files and the corresponding target data files, wherein the second sequence number is the sequence number larger than the minimum sequence number in the target data files;
loading all target data files corresponding to the minimum serial numbers to generate a first result set, and determining all target data files associated with each second serial number according to the mapping relation;
processing the first result set according to the type of each target data file associated with each second serial number to obtain a data set to be updated;
the processing the first result set according to the type of each target data file associated with each second serial number to obtain a data set to be updated includes:
when the target data file associated with the second serial number is a general data file, adding all data in the target data file into the first result set to obtain a second result set;
when the target data file associated with the second serial number is a deleted data file, deleting all data matched with the second result set in the target data file to obtain a third result set;
when the target data file associated with the second serial number is a column update file, covering the information corresponding to the original field in the third result set with the first field information in the target data file to obtain a fourth result set, and replacing the first result set with the fourth result set;
repeating the steps until all target data files associated with all second serial numbers are processed, and taking a fourth result set corresponding to the finally processed second serial numbers as a data set to be updated;
and extracting the primary key information and the first field information of the updated data from the update record, and writing the primary key information and the first field information into the column update file.
2. The method for updating a data list based on Apache Iceberg of claim 1, further comprising:
and generating a first serial number according to the data updating time, writing the first serial number into all the first files generated by the data updating, and recording the maximum value and the minimum value of each column in each corresponding first file and bloom filter auxiliary information in metadata of each first file.
3. The method for updating a data list based on Apache Iceberg of claim 1, wherein the screening a plurality of target data files from the data files according to query conditions comprises:
and comparing the maximum value and the minimum value of each column in each data file and the auxiliary information of the bloom filter with the query condition one by one, and removing the data which do not accord with the query condition in each data file to obtain a plurality of target data files.
4. The method for updating a data list based on Apache Iceberg of claim 1, wherein metadata of all data files further includes a sequence number field;
the all data files also include general data files and delete data files.
5. An Apache Iceberg-based data table column update system, comprising:
the creation module is used for constructing column update files and adding a first field representing updated column information into metadata of all data files, wherein all data files comprise the column update files;
the selecting module is used for acquiring each data file corresponding to the target data table and screening a plurality of target data files from the data files according to the query condition;
the updating module is used for determining a data set to be updated according to the plurality of target data files, updating the data in the data set to be updated and generating an updating record, and comprises the following steps:
determining the minimum sequence number in the target data files, and recording the mapping relation between each second sequence number in the target data files and the corresponding target data files, wherein the second sequence number is the sequence number larger than the minimum sequence number in the target data files;
loading all target data files corresponding to the minimum serial numbers to generate a first result set, and determining all target data files associated with each second serial number according to the mapping relation;
processing the first result set according to the type of each target data file associated with each second serial number to obtain a data set to be updated;
the processing the first result set according to the type of each target data file associated with each second serial number to obtain a data set to be updated includes:
when the target data file associated with the second serial number is a general data file, adding all data in the target data file into the first result set to obtain a second result set;
when the target data file associated with the second serial number is a deleted data file, deleting all data matched with the second result set in the target data file to obtain a third result set;
when the target data file associated with the second serial number is a column update file, covering the information corresponding to the original field in the third result set with the first field information in the target data file to obtain a fourth result set, and replacing the first result set with the fourth result set;
repeating the steps until all target data files associated with all second serial numbers are processed, and taking a fourth result set corresponding to the finally processed second serial numbers as a data set to be updated;
and the extraction module is used for extracting the primary key information and the first field information of the updated data from the update record and writing the primary key information and the first field information into the column update file.
6. The Apache Iceberg-based data table column update system of claim 5, further comprising:
and the recording module is used for generating a first serial number according to the data updating time, writing the first serial number into all the first files generated by the data updating, and recording the maximum value and the minimum value of each column in each corresponding first file and bloom filter auxiliary information in the metadata of each first file.
7. An electronic device comprising a memory and a processor, the memory to store one or more computer instructions, wherein the one or more computer instructions are executable by the processor to implement an Apache Iceberg-based data table column update method of any one of claims 1-4.
8. A computer-readable storage medium storing a computer program, wherein the computer program when executed causes a computer to implement an Apache Iceberg-based data table column updating method according to any one of claims 1 to 4.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311113956.8A CN116821146B (en) | 2023-08-31 | 2023-08-31 | Apache Iceberg-based data list updating method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311113956.8A CN116821146B (en) | 2023-08-31 | 2023-08-31 | Apache Iceberg-based data list updating method and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116821146A CN116821146A (en) | 2023-09-29 |
CN116821146B true CN116821146B (en) | 2023-12-08 |
Family
ID=88115360
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311113956.8A Active CN116821146B (en) | 2023-08-31 | 2023-08-31 | Apache Iceberg-based data list updating method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116821146B (en) |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111522784A (en) * | 2020-04-20 | 2020-08-11 | 支付宝(杭州)信息技术有限公司 | Metadata synchronization method, device and equipment for unstructured data file |
CN113986829A (en) * | 2021-11-03 | 2022-01-28 | 浪潮云信息技术股份公司 | Method for changing Hive data based on index |
CN114153891A (en) * | 2021-10-22 | 2022-03-08 | 上海铂铸信息科技有限公司 | Time series data processing method |
CN114579589A (en) * | 2022-02-10 | 2022-06-03 | 杭州玳数科技有限公司 | Method for realizing Update function in Trino Iceberg connection |
CN114780563A (en) * | 2022-04-19 | 2022-07-22 | 上海聚音信息科技有限公司 | Zipper surface processing method and equipment based on data lake |
CN114895850A (en) * | 2022-05-09 | 2022-08-12 | 湖南兴盛优选网络科技有限公司 | Method for optimizing writing of data lake |
CN116028514A (en) * | 2022-12-22 | 2023-04-28 | 北京东方国信科技股份有限公司 | Data updating method and device |
CN116521641A (en) * | 2023-01-18 | 2023-08-01 | 浙江大华技术股份有限公司 | Data lake-based data reading and writing method, data reading and writing device and storage medium |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150261794A1 (en) * | 2014-03-12 | 2015-09-17 | Apple Inc. | Generating or updating table data |
-
2023
- 2023-08-31 CN CN202311113956.8A patent/CN116821146B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111522784A (en) * | 2020-04-20 | 2020-08-11 | 支付宝(杭州)信息技术有限公司 | Metadata synchronization method, device and equipment for unstructured data file |
CN114153891A (en) * | 2021-10-22 | 2022-03-08 | 上海铂铸信息科技有限公司 | Time series data processing method |
CN113986829A (en) * | 2021-11-03 | 2022-01-28 | 浪潮云信息技术股份公司 | Method for changing Hive data based on index |
CN114579589A (en) * | 2022-02-10 | 2022-06-03 | 杭州玳数科技有限公司 | Method for realizing Update function in Trino Iceberg connection |
CN114780563A (en) * | 2022-04-19 | 2022-07-22 | 上海聚音信息科技有限公司 | Zipper surface processing method and equipment based on data lake |
CN114895850A (en) * | 2022-05-09 | 2022-08-12 | 湖南兴盛优选网络科技有限公司 | Method for optimizing writing of data lake |
CN116028514A (en) * | 2022-12-22 | 2023-04-28 | 北京东方国信科技股份有限公司 | Data updating method and device |
CN116521641A (en) * | 2023-01-18 | 2023-08-01 | 浙江大华技术股份有限公司 | Data lake-based data reading and writing method, data reading and writing device and storage medium |
Non-Patent Citations (2)
Title |
---|
Optimized methods for inserting and deleting records and data retrieving in quantum database;Amor Gueddana等;《 2010 12th International Conference on Transparent Optical Networks》;1-5 * |
基于Json的小型异构数据库同步策略研究;黄志;李涛;宋瑶;苏传程;;气象研究与应用(01);50-55 * |
Also Published As
Publication number | Publication date |
---|---|
CN116821146A (en) | 2023-09-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107391628B (en) | Data synchronization method and device | |
CN108932236B (en) | File management method and device | |
CN103593440B (en) | The reading/writing method and device of journal file | |
CN108334609B (en) | Method, device, equipment and storage medium for realizing JSON format data access in Oracle | |
CN108228799B (en) | Object index information storage method and device | |
JP2014523024A (en) | Incremental data extraction | |
JP2005267600A5 (en) | ||
US11036699B2 (en) | Method for computing distinct values in analytical databases | |
CN111026568B (en) | Data and task relation construction method and device, computer equipment and storage medium | |
CN108536745B (en) | Shell-based data table extraction method, terminal, equipment and storage medium | |
US8347052B2 (en) | Initializing of a memory area | |
CN115114232A (en) | Method, device and medium for enumerating historical version objects | |
CN116821146B (en) | Apache Iceberg-based data list updating method and system | |
CN110704573B (en) | Catalog storage method, catalog storage device, computer equipment and storage medium | |
CN111190895B (en) | Organization method, device and storage medium of column-type storage data | |
CN109710626B (en) | Data warehousing management method and device, electronic equipment and storage medium | |
CN111290700A (en) | Distributed data reading and writing method and system | |
CN111651531B (en) | Data importing method, device, equipment and computer storage medium | |
US10853177B2 (en) | Performant process for salvaging renderable content from digital data sources | |
CN110032445B (en) | Big data aggregation calculation method and device | |
CN112965939A (en) | File merging method, device and equipment | |
CN112527900A (en) | Method, device, equipment and medium for database multi-copy reading consistency | |
CN110543622A (en) | Text similarity detection method and device, electronic equipment and readable storage medium | |
CN113127408A (en) | Data conversion method and device | |
CN111459949B (en) | Data processing method, device and equipment for database and index updating method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |