CN114579614A - Real-time data full-scale acquisition method and device and computer equipment - Google Patents
Real-time data full-scale acquisition method and device and computer equipment Download PDFInfo
- Publication number
- CN114579614A CN114579614A CN202210128596.8A CN202210128596A CN114579614A CN 114579614 A CN114579614 A CN 114579614A CN 202210128596 A CN202210128596 A CN 202210128596A CN 114579614 A CN114579614 A CN 114579614A
- Authority
- CN
- China
- Prior art keywords
- data
- doris
- full
- wide table
- real
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24568—Data stream processing; Continuous queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/23—Updating
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Computing Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a real-time data full-scale acquisition method, a real-time data full-scale acquisition device and computer equipment. The method comprises the following steps: monitoring and collecting binlog data of the mysql database by using a flinkcdc component; distributing global sorting id to the collected binlog data, and when writing in a kafka system, taking a main key of a table as a partitioning strategy, wherein the main key value adopts a hash partition, so that the same main key data of the same table data are in the same partition; creating a doris wide table, wherein other fields except the main key are replaced by REPLACE _ IF _ NOT _ NULL; creating a plurality of flash stream tasks, writing a plurality of dwd layer or dws layer data in the kafka system into the doris wide table in a stream load mode, and enabling the data of the same main key in the doris wide table to be updated in an overlaying mode; and querying a doris wide table according to actual requirements to obtain full data. The beneficial effects provided by the invention are as follows: mainly solves the problem of mysql cross-library analysis and has low cost.
Description
Technical Field
The invention relates to the field of big data, in particular to a method and a device for acquiring full real-time data and computer equipment.
Background
With the development of the internet entering the next half, the timeliness of the data becomes more and more important to the fine operation of enterprises, and how valuable information can be effectively mined out in real time in mass data generated every day in markets such as battlefields greatly helps to the decision operation strategy adjustment of the enterprises.
From the perspective of intelligent commerce, the data result represents feedback of a user, the timeliness of the obtained result is particularly important, the quick data feedback obtaining can help a decision maker to make a decision more quickly and better perform corresponding software product iteration, and the real-time data warehouse plays an irreplaceable role in the process.
Typically, bins are intended to have data from the first day a new transaction comes online and then recorded until now. However, the real-time stream processing technique is a technique for emphasizing the current processing state, and a certain contradiction exists between the two techniques, so that the data timeliness of the current off-line bins is very low.
Specifically, the method comprises the following steps:
(1) at present, a real-time large wide table (a plurality of bins) is Flink + clickhouse, the generation of the wide table is realized by writing the results generated by associating each table with a flash component into the clickhouse, and the clickhouse is only responsible for query, and the OLAP support of the wide table is insufficient, and the performances of table association and the like are poor.
(2) The flink component generates a large-width table, has the concept of time (time window must be added, time range is set, otherwise task is problematic, state is wireless and OOM is generated), and has the problem of data delay and the like which are not relevant, so that the problem of inaccurate data result exists, and the objective display exists.
(3) The flink client acquires the binary log and writes the binary log to the kafka end, and because the flink can set multiple parallelism to improve efficiency, the kafka can also be partitioned. And multithreading acquires a plurality of binlogs of the same main key to ensure the orderliness of data, otherwise, the data disordering leads to the final data falling to the ground to be wrong.
To take a simple example, such as service: order placement + payment.
The order is a piece of information, the payment is a piece of information, the two are associated to be a flink sql join, but a time range exists, the payment is carried out within half an hour, the payment is failed, if the payment is a month or a year later, the data message of the order is always waited in the memory, the data volume is large, the memory is always stored, and the join of the historical full data can not be carried out.
Summarizing, the Flink + clickhouse cannot solve the problem that the data time correlation span is large, and the problem is also a pain point.
Disclosure of Invention
In view of this, the present invention provides a method for acquiring full real-time data based on the construction of several bins and the real-time stream data processing technology. The invention begins with.
In order to achieve the above purpose, the present invention provides a real-time data full-scale obtaining method, which is based on flinkcd + doris, and comprises the following steps:
s101: accessing real-time binary log binlog data: monitoring and collecting binlog data of the mysql database by using a flinkcdc component;
s102: write binlog data to kafka system: distributing global sorting id to the collected binlog data, and when writing in a kafka system, taking a main key of a table as a partitioning strategy, wherein the main key value adopts a hash partition, so that the same main key data of the same table data are in the same partition;
s103: create doris broad table: creating a doris wide table, wherein other fields except the main key are replaced by REPLACE _ IF _ NOT _ NULL;
s104: write data to doris wide table: creating a plurality of flash stream tasks, writing a plurality of dwd layer or dws layer data in the kafka system into the doris wide table in a stream load mode, and enabling the data of the same main key in the doris wide table to be updated in an overlaying mode;
s105: acquiring full data: and querying a doris wide table according to actual requirements to obtain full data.
Further, in step S105, when a doris wide table is queried, a cross-latitude correlation query is performed by using a correlation field with another table as a query condition.
Further, after the data of the doris wide table is written, when a new field is added to the doris wide table at any time, an asynchronous execution mode is adopted, and the specific process is as follows:
s201: newly building a field d in a doris wide table; and after the field d is newly built, starting to access the original incremental data of the task a.
S202: newly building an offline task b, and importing the full data in the doris wide table into the field d by the offline task b before the field d is built;
s203: in the importing process, the field d receives the incremental data and the historical full data at the same time, the incremental data and the historical full data are disordered, and the incremental data continuously keep the historical full data of the same main key until the importing is finished;
s204: newly building a flink temporary task c, accessing data consumption, writing data into the field d by the temporary task c and the original task d simultaneously, closing the temporary task c after the temporary task d runs for a period of time T, and finishing field updating by the doris wide table.
A real-time data full capture apparatus, the apparatus comprising:
binlog data acquisition unit: accessing real-time binary log binlog data: monitoring and collecting binlog data of the mysql database by using a flinkcdc component;
a data partitioning unit: distributing global sorting id to the collected binlog data, and when writing in a kafka system, taking a main key of a table as a partitioning strategy, wherein the main key value adopts a hash partition, so that the same main key data of the same table data are in the same partition;
doris big Wide Table creation Unit: creating a doris large-width table, wherein fields except for a main key are replaced by REPLACE _ IF _ NOT _ NULL;
data filling unit of doris large-width table: creating a plurality of flash stream tasks, writing a plurality of dwd layer or dws layer data in the kafka system into the doris wide table in a stream load mode, and enabling the data of the same main key in the doris wide table to be updated in an overlaying mode;
a full data acquisition unit: and querying a doris wide table according to actual requirements to obtain full data.
A computer device comprising a memory, a processor and a computer program stored in said memory and executable on said processor, said processor implementing the steps of any of said real-time data full-scale acquisition methods when executing said computer program.
A computer-readable storage medium storing a computer program which, when executed by a processor, implements the steps of any one of the real-time data full-scale acquisition methods.
The beneficial effects provided by the invention are as follows:
1. the problem of mysql cross-library analysis is mainly solved, and the scheme is low in cost;
2. generating a full-scale real-time large-width table (full-data large-width table) based on Flink + doris, directly carrying out real-time statistics on the large-width table, optimizing query speed and improving use efficiency;
3. the problem of downstream data synchronization caused by modifying an original table in real-time statistics is solved with the minimum cost;
4. and generating a large-width table based on doris solving historical full-scale data.
Drawings
FIG. 1 is a schematic flow chart of a real-time data full-scale acquisition method according to the present invention;
FIG. 2 is a simplified example of creating a doris broad table;
FIG. 3 is a process of large width table data writing;
fig. 4 is a schematic diagram of data update.
Detailed Description
To make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be further described with reference to the accompanying drawings.
First, the related terms are explained in a unified way as follows:
the Flank is an apache top level open source project and is a calculation engine in the aspect of real-time processing;
flinkcdc: the system is a Flink-cdc-connectors component developed by a Flink community, and is a source component capable of directly reading full data and incremental change data from databases such as MySQL, PostgreSQL and the like. The CDC is used for monitoring and capturing the change of the database, and the changes are completely recorded according to the occurrence sequence and written into the message middleware for subscribing and consuming other services;
the Mysql Binlog is a log file in a binary format and records the change of the Mysql inside to a database;
kafka is a high throughput distributed publish-subscribe messaging system;
doris is an OLAP system of MPP, and provides high-performance analysis and report query functions on a large data set at lower cost;
the invention provides a real-time data full-scale acquisition method, and a basic idea refers to fig. 1. FIG. 1 is a flow chart of the method of the present invention.
S101: accessing real-time binary log binlog data: monitoring and collecting binlog data of the mysql database by using a flinkcdc component;
it should be noted that the flinkcdc component monitors the mysql database binlog log, then links the log data to the message middleware kafka, and then develops the application program for data consumption processing. Or use other tools in existing commercial libraries, such as logail tool in ali, and then develop program consumption.
It is easy to understand that, in the present application, the manner of acquiring real-time changing data by monitoring the binlog log file of the mysql database does not affect the database performance, and the data synchronization performance problem and the timeliness problem are solved, compared with the conventional and common data query derivation schemes.
S102: write binlog data to kafka system: distributing global sorting id to the collected binlog data, and when writing in a kafka system, taking a main key of a table as a partitioning strategy, wherein the main key value adopts a hash partition, so that the same main key data of the same table data are in the same partition;
it should be noted that binlog data of mysql database is written into kafka system in real time, in this process, binlog is in the second order, but if data in this same second is out of order, the final data will be wrong.
Therefore, when the binlog is collected, the collected binlog is distributed with the global sorting id, further, when kafka is written in, a primary key of the table is used as a partition strategy, and the primary key value is partitioned by hash, so that the same primary key data of the same table data are ensured to be in the same partition, the ordering of the data is finally ensured, and the data are also ensured to be ordered when doris data are written in and updated.
It should be further noted that the data of the kafka system is divided into an ods layer, an dwd layer and a dws layer data, and the data of the kafka system is processed by etl and then is parsed into a standard format.
S103: create doris broad table: creating a doris large-width table, wherein fields except for a main key are replaced by REPLACE _ IF _ NOT _ NULL;
referring to fig. 2, fig. 2 is a simplified example of creating a doris wide table;
in fig. 2, a large-width table of the table name "test" is created; the key1 and the key2 are included, and all fields except the key are REPLACE _ IF _ NOT _ NULL.
In the application, through such a processing mode, after the modified delete data of the bin is accessed, the corresponding primary key data can be updated in a covering manner.
S104: write data to doris wide table: creating a plurality of flash stream tasks, writing a plurality of dwd layer or dws layer data in the kafka system into the doris wide table in a stream load mode, and enabling the data of the same main key in the doris wide table to be updated in an overlaying mode;
it should be noted that, when writing multiple pieces of flink stream data into the doris wide table at the same time, multiple flink tasks are created, and multiple pieces of data at dwd level or dws level are written into doris in a stream load manner, and the same primary key data is overwritten and updated.
Referring to fig. 3, fig. 3 illustrates a process of writing data into a large width table.
Examples are as follows: in a certain embodiment, a total of three pieces of stream data are included;
the first stream: key1, Key2, value1, value2
The second stream: key1, Key2, value3, value4
The third flow: key2, value5, value 6;
three streams are written simultaneously into the large width table.
S105: acquiring full data: and querying a doris wide table according to actual requirements to obtain full data.
In step S105, when a doris wide table is queried, a cross-latitude correlation query is performed using a correlation field with another table as a query condition.
It should be noted that when a doris wide table is used for query, the doris wide table can be used as a large table, and can be associated with other tables for query (cross-dimension associated query), and the doris wide table can be optimized for a single label, similar to roll up, so as to improve query speed.
After the doris wide table data is written in, when a new field is added to the doris wide table at any time, an asynchronous execution mode is adopted, and the specific process is as follows:
s201: newly building a field d in a doris wide table; and after the field d is newly built, starting to access the original incremental data of the task a.
S202: newly building an offline task b, and importing the full data in the doris wide table into the field d by the offline task b before the field d is newly built;
s203: in the importing process, the field d receives the incremental data and the historical full data at the same time, the incremental data and the historical full data are disordered, and the incremental data continuously keep the historical full data of the same main key until the importing is finished;
s204: newly building a flink temporary task c, accessing data consumption, writing data into the field d by the temporary task c and the original task d simultaneously, closing the temporary task c after the temporary task d runs for a period of time T, and finishing field updating by the doris wide table.
The above process, briefly, is: if a certain field historical value of the large-width table needs to be modified, a doris table field is newly added for asynchronous execution.
A new field is added, historical data are firstly imported into a doris newly added field, a temporary task is newly built, incremental data size is accessed, the incremental data size is written into the doris new and old fields simultaneously with an original task, the doris new and old fields are operated for a period of time (the setting time is up, a script is called to kill the task) and the temporary task is closed, the original incremental task is modified, the written fields are restarted, and then a data source is consumed, so that the updating and the synchronization of the data are guaranteed.
For better explanation, please refer to fig. 4, fig. 4 is a schematic diagram of data update;
as before, in one embodiment, a total of three pieces of stream data are included;
the first stream: key1, Key2, value1, value2
The second stream: key1, Key2, value3, value4
The third stream: key2, value5, value 6;
where the third stream has only key2, to update its data, it can be supplemented by a dimension table (cross-dimension association query), see the top right hand portion of FIG. 4.
That is, the third flow is to disassociate the dimension table completion key into:
key1, Key2, value5, value 6. The process adopted by the method is also the process described in steps S201 to S204. The supplement of the key2 field in the third stream data and the data coverage update are finally realized through the process.
A real-time data full acquisition apparatus, the apparatus comprising:
a Binlog data acquisition unit: accessing real-time binary log binlog data: monitoring and collecting binlog data of the mysql database by using a flinkcdc component;
a data partitioning unit: distributing global sorting id to the collected binlog data, and when writing in a kafka system, taking a main key of a table as a partitioning strategy, wherein the main key value adopts a hash partition, so that the same main key data of the same table data are in the same partition;
doris big Wide Table creation Unit: creating a doris wide table, wherein other fields except the main key are replaced by REPLACE _ IF _ NOT _ NULL;
data filling unit of doris large-width table: creating a plurality of flash stream tasks, writing a plurality of dwd layer or dws layer data in the kafka system into the doris wide table in a stream load mode, and enabling the data of the same main key in the doris wide table to be updated in an overlaying mode;
a full data acquisition unit: and querying a doris wide table according to actual requirements to obtain full data.
A computer device comprising a memory, a processor and a computer program stored in said memory and executable on said processor, said processor implementing the steps of any of said real-time data full-scale acquisition methods when executing said computer program.
A computer-readable storage medium storing a computer program which, when executed by a processor, implements the steps of any one of the real-time data full-scale acquisition methods.
The beneficial effects of the implementation of the invention are as follows:
1. the problem of mysql cross-library analysis is mainly solved, and the scheme is low in cost;
2. generating a full-scale real-time large-width table (full-data large-width table) based on Flink + doris, directly carrying out real-time statistics on the large-width table, optimizing query speed and improving use efficiency;
3. the problem of downstream data synchronization caused by modifying an original table in real-time statistics is solved with the minimum cost;
4. and generating a large-width table based on doris solving historical full-scale data.
The features of the above-described embodiments and embodiments of the invention may be combined with each other without conflict.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.
Claims (6)
1. A real-time data full-scale acquisition method is characterized by comprising the following steps: the method comprises the following steps:
s101: accessing real-time binary log binlog data: monitoring and collecting binlog data of the mysql database by using a flinkcdc component;
s102: write binlog data to kafka system: distributing global sorting id to the collected binlog data, and when the binlog data is written into a kafka system, pressing a primary key of a table to serve as a partitioning strategy, wherein the primary key value adopts a hash partition, so that the same primary key data of the same table data are in the same partition;
s103: create doris broad table: creating a doris wide table, wherein other fields except the main key are replaced by REPLACE _ IF _ NOT _ NULL;
s104: write data to doris wide table: creating a plurality of flash stream tasks, writing a plurality of dwd layer or dws layer data in the kafka system into the doris wide table in a stream load mode, and enabling the data of the same main key in the doris wide table to be updated in an overlaying mode;
s105: acquiring full data: and querying a doris wide table according to actual requirements to obtain full data.
2. The method for acquiring the full amount of the real-time data according to claim 1, wherein: in step S105, when a doris wide table is queried, cross-latitude association query is performed by using the association fields of other tables as query conditions.
3. The method for acquiring the full amount of the real-time data according to claim 1, wherein: after the doris wide table data is written in, when newly adding fields to the doris wide table at any time, an asynchronous execution mode is adopted, and the specific process is as follows:
s201: newly building a field d in a doris wide table; and after the field d is newly built, starting to access the original incremental data of the task a.
S202: newly building an offline task b, and importing the full data in the doris wide table into the field d by the offline task b before the field d is built;
s203: in the importing process, the field d receives the incremental data and the historical full data at the same time, the incremental data and the historical full data are disordered, and the incremental data are continuously the historical full data of the same main key until the importing is finished;
s204: newly building a flink temporary task c, accessing data consumption, writing data into the field d by the temporary task c and the original task d simultaneously, closing the temporary task c after the temporary task d runs for a period of time T, and finishing field updating by the doris wide table.
4. A real-time data full-scale acquisition apparatus, the apparatus comprising:
binlog data acquisition unit: accessing real-time binary log binlog data: monitoring and collecting binlog data of the mysql database by using a flinkcdc component;
a data partitioning unit: distributing global sorting id to the collected binlog data, and when writing in a kafka system, taking a main key of a table as a partitioning strategy, wherein the main key value adopts a hash partition, so that the same main key data of the same table data are in the same partition;
doris big Wide Table creation Unit: creating a doris wide table, wherein other fields except the main key are replaced by REPLACE _ IF _ NOT _ NULL;
data filling unit of doris large-width table: creating a plurality of flash stream tasks, writing a plurality of dwd layer or dws layer data in the kafka system into the doris wide table in a stream load mode, and enabling the data of the same main key in the doris wide table to be updated in an overlaying mode;
a full data acquisition unit: and querying the doris wide table according to actual requirements to obtain full data.
5. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the real-time data full-scale acquisition method according to any one of claims 1 to 3 when executing the computer program.
6. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed by a processor, implements the steps of the real-time data full size acquisition method according to any one of claims 1 to 3.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210128596.8A CN114579614A (en) | 2022-02-11 | 2022-02-11 | Real-time data full-scale acquisition method and device and computer equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210128596.8A CN114579614A (en) | 2022-02-11 | 2022-02-11 | Real-time data full-scale acquisition method and device and computer equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114579614A true CN114579614A (en) | 2022-06-03 |
Family
ID=81773769
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210128596.8A Pending CN114579614A (en) | 2022-02-11 | 2022-02-11 | Real-time data full-scale acquisition method and device and computer equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114579614A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115033646A (en) * | 2022-08-11 | 2022-09-09 | 深圳联友科技有限公司 | Method for constructing real-time warehouse system based on Flink and Doris |
CN115062028A (en) * | 2022-07-27 | 2022-09-16 | 中建电子商务有限责任公司 | Method for multi-table join query in OLTP field |
CN115470217A (en) * | 2022-11-14 | 2022-12-13 | 云筑信息科技(成都)有限公司 | Method for solving change response problem of data bin model in real time |
CN116431654A (en) * | 2023-06-08 | 2023-07-14 | 中新宽维传媒科技有限公司 | Data storage method, device, medium and computing equipment based on integration of lake and warehouse |
CN117331513A (en) * | 2023-12-01 | 2024-01-02 | 蒲惠智造科技股份有限公司 | Data reduction method and system based on Hadoop architecture |
CN117792960A (en) * | 2024-02-23 | 2024-03-29 | 中国电子科技集团公司第三十研究所 | Historical flow statistics method and device based on domestic multi-core processor |
-
2022
- 2022-02-11 CN CN202210128596.8A patent/CN114579614A/en active Pending
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115062028A (en) * | 2022-07-27 | 2022-09-16 | 中建电子商务有限责任公司 | Method for multi-table join query in OLTP field |
CN115062028B (en) * | 2022-07-27 | 2023-01-06 | 中建电子商务有限责任公司 | Method for multi-table join query in OLTP field |
CN115033646A (en) * | 2022-08-11 | 2022-09-09 | 深圳联友科技有限公司 | Method for constructing real-time warehouse system based on Flink and Doris |
CN115033646B (en) * | 2022-08-11 | 2023-01-13 | 深圳联友科技有限公司 | Method for constructing real-time warehouse system based on Flink and Doris |
CN115470217A (en) * | 2022-11-14 | 2022-12-13 | 云筑信息科技(成都)有限公司 | Method for solving change response problem of data bin model in real time |
CN116431654A (en) * | 2023-06-08 | 2023-07-14 | 中新宽维传媒科技有限公司 | Data storage method, device, medium and computing equipment based on integration of lake and warehouse |
CN116431654B (en) * | 2023-06-08 | 2023-09-08 | 中新宽维传媒科技有限公司 | Data storage method, device, medium and computing equipment based on integration of lake and warehouse |
CN117331513A (en) * | 2023-12-01 | 2024-01-02 | 蒲惠智造科技股份有限公司 | Data reduction method and system based on Hadoop architecture |
CN117331513B (en) * | 2023-12-01 | 2024-03-19 | 蒲惠智造科技股份有限公司 | Data reduction method and system based on Hadoop architecture |
CN117792960A (en) * | 2024-02-23 | 2024-03-29 | 中国电子科技集团公司第三十研究所 | Historical flow statistics method and device based on domestic multi-core processor |
CN117792960B (en) * | 2024-02-23 | 2024-04-30 | 中国电子科技集团公司第三十研究所 | Historical flow statistics method and device based on domestic multi-core processor |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114579614A (en) | Real-time data full-scale acquisition method and device and computer equipment | |
US11829360B2 (en) | Database workload capture and replay | |
US11762882B2 (en) | System and method for analysis and management of data distribution in a distributed database environment | |
CN102880685B (en) | Method for interval and paging query of time-intensive B/S (Browser/Server) with large data size | |
Poess et al. | TPC-DI: the first industry benchmark for data integration | |
CN103154935B (en) | For inquiring about the system and method for data stream | |
CN102214176B (en) | Method for splitting and join of huge dimension table | |
CN103353873B (en) | Optimization implementation method and system based on the service of time measure data real-time query | |
US11934306B2 (en) | Object storage change-events | |
CN102779138B (en) | The hard disk access method of real time data | |
CN104572856A (en) | Converged storage method of service source data | |
CN104199978A (en) | System and method for realizing metadata cache and analysis based on NoSQL and method | |
CN113742325A (en) | Data warehouse construction method, device and system, electronic equipment and storage medium | |
CN114153809A (en) | Parallel real-time incremental statistic method based on database logs | |
CN106776810B (en) | Big data processing system and method | |
CN110851515B (en) | Big data ETL model execution method and medium based on Spark distributed environment | |
Meoni et al. | Exploiting Apache Spark platform for CMS computing analytics | |
CN114265875B (en) | Method for establishing wide table in real time based on stream data | |
Goncalves et al. | DottedDB: Anti-entropy without merkle trees, deletes without tombstones | |
CN116126901A (en) | Data processing method, device, electronic equipment and computer readable storage medium | |
CN116010452A (en) | Industrial data processing system and method based on stream type calculation engine and medium | |
CN114168595A (en) | Data analysis method and device | |
Ma et al. | Live data migration approach from relational tables to schema-free collections with mapreduce | |
CN116756247B (en) | Data restoration method, device, computer equipment and storage medium | |
US20240070180A1 (en) | Mutation-Responsive Documentation Regeneration Based on Knowledge Base |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |