CN114579614A - Real-time data full-scale acquisition method and device and computer equipment - Google Patents

Real-time data full-scale acquisition method and device and computer equipment Download PDF

Info

Publication number
CN114579614A
CN114579614A CN202210128596.8A CN202210128596A CN114579614A CN 114579614 A CN114579614 A CN 114579614A CN 202210128596 A CN202210128596 A CN 202210128596A CN 114579614 A CN114579614 A CN 114579614A
Authority
CN
China
Prior art keywords
data
doris
full
wide table
real
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210128596.8A
Other languages
Chinese (zh)
Inventor
王祖正
汪健
吴凡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Wuyi Yuntong Network Technology Co ltd
Original Assignee
Wuhan Wuyi Yuntong Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Wuyi Yuntong Network Technology Co ltd filed Critical Wuhan Wuyi Yuntong Network Technology Co ltd
Priority to CN202210128596.8A priority Critical patent/CN114579614A/en
Publication of CN114579614A publication Critical patent/CN114579614A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24568Data stream processing; Continuous queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a real-time data full-scale acquisition method, a real-time data full-scale acquisition device and computer equipment. The method comprises the following steps: monitoring and collecting binlog data of the mysql database by using a flinkcdc component; distributing global sorting id to the collected binlog data, and when writing in a kafka system, taking a main key of a table as a partitioning strategy, wherein the main key value adopts a hash partition, so that the same main key data of the same table data are in the same partition; creating a doris wide table, wherein other fields except the main key are replaced by REPLACE _ IF _ NOT _ NULL; creating a plurality of flash stream tasks, writing a plurality of dwd layer or dws layer data in the kafka system into the doris wide table in a stream load mode, and enabling the data of the same main key in the doris wide table to be updated in an overlaying mode; and querying a doris wide table according to actual requirements to obtain full data. The beneficial effects provided by the invention are as follows: mainly solves the problem of mysql cross-library analysis and has low cost.

Description

Real-time data full-scale acquisition method and device and computer equipment
Technical Field
The invention relates to the field of big data, in particular to a method and a device for acquiring full real-time data and computer equipment.
Background
With the development of the internet entering the next half, the timeliness of the data becomes more and more important to the fine operation of enterprises, and how valuable information can be effectively mined out in real time in mass data generated every day in markets such as battlefields greatly helps to the decision operation strategy adjustment of the enterprises.
From the perspective of intelligent commerce, the data result represents feedback of a user, the timeliness of the obtained result is particularly important, the quick data feedback obtaining can help a decision maker to make a decision more quickly and better perform corresponding software product iteration, and the real-time data warehouse plays an irreplaceable role in the process.
Typically, bins are intended to have data from the first day a new transaction comes online and then recorded until now. However, the real-time stream processing technique is a technique for emphasizing the current processing state, and a certain contradiction exists between the two techniques, so that the data timeliness of the current off-line bins is very low.
Specifically, the method comprises the following steps:
(1) at present, a real-time large wide table (a plurality of bins) is Flink + clickhouse, the generation of the wide table is realized by writing the results generated by associating each table with a flash component into the clickhouse, and the clickhouse is only responsible for query, and the OLAP support of the wide table is insufficient, and the performances of table association and the like are poor.
(2) The flink component generates a large-width table, has the concept of time (time window must be added, time range is set, otherwise task is problematic, state is wireless and OOM is generated), and has the problem of data delay and the like which are not relevant, so that the problem of inaccurate data result exists, and the objective display exists.
(3) The flink client acquires the binary log and writes the binary log to the kafka end, and because the flink can set multiple parallelism to improve efficiency, the kafka can also be partitioned. And multithreading acquires a plurality of binlogs of the same main key to ensure the orderliness of data, otherwise, the data disordering leads to the final data falling to the ground to be wrong.
To take a simple example, such as service: order placement + payment.
The order is a piece of information, the payment is a piece of information, the two are associated to be a flink sql join, but a time range exists, the payment is carried out within half an hour, the payment is failed, if the payment is a month or a year later, the data message of the order is always waited in the memory, the data volume is large, the memory is always stored, and the join of the historical full data can not be carried out.
Summarizing, the Flink + clickhouse cannot solve the problem that the data time correlation span is large, and the problem is also a pain point.
Disclosure of Invention
In view of this, the present invention provides a method for acquiring full real-time data based on the construction of several bins and the real-time stream data processing technology. The invention begins with.
In order to achieve the above purpose, the present invention provides a real-time data full-scale obtaining method, which is based on flinkcd + doris, and comprises the following steps:
s101: accessing real-time binary log binlog data: monitoring and collecting binlog data of the mysql database by using a flinkcdc component;
s102: write binlog data to kafka system: distributing global sorting id to the collected binlog data, and when writing in a kafka system, taking a main key of a table as a partitioning strategy, wherein the main key value adopts a hash partition, so that the same main key data of the same table data are in the same partition;
s103: create doris broad table: creating a doris wide table, wherein other fields except the main key are replaced by REPLACE _ IF _ NOT _ NULL;
s104: write data to doris wide table: creating a plurality of flash stream tasks, writing a plurality of dwd layer or dws layer data in the kafka system into the doris wide table in a stream load mode, and enabling the data of the same main key in the doris wide table to be updated in an overlaying mode;
s105: acquiring full data: and querying a doris wide table according to actual requirements to obtain full data.
Further, in step S105, when a doris wide table is queried, a cross-latitude correlation query is performed by using a correlation field with another table as a query condition.
Further, after the data of the doris wide table is written, when a new field is added to the doris wide table at any time, an asynchronous execution mode is adopted, and the specific process is as follows:
s201: newly building a field d in a doris wide table; and after the field d is newly built, starting to access the original incremental data of the task a.
S202: newly building an offline task b, and importing the full data in the doris wide table into the field d by the offline task b before the field d is built;
s203: in the importing process, the field d receives the incremental data and the historical full data at the same time, the incremental data and the historical full data are disordered, and the incremental data continuously keep the historical full data of the same main key until the importing is finished;
s204: newly building a flink temporary task c, accessing data consumption, writing data into the field d by the temporary task c and the original task d simultaneously, closing the temporary task c after the temporary task d runs for a period of time T, and finishing field updating by the doris wide table.
A real-time data full capture apparatus, the apparatus comprising:
binlog data acquisition unit: accessing real-time binary log binlog data: monitoring and collecting binlog data of the mysql database by using a flinkcdc component;
a data partitioning unit: distributing global sorting id to the collected binlog data, and when writing in a kafka system, taking a main key of a table as a partitioning strategy, wherein the main key value adopts a hash partition, so that the same main key data of the same table data are in the same partition;
doris big Wide Table creation Unit: creating a doris large-width table, wherein fields except for a main key are replaced by REPLACE _ IF _ NOT _ NULL;
data filling unit of doris large-width table: creating a plurality of flash stream tasks, writing a plurality of dwd layer or dws layer data in the kafka system into the doris wide table in a stream load mode, and enabling the data of the same main key in the doris wide table to be updated in an overlaying mode;
a full data acquisition unit: and querying a doris wide table according to actual requirements to obtain full data.
A computer device comprising a memory, a processor and a computer program stored in said memory and executable on said processor, said processor implementing the steps of any of said real-time data full-scale acquisition methods when executing said computer program.
A computer-readable storage medium storing a computer program which, when executed by a processor, implements the steps of any one of the real-time data full-scale acquisition methods.
The beneficial effects provided by the invention are as follows:
1. the problem of mysql cross-library analysis is mainly solved, and the scheme is low in cost;
2. generating a full-scale real-time large-width table (full-data large-width table) based on Flink + doris, directly carrying out real-time statistics on the large-width table, optimizing query speed and improving use efficiency;
3. the problem of downstream data synchronization caused by modifying an original table in real-time statistics is solved with the minimum cost;
4. and generating a large-width table based on doris solving historical full-scale data.
Drawings
FIG. 1 is a schematic flow chart of a real-time data full-scale acquisition method according to the present invention;
FIG. 2 is a simplified example of creating a doris broad table;
FIG. 3 is a process of large width table data writing;
fig. 4 is a schematic diagram of data update.
Detailed Description
To make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be further described with reference to the accompanying drawings.
First, the related terms are explained in a unified way as follows:
the Flank is an apache top level open source project and is a calculation engine in the aspect of real-time processing;
flinkcdc: the system is a Flink-cdc-connectors component developed by a Flink community, and is a source component capable of directly reading full data and incremental change data from databases such as MySQL, PostgreSQL and the like. The CDC is used for monitoring and capturing the change of the database, and the changes are completely recorded according to the occurrence sequence and written into the message middleware for subscribing and consuming other services;
the Mysql Binlog is a log file in a binary format and records the change of the Mysql inside to a database;
kafka is a high throughput distributed publish-subscribe messaging system;
doris is an OLAP system of MPP, and provides high-performance analysis and report query functions on a large data set at lower cost;
the invention provides a real-time data full-scale acquisition method, and a basic idea refers to fig. 1. FIG. 1 is a flow chart of the method of the present invention.
S101: accessing real-time binary log binlog data: monitoring and collecting binlog data of the mysql database by using a flinkcdc component;
it should be noted that the flinkcdc component monitors the mysql database binlog log, then links the log data to the message middleware kafka, and then develops the application program for data consumption processing. Or use other tools in existing commercial libraries, such as logail tool in ali, and then develop program consumption.
It is easy to understand that, in the present application, the manner of acquiring real-time changing data by monitoring the binlog log file of the mysql database does not affect the database performance, and the data synchronization performance problem and the timeliness problem are solved, compared with the conventional and common data query derivation schemes.
S102: write binlog data to kafka system: distributing global sorting id to the collected binlog data, and when writing in a kafka system, taking a main key of a table as a partitioning strategy, wherein the main key value adopts a hash partition, so that the same main key data of the same table data are in the same partition;
it should be noted that binlog data of mysql database is written into kafka system in real time, in this process, binlog is in the second order, but if data in this same second is out of order, the final data will be wrong.
Therefore, when the binlog is collected, the collected binlog is distributed with the global sorting id, further, when kafka is written in, a primary key of the table is used as a partition strategy, and the primary key value is partitioned by hash, so that the same primary key data of the same table data are ensured to be in the same partition, the ordering of the data is finally ensured, and the data are also ensured to be ordered when doris data are written in and updated.
It should be further noted that the data of the kafka system is divided into an ods layer, an dwd layer and a dws layer data, and the data of the kafka system is processed by etl and then is parsed into a standard format.
S103: create doris broad table: creating a doris large-width table, wherein fields except for a main key are replaced by REPLACE _ IF _ NOT _ NULL;
referring to fig. 2, fig. 2 is a simplified example of creating a doris wide table;
in fig. 2, a large-width table of the table name "test" is created; the key1 and the key2 are included, and all fields except the key are REPLACE _ IF _ NOT _ NULL.
In the application, through such a processing mode, after the modified delete data of the bin is accessed, the corresponding primary key data can be updated in a covering manner.
S104: write data to doris wide table: creating a plurality of flash stream tasks, writing a plurality of dwd layer or dws layer data in the kafka system into the doris wide table in a stream load mode, and enabling the data of the same main key in the doris wide table to be updated in an overlaying mode;
it should be noted that, when writing multiple pieces of flink stream data into the doris wide table at the same time, multiple flink tasks are created, and multiple pieces of data at dwd level or dws level are written into doris in a stream load manner, and the same primary key data is overwritten and updated.
Referring to fig. 3, fig. 3 illustrates a process of writing data into a large width table.
Examples are as follows: in a certain embodiment, a total of three pieces of stream data are included;
the first stream: key1, Key2, value1, value2
The second stream: key1, Key2, value3, value4
The third flow: key2, value5, value 6;
three streams are written simultaneously into the large width table.
S105: acquiring full data: and querying a doris wide table according to actual requirements to obtain full data.
In step S105, when a doris wide table is queried, a cross-latitude correlation query is performed using a correlation field with another table as a query condition.
It should be noted that when a doris wide table is used for query, the doris wide table can be used as a large table, and can be associated with other tables for query (cross-dimension associated query), and the doris wide table can be optimized for a single label, similar to roll up, so as to improve query speed.
After the doris wide table data is written in, when a new field is added to the doris wide table at any time, an asynchronous execution mode is adopted, and the specific process is as follows:
s201: newly building a field d in a doris wide table; and after the field d is newly built, starting to access the original incremental data of the task a.
S202: newly building an offline task b, and importing the full data in the doris wide table into the field d by the offline task b before the field d is newly built;
s203: in the importing process, the field d receives the incremental data and the historical full data at the same time, the incremental data and the historical full data are disordered, and the incremental data continuously keep the historical full data of the same main key until the importing is finished;
s204: newly building a flink temporary task c, accessing data consumption, writing data into the field d by the temporary task c and the original task d simultaneously, closing the temporary task c after the temporary task d runs for a period of time T, and finishing field updating by the doris wide table.
The above process, briefly, is: if a certain field historical value of the large-width table needs to be modified, a doris table field is newly added for asynchronous execution.
A new field is added, historical data are firstly imported into a doris newly added field, a temporary task is newly built, incremental data size is accessed, the incremental data size is written into the doris new and old fields simultaneously with an original task, the doris new and old fields are operated for a period of time (the setting time is up, a script is called to kill the task) and the temporary task is closed, the original incremental task is modified, the written fields are restarted, and then a data source is consumed, so that the updating and the synchronization of the data are guaranteed.
For better explanation, please refer to fig. 4, fig. 4 is a schematic diagram of data update;
as before, in one embodiment, a total of three pieces of stream data are included;
the first stream: key1, Key2, value1, value2
The second stream: key1, Key2, value3, value4
The third stream: key2, value5, value 6;
where the third stream has only key2, to update its data, it can be supplemented by a dimension table (cross-dimension association query), see the top right hand portion of FIG. 4.
That is, the third flow is to disassociate the dimension table completion key into:
key1, Key2, value5, value 6. The process adopted by the method is also the process described in steps S201 to S204. The supplement of the key2 field in the third stream data and the data coverage update are finally realized through the process.
A real-time data full acquisition apparatus, the apparatus comprising:
a Binlog data acquisition unit: accessing real-time binary log binlog data: monitoring and collecting binlog data of the mysql database by using a flinkcdc component;
a data partitioning unit: distributing global sorting id to the collected binlog data, and when writing in a kafka system, taking a main key of a table as a partitioning strategy, wherein the main key value adopts a hash partition, so that the same main key data of the same table data are in the same partition;
doris big Wide Table creation Unit: creating a doris wide table, wherein other fields except the main key are replaced by REPLACE _ IF _ NOT _ NULL;
data filling unit of doris large-width table: creating a plurality of flash stream tasks, writing a plurality of dwd layer or dws layer data in the kafka system into the doris wide table in a stream load mode, and enabling the data of the same main key in the doris wide table to be updated in an overlaying mode;
a full data acquisition unit: and querying a doris wide table according to actual requirements to obtain full data.
A computer device comprising a memory, a processor and a computer program stored in said memory and executable on said processor, said processor implementing the steps of any of said real-time data full-scale acquisition methods when executing said computer program.
A computer-readable storage medium storing a computer program which, when executed by a processor, implements the steps of any one of the real-time data full-scale acquisition methods.
The beneficial effects of the implementation of the invention are as follows:
1. the problem of mysql cross-library analysis is mainly solved, and the scheme is low in cost;
2. generating a full-scale real-time large-width table (full-data large-width table) based on Flink + doris, directly carrying out real-time statistics on the large-width table, optimizing query speed and improving use efficiency;
3. the problem of downstream data synchronization caused by modifying an original table in real-time statistics is solved with the minimum cost;
4. and generating a large-width table based on doris solving historical full-scale data.
The features of the above-described embodiments and embodiments of the invention may be combined with each other without conflict.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (6)

1. A real-time data full-scale acquisition method is characterized by comprising the following steps: the method comprises the following steps:
s101: accessing real-time binary log binlog data: monitoring and collecting binlog data of the mysql database by using a flinkcdc component;
s102: write binlog data to kafka system: distributing global sorting id to the collected binlog data, and when the binlog data is written into a kafka system, pressing a primary key of a table to serve as a partitioning strategy, wherein the primary key value adopts a hash partition, so that the same primary key data of the same table data are in the same partition;
s103: create doris broad table: creating a doris wide table, wherein other fields except the main key are replaced by REPLACE _ IF _ NOT _ NULL;
s104: write data to doris wide table: creating a plurality of flash stream tasks, writing a plurality of dwd layer or dws layer data in the kafka system into the doris wide table in a stream load mode, and enabling the data of the same main key in the doris wide table to be updated in an overlaying mode;
s105: acquiring full data: and querying a doris wide table according to actual requirements to obtain full data.
2. The method for acquiring the full amount of the real-time data according to claim 1, wherein: in step S105, when a doris wide table is queried, cross-latitude association query is performed by using the association fields of other tables as query conditions.
3. The method for acquiring the full amount of the real-time data according to claim 1, wherein: after the doris wide table data is written in, when newly adding fields to the doris wide table at any time, an asynchronous execution mode is adopted, and the specific process is as follows:
s201: newly building a field d in a doris wide table; and after the field d is newly built, starting to access the original incremental data of the task a.
S202: newly building an offline task b, and importing the full data in the doris wide table into the field d by the offline task b before the field d is built;
s203: in the importing process, the field d receives the incremental data and the historical full data at the same time, the incremental data and the historical full data are disordered, and the incremental data are continuously the historical full data of the same main key until the importing is finished;
s204: newly building a flink temporary task c, accessing data consumption, writing data into the field d by the temporary task c and the original task d simultaneously, closing the temporary task c after the temporary task d runs for a period of time T, and finishing field updating by the doris wide table.
4. A real-time data full-scale acquisition apparatus, the apparatus comprising:
binlog data acquisition unit: accessing real-time binary log binlog data: monitoring and collecting binlog data of the mysql database by using a flinkcdc component;
a data partitioning unit: distributing global sorting id to the collected binlog data, and when writing in a kafka system, taking a main key of a table as a partitioning strategy, wherein the main key value adopts a hash partition, so that the same main key data of the same table data are in the same partition;
doris big Wide Table creation Unit: creating a doris wide table, wherein other fields except the main key are replaced by REPLACE _ IF _ NOT _ NULL;
data filling unit of doris large-width table: creating a plurality of flash stream tasks, writing a plurality of dwd layer or dws layer data in the kafka system into the doris wide table in a stream load mode, and enabling the data of the same main key in the doris wide table to be updated in an overlaying mode;
a full data acquisition unit: and querying the doris wide table according to actual requirements to obtain full data.
5. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the real-time data full-scale acquisition method according to any one of claims 1 to 3 when executing the computer program.
6. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed by a processor, implements the steps of the real-time data full size acquisition method according to any one of claims 1 to 3.
CN202210128596.8A 2022-02-11 2022-02-11 Real-time data full-scale acquisition method and device and computer equipment Pending CN114579614A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210128596.8A CN114579614A (en) 2022-02-11 2022-02-11 Real-time data full-scale acquisition method and device and computer equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210128596.8A CN114579614A (en) 2022-02-11 2022-02-11 Real-time data full-scale acquisition method and device and computer equipment

Publications (1)

Publication Number Publication Date
CN114579614A true CN114579614A (en) 2022-06-03

Family

ID=81773769

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210128596.8A Pending CN114579614A (en) 2022-02-11 2022-02-11 Real-time data full-scale acquisition method and device and computer equipment

Country Status (1)

Country Link
CN (1) CN114579614A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115033646A (en) * 2022-08-11 2022-09-09 深圳联友科技有限公司 Method for constructing real-time warehouse system based on Flink and Doris
CN115062028A (en) * 2022-07-27 2022-09-16 中建电子商务有限责任公司 Method for multi-table join query in OLTP field
CN115470217A (en) * 2022-11-14 2022-12-13 云筑信息科技(成都)有限公司 Method for solving change response problem of data bin model in real time
CN116431654A (en) * 2023-06-08 2023-07-14 中新宽维传媒科技有限公司 Data storage method, device, medium and computing equipment based on integration of lake and warehouse
CN117331513A (en) * 2023-12-01 2024-01-02 蒲惠智造科技股份有限公司 Data reduction method and system based on Hadoop architecture
CN117792960A (en) * 2024-02-23 2024-03-29 中国电子科技集团公司第三十研究所 Historical flow statistics method and device based on domestic multi-core processor

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115062028A (en) * 2022-07-27 2022-09-16 中建电子商务有限责任公司 Method for multi-table join query in OLTP field
CN115062028B (en) * 2022-07-27 2023-01-06 中建电子商务有限责任公司 Method for multi-table join query in OLTP field
CN115033646A (en) * 2022-08-11 2022-09-09 深圳联友科技有限公司 Method for constructing real-time warehouse system based on Flink and Doris
CN115033646B (en) * 2022-08-11 2023-01-13 深圳联友科技有限公司 Method for constructing real-time warehouse system based on Flink and Doris
CN115470217A (en) * 2022-11-14 2022-12-13 云筑信息科技(成都)有限公司 Method for solving change response problem of data bin model in real time
CN116431654A (en) * 2023-06-08 2023-07-14 中新宽维传媒科技有限公司 Data storage method, device, medium and computing equipment based on integration of lake and warehouse
CN116431654B (en) * 2023-06-08 2023-09-08 中新宽维传媒科技有限公司 Data storage method, device, medium and computing equipment based on integration of lake and warehouse
CN117331513A (en) * 2023-12-01 2024-01-02 蒲惠智造科技股份有限公司 Data reduction method and system based on Hadoop architecture
CN117331513B (en) * 2023-12-01 2024-03-19 蒲惠智造科技股份有限公司 Data reduction method and system based on Hadoop architecture
CN117792960A (en) * 2024-02-23 2024-03-29 中国电子科技集团公司第三十研究所 Historical flow statistics method and device based on domestic multi-core processor
CN117792960B (en) * 2024-02-23 2024-04-30 中国电子科技集团公司第三十研究所 Historical flow statistics method and device based on domestic multi-core processor

Similar Documents

Publication Publication Date Title
CN114579614A (en) Real-time data full-scale acquisition method and device and computer equipment
US11829360B2 (en) Database workload capture and replay
US11762882B2 (en) System and method for analysis and management of data distribution in a distributed database environment
CN102880685B (en) Method for interval and paging query of time-intensive B/S (Browser/Server) with large data size
Poess et al. TPC-DI: the first industry benchmark for data integration
CN103154935B (en) For inquiring about the system and method for data stream
CN102214176B (en) Method for splitting and join of huge dimension table
CN103353873B (en) Optimization implementation method and system based on the service of time measure data real-time query
US11934306B2 (en) Object storage change-events
CN102779138B (en) The hard disk access method of real time data
CN104572856A (en) Converged storage method of service source data
CN104199978A (en) System and method for realizing metadata cache and analysis based on NoSQL and method
CN113742325A (en) Data warehouse construction method, device and system, electronic equipment and storage medium
CN114153809A (en) Parallel real-time incremental statistic method based on database logs
CN106776810B (en) Big data processing system and method
CN110851515B (en) Big data ETL model execution method and medium based on Spark distributed environment
Meoni et al. Exploiting Apache Spark platform for CMS computing analytics
CN114265875B (en) Method for establishing wide table in real time based on stream data
Goncalves et al. DottedDB: Anti-entropy without merkle trees, deletes without tombstones
CN116126901A (en) Data processing method, device, electronic equipment and computer readable storage medium
CN116010452A (en) Industrial data processing system and method based on stream type calculation engine and medium
CN114168595A (en) Data analysis method and device
Ma et al. Live data migration approach from relational tables to schema-free collections with mapreduce
CN116756247B (en) Data restoration method, device, computer equipment and storage medium
US20240070180A1 (en) Mutation-Responsive Documentation Regeneration Based on Knowledge Base

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination