WO2023082681A1 - Procédé et appareil de traitement de données basés sur une intégration de flux par lots, dispositif informatique et support - Google Patents

Procédé et appareil de traitement de données basés sur une intégration de flux par lots, dispositif informatique et support Download PDF

Info

Publication number
WO2023082681A1
WO2023082681A1 PCT/CN2022/105078 CN2022105078W WO2023082681A1 WO 2023082681 A1 WO2023082681 A1 WO 2023082681A1 CN 2022105078 W CN2022105078 W CN 2022105078W WO 2023082681 A1 WO2023082681 A1 WO 2023082681A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
processing
layer
processed
module
Prior art date
Application number
PCT/CN2022/105078
Other languages
English (en)
Chinese (zh)
Inventor
罗静
王博一
王晓
霍星志
郭宇鹏
毛少将
Original Assignee
通号通信信息集团有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 通号通信信息集团有限公司 filed Critical 通号通信信息集团有限公司
Publication of WO2023082681A1 publication Critical patent/WO2023082681A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24568Data stream processing; Continuous queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases

Definitions

  • the data to be processed is processed layer by layer to obtain the first data; wherein, in each of the processing layers, the data input to the processing layer is processed to obtain the processed data, and the processed data is The data is real-time data, based on the Flink flow, the processed real-time data is stored in the Hive module, and the processed data is input to the next processing layer; the first data is the last one in the processing chain The processed data obtained by the processing layer;
  • the first processing module is configured to process the data to be processed layer by layer according to the processing link to obtain first data
  • the offline data is obtained from the Hive module, and the offline data is used to correct the wrong data.
  • Real-time data is corrected. Since the data is passed layer by layer and processed layer by layer, changes in the processing results of the previous processing layer will cause corresponding changes in the processing results of the subsequent processing layers, so the corrected data needs to be input to the next processing layer. layer, and the next processing layer re-processes the data.
  • the data processing device can connect to visual display components (such as Tableau), and query the full amount of data in a custom way on the web client (Web), thereby supporting the visual display of front-end data.
  • visual display components such as Tableau
  • FIG. 3 is a schematic diagram of the first structure of the data processing device provided by the embodiment of the present disclosure.
  • the data processing device includes an acquisition module 101, a first processing module 102 and The second processing module 103, the second processing module 103 forms a data application layer, the first processing module 102 includes a plurality of processing layers, each processing layer forms a processing link, and each processing layer includes a first processing unit 1021 and a second processing unit 1022 .
  • the acquiring module 101 is configured to acquire data to be processed, and the data to be processed includes real-time data.
  • the ODS layer, DWD layer and DWS layer are connected through the Kafka module to realize data exchange. Passed layer by layer.
  • the second processing module 203 is located at the ADS layer and may be an OLAP module.
  • the query module 204 is respectively connected to the Hive module of each processing layer and the OLAP module of the ADS layer, so as to realize cross-source query.

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Debugging And Monitoring (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

La présente divulgation concerne un procédé de traitement de données basé sur une intégration de flux par lots, ledit procédé consistant à : obtenir des données à traiter ; selon une liaison de traitement, traiter, couche par couche, les données à traiter de façon à obtenir des premières données ; dans des couches de traitement, traiter des données entrées dans la couche de traitement actuelle pour obtenir des données en temps réel traitées, stocker les données traitées en temps réel dans un module Hive d'après un flux Flink, puis entrer les données traitées dans la couche de traitement suivante ; traiter les premières données dans une couche d'application de données pour obtenir des secondes données ; en réponse à la détection d'une erreur dans les secondes données, corriger, selon les données hors ligne de la couche de traitement actuelle, les données erronées dans la couche de traitement où une erreur de données s'est produite, puis entrer les données corrigées dans la couche de traitement suivante, de façon à ce que la couche de traitement suivante traite les données d'entrée. La présente divulgation concerne également un appareil de traitement de données, un dispositif informatique et un support.
PCT/CN2022/105078 2021-11-09 2022-07-12 Procédé et appareil de traitement de données basés sur une intégration de flux par lots, dispositif informatique et support WO2023082681A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111318823.5 2021-11-09
CN202111318823.5A CN113779094B (zh) 2021-11-09 2021-11-09 基于批流一体的数据处理方法、装置、计算机设备和介质

Publications (1)

Publication Number Publication Date
WO2023082681A1 true WO2023082681A1 (fr) 2023-05-19

Family

ID=78956925

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/105078 WO2023082681A1 (fr) 2021-11-09 2022-07-12 Procédé et appareil de traitement de données basés sur une intégration de flux par lots, dispositif informatique et support

Country Status (2)

Country Link
CN (1) CN113779094B (fr)
WO (1) WO2023082681A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117724706A (zh) * 2024-02-06 2024-03-19 湖南盛鼎科技发展有限责任公司 批流一体流程化实时处理异构平台海量数据的方法及系统
CN118051554A (zh) * 2024-03-05 2024-05-17 合肥喆塔科技有限公司 基于FlinkSQL与Kudu构建实时数仓的方法、设备及介质

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113779094B (zh) * 2021-11-09 2022-03-22 通号通信信息集团有限公司 基于批流一体的数据处理方法、装置、计算机设备和介质
CN114416845A (zh) * 2022-01-19 2022-04-29 平安好医投资管理有限公司 大数据测试方法、装置、电子设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103473480A (zh) * 2013-10-08 2013-12-25 武汉大学 基于改进万有引力支持向量机的在线监测数据校正方法
US20150341231A1 (en) * 2014-05-21 2015-11-26 Asif Khan Distributed system architecture using event stream processing
CN112000636A (zh) * 2020-08-31 2020-11-27 民生科技有限责任公司 基于Flink流式处理的用户行为统计分析方法
CN113515363A (zh) * 2021-08-10 2021-10-19 中国人民解放军61646部队 面向异型任务高并发的多层次数据处理系统动态调度平台
CN113779094A (zh) * 2021-11-09 2021-12-10 通号通信信息集团有限公司 基于批流一体的数据处理方法、装置、计算机设备和介质

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10936585B1 (en) * 2018-10-31 2021-03-02 Splunk Inc. Unified data processing across streaming and indexed data sets
US11526539B2 (en) * 2019-01-31 2022-12-13 Salesforce, Inc. Temporary reservations in non-relational datastores
CN112507029B (zh) * 2020-12-18 2022-11-04 上海哔哩哔哩科技有限公司 数据处理系统及数据实时处理方法
CN113220521A (zh) * 2021-02-04 2021-08-06 北京易车互联信息技术有限公司 实时监控系统
CN112905595A (zh) * 2021-03-05 2021-06-04 腾讯科技(深圳)有限公司 一种数据查询方法、装置及计算机可读存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103473480A (zh) * 2013-10-08 2013-12-25 武汉大学 基于改进万有引力支持向量机的在线监测数据校正方法
US20150341231A1 (en) * 2014-05-21 2015-11-26 Asif Khan Distributed system architecture using event stream processing
CN112000636A (zh) * 2020-08-31 2020-11-27 民生科技有限责任公司 基于Flink流式处理的用户行为统计分析方法
CN113515363A (zh) * 2021-08-10 2021-10-19 中国人民解放军61646部队 面向异型任务高并发的多层次数据处理系统动态调度平台
CN113779094A (zh) * 2021-11-09 2021-12-10 通号通信信息集团有限公司 基于批流一体的数据处理方法、装置、计算机设备和介质

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117724706A (zh) * 2024-02-06 2024-03-19 湖南盛鼎科技发展有限责任公司 批流一体流程化实时处理异构平台海量数据的方法及系统
CN117724706B (zh) * 2024-02-06 2024-05-03 湖南盛鼎科技发展有限责任公司 批流一体流程化实时处理异构平台海量数据的方法及系统
CN118051554A (zh) * 2024-03-05 2024-05-17 合肥喆塔科技有限公司 基于FlinkSQL与Kudu构建实时数仓的方法、设备及介质

Also Published As

Publication number Publication date
CN113779094B (zh) 2022-03-22
CN113779094A (zh) 2021-12-10

Similar Documents

Publication Publication Date Title
WO2023082681A1 (fr) Procédé et appareil de traitement de données basés sur une intégration de flux par lots, dispositif informatique et support
US11422982B2 (en) Scaling stateful clusters while maintaining access
US11354314B2 (en) Method for connecting a relational data store's meta data with hadoop
US11836533B2 (en) Automated reconfiguration of real time data stream processing
US9418113B2 (en) Value based windows on relations in continuous data streams
US8321450B2 (en) Standardized database connectivity support for an event processing server in an embedded context
US8387076B2 (en) Standardized database connectivity support for an event processing server
CN112507029B (zh) 数据处理系统及数据实时处理方法
CN109656963B (zh) 元数据获取方法、装置、设备及计算机可读存储介质
CN106649630A (zh) 数据查询方法及装置
US20230144100A1 (en) Method and apparatus for managing and controlling resource, device and storage medium
CN106687955B (zh) 简化将数据从数据源转移到数据目标的导入过程的调用
EP2883172A1 (fr) Système de base de données relationnelle en temps réel à haute performance et procédé pour l'utiliser
US10394805B2 (en) Database management for mobile devices
CN110019267A (zh) 一种元数据更新方法、装置、系统、电子设备及存储介质
US11645179B2 (en) Method and apparatus of monitoring interface performance of distributed application, device and storage medium
CN107346270B (zh) 基于实时计算的基数估计的方法和系统
CN108629016B (zh) 支持实时流计算面向大数据数据库控制系统、计算机程序
US10489179B1 (en) Virtual machine instance data aggregation based on work definition metadata
WO2017157111A1 (fr) Procédé, dispositif et système pour empêcher la perte de données de mémoire
US8510426B2 (en) Communication and coordination between web services in a cloud-based computing environment
CN111125161A (zh) 数据的实时处理方法、装置、设备及存储介质
US20220277009A1 (en) Processing database queries based on external tables
US11757959B2 (en) Dynamic data stream processing for Apache Kafka using GraphQL
CN113612832A (zh) 流式数据分发方法与系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22891490

Country of ref document: EP

Kind code of ref document: A1