CN115168472A - Real-time report generation method and system based on Flink - Google Patents

Real-time report generation method and system based on Flink Download PDF

Info

Publication number
CN115168472A
CN115168472A CN202210873540.5A CN202210873540A CN115168472A CN 115168472 A CN115168472 A CN 115168472A CN 202210873540 A CN202210873540 A CN 202210873540A CN 115168472 A CN115168472 A CN 115168472A
Authority
CN
China
Prior art keywords
real
time
data
flink
time report
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210873540.5A
Other languages
Chinese (zh)
Inventor
解培佩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Property and Casualty Insurance Company of China Ltd
Original Assignee
Ping An Property and Casualty Insurance Company of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Property and Casualty Insurance Company of China Ltd filed Critical Ping An Property and Casualty Insurance Company of China Ltd
Priority to CN202210873540.5A priority Critical patent/CN115168472A/en
Publication of CN115168472A publication Critical patent/CN115168472A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24553Query execution of query operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application belongs to the technical field of big data real-time calculation, and relates to a method for generating a real-time report based on Flink, which comprises the following steps: when the real-time data stream generated by the Flink operation is processed, transmitting the real-time data stream to a Kafka service system; writing the real-time data stream received by the Kafka service system into a disk structure of the Kafka service system; extracting real-time data streams in the disk structure through the ETL to a temporary intermediate layer for conversion processing, and generating real-time list data; sending the real-time list data to a Druid database; when the data query operation is detected, identifying the type of the data query operation, generating a corresponding operation instruction according to the determined type of the data query operation, and extracting a Druid engine in a Druid database according to the operation instruction to generate a real-time report; and pushing the real-time report to a front-end platform. The application also provides a real-time report generation system based on the Flink, computer equipment and a storage medium thereof.

Description

Real-time report generation method and system based on Flink
Technical Field
The application relates to the technical field of big data real-time calculation, in particular to a method and a system for generating a real-time report based on Flink, a computer device and a storage medium thereof.
Background
The traditional report item is combined by a report tool, a data warehouse and an ETL (data warehouse technology), so that the data generation time is long, if the data is directly read from a production system, huge pressure is caused on the production database, a performance bottleneck is generated, and the service is directly influenced. As the clients pay more and more attention to the real-time performance of the source data, the timeliness of the real-time report is more and more important.
Disclosure of Invention
The embodiment of the application aims to provide a method and a system for generating a real-time report based on Flink, a computer device and a storage medium thereof, so as to solve the technical problem of long time for generating the real-time report.
In order to solve the above technical problem, an embodiment of the present application provides a method for generating a real-time report based on Flink, which adopts the following technical solutions: the method comprises the following steps:
when the real-time data stream generated by the Flink operation is processed, transmitting the real-time data stream to a Kafka service system;
writing the real-time data stream received by the Kafka service system into a disk structure of the Kafka service system;
extracting real-time data streams in the disk structure through the ETL to a temporary intermediate layer for conversion processing, and generating real-time list data;
sending the real-time list data to a Druid database;
when the data query operation is detected, identifying the type of the data query operation, generating a corresponding operation instruction according to the determined type of the data query operation, and extracting a Druid engine in a Druid database according to the operation instruction to generate a real-time report;
and pushing the real-time report to a front-end platform.
Further, after the step of extracting the real-time data stream in the disk structure to the temporary intermediate layer for conversion processing through the ETL and generating the real-time manifest data, the method further includes:
transmitting the real-time inventory data to a hive data warehouse through a flux data pipeline for storage and backup;
when a fault occurs during the flight operation, inquiring the difference between the real-time inventory data backed up in the hive data warehouse and the real-time inventory data generated by the Kafka service system through a hive tool, and acquiring target inventory data;
and (5) complementing the target list data to a Druid database.
Further, the step of transmitting the real-time inventory data to the Flume system for backup includes:
receiving real-time inventory data and transmitting the received real-time inventory data to one or more pavement guides in a Flume;
and storing the live inventory data value hive data warehouse after the flash transmission.
Further, the step of extracting the real-time data stream in the disk structure through the ETL to the temporary middle layer for conversion processing, and generating the real-time manifest data includes:
extracting a real-time data stream received by the Kafka service system to a temporary intermediate layer through ETL, and generating real-time list data in a standard format after cleaning, converting and integrating the real-time data in the temporary intermediate layer;
and transmitting the real-time inventory data to the Kafka service system.
Further, the step of pushing the real-time report to the front-end platform includes:
setting an index column, a time column and a dimension column of the real-time report;
and pushing the real-time report to a front-end platform by using a logical set of an index column, a time column or a dimension column.
Further, after the step of pushing the real-time report to the front-end platform, the method further includes:
and when the preset identification of the front-end platform generates dynamic actions such as supporting, pulling or dragging, an index column, a time column or a dimension column of the real-time report is obtained.
Further, after the step of pushing the real-time report to the front-end platform, the method further includes:
when the real-time content generated by the front-end platform according to the real-time report does not accord with the target content corresponding to the data query operation, pre-stored list data stored in a hive data warehouse is obtained through a hive tool, and the target fault position in the drive database is located by comparing the pre-stored list data with the real-time list data obtained in the drive database;
extracting target fault data at a target fault position in a drive database and target list data at a position corresponding to the target fault position in a hive data warehouse;
replacing the target list data with target fault data, and supplementing the target fault data back to the target fault position in the Druid database;
and when the real-time content generated by the front-end platform according to the real-time report accords with the target content corresponding to the data query operation, updating the pre-stored list data stored in the hive data warehouse.
In order to solve the above technical problem, an embodiment of the present application further provides a real-time report generating system based on Flink, where the system includes:
the transmission module is used for transmitting the real-time data stream to the Kafka business system when the real-time data stream is generated by the Flink operation;
the processing module is used for writing the real-time data stream received by the Kafka service system into a disk structure of the Kafka service system; extracting real-time data streams in the disk structure through the ETL to a temporary intermediate layer for conversion processing, and generating real-time list data;
the sending module is used for sending the real-time list data to the Druid database;
the extraction module is used for identifying the type of the data query operation when the data query operation is detected, generating a corresponding operation instruction according to the determined type of the data query operation, and extracting a Druid engine in a Druid database according to the operation instruction to generate a real-time report;
and the pushing module is used for pushing the real-time report to the front-end platform.
In order to solve the above technical problem, an embodiment of the present application further provides a computer device, which adopts the following technical solutions: the method comprises a memory and a processor, wherein the memory stores computer readable instructions, and the processor executes the computer readable instructions to realize the steps of the real-time report generation method based on Flink.
In order to solve the above technical problem, an embodiment of the present application further provides a computer-readable storage medium, which adopts the following technical solutions: the computer readable storage medium has stored thereon computer readable instructions which, when executed by a processor, implement the steps of the Flink-based real-time report generation method as described above.
Compared with the prior art, the embodiment of the application mainly has the following beneficial effects: transmitting the real-time data stream to a Kafka service system when the real-time data stream is generated by the Flink operation; writing the real-time data stream received by the Kafka service system into a disk structure of the Kafka service system; extracting real-time data streams in the disk structure through the ETL to a temporary intermediate layer for conversion processing, and generating real-time list data; sending the real-time list data to a Druid database; when the data query operation is detected, identifying the type of the data query operation, generating a corresponding operation instruction according to the determined type of the data query operation, and extracting a Druid engine in a Druid database according to the operation instruction to generate a real-time report; and pushing the real-time report to a front-end platform. The method has the advantages that the full link generation of the time report is realized, the fault tolerance of data is improved, the generation speed of the real-time report is increased from the perspective of the data universe, the data timeliness of the real-time report is improved, and the accuracy of the data is improved.
Drawings
In order to more clearly illustrate the solution of the present application, the drawings needed for describing the embodiments of the present application will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and that other drawings can be obtained by those skilled in the art without inventive effort.
FIG. 1 is an exemplary system architecture diagram in which the present application may be applied;
FIG. 2 is a flow diagram of one embodiment of a Flink-based real-time report generation method;
FIG. 3 is a schematic diagram of the structure of an embodiment of a real-time report generation system based on Flink;
FIG. 4 is a schematic block diagram of one embodiment of a computer device according to the present application.
Detailed Description
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "including" and "having," and any variations thereof, in the description and claims of this application and the description of the above figures are intended to cover non-exclusive inclusions. The terms "first," "second," and the like in the description and claims of this application or in the above-described drawings are used for distinguishing between different objects and not for describing a particular order.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings.
As shown in fig. 1, the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The terminal devices 101, 102, 103 may have various communication client applications installed thereon, such as a web browser application, a shopping application, a search application, an instant messaging tool, a mailbox client, social platform software, and the like.
The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving Picture Experts group audio Layer III, motion Picture Experts group audio Layer 3), MP4 players (Moving Picture Experts group audio Layer IV, motion Picture Experts group audio Layer 4), laptop portable computers, desktop computers, and the like.
The server 105 may be a server providing various services, such as a background server providing support for pages displayed on the terminal devices 101, 102, 103.
It should be noted that the method for generating a real-time report based on Flink provided in the embodiment of the present application is generally executed by a server, and accordingly, the system for generating a real-time report based on Flink is generally disposed in the server.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
With continuing reference to FIG. 2, a flowchart of one embodiment of a Flink-based real-time report generation method according to the present application is shown. The method for generating the real-time report based on the Flink comprises the following steps:
step S201, when the real-time data stream generated by the Flink operation is generated, the real-time data stream is transmitted to the Kafka business system.
It should be noted that the core of the Flink (open source stream processing framework) is mainly a distributed stream data engine written by Java and Scala. Flink executes arbitrary streaming data programs in a distributed data parallel and pipelined manner. In which data and computations in the stream data processing system are stored locally, such as in memory or on a local disk.
In this embodiment, the state of Flink is mainly embodied as: each operator subtask maintains a store of the state corresponding to that operator subtask, and states between operator subtasks cannot be accessed, for example: when the number of parallel instances of the operator or the number of subtasks of the operator changes, the application needs to stop or newly start the target operator subtask, and the state data on the original operator subtask is updated to the target operator subtask.
The state for Flink is understood to be a variable of the subtask of an operator on its current instance that records the results produced by the history of flow through the current operator. Specifically, as new data records come in, we need to do the calculations in conjunction with the current result (i.e., the Flink state). In practice, the state of Flink is created and managed by sub-tasks of operators, one operator sub-task receiving the input stream, obtaining the corresponding state, updating the state according to the new calculation result, for example: the integer fields flowing in within a time window are summed, when the operator subtask receives a new element, the value already stored in the state, i.e. the result of the historical sum, is obtained, then the current value input is added to the Flink state, and the Flink state data is updated.
In this embodiment, the Flink processes the real-time data stream into receiving data, processing data, and outputting a processing result, where the receiving data specifically receives one or more data sources, and the data sources are Hdfs, kafka, and the like; processing data specifically comprises executing a plurality of conversion operators required by a user; and outputting the processing result specifically by outputting the converted result to the Kafka service system.
Step S202, writing the real-time data stream received by the Kafka service system into a disk structure of the Kafka service system;
kafka (high throughput distributed publish-subscribe messaging system) is a storage system that is used as a storage system to separate all published messages from message queues and from consumption, i.e., a storage system that stores published messages first. The real-time data stream written to Kafka will be written to disk and copied into the drive to ensure fault tolerance and allow waiting for message replies until the message is completely written. In the present embodiment, the disk structure of Kafka mainly controls the position of reading the real-time data stream by the drive (data query system), unlike the position of reading the real-time data stream controlled by the client in the prior art. Kafka is a distributed file system for filing log storage, replication, and propagation special purposes, in which the stream processing of Kafka continuously acquires a real-time data stream of input Flink, performs data ETL processing through business logic, and then writes to output drive (data query system).
Step S203, extracting real-time data stream in the disk structure through ETL to a temporary intermediate layer for conversion processing, and generating real-time list data;
in this embodiment, ETL (data warehouse technology) is used for data processing, building a complex application program which is aggregated or linked together, and is helpful for processing data out of order, reprocessing of code change, and performing Flink state calculation. The service logic describes the process of each service feature by a combination of building blocks and basic call processing modules.
It should be noted that, the ETL extracts the real-time data stream to the temporary middle layer, then performs cleaning, conversion, integration, and finally loads the extracted real-time data stream to the Kafka business system to complete online analysis and processing and data mining. The data ETL is processed to convert the real-time data stream into a standard format.
Writing the real-time data stream received by the Kafka service system into a disk structure of the Kafka service system; extracting a real-time data stream in a disk structure through ETL, transferring the real-time data stream to a temporary intermediate layer for conversion processing, generating real-time inventory data, and then transmitting the real-time inventory data to a hive (data warehouse tool) data warehouse through a flux (log collection system) data pipeline for storage and backup.
One or more agents are arranged in the interior of the flash, and each agent is provided with data acquired by an independent daemon process from a receiving end or from other agents and transmits the data to a gateway node or an agent of the next node. The received data is passed to one or more boot channels in a flash event format, where flash provides multiple ways of data reception: data serialization, remote procedure call frameworks, and the like. In this embodiment, the Flume is transmitted to the next node hive data warehouse.
In this embodiment, the hive data warehouse is used to store, query, and analyze large-scale data stored in Hadoop (distributed System infrastructure). The hive data warehouse tool maps the structured data file into a database table, provides an SQL (database language) query function, converts an SQL statement into a MapReduce task to execute, and queries and analyzes required contents through the SQL query function.
Specifically, in this embodiment, when a failure occurs during the flight operation, the live inventory data backed up in the live data warehouse is queried by the live tool to be different from the live inventory data generated by the Kafka business system, the different live inventory data is extracted and set as the target inventory data, and the target inventory data is complemented back to the drive database. The method comprises the steps that a structured data file is mapped into a database table through a hive data warehouse tool, a complete SQL query function is provided, and then when a fault occurs, the hive data warehouse tool rapidly queries needed content from a hive data warehouse and timely supplements the needed content data to a drive database, namely the needed content is fault point data.
Step S204, sending the real-time inventory data to a Druid database;
the Druid database is mainly used for carrying out aggregation query on a large amount of data based on time sequence, when real-time list data are ingested into the Druid database in real time and can be immediately checked after entering the Druid database, and meanwhile, the real-time list data are unchanged, so that the integrity of the real-time list data is ensured. Typically, the drive database is structured based on time-series inventory data, which can be queried by an external system when the inventory data enters the drive after it has occurred.
Step S205, when the data query operation is detected, identifying the type of the data query operation, generating a corresponding operation instruction according to the determined type of the data query operation, and extracting a Druid engine in a Druid database according to the operation instruction to generate a real-time report;
the Druid engine is extracted from the Druid database, the performance is excellent in timeliness, and the user result return in the second level can be realized. And the processed real-time list data is accessed into the drive engine for a downstream user to perform report query, so that the threshold of the user using the drive engine is reduced, and the efficiency of report query is increased. In this embodiment, the type of the data query operation includes a dynamic motion generated by a dynamic motion such as a drag or a drag performed by a user on the display screen.
And step S206, pushing the real-time report to a front-end platform.
Specifically, the Druid database is not only used as a distributed data analysis platform, but also used as a time sequence database, wherein a data structure in the Druid database is composed of a time column, a dimension column and an index column, wherein the time column: identifying a time value for each row of data; dimension column: identifying various category information of the data row; index column for aggregation and calculation. When the real-time report is generated by the drive engine, an index column, a time column and a dimension column of the real-time report are set, and the real-time report is pushed to the front-end platform through a logic set of the index column, the time column or the dimension column.
When the preset identification of the front-end platform generates dynamic actions such as supporting, pulling or dragging, the index column, the time column or the dimension column is changed, namely the report recombined after the change is a real-time report, and the index column, the time column or the dimension column of the real-time report is obtained. The real-time report can be obtained by dynamic actions such as dragging or pulling generated by the preset identifier of the front-end platform, and certainly, the real-time report can also be obtained by self-defining a configuration index column or a time column or a dimension column.
When the real-time content generated by the front-end platform according to the real-time report does not accord with the target content corresponding to the data query operation, pre-stored list data stored in a hive data warehouse is obtained through a hive tool, and the target fault position in the drive database is located by comparing the pre-stored list data with the real-time list data obtained in the drive database; extracting target fault data at a target fault position in a drive database and target list data at a position corresponding to the target fault position in a hive data warehouse; and replacing the target list data with target fault data, and supplementing the target fault data back to the target fault position in the Druid database. And when the real-time content generated by the front-end platform according to the real-time report accords with the target content corresponding to the data query operation, updating the pre-stored list data stored in the hive data warehouse.
In the embodiment, when a real-time data stream is generated by a flight operation, the real-time data stream is transmitted to a Kafka service system; writing the real-time data stream received by the Kafka service system into a disk structure of the Kafka service system; extracting real-time data streams in the disk structure through the ETL to a temporary intermediate layer for conversion processing, and generating real-time list data; sending real-time inventory data to a Druid database; when the data query operation is detected, identifying the type of the data query operation, generating a corresponding operation instruction according to the determined type of the data query operation, and extracting a Druid engine in a Druid database according to the operation instruction to generate a real-time report; and pushing the real-time report to a front-end platform. The method has the advantages that the full link generation of the time report is realized, the fault tolerance of data is improved, the generation speed of the real-time report is increased, the data timeliness of the real-time report is improved, and the accuracy of the data is improved.
It should be emphasized that, in order to further ensure the privacy and security of the real-time report, the real-time report may also be stored in a node of a block chain.
The block chain referred by the application is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
The embodiment of the application can acquire and process related data based on an artificial intelligence technology. Among them, artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result.
The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware associated with computer readable instructions, which can be stored in a computer readable storage medium, and when executed, the processes of the embodiments of the methods described above can be included. The storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a Random Access Memory (RAM).
It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
With further reference to fig. 3, as an implementation of the method shown in fig. 2, the present application provides an embodiment of a real-time report generating system based on Flink, where the embodiment of the system corresponds to the embodiment of the method shown in fig. 2.
As shown in fig. 3, the real-time report generating system 300 based on Flink according to this embodiment includes: a transmission module 301, a processing module 302, a sending module 303, an extraction module 304, and a push module 305. Wherein:
the transmission module 301 is configured to transmit the real-time data stream to the Kafka service system when the real-time data stream is generated by the Flink job;
a processing module 302, configured to write the real-time data stream received by the Kafka service system into a disk structure of the Kafka service system; extracting real-time data streams in the disk structure through the ETL to a temporary intermediate layer for conversion processing, and generating real-time list data;
a sending module 303, configured to send the real-time inventory data to a Druid database;
the extracting module 304 is used for identifying the type of the data query operation when the data query operation is detected, generating a corresponding operation instruction according to the determined type of the data query operation, and extracting a Druid engine in a Druid database according to the operation instruction to generate a real-time report;
and the pushing module 305 is configured to push the real-time report to the front-end platform.
In the embodiment, when a real-time data stream is generated by a flight operation, the real-time data stream is transmitted to a Kafka service system; writing the real-time data stream received by the Kafka service system into a disk structure of the Kafka service system; extracting real-time data streams in the disk structure through the ETL to a temporary intermediate layer for conversion processing, and generating real-time list data; sending real-time inventory data to a Druid database; when the data query operation is detected, identifying the type of the data query operation, generating a corresponding operation instruction according to the determined type of the data query operation, and extracting a Druid engine in a Druid database according to the operation instruction to generate a real-time report; and pushing the real-time report to a front-end platform. The method has the advantages that the full link generation of the time report is realized, the fault tolerance of data is improved, the generation speed of the real-time report is increased from the perspective of the data universe, the data timeliness of the real-time report is improved, and the accuracy of the data is improved.
In some optional implementations of this embodiment, the system 300 further includes:
the backup module is used for transmitting the real-time inventory data to the hive data warehouse through a flash data pipeline for storage and backup;
the query module is used for querying the difference between the real-time inventory data backed up in the hive data warehouse and the real-time inventory data generated by the Kafka service system through the hive tool when a fault occurs during the Flink operation, and acquiring target inventory data;
and the back-supplementing module is used for supplementing the target list data to the Druid database.
In some optional implementations of this embodiment, the backup module includes:
the receiving unit is used for receiving the real-time list data and transmitting one or more pavement guides to the received real-time list data in a Flume mode;
and the storage unit is used for storing the live inventory data value hive data warehouse after the Flume transmission.
In some optional implementations of this embodiment, the processing module 302 includes:
the processing unit is used for extracting the real-time data stream received by the Kafka service system to the temporary intermediate layer through the ETL, and generating real-time list data in a standard format after cleaning, converting and integrating the real-time data in the temporary intermediate layer;
and the transmission unit is used for transmitting the real-time inventory data to the Kafka service system.
In some optional implementations of this embodiment, the pushing module 305 includes:
the setting unit is used for setting an index column, a time column and a dimension column of the real-time report;
and the pushing unit is used for pushing the real-time report to the front-end platform by using the index column or the time column or the logic set of the dimension column.
In some optional implementations of this embodiment, the system 300 further includes:
and the operation module is used for acquiring an index column or a time column or a dimension column of the real-time report when the preset identifier of the front-end platform generates dynamic actions such as supporting, pulling or dragging.
In some optional implementations of this embodiment, the system 300 further includes:
the fault positioning module is used for acquiring prestored list data stored in a hive data warehouse through a hive tool when real-time content generated by the front-end platform according to the real-time report does not accord with target content corresponding to the data query operation, and positioning a target fault position in the drive database by comparing the prestored list data with the real-time list data acquired in the drive database;
the data coverage module is used for extracting target fault data at a target fault position in the drive database and target list data at a position corresponding to the target fault position in the hive data warehouse, replacing the target list data with the target fault data and supplementing the target fault data back to the target fault position in the drive database;
and the updating module is used for updating the pre-stored list data stored in the hive data warehouse when the real-time content generated by the front-end platform according to the real-time report accords with the target content corresponding to the data query operation.
In order to solve the technical problem, an embodiment of the present application further provides a computer device. Referring to fig. 4, fig. 4 is a block diagram of a basic structure of a computer device according to the present embodiment.
The computer device 4 comprises a memory 41, a processor 42, a network interface 43 communicatively connected to each other via a system bus. It is noted that only computer device 4 having components 41-43 is shown in FIG. 4, but it is understood that not all of the shown components are required to be implemented, and that more or fewer components may alternatively be implemented. As will be understood by those skilled in the art, the computer device is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and the hardware includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable gate array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.
The computer device can be a desktop computer, a notebook, a palm computer, a cloud server and other computing devices. The computer equipment can carry out man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch panel or voice control equipment and the like.
The memory 41 includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, etc. In some embodiments, the memory 41 may be an internal storage unit of the computer device 4, such as a hard disk or a memory of the computer device 4. In other embodiments, the memory 41 may also be an external storage device of the computer device 4, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, provided on the computer device 4. Of course, the memory 41 may also include both internal and external storage devices of the computer device 4. In this embodiment, the memory 41 is generally used for storing an operating system and various application software installed on the computer device 4, such as computer readable instructions of a real-time report generation method based on Flink. Further, the memory 41 may also be used to temporarily store various types of data that have been output or are to be output.
The processor 42 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. The processor 42 is typically used to control the overall operation of the computer device 4. In this embodiment, the processor 42 is configured to execute the computer readable instructions or processing data stored in the memory 41, for example, execute the computer readable instructions of the Flink-based real-time report generating method.
The network interface 43 may comprise a wireless network interface or a wired network interface, and the network interface 43 is generally used for establishing communication connection between the computer device 4 and other electronic devices.
In the embodiment, when the real-time data stream is generated by the Flink operation, the real-time data stream is transmitted to the Kafka business system; writing the real-time data stream received by the Kafka service system into a disk structure of the Kafka service system; extracting real-time data streams in the disk structure through the ETL to a temporary intermediate layer for conversion processing, and generating real-time list data; sending the real-time list data to a Druid database; when the data query operation is detected, identifying the type of the data query operation, generating a corresponding operation instruction according to the determined type of the data query operation, and extracting a Druid engine in a Druid database according to the operation instruction to generate a real-time report; and pushing the real-time report to a front-end platform. The method has the advantages that the full link generation of the time report is realized, the fault tolerance of data is improved, the generation speed of the real-time report is increased, the data timeliness of the real-time report is improved, and the accuracy of the data is improved.
The present application further provides another embodiment, which is to provide a computer-readable storage medium, wherein the computer-readable storage medium stores computer-readable instructions, which can be executed by at least one processor, so as to cause the at least one processor to execute the steps of the Flink-based real-time report generation method as described above.
In the embodiment, when the real-time data stream is generated by the Flink operation, the real-time data stream is transmitted to the Kafka business system; writing the real-time data stream received by the Kafka service system into a disk structure of the Kafka service system; extracting real-time data streams in the disk structure through the ETL to a temporary intermediate layer for conversion processing, and generating real-time list data; sending the real-time list data to a Druid database; when the data query operation is detected, identifying the type of the data query operation, generating a corresponding operation instruction according to the determined type of the data query operation, and extracting a Druid engine in a Druid database according to the operation instruction to generate a real-time report; and pushing the real-time report to a front-end platform. The method has the advantages that the full link generation of the time report is realized, the fault tolerance of data is improved, the generation speed of the real-time report is increased from the perspective of the data universe, the data timeliness of the real-time report is improved, and the accuracy of the data is improved.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present application.
It is to be understood that the above-described embodiments are merely illustrative of some, but not restrictive, of the broad invention, and that the appended drawings illustrate preferred embodiments of the invention and do not limit the scope of the invention. This application is capable of embodiments in many different forms and is provided for the purpose of enabling a thorough understanding of the disclosure of the application. Although the present application has been described in detail with reference to the foregoing embodiments, it will be apparent to one skilled in the art that modifications can be made to the embodiments described in the foregoing detailed description, or equivalents can be substituted for some of the features described therein. All equivalent structures made by using the contents of the specification and the drawings of the present application are directly or indirectly applied to other related technical fields and are within the protection scope of the present application.

Claims (10)

1. A real-time report generation method based on Flink is characterized by comprising the following steps:
when the real-time data stream generated by the Flink operation is processed, transmitting the real-time data stream to a Kafka service system;
writing the real-time data stream received by the Kafka service system into a disk structure of the Kafka service system;
extracting real-time data streams in the disk structure through the ETL to a temporary intermediate layer for conversion processing, and generating real-time list data;
sending the real-time list data to a Druid database;
when the data query operation is detected, identifying the type of the data query operation, generating a corresponding operation instruction according to the determined type of the data query operation, and extracting a Druid engine in a Druid database according to the operation instruction to generate a real-time report;
and pushing the real-time report to a front-end platform.
2. The method for generating a Flink-based real-time report according to claim 1, wherein after the step of extracting the real-time data stream in the disk structure through ETL to a temporary middle layer for conversion processing and generating the real-time inventory data, the method further comprises:
transmitting the real-time inventory data to a hive data warehouse through a flux data pipeline for storage and backup;
when a fault occurs during the Flink operation, inquiring that the real-time inventory data backed up in the hive data warehouse is different from the real-time inventory data generated by the Kafka service system through a hive tool, and acquiring target inventory data;
and (5) supplementing the target list data back to the Druid database.
3. The Flink-based real-time report generation method as claimed in claim 2, wherein the step of transmitting the real-time inventory data to the hive data warehouse for storage and backup through a flute data pipeline comprises:
receiving real-time inventory data and transmitting the received real-time inventory data in Flume to one or more pavement guides;
and storing the real-time inventory data value hive data warehouse after the flash transmission.
4. The Flink-based real-time report generation method as recited in claim 3, wherein the step of extracting the real-time data stream in the disk structure through the ETL to a temporary middle layer for conversion processing and generating the real-time inventory data comprises:
extracting a real-time data stream received by the Kafka service system to a temporary intermediate layer through ETL, and generating real-time list data in a standard format after cleaning, converting and integrating the real-time data in the temporary intermediate layer;
and transmitting the real-time inventory data to the Kafka service system.
5. The Flink-based real-time report generation method according to claim 1, wherein said step of pushing the real-time report to the front-end platform comprises:
setting an index column, a time column and a dimension column of the real-time report;
and pushing the real-time report to a front-end platform by using a logical set of an index column, a time column or a dimension column.
6. The Flink-based real-time report generation method according to claim 5, wherein after the step of pushing the real-time report to the front-end platform, the method further comprises:
and when the preset identification of the front-end platform generates dynamic actions such as supporting, pulling or dragging, an index column, a time column or a dimension column of the real-time report is obtained.
7. The Flink-based real-time report generation method according to claim 2, wherein after the step of pushing the real-time report to the front-end platform, the method further comprises:
when the real-time content generated by the front-end platform according to the real-time report does not accord with the target content corresponding to the data query operation, pre-stored list data stored in a hive data warehouse is obtained through a hive tool, and the target fault position in the drive database is located by comparing the pre-stored list data with the real-time list data obtained in the drive database;
extracting target fault data at a target fault position in a drive database and target list data at a position corresponding to the target fault position in a hive data warehouse;
replacing the target list data with target fault data, and supplementing the target fault data back to the target fault position in the Druid database;
and when the real-time content generated by the front-end platform according to the real-time report accords with the target content corresponding to the data query operation, updating the pre-stored list data stored in the hive data warehouse.
8. A real-time report generation system based on Flink is characterized by comprising the following components:
the transmission module is used for transmitting the real-time data stream to the Kafka service system when the real-time data stream is generated by the flight operation;
the processing module is used for writing the real-time data stream received by the Kafka service system into a disk structure of the Kafka service system, extracting the real-time data stream in the disk structure through the ETL, transferring the real-time data stream to a temporary intermediate layer for conversion processing, and generating real-time list data;
the sending module is used for sending the real-time list data to the Druid database;
the extraction module is used for identifying the type of the data query operation when the data query operation is detected, generating a corresponding operation instruction according to the determined type of the data query operation, and extracting a Druid engine in a Druid database according to the operation instruction to generate a real-time report;
and the pushing module is used for pushing the real-time report to the front-end platform.
9. Computer device, characterized in that it comprises a memory in which computer readable instructions are stored and a processor, which when executing said computer readable instructions implements the steps of the Flink based real-time report generation method according to any of the claims 1 to 7.
10. A computer readable storage medium, characterized in that, the computer readable storage medium has stored thereon computer readable instructions, which when executed by a processor, implement the steps of the Flink-based real-time report generation method according to any of the claims 1 to 7.
CN202210873540.5A 2022-07-22 2022-07-22 Real-time report generation method and system based on Flink Pending CN115168472A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210873540.5A CN115168472A (en) 2022-07-22 2022-07-22 Real-time report generation method and system based on Flink

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210873540.5A CN115168472A (en) 2022-07-22 2022-07-22 Real-time report generation method and system based on Flink

Publications (1)

Publication Number Publication Date
CN115168472A true CN115168472A (en) 2022-10-11

Family

ID=83496615

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210873540.5A Pending CN115168472A (en) 2022-07-22 2022-07-22 Real-time report generation method and system based on Flink

Country Status (1)

Country Link
CN (1) CN115168472A (en)

Similar Documents

Publication Publication Date Title
JP6523354B2 (en) State machine builder with improved interface and handling of state independent events
CN107577805B (en) Business service system for log big data analysis
CN112507027B (en) Kafka-based incremental data synchronization method, device, equipment and medium
CN111666490A (en) Information pushing method, device, equipment and storage medium based on kafka
CN113254445B (en) Real-time data storage method, device, computer equipment and storage medium
CN110321544B (en) Method and device for generating information
CN113282611B (en) Method, device, computer equipment and storage medium for synchronizing stream data
CN113254767A (en) Big data searching method and device, computer equipment and storage medium
CN112948486A (en) Batch data synchronization method and system and electronic equipment
CN105320711B (en) Mass data access method and system using the same
CN111797297B (en) Page data processing method and device, computer equipment and storage medium
CN113010542A (en) Service data processing method and device, computer equipment and storage medium
CN113535677A (en) Data analysis query management method and device, computer equipment and storage medium
CN117251228A (en) Function management method, device, computer equipment and storage medium
CN112860662A (en) Data blood relationship establishing method and device, computer equipment and storage medium
CN113836235B (en) Data processing method based on data center and related equipment thereof
CN115730603A (en) Information extraction method, device, equipment and storage medium based on artificial intelligence
CN115238009A (en) Metadata management method, device and equipment based on blood vessel margin analysis and storage medium
CN115168472A (en) Real-time report generation method and system based on Flink
CN114626352A (en) Report automatic generation method and device, computer equipment and storage medium
CN112527880B (en) Method, device, equipment and medium for collecting metadata information of big data cluster
CN116702751A (en) Formula processing method, device, equipment and storage medium based on artificial intelligence
Ikhlaq et al. A comparative study of big data computational approaches
CN114969482A (en) Method for automatically completing request by online configuration interface and related equipment thereof
CN117743291A (en) Data processing method, device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination