CN116150180A - Big data processing method and device based on data base - Google Patents

Big data processing method and device based on data base Download PDF

Info

Publication number
CN116150180A
CN116150180A CN202211714571.2A CN202211714571A CN116150180A CN 116150180 A CN116150180 A CN 116150180A CN 202211714571 A CN202211714571 A CN 202211714571A CN 116150180 A CN116150180 A CN 116150180A
Authority
CN
China
Prior art keywords
data
processing
clickhouse
big data
big
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211714571.2A
Other languages
Chinese (zh)
Inventor
江智
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Lingyunguang Industrial Intelligent Technology Co Ltd
Original Assignee
Suzhou Lingyunguang Industrial Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Lingyunguang Industrial Intelligent Technology Co Ltd filed Critical Suzhou Lingyunguang Industrial Intelligent Technology Co Ltd
Priority to CN202211714571.2A priority Critical patent/CN116150180A/en
Publication of CN116150180A publication Critical patent/CN116150180A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Quality & Reliability (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a big data processing method and device based on a data base, and belongs to the technical field of electronics. The big data processing method based on the data base comprises the following steps: performing at least one of OGG-based processing, kafak-based processing, flink-based processing and Canel-based processing on the original data by using OLAP to generate clickhouse data; processing the clickhouse data based on the etl tool; outputting the processed clickhouse data. According to the big data processing method based on the data base, the OLAP and the etl tool are used for collecting and sorting big data, so that the big data (tb and above) query of the industrial Internet can be supported, the response is quick, the data collecting efficiency can be remarkably improved, the method is suitable for any big data query scene, and the method has high universality and high query effect.

Description

Big data processing method and device based on data base
Technical Field
The application belongs to the technical field of electronics, and particularly relates to a big data processing method and device based on a data base.
Background
Online transactions support data query schemes (OnLine Transaction Processing, OLTP) for handling large amounts of relatively simple transactions, such as insertion, updating and deletion of data, and simple data queries (e.g., balance checks of ATM), etc., allowing multiple users to access the same data while ensuring data integrity. In the related technology, OLTP is mainly used for query, but OLTP systems need to perform regular backup and continuous incremental backup frequently, which greatly affects the data query rate, and cannot support real-time query of the order of tb and above, thereby affecting the query effect and the working efficiency of users.
Disclosure of Invention
The present application aims to solve at least one of the technical problems existing in the prior art. Therefore, the application provides the big data processing method and the device based on the data base, which can support the inquiry of industrial Internet big data (tb and above) and have quick response.
In a first aspect, the present application provides a big data processing method based on a data base, the method including:
performing at least one of OGG-based processing, kafak-based processing, flink-based processing and Canel-based processing on the original data by using OLAP to generate clickhouse data;
processing the clickhouse data based on the etl tool;
outputting the processed clickhouse data.
According to the big data processing method based on the data base, the OLAP and the etl tool are used for collecting and sorting big data, clickhouse data is obtained, inquiry of big data (tb and above) of the industrial Internet can be supported, response is quick, data collecting efficiency can be remarkably improved, the method is suitable for any big data inquiry scene, and the method has high universality and good inquiry effect.
According to one embodiment of the present application, the generating clickhouse data by performing at least one of OGG-based processing, kafak-based processing, flank-based processing, and Canel-based processing on the raw data using OLAP includes:
and under the condition that the original data come from an Oracle database, processing the original data based on OGG, kafak and Flink in sequence, and acquiring the clickhouse data.
According to one embodiment of the present application, the generating clickhouse data by performing at least one of OGG-based processing, kafak-based processing, flank-based processing, and Canel-based processing on the raw data using OLAP includes:
and under the condition that the original data come from a MySQL database, processing the original data based on the Canel, the Kafak and the Flink in sequence, and acquiring the clickhouse data.
According to one embodiment of the present application, the etl-based tool processes the clickhouse data, including:
at least one of collecting, cleaning, and summarizing the clickhouse data based on the etl tool.
According to one embodiment of the present application, the outputting the processed clickhouse data includes:
receiving a first input of a user, wherein the first input is used for inquiring target data;
and responding to the first input, screening the processed clickhouse data to obtain the target data, and outputting the target data.
According to one embodiment of the present application, the outputting the processed clickhouse data includes:
displaying the processed clickhouse data based on a report form;
and/or
Outputting the processed clickhouse data based on the Web form.
In a second aspect, the present application provides a big data processing apparatus based on a data base, the apparatus comprising:
the first processing module is used for performing at least one of processing based on OGG (on the fly, on the Kafak, on the Flink and on the Canel) on the original data by adopting OLAP (on the fly), and generating clickhouse data;
a second processing module for processing the clickhouse data based on an etl tool;
and the third processing module is used for outputting the processed clickhouse data.
According to the big data processing device based on the data base, the OLAP and the etl tool are used for collecting and sorting big data, clickhouse data is obtained, the inquiry of big data (tb and above) of the industrial Internet can be supported, the response is quick, the data collecting efficiency can be remarkably improved, the device is suitable for any big data inquiry scene, and the device has higher universality and higher inquiry effect.
According to one embodiment of the application, the first processing module is configured to:
and under the condition that the original data come from an Oracle database, processing the original data based on OGG, kafak and Flink in sequence, and acquiring the clickhouse data.
According to one embodiment of the application, the first processing module is configured to:
and under the condition that the original data come from a MySQL database, processing the original data based on the Canel, the Kafak and the Flink in sequence, and acquiring the clickhouse data.
According to one embodiment of the application, the second processing module is configured to:
at least one of collecting, cleaning, and summarizing the clickhouse data based on the etl tool.
According to one embodiment of the present application, the apparatus further comprises:
the first receiving module is used for receiving a first input of a user, and the first input is used for inquiring target data;
and the third processing module is used for responding to the first input, screening the processed clickhouse data to obtain the target data, and outputting the target data.
According to an embodiment of the present application, the third processing module is configured to:
displaying the processed clickhouse data based on a report form;
and/or
Outputting the processed clickhouse data based on the Web form.
In a third aspect, the present application provides an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the data base-based big data processing method according to the first aspect described above when the computer program is executed by the processor.
In a fourth aspect, the present application provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a data base based big data processing method as described in the first aspect above.
In a fifth aspect, the present application provides a chip, the chip including a processor and a communication interface, the communication interface being coupled to the processor, the processor being configured to execute a program or instructions to implement the data base-based big data processing method according to the first aspect.
In a sixth aspect, the present application provides a computer program product comprising a computer program which, when executed by a processor, implements a data-base based big data processing method as described in the first aspect above.
The above technical solutions in the embodiments of the present application at least have the following technical effects:
the OLAP and the etl tool are used for acquiring and sorting big data to obtain clickhouse data, so that the clickhouse data can support industrial Internet big data (tb and above) inquiry, the response is quick, the data acquisition efficiency can be remarkably improved, the method is suitable for any big data inquiry scene, and the method has high universality and good inquiry effect.
Additional aspects and advantages of the application will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the application.
Drawings
The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings, wherein:
FIG. 1 is a schematic flow chart of a big data processing method based on a data base according to an embodiment of the present application;
FIG. 2 is a second flow chart of a big data processing method based on a data base according to the embodiment of the present application;
FIG. 3 is a schematic diagram of the results of a large data processing method based on a data base according to an embodiment of the present application;
FIG. 4 is a second schematic diagram of the result of the big data processing method based on the data base according to the embodiment of the present application;
FIG. 5 is a schematic diagram of a big data processing device based on a data base according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application;
fig. 7 is a schematic hardware diagram of an electronic device according to an embodiment of the present application.
Detailed Description
Technical solutions in the embodiments of the present application will be clearly described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application are within the scope of the protection of the present application.
The terms first, second and the like in the description and in the claims, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged, as appropriate, such that embodiments of the present application may be implemented in sequences other than those illustrated or described herein, and that the objects identified by "first," "second," etc. are generally of a type and not limited to the number of objects, e.g., the first object may be one or more. Furthermore, in the description and claims, "and/or" means at least one of the connected objects, and the character "/", generally means that the associated object is an "or" relationship.
The data base-based big data processing method, the data base-based big data processing device, the electronic equipment and the readable storage medium provided by the embodiment of the application are described in detail below by means of specific embodiments and application scenes with reference to the accompanying drawings.
The big data processing method based on the data base can be applied to the terminal, and can be executed by hardware or software in the terminal.
The execution main body of the big data processing method based on the data base may be an electronic device or a functional module or a functional entity in the electronic device capable of implementing the big data processing method based on the data base, and the electronic device in the embodiment of the present application includes, but is not limited to, a mobile phone, a tablet computer, a camera, a wearable device, and the like.
As shown in fig. 1, the big data processing method based on the data base includes: step 110, step 120 and step 130.
Step 110, performing at least one of OGG-based processing, kafak-based processing, flink-based processing and Canel-based processing on the original data by using OLAP to generate clickhouse data;
in this step, online analytical processing (OnLine Analytical Processing, OLAP) is a method that uses multidimensional databases to support rapid reporting, enabling analysts to quickly, consistently, interactively view information from various aspects for purposes of in-depth understanding of the data.
Online analysis processing OLAP has the feature of sharing fast analysis of multidimensional information ((Fast Analysis of Shared Multidimensional Information, fasi), capable of supporting Business Intelligence (BI), data mining, and other decision support applications.
Wherein, F characterizes rapidity (Fast), which means that the system can react to most analysis requirements of users in a few seconds;
a characterizes the analyzability (Analysis), meaning that the user can define new specialized calculations without programming, take them as part of the Analysis, and report in the manner desired by the user;
m characterizes Multi-dimensionality (M), which refers to providing a Multi-dimensional view and analysis of data analysis;
i characterizes informativity (Information), which means that Information can be obtained in time and large-capacity Information is managed.
OGG, oracle GoldenGate, is a tool for solving data replication in heterogeneous data environments, and can directly analyze the redox log of the source Oracle, thereby completing migration of the data increment part without too much adjustment on the original table structure.
Kafak is a high-throughput distributed publish-subscribe message system and has the characteristics of high throughput, low delay and the like.
The Flink is a distributed system, an open source stream processing framework is created for an accurate stream processing application program, and the Flink has the characteristics of being distributed, high in performance and available at any time.
The Canel is used for data synchronization and can provide incremental data subscription and consumption functions based on MySQL database incremental log analysis.
The clickhouse data is data stored in a clickhouse database, the clickhouse database is an OLAP open source column type storage database, a large data volume query can be responded in millisecond level, 7 hundred million data aggregation queries can query results in 5 seconds, and a single data query can achieve second level response in 20 hundred million data volumes.
In this step, clickhouse data is generated by performing at least one of OGG-based processing, kafak-based processing, flank-based processing, and Canel-based processing on the raw data.
It should be noted that, in some embodiments, step 110 may include: based on the original data of different database sources, different data acquisition modes are adopted.
Wherein, different databases include an Oracle database and a MySQL database.
The implementation of step 110 is described below in two terms, respectively.
As shown in fig. 2, in some embodiments, step 110 may include:
under the condition that the original data come from an Oracle database, the original data are processed sequentially based on the OGG technology, the Kafak technology and the Flink technology, and clickhouse data are obtained.
In this embodiment, oracle is an efficient, reliable, high throughput-compliant database scheme.
The Oracle database is a relational database management system; has the advantages of portability, convenient use, strong function and the like, and is suitable for various large, medium and small microcomputer environments.
In the actual execution process, with continued reference to fig. 2, the data collected by the OGG may be sent to a Kafak queue, and submitted to the flank stream processing, and updated in batches to clickhouse, so as to generate clickhouse data.
With continued reference to fig. 2, in some embodiments, step 110 may further include:
and under the condition that the original data come from the MySQL database, sequentially processing the original data based on a Canel technology, a Kafak technology and a Flink technology to acquire clickhouse data.
In this embodiment, continuing with the above embodiment as an example, for MySQL databases, a Canel snoop MySQL bilog may be employed, translating into an executable sql to insert into clickhouse to generate clickhouse data.
It can be understood that in the application, by adopting the Canel open source tool to monitor the MySQL database, and by monitoring the binlog of the MySQL, the method converts the binlog into executable sql to be inserted into the clickhouse so as to generate clickhouse data, the limitation of the MySQL database can be effectively solved, the application scene is widened, and the method has higher universality.
Step 120, processing clickhouse data based on the etl tool;
in this step, etl (Extract-Transform-Load) is the process of data extraction, conversion and loading. etl tools are tools for data extraction, conversion and loading.
The etl tools may include kettle, dataPipeline, talend, informaticaPowerCenter and DataX tools, among others.
The ketle (kettle) is a source opening tool and is used for providing rich components, can complete a complex calculation flow, can run on Windows, linux and Unix, and is efficient and stable in data extraction.
Kettle may further include: SPOON, PAN, CHEF and KITCHEN.
Wherein the SPOON supports the design of ETL transformations (transformations) through a graphical interface.
The PAN supports batch runs of ETL transitions designed by SPOON (e.g., using a time scheduler).
PAN is a program executed in the background without a graphical interface.
CHEF supports creation tasks (Job).
Wherein tasks facilitate complex tasks of automatically updating a data warehouse by allowing each transition, task, script, etc.
Kitch supports batch use of tasks designed by CHEF (e.g., using a time scheduler).
In actual execution, with continued reference to FIG. 2, statistics on multi-dimensional data summaries may be submitted to the etl tool keyole to process clickhouse data, obtaining processed clickhouse data.
Of course, in other embodiments, other etl tools may be used to aggregate multi-dimensional data, and may be selected based on actual requirements, which is not limited in this application.
In some embodiments, step 120 may further comprise: at least one of collecting, cleaning, and summarizing clickhouse data based on the etl tool.
In this embodiment, the etl tool may be used to collect, purge and aggregate data from various data sources.
In this application, clickhouse implemented using OLAP technology is used as a persistence layer. By adopting the characteristics of the columnar storage and vector engine based on clickhouse, a faster query rate can be realized for data above tb.
Step 130, outputting the processed clickhouse data.
In this step, the presentation layer satisfies the complexity and response time to the data query by querying clickhouses.
Fig. 3 and 4 illustrate two query schematics, wherein fig. 3 illustrates a simple query and fig. 4 illustrates a complex query.
The inventor obtains through many experiments, through using clickhouse, can respond to the inquiry of big data volume in millisecond level, can inquire out the result in 5 seconds to 7 hundred million data aggregate inquiry, single data inquiry is responded in the second level in 20 hundred million data volume, has improved the inquiry rate of big data obviously.
In some embodiments, step 130 may include:
receiving a first input of a user, wherein the first input is used for inquiring target data;
and responding to the first input, screening out target data from the processed clickhouse data, and outputting the target data.
In this embodiment, the first input is for querying the target data.
Wherein the first input may be at least one of:
first, the first input may be a touch operation including, but not limited to, a click operation, a slide operation, a press operation, and the like.
In this embodiment, the receiving the first input of the user may be receiving a touch operation of the user in a display area of the terminal display screen.
In order to reduce the user's rate of misoperation, the active area of the first input may be defined within a specific area, such as the upper middle area of the user input interface; or displaying a target control on the current interface in a state of displaying input information to generate a target data service interface, and touching the target control to realize first input; or the first input is set as a continuous multi-tap operation of the display area within a target time interval.
Second, the first input may be a physical key input.
In this embodiment, the body of the terminal is provided with a corresponding physical key (such as a mouse or a keyboard) to receive the first input of the user, which may be that the user presses the corresponding physical key; the first input may also be a combined operation of simultaneously pressing a plurality of physical keys.
Third, the first input may be a voice input.
In this embodiment, the terminal may trigger a corresponding operation upon receiving a voice such as "query XX".
Of course, in other embodiments, the first input may also be in other forms, including but not limited to character input, etc., which may be specifically determined according to actual needs, which is not limited in this embodiment of the present application.
In the actual execution process, the first input can be realized by inputting the query instruction by the user, the terminal responds to the first input and carries out related query from the clickhouse database so as to output target data required by the user, the response time is short, and the query rate is high.
In some embodiments, step 130 may include:
displaying the processed clickhouse data based on a report form;
and/or
The processed clickhouse data is output based on the Web form.
In this embodiment, with continued reference to fig. 2, the query data may be output in the form of a report or Web, etc. during actual execution.
Of course, in other embodiments, the output may also be represented by a voice output or any other output form that can be implemented, and specifically may be set based on actual needs, which is not limited in this application.
The inventor has found during research and development that online transaction processing is mainly used to support real-time query by using a data query scheme (OLTP), and OLTP is mainly used for processing a large number of relatively simple transactions, such as insertion, update and deletion of data, and simple data query (e.g., balance check of ATM) and the like.
OLTP allows multiple users to access the same data while ensuring data integrity. Wherein the OLTP system relies on concurrency algorithms to ensure that no two users can change the same data at the same time, and that all transactions are performed in the correct order, thereby preventing users from repeatedly subscribing to the same room using the online subscription system, and protecting the holder of a commonly held bank account from accidental overdraft.
OLTP emphasizes very fast processing, with response times in milliseconds. Specifically, the effectiveness of an OLTP system is measured by the total number of transactions that can be performed per second.
OLTP may also be used to provide index data sets for quick searching, retrieval, and querying.
In addition, OLTP systems are also used to handle large numbers of concurrent transactions, and complete data backups must be available at any time, as any data loss or downtime can have significant and costly consequences; this results in frequent periodic backups and continuous incremental backups when using OLTP systems, greatly affecting data query rates, and not being suitable for real-time queries on the order of tb and above.
The inventor also finds that the problems faced in the industrial Internet can reach tb and even pb in the process of research and development, the data are basically not updated or deleted, and the application occasions mainly comprise statistics report forms, data analysis and production optimization operation; most users are several or tens, and the concurrency is low; in addition, the data sources are more, the data persistence forms are numerous, and the data structure is greatly different.
In the application, the OLAP and the etl tool are used for acquiring and sorting big data to obtain clickhouse data, and the clickhouse is used for supporting tb and even pb-level data query, so that the response speed is high; the etl tool is used for collecting data of each data source, analyzing heterogeneous data into isomorphic data, summarizing the isomorphic data into a clickhouse view, and providing the clickhouse view for an application layer to use, so that the method can support industrial Internet big data (tb and above) query, is quick in response, can remarkably improve the data collecting efficiency, is suitable for any big data query scene, and has higher universality and better query effect; therefore, the technical problem that the conventional OLTP database cannot support the query efficiency of a large amount of data is solved.
In addition, the method can also support distributed deployment and various table engines, has wide application scenes and is easy to realize.
According to the big data processing method based on the data base, the OLAP and the etl tool are used for collecting and sorting big data, clickhouse data is obtained, industrial Internet big data (tb and above) inquiry can be supported, response is quick, efficiency of collecting data and inquiring data can be remarkably improved, the method is suitable for any big data inquiry scene, and high universality and high inquiry effect are achieved.
According to the big data processing method based on the data base, the execution main body can be a big data processing device based on the data base. In the embodiment of the present application, a big data processing device based on a data base is taken as an example to execute a big data processing method based on a data base.
The embodiment of the application also provides a big data processing device based on the data base.
As shown in fig. 5, the big data processing apparatus based on the data base includes: a first processing module 510, a second processing module 520, and a third processing module 530.
A first processing module 510, configured to perform OGG-based processing, kafak-based processing, flank-based processing, and Canel-based processing on the raw data by using OLAP, and generate clickhouse data;
a second processing module 520 for processing clickhouse data based on the etl tool;
a third processing module 530, configured to output the processed clickhouse data.
According to the big data processing device based on the data base, the OLAP and the etl tool are used for collecting and sorting big data, clickhouse data are obtained, industrial Internet big data (tb and above) inquiry can be supported, response is quick, data collecting efficiency can be remarkably improved, the device is suitable for any big data inquiry scene, and the device has high universality and high inquiry effect.
In some embodiments, the first processing module 510 may also be configured to:
under the condition that the original data come from an Oracle database, the original data are processed sequentially based on OGG, kafak and Flink, and clickhouse data are acquired.
In some embodiments, the first processing module 510 may also be configured to:
and under the condition that the original data come from the MySQL database, sequentially processing the original data based on the Canel, the Kafak and the Flink to acquire clickhouse data.
In some embodiments, the second processing module 520 may also be configured to:
at least one of collecting, cleaning, and summarizing clickhouse data based on the etl tool.
In some embodiments, the apparatus may further comprise:
the first receiving module is used for receiving a first input of a user, wherein the first input is used for inquiring target data;
the third processing module 530 may be further configured to filter, in response to the first input, target data from the processed clickhouse data, and output the target data.
In some embodiments, the third processing module 530 may also be configured to:
displaying the processed clickhouse data based on a report form;
and/or
The processed clickhouse data is output based on the Web form.
The big data processing device based on the data base in the embodiment of the application can be an electronic device, and also can be a component in the electronic device, such as an integrated circuit or a chip. The electronic device may be a terminal, or may be other devices than a terminal. By way of example, the electronic device may be a mobile phone, tablet computer, notebook computer, palm computer, vehicle-mounted electronic device, mobile internet appliance (Mobile Internet Device, MID), augmented reality (augmented reality, AR)/Virtual Reality (VR) device, robot, wearable device, ultra-mobile personal computer, UMPC, netbook or personal digital assistant (personal digital assistant, PDA), etc., but may also be a server, network attached storage (Network Attached Storage, NAS), personal computer (personal computer, PC), television (TV), teller machine or self-service machine, etc., and the embodiments of the present application are not limited in particular.
The big data processing device based on the data base in the embodiment of the application may be a device with an operating system. The operating system may be an Android operating system, an IOS operating system, or other possible operating systems, which is not specifically limited in the embodiments of the present application.
The big data processing device based on the data base provided in the embodiment of the present application can implement each process implemented by the method embodiments of fig. 1 to 3, and in order to avoid repetition, a detailed description is omitted here.
In some embodiments, as shown in fig. 6, the embodiment of the present application further provides an electronic device 600, including a processor 601, a memory 602, and a computer program stored in the memory 602 and capable of running on the processor 601, where the program when executed by the processor 601 implements the respective processes of the foregoing embodiment of the big data processing method based on a data base, and the same technical effects can be achieved, so that repetition is avoided and redundant description is omitted herein.
The electronic device in the embodiment of the application includes the mobile electronic device and the non-mobile electronic device described above.
Fig. 7 is a schematic hardware structure of an electronic device implementing an embodiment of the present application.
The electronic device 700 includes, but is not limited to: radio frequency unit 701, network module 702, audio output unit 703, input unit 704, sensor 705, display unit 706, user input unit 707, interface unit 708, memory 709, and processor 710.
Those skilled in the art will appreciate that the electronic device 700 may also include a power source (e.g., a battery) for powering the various components, which may be logically connected to the processor 710 via a power management system so as to perform functions such as managing charge, discharge, and power consumption via the power management system. The electronic device structure shown in fig. 7 does not constitute a limitation of the electronic device, and the electronic device may include more or less components than shown, or may combine certain components, or may be arranged in different components, which are not described in detail herein.
Wherein the processor 710 is configured to:
performing at least one of OGG-based processing, kafak-based processing, flink-based processing and Canel-based processing on the original data by using OLAP to generate clickhouse data;
processing clickhouse data based on the etl tool;
outputting the processed clickhouse data.
According to the electronic equipment provided by the embodiment of the application, the OLAP and the etl tool are used for acquiring and sorting the big data, so that click house data is obtained, the inquiry of big data (tb and more) of the industrial Internet can be supported, the response is quick, the data acquisition efficiency can be remarkably improved, the electronic equipment is suitable for any big data inquiry scene, and the electronic equipment has higher universality and higher inquiry effect.
In some embodiments, processor 710 may also be configured to:
under the condition that the original data come from an Oracle database, the original data are processed sequentially based on OGG, kafak and Flink, and clickhouse data are acquired.
In some embodiments, processor 710 may also be configured to: and under the condition that the original data come from the MySQL database, sequentially processing the original data based on the Canel, the Kafak and the Flink to acquire clickhouse data.
In some embodiments, processor 710 may also be configured to:
at least one of collecting, cleaning, and summarizing clickhouse data based on the etl tool.
In some embodiments of the present invention, in some embodiments,
a user input unit 707 operable to receive a first input from a user that is used to query the target data;
processor 710, may also be configured to: and responding to the first input, screening out target data from the processed clickhouse data, and outputting the target data.
In some embodiments, processor 710 may also be configured to:
displaying the processed clickhouse data based on a report form;
and/or
The processed clickhouse data is output based on the Web form.
It should be appreciated that in embodiments of the present application, the input unit 704 may include a graphics processor (Graphics Processing Unit, GPU) 7041 and a microphone 7042, with the graphics processor 7041 processing image data of still pictures or video obtained by an image capturing device (e.g., a camera) in a video capturing mode or an image capturing mode. The display unit 706 may include a display panel 7071, and the display panel 7071 may be configured in the form of a liquid crystal display, an organic light emitting diode, or the like. The user input unit 707 includes at least one of a touch panel 7071 and other input devices 7072. The touch panel 7071 is also referred to as a touch screen. The touch panel 7071 may include two parts, a touch detection device and a touch controller. Other input devices 7072 may include, but are not limited to, a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, a mouse, a joystick, and so forth, which are not described in detail herein.
The memory 709 may be used to store software programs as well as various data. The memory 709 may mainly include a first storage area storing programs or instructions and a second storage area storing data, wherein the first storage area may store an operating system, application programs or instructions (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like. Further, the memory 709 may include volatile memory or nonvolatile memory, or the memory 709 may include both volatile and nonvolatile memory. The nonvolatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable EPROM (EEPROM), or a flash Memory. The volatile memory may be random access memory (Random Access Memory, RAM), static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (ddr SDRAM), enhanced SDRAM (Enhanced SDRAM), synchronous DRAM (SLDRAM), and Direct RAM (DRRAM). Memory 709 in embodiments of the present application includes, but is not limited to, these and any other suitable types of memory.
Processor 710 may include one or more processing units; the processor 710 integrates an application processor that primarily processes operations involving an operating system, user interface, application programs, etc., and a modem processor that primarily processes wireless communication signals, such as a baseband processor. It will be appreciated that the modem processor described above may not be integrated into the processor 710.
The embodiment of the present application further provides a non-transitory computer readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements each process of the foregoing embodiment of the big data processing method based on the data base, and can achieve the same technical effect, so that repetition is avoided, and details are not repeated here.
Wherein the processor is a processor in the electronic device described in the above embodiment. The readable storage medium includes computer readable storage medium such as computer readable memory ROM, random access memory RAM, magnetic or optical disk, etc.
The embodiment of the application also provides a computer program product, which comprises a computer program, wherein the computer program is executed by a processor to realize the big data processing method based on the data base.
Wherein the processor is a processor in the electronic device described in the above embodiment. The readable storage medium includes computer readable storage medium such as computer readable memory ROM, random access memory RAM, magnetic or optical disk, etc.
The embodiment of the application further provides a chip, which includes a processor and processes of the big data processing method embodiment based on the data base, and can achieve the same technical effects, so that repetition is avoided, and the description is omitted here.
It should be understood that the chips referred to in the embodiments of the present application may also be referred to as system-on-chip chips, chip systems, or system-on-chip chips, etc.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element. Furthermore, it should be noted that the scope of the methods and apparatus in the embodiments of the present application is not limited to performing the functions in the order shown or discussed, but may also include performing the functions in a substantially simultaneous manner or in an opposite order depending on the functions involved, e.g., the described methods may be performed in an order different from that described, and various steps may also be added, omitted, or combined. Additionally, features described with reference to certain examples may be combined in other examples.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solutions of the present application may be embodied essentially or in a part contributing to the prior art in the form of a computer software product stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk), comprising several instructions for causing a terminal (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the methods described in the embodiments of the present application.
The examples of the present application have been described above with reference to the accompanying drawings, but the present application is not limited to the above-described embodiment 5, which is merely illustrative and not restrictive, and those of ordinary skill in the art
Many forms of the teaching of the present application can be made without departing from the spirit of the application and the scope of the claims.
In the description of the present specification, reference is made to the terms "one embodiment," "some embodiments," "illustrative embodiments," and the like,
The description of "an example," "a particular example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. In the present specification, the above-mentioned
The schematic representations of the terms do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
While embodiments of the present application have been shown and described, it will be understood by those of ordinary skill in the art that: in no take-off
Numerous variations, modifications, substitutions and changes may be made to the embodiments without departing from the principles and spirit of the application, the scope of which is defined in the claims and their equivalents.

Claims (10)

1. A big data processing method based on a data base, comprising:
performing at least one of OGG-based processing, kafak-based processing, flink-based processing and Canel-based processing on the original data by using OLAP to generate clickhouse data;
processing the clickhouse data based on the etl tool;
outputting the processed clickhouse data.
2. The big data processing method based on the data base according to claim 1, wherein the generating clickhouse data by performing at least one of OGG-based processing, kafak-based processing, flank-based processing, and Canel-based processing on the original data using OLAP includes:
and under the condition that the original data come from an Oracle database, processing the original data based on OGG, kafak and Flink in sequence, and acquiring the clickhouse data.
3. The big data processing method based on the data base according to claim 1, wherein the generating clickhouse data by performing at least one of OGG-based processing, kafak-based processing, flank-based processing, and Canel-based processing on the original data using OLAP includes:
and under the condition that the original data come from a MySQL database, processing the original data based on the Canel, the Kafak and the Flink in sequence, and acquiring the clickhouse data.
4. A data base based big data processing method according to any of the claims 1-3, wherein said etl based tool processing said clickhouse data comprises:
at least one of collecting, cleaning, and summarizing the clickhouse data based on the etl tool.
5. A data base based big data processing method according to any of the claims 1-3, characterized in that said outputting the processed clickhouse data comprises:
receiving a first input of a user, wherein the first input is used for inquiring target data;
and responding to the first input, screening the processed clickhouse data to obtain the target data, and outputting the target data.
6. A data base based big data processing method according to any of the claims 1-3, characterized in that said outputting the processed clickhouse data comprises:
displaying the processed clickhouse data based on a report form;
and/or
Outputting the processed clickhouse data based on the Web form.
7. A big data processing apparatus based on a data base, comprising:
the first processing module is used for performing at least one of processing based on OGG (on the fly, on the Kafak, on the Flink and on the Canel) on the original data by adopting OLAP (on the fly), and generating clickhouse data;
a second processing module for processing the clickhouse data based on an etl tool;
and the third processing module is used for outputting the processed clickhouse data.
8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the data base based big data processing method according to any of claims 1-6 when executing the program.
9. A non-transitory computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the data base based big data processing method according to any of claims 1-6.
10. A computer program product comprising a computer program which, when executed by a processor, implements a data base based big data processing method as claimed in any of claims 1-6.
CN202211714571.2A 2022-12-29 2022-12-29 Big data processing method and device based on data base Pending CN116150180A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211714571.2A CN116150180A (en) 2022-12-29 2022-12-29 Big data processing method and device based on data base

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211714571.2A CN116150180A (en) 2022-12-29 2022-12-29 Big data processing method and device based on data base

Publications (1)

Publication Number Publication Date
CN116150180A true CN116150180A (en) 2023-05-23

Family

ID=86357531

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211714571.2A Pending CN116150180A (en) 2022-12-29 2022-12-29 Big data processing method and device based on data base

Country Status (1)

Country Link
CN (1) CN116150180A (en)

Similar Documents

Publication Publication Date Title
US11816316B2 (en) Event identification based on cells associated with aggregated metrics
US11797168B1 (en) Binning information associated with ranges of time
JP5815563B2 (en) Method and system for e-commerce transaction data accounting
US9916367B2 (en) Processing system search requests from multiple data stores with overlapping data
US20190073409A1 (en) Search result replication management in a search head cluster
US20200167311A1 (en) Reproducing datasets generated by alert-triggering search queries
US20190213206A1 (en) Systems and methods for providing dynamic indexer discovery
EP3709127A1 (en) Novel olap precomputation model and precomputation result generation method
US20070226209A1 (en) Methods and Apparatus for Clustering Evolving Data Streams Through Online and Offline Components
WO2021057383A1 (en) Log query method, apparatus, device, and computer-readable storage medium
CN113010484A (en) Log file management method and device
CN111444158A (en) Long-short term user portrait generation method, device, equipment and readable storage medium
US9824081B2 (en) Manipulating spreadsheet data using a data flow graph
US10311035B2 (en) Direct cube filtering
Frank Application of DBMS to land information systems
US20210081451A1 (en) Persisted queries and batch streaming
US9727550B2 (en) Presenting a selected table of data as a spreadsheet and transforming the data using a data flow graph
US20170091833A1 (en) Graphical rule editor
CN112527620A (en) Database performance analysis method and device, electronic equipment, medium and product
CN116150180A (en) Big data processing method and device based on data base
US20150134660A1 (en) Data clustering system and method
US10002120B2 (en) Computer implemented systems and methods for data usage monitoring
CN109376171B (en) Data query method and device, computer storage medium and server
CN111399838A (en) Data modeling method and device based on spark SQ L and materialized view
Ordonez-Ante et al. Interactive querying and data visualization for abuse detection in social network sites

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination