CN111522918A

CN111522918A - Data aggregation method and device, electronic equipment and computer readable storage medium

Info

Publication number: CN111522918A
Application number: CN202010331096.5A
Authority: CN
Inventors: �田�浩; 韩东
Original assignee: Tianjin Yiweike Information Technology Co ltd
Current assignee: Tianjin Yiweike Information Technology Co ltd
Priority date: 2020-04-24
Filing date: 2020-04-24
Publication date: 2020-08-11

Abstract

The application provides a data aggregation method, a data aggregation device, electronic equipment and a computer readable storage medium, which are applied to the technical field of data warehouses, wherein when a data source is monitored to change, first target data are extracted according to a preset rule, data in a dimension table stored by a full text search engine are updated, in other words, the changed data are only aggregated, the data are prevented from being aggregated integrally, the data processing amount is reduced, the data can be aggregated in real time, the efficiency of acquiring the latest data is improved, further, report generation or BI analysis can be performed according to the latest data, and the information hysteresis of report generation or BI analysis is avoided.

Description

Data aggregation method and device, electronic equipment and computer readable storage medium

Technical Field

The present application relates to the field of data warehouse technologies, and in particular, to a data aggregation method, an apparatus, an electronic device, and a computer-readable storage medium.

Background

As enterprise traffic grows, it has become difficult for a single database to store the explosion of data as traffic grows, and enterprises typically store vast amounts of business data through distributed database clusters. Based on the service requirement, service managers often need to gather data scattered in different databases, and then make or analyze reports based on the gathered data.

According to the current data aggregation method, service management personnel can only generate reports or perform BI analysis based on the data stored in history, or firstly perform overall aggregation of the data of each database, and then generate reports or perform BI analysis. However, according to the current data aggregation method, a report is generated or BI analysis is performed based on the data stored in history, and the report cannot be generated or BI analysis cannot be performed based on the latest data, so that the problems of analysis lag and poor accuracy exist; the data of each database are integrally gathered, and then a report form is generated or BI analysis is performed, so that the integral gathering of the data of each database is time-consuming, and the problem of low efficiency exists.

Disclosure of Invention

The application provides a data aggregation method, a data aggregation device, electronic equipment and a computer-readable storage medium, which can aggregate changed data in real time, improve the efficiency of acquiring latest data, and avoid information hysteresis of analysis based on historical data, and the technical scheme adopted by the application is as follows:

in a first aspect, a data aggregation method is provided, the method comprising,

when monitoring that the data source of the target database changes, extracting first target data according to a preset rule;

and updating the data in the dimension table stored by the full-text search engine based on the extracted first target data.

Specifically, when it is monitored that a data source of the target database changes, extracting first target data according to a predetermined rule includes:

determining a dimension model corresponding to each changed data based on a routing rule, wherein the dimension model comprises master table information, slave table information, master key information and associated parameter information;

and acquiring the associated parameters of the corresponding master table and/or slave table from the changed data in the data source respectively based on the master key information of all the corresponding dimension models to serve as first target data.

Specifically, updating data in a dimension table stored by a full-text search engine based on the extracted first target data comprises the following steps:

and respectively updating corresponding data in the dimension table on the full-text search engine based on the type of the data corresponding to the first target data, wherein the type comprises master table data and slave table data.

Further, the method comprises:

extracting second target data from the target database based on all the dimension models;

and generating a dimension table based on the second target data and storing the dimension table to the full-text search engine.

Further, the method further comprises: receiving a report generation request and/or a BI analysis request;

generating a report based on the updated dimension table and/or performing BI analysis.

Wherein the database comprises a relational database and/or a non-relational database.

Wherein, the full-text search engine is a Solr full-text search engine or an elastic search full-text search engine.

In a second aspect, there is provided a data aggregation device, comprising,

the monitoring module is used for extracting first target data according to a preset rule when monitoring that a data source of the target database changes;

and the updating module is used for updating the data in the dimension table stored by the full-text search engine based on the extracted first target data.

Specifically, the monitoring module includes:

the determining unit is used for determining a dimension model corresponding to each changed data based on the routing rule, wherein the dimension model comprises master table information, slave table information, master key information and associated parameter information;

and the acquisition unit is used for acquiring the association parameters of the corresponding master table and/or slave table from the changed data in the data source as first target data based on the master key information of all the corresponding dimension models.

Specifically, the updating module is specifically configured to update corresponding data in the dimension table on the full-text search engine based on a type of data corresponding to the first target data, where the type includes master table data and slave table data.

Further, the apparatus further comprises:

the extraction module is used for extracting second target data from the target database based on all the dimension models;

and the storage module is used for generating a dimension table based on the second target data and storing the dimension table to the full-text search engine.

Further, the apparatus further comprises:

the device also includes:

the receiving module is used for receiving a report generation request and/or a BI analysis request;

and the analysis module is used for generating a report and/or performing BI analysis based on the updated dimension table.

The full-text search engine is a Solr full-text search engine or an elastic search full-text search engine.

In a third aspect, an electronic device is provided, which includes:

one or more processors;

a memory;

one or more application programs, wherein the one or more application programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to: the data aggregation method shown in the first aspect is performed.

In a fourth aspect, a computer-readable storage medium is provided, which is used for storing computer instructions, which when run on a computer, make the computer perform the data aggregation method shown in the first aspect.

Compared with the prior art that the latest data acquisition is time-consuming and low in efficiency, the method and the device for data aggregation, the electronic equipment and the computer-readable storage medium extract the first target data according to the preset rule when the change of the data source of the target database is monitored, and update the data in the dimension table stored by the full-text search engine based on the extracted first target data. That is, when the change of the data source is monitored, the first target data is extracted according to the preset rule, and the data in the dimension table stored by the full-text search engine is updated, in other words, only the changed data is converged, so that the data is prevented from being converged integrally, the data processing amount is reduced, the real-time convergence of the data can be realized, the efficiency of obtaining the latest data is improved, further, report generation or BI analysis can be performed according to the latest data, and the information hysteresis of report generation or BI analysis is avoided.

Additional aspects and advantages of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.

Drawings

The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

fig. 1 is a schematic flowchart of a data aggregation method according to an embodiment of the present application;

fig. 2 is a schematic structural diagram of a data aggregation device according to an embodiment of the present application;

FIG. 3 is a schematic structural diagram of another data aggregation apparatus according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

Reference will now be made in detail to the embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present application.

As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

The following describes the technical solutions of the present application and how to solve the above technical problems with specific embodiments. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.

An embodiment of the present application provides a data aggregation method, as shown in fig. 1, the method may include the following steps:

step S101, when monitoring that a data source of a target database changes, extracting first target data according to a preset rule;

specifically, when the change of a data source in a target database is monitored through a corresponding monitoring method, first target data are extracted from the changed data according to a preset rule; the monitoring can be realized by monitoring database logs, namely, data of generating insert, update, delete and other operations are identified by monitoring and analyzing binlog of the database; the target database may be one database or a plurality of databases, wherein the plurality of databases may be different types of databases, and the monitoring rules of the databases may be different; the first target data may be all data that has changed, or may be partial data or partial parameters of the data that has changed, that is, only partial data of the data that has changed is extracted.

And step S102, updating the data in the dimension table stored by the full-text search engine based on the extracted first target data.

Specifically, the full-text search engine stores historical data of corresponding databases; wherein the respective databases may be distributed deployed; the full text search engine is a mainstream search engine widely applied at present, and the working principle of the full text search engine is that a computer indexing program establishes an index for each word by scanning each word in an article to indicate the occurrence frequency and position of the word in the article, when a user inquires, the retrieval program searches according to the established index in advance and feeds back the searched result to the retrieval mode of the user;

specifically, updating is performed based on the relationship between the first target data and the data in the dimension table, and if the first target data replaces a certain attribute parameter of the historical data, the corresponding data in the dimension table may be updated according to the mapping relationship between the corresponding first target data and the corresponding data in the dimension table; if the first target data is added with some data on the basis of the history data, the first target data can be added in the dimension table.

Compared with the prior art that the time is very long and the efficiency is low when the latest data are acquired, the method for acquiring the data in the dimension table stored by the full-text search engine is characterized in that the first target data are extracted according to the preset rule when the change of the data source of the target database is monitored, and the data in the dimension table stored by the full-text search engine are updated based on the extracted first target data. That is, according to the embodiment of the present application, when it is monitored that a data source changes, the first target data is extracted according to a predetermined rule, and data in a dimension table stored in a full-text search engine is updated, in other words, only the changed data is aggregated, aggregation of the entire data is avoided, data processing amount is reduced, and real-time aggregation of the data can be achieved, so that efficiency of obtaining the latest data is improved, further, report generation or BI analysis can be performed according to the latest data, and information hysteresis of report generation or BI analysis is avoided.

The embodiment of the present application provides a possible implementation manner, and specifically, step S101 includes:

step S1011 (not shown in the figure), determining a dimension model corresponding to each changed data based on the routing rule, where the dimension model includes master table information, slave table information, master key information, and associated parameter information;

specifically, when it is monitored that a data source of the target database changes, determining, by using a corresponding routing rule, dimension models corresponding to changed data, for example, the changed data are data a, data B, and data C, and the corresponding dimension models are a dimension model a, a dimension model B, and a dimension model C, where each dimension model defines information including, but not limited to, the following information: the data information of the source data which is extracted to change is different, and the main table information, the sub table information, the main key information and the associated parameter information which are defined by each dimension model can be different; the master table is only one, the slave tables can be multiple, both the slave tables and the master table contain master key information, a relationship between the master table and the slave tables can be established through the master key, the association parameters define data extracted from the master table or the slave tables, illustratively, the master key information can be identity number information, and when the information of a certain client is monitored to be changed, corresponding information can be obtained from the master table and/or the slave tables through the master key information.

Step S1012 (not shown in the figure), obtaining the associated parameters of the corresponding master table and/or slave table from the changed data in the data source based on the primary key information of all the corresponding dimension models, respectively, as the first target data.

Specifically, the changed data may be data in a master table and/or a slave table, and the associated parameters in the corresponding master table and/or slave table may be acquired as the first target data through the master key information.

With the embodiment of the application, the problem of how to extract the first target data according to the predetermined rule is solved, that is, the problem of how to obtain the first target data when the data source in the database is monitored to be changed is solved.

The embodiment of the present application provides a possible implementation manner, and specifically, step S102 includes:

step S1021 (not shown in the figure), based on the type of the data corresponding to the first target data, respectively update the corresponding data in the dimension table on the full-text search engine, where the type includes the master table data and the slave table data.

Specifically, according to different data types corresponding to the first target data, corresponding data in a dimension table on a full-text search engine are respectively updated; wherein, the data type can be master table data and slave table data; wherein the first target data may comprise master table data and/or slave table data.

For example, the first target data includes both master table data and slave table data, and for the master table data of the first target data, the corresponding master table data and slave table data on the full-text search engine need to be updated; and for the slave table data of the first target data, only updating corresponding slave table data on the full-text search engine, wherein the corresponding master table and slave table on the full-text search engine can be obtained through primary key information query.

According to the embodiment of the application, the problem of how to update the corresponding data in the dimension table on the full-text search engine according to the first target data is solved.

The embodiment of the present application provides a possible implementation manner, and further, the method further includes:

step S103 (not shown in the figure), extracting second target data from the target database based on all the dimension models;

step S104 (not shown in the figure), a dimension table is generated based on the second target data and stored to the full-text search engine.

Specifically, when the historical data is extracted integrally, second target data is extracted from the target database based on all the dimension models, and a dimension table is generated based on the second target data and stored in the full-text search engine.

According to the embodiment of the application, the problem of how to realize the integral aggregation of the historical data is solved.

step S105 (not shown in the figure), receiving a report generation request and/or a BI analysis request;

step S106 (not shown), generating report and/or performing BI analysis based on the updated dimension table.

Specifically, when a report generation request and/or a BI (business intelligence) analysis request is received, a report is generated and/or BI analysis is performed based on the updated dimension table; the report generation request and/or the BI analysis request may be implemented by the relevant service personnel by triggering the corresponding virtual key.

According to the embodiment of the application, the report can be generated in real time and/or the BI can be analyzed in real time.

The embodiments of the present application provide a possible implementation manner, where the target database includes a relational database and/or a non-relational database.

Specifically, the target databases may be multiple, and the multiple databases may include a relational database and a non-relational database, where monitoring rules corresponding to the relational database and the non-relational database are different, and when a change in a data source of the corresponding database is monitored, data is extracted.

For the embodiment of the application, the data aggregation of the relational database and the non-relational database is realized.

The embodiment of the application provides a possible implementation manner, wherein the full-text search engine is a Solr full-text search engine or an elastic search full-text search engine.

Specifically, the full-text search engine may be a Solr full-text search engine or an elastic search full-text search engine, or may be another search engine capable of implementing the functions of the present application; the Solr is a high-performance full-text search engine, is developed by adopting Java, is based on a Lucene full-text search server, is expanded, provides richer query languages than Lucene, realizes configurability and expandability, optimizes the query performance, provides a perfect function management interface and is a very excellent full-text search engine; the Elasticisearch is a RESTful search engine constructed based on an Apache lucene library, provides a distributed full-text search engine with multi-tenant capability, and has an HTTPWeb interface (REST) and a non-architecture JSON document. It is worth noting that compared with the prior art that the BI analysis is performed by using a local storage, the BI analysis is performed based on the elastic search full-text search engine, the analyzed data volume can be expanded transversely, the BI analysis is not limited by the analyzed data volume, the problem that the data volume for performing the BI analysis is limited in the prior art is solved, and moreover, the data query speed based on the elastic search is very high, so that the efficiency of generating a report and performing the BI analysis can be improved.

For the embodiment of the application, the dimension table obtained by aggregation is stored in a Solr full-text search engine or an elastic search full-text search engine, so that the problems of subsequent report generation and BI analysis are solved.

Fig. 2 is a data aggregation apparatus provided in an embodiment of the present application, where the apparatus 20 includes: a monitoring module 201 and an updating module 202, wherein,

the monitoring module 201 is configured to extract first target data according to a predetermined rule when a change in a data source of a target database is monitored;

and the updating module 202 is used for updating the data in the dimension table stored by the full-text search engine based on the extracted first target data.

Compared with the prior art that time is consumed for obtaining the latest data and efficiency is low, the data aggregation device extracts the first target data according to the preset rule when the change of the data source of the target database is monitored, and updates the data in the dimension table stored by the full-text search engine based on the extracted first target data. That is, according to the embodiment of the present application, when it is monitored that a data source changes, the first target data is extracted according to a predetermined rule, and data in a dimension table stored in a full-text search engine is updated, in other words, only the changed data is aggregated, aggregation of the entire data is avoided, data processing amount is reduced, and real-time aggregation of the data can be achieved, so that efficiency of obtaining the latest data is improved, further, report generation or BI analysis can be performed according to the latest data, and information hysteresis of report generation or BI analysis is avoided.

The data aggregation device of this embodiment may execute a data aggregation method provided in the above embodiments of this application, and the implementation principles thereof are similar, and are not described herein again.

As shown in fig. 3, an embodiment of the present application provides another data aggregation device, where the device 30 includes: a listening module 301, an updating module 302, wherein,

the monitoring module 301 is configured to extract first target data according to a predetermined rule when a change in a data source of a target database is monitored;

here, the function of the listening module 301 in fig. 3 is the same as or similar to that of the listening module 201 in fig. 2.

And an updating module 302, configured to update data in the dimension table stored in the full-text search engine based on the extracted first target data.

Wherein the update module 302 of fig. 3 has the same or similar function as the update module 202 of fig. 2.

The embodiment of the present application provides a possible implementation manner, and specifically, the monitoring module 301 includes:

a determining unit 3011, configured to determine, based on a routing rule, a dimension model corresponding to each changed data, where the dimension model includes master table information, slave table information, master key information, and associated parameter information;

an obtaining unit 3012, configured to obtain, as first target data, associated parameters of corresponding master tables and/or slave tables from data that changes in a data source based on the master key information of all corresponding dimension models.

The embodiment of the present application provides a possible implementation manner, and the updating module 302 is specifically configured to update, based on a type of data corresponding to the first target data, corresponding data in a dimension table on a full-text search engine, where the type includes master table data and slave table data.

The embodiment of the present application provides a possible implementation manner, and further, the apparatus 30 further includes:

an extraction module 303, which extracts the second target data from the target database based on all the dimensional models;

and the storage module 304 is used for generating a dimension table based on the second target data and storing the dimension table to the full-text search engine.

a receiving module 305, configured to receive a report generation request and/or a BI analysis request;

and the analysis module 306 is used for generating reports and/or performing BI analysis based on the updated dimension table.

The embodiment of the application provides a data aggregation device, and compared with the prior art that time is consumed for obtaining latest data and efficiency is low, the data aggregation device extracts first target data according to a preset rule when a data source of a target database is monitored to change, and updates data in a dimension table stored in a full-text search engine based on the extracted first target data. That is, according to the embodiment of the present application, when it is monitored that a data source changes, the first target data is extracted according to a predetermined rule, and data in a dimension table stored in a full-text search engine is updated, in other words, only the changed data is aggregated, aggregation of the entire data is avoided, data processing amount is reduced, and real-time aggregation of the data can be achieved, so that efficiency of obtaining the latest data is improved, further, report generation or BI analysis can be performed according to the latest data, and information hysteresis of report generation or BI analysis is avoided.

The embodiments of the present application provide a data aggregation device, which is suitable for the method shown in the above embodiments, and details are not described herein again.

An embodiment of the present application provides an electronic device, as shown in fig. 4, an electronic device 40 shown in fig. 4 includes: a processor 4001 and a memory 4003. Processor 4001 is coupled to memory 4003, such as via bus 4002. Further, the electronic device 40 may also include a transceiver 4004. In addition, the transceiver 4004 is not limited to one in practical applications, and the structure of the electronic device 40 is not limited to the embodiment of the present application. The processor 4001 is applied in this embodiment of the application, and is configured to implement the functions of the monitoring module and the updating module shown in fig. 2 or fig. 3, and the functions of the extracting module, the storing module, the receiving module, and the analyzing module shown in fig. 3. The transceiver 4004 includes a receiver and a transmitter.

Processor 4001 may be a CPU, general purpose processor, DSP, ASIC, FPGA or other programmable logic device, transistor logic device, hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure. The processor 4001 may also be a combination that performs a computational function, including, for example, a combination of one or more microprocessors, a combination of a DSP and a microprocessor, or the like.

Bus 4002 may include a path that carries information between the aforementioned components. Bus 4002 may be a PCI bus, EISA bus, or the like. The bus 4002 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 4, but this does not indicate only one bus or one type of bus.

Memory 4003 may be, but is not limited to, a ROM or other type of static storage device that can store static information and instructions, a RAM or other type of dynamic storage device that can store information and instructions, an EEPROM, a CD-ROM or other optical disk storage, an optical disk storage (including compact disk, laser disk, optical disk, digital versatile disk, blu-ray disk, etc.), a magnetic disk storage medium or other magnetic storage device, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.

The memory 4003 is used for storing application codes for executing the scheme of the present application, and the execution is controlled by the processor 4001. The processor 4001 is configured to execute application code stored in the memory 4003 to implement the functions of the data aggregation apparatus provided by the embodiment shown in fig. 2 or fig. 3.

Compared with the prior art that time is consumed for obtaining the latest data and efficiency is low, the electronic equipment extracts the first target data according to the preset rule when the change of the data source of the target database is monitored, and updates the data in the dimension table stored by the full-text search engine based on the extracted first target data. That is, according to the embodiment of the present application, when it is monitored that a data source changes, the first target data is extracted according to a predetermined rule, and data in a dimension table stored in a full-text search engine is updated, in other words, only the changed data is aggregated, aggregation of the entire data is avoided, data processing amount is reduced, and real-time aggregation of the data can be achieved, so that efficiency of obtaining the latest data is improved, further, report generation or BI analysis can be performed according to the latest data, and information hysteresis of report generation or BI analysis is avoided.

The embodiment of the application provides an electronic device suitable for the method embodiment. And will not be described in detail herein.

The present application provides a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, the method shown in the above embodiments is implemented.

Compared with the prior art that the latest data acquisition is time-consuming and inefficient, the method and the device for updating the data in the dimension table stored by the full-text search engine extract the first target data according to the preset rule when the change of the data source of the target database is monitored, and update the data in the dimension table stored by the full-text search engine based on the extracted first target data. That is, according to the embodiment of the present application, when it is monitored that a data source changes, the first target data is extracted according to a predetermined rule, and data in a dimension table stored in a full-text search engine is updated, in other words, only the changed data is aggregated, aggregation of the entire data is avoided, data processing amount is reduced, and real-time aggregation of the data can be achieved, so that efficiency of obtaining the latest data is improved, further, report generation or BI analysis can be performed according to the latest data, and information hysteresis of report generation or BI analysis is avoided.

The embodiment of the application provides a computer-readable storage medium which is suitable for the method embodiment. And will not be described in detail herein.

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.

The foregoing is only a partial embodiment of the present application, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present application, and these modifications and decorations should also be regarded as the protection scope of the present application.

Claims

1. A data aggregation method, comprising:

2. The method of claim 1, wherein the extracting the first target data according to a predetermined rule when the data source of the target database is monitored to be changed comprises:

and acquiring the associated parameters of corresponding master tables and/or slave tables from the changed data in the data source respectively based on the master key information of all the corresponding dimension models to serve as the first target data.

3. The method according to claim 1 or 2, wherein the updating data in the dimension table stored in the full text search engine based on the extracted first target data comprises:

4. A method according to claim 2, characterized in that the method comprises:

extracting second target data from the target database based on all dimension models;

and generating the dimension table based on the second target data and storing the dimension table to the full-text search engine.

5. The method according to any one of claims 1-4, characterized in that the method further comprises:

receiving a report generation request and/or a BI analysis request;

and generating a report and/or performing BI analysis based on the updated dimension table.

6. The method of claim 1, wherein the target database comprises a relational database and/or a non-relational database.

7. The method of any of claim 1, wherein the full-text search engine is a Solr full-text search engine or an elastic search full-text search engine.

8. A data convergence device, comprising:

9. An electronic device, comprising:

one or more processors;

a memory;

one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to: performing the data aggregation method according to any one of claims 1 to 7.

10. A computer-readable storage medium for storing computer instructions which, when executed on a computer, cause the computer to perform the data aggregation method of any one of claims 1 to 7.