CN114238258B

CN114238258B - Database data processing method, device, computer equipment and storage medium

Info

Publication number: CN114238258B
Application number: CN202111452515.1A
Authority: CN
Inventors: 赵勇; 王金虎
Original assignee: Qichacha Technology Co ltd
Current assignee: Qichacha Technology Co ltd
Priority date: 2021-11-30
Filing date: 2021-11-30
Publication date: 2024-02-20
Anticipated expiration: 2041-11-30
Also published as: CN114238258A

Abstract

The present disclosure relates to a database data processing method, apparatus, computer device, storage medium and computer program product. The method comprises the following steps: collecting log data in a first database; pushing the log data to a second database, wherein the second database comprises statistical data tables respectively corresponding to different dimensions of different objects; according to the object information and the dimension information in the log data, storing the log data into a corresponding statistical data table, and determining current data; calculating a new statistical value corresponding to the statistical data table according to the historical statistical value, the data state change information, the numerical value increase and decrease information and the data increase and decrease information of the current data; and saving the new statistical value into a third database. By adopting the method, a large number of data cleaning personnel can be saved, and the process is carried out through a computer program, so that the error probability can be reduced compared with the process of processing by the large number of data cleaning personnel.

Description

Database data processing method, device, computer equipment and storage medium

Technical Field

The present disclosure relates to the field of electronic data processing technology, and in particular, to a database data processing method, apparatus, computer device, storage medium, and computer program product.

Background

With the development of big data technology, dimension association data statistics technology appears. Data information associated with a certain object or primary key may be divided into multiple dimensions. The establishment of the information dimension facilitates the classification, statistics and utilization of effective information of the associated information. In the specific statistics of the information included in each dimension, statistics values such as the number, the number of times, the number of pieces of specific content (for example, the number of core people, the number of sheets of a list, and the number of pieces of patent information) may be collectively referred to as count.

In the existing count calculation method, because the data of each dimension has different content characteristics and statistical requirements, the count value is generally calculated by writing respective refreshing logic by data cleaning personnel of different dimensions, so that a large number of data cleaning personnel are required to participate in the count calculation, the efficiency is low and errors are easy to occur.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a database data processing method, apparatus, computer device, computer-readable storage medium, and computer program product that are capable of efficiently and accurately performing count computation.

In a first aspect, the present disclosure provides a database data processing method. The method comprises the following steps:

collecting log data in a first database, wherein the log data comprises object association data, and the object association data comprises dimension association data;

pushing the log data to a second database, wherein the second database comprises statistical data tables respectively corresponding to different dimensions of different objects;

according to the object information and the dimension information in the log data, storing the log data into a corresponding statistical data table, and determining current data;

calculating a new statistical value corresponding to the statistical data table according to the historical statistical value, the data state change information, the numerical value increase and decrease information and the data increase and decrease information of the current data;

and saving the new statistical value into a third database.

In one embodiment, the second database is an open source distributed relational database and the third database is a distributed document storage database.

In one embodiment, the second database further includes a data refresh table for:

determining a target object and a target dimension based on the log data;

storing the target object and the target dimension into a data refreshing table;

and determining a statistical data table needing to calculate a statistical value according to the data refreshing table.

In one embodiment, the statistical data table comprises a classified statistical data table and an unclassified statistical data table.

In one embodiment, the pushing the log data to the second database comprises:

pushing the log data to a second database in the form of a message queue.

In a second aspect, the present disclosure also provides a database data processing apparatus. The device comprises:

the data acquisition module is used for acquiring log data in the first database, wherein the log data comprises object association data, and the object association data comprises dimension association data;

the data pushing module is used for pushing the log data to a second database, and the second database comprises statistical data tables respectively corresponding to different dimensions of different objects;

the data storage module is used for storing the log data into a corresponding statistical data table according to the object information and the dimension information in the log data and determining current data;

the calculation module is used for calculating a new statistical value corresponding to the statistical data table according to the historical statistical value, the data state change information, the numerical value increase and decrease information and the data increase and decrease information of the current data;

and the data summarizing module is used for storing the new statistical value into a third database.

In a third aspect, the present disclosure also provides a computer device. The computer device comprises a memory storing a computer program and a processor which when executing the computer program performs the steps of:

and saving the new statistical value into a third database.

In a fourth aspect, the present disclosure also provides a computer-readable storage medium. The computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:

and saving the new statistical value into a third database.

In a fifth aspect, the present disclosure also provides a computer program product. The computer program product comprises a computer program which, when executed by a processor, implements the steps of:

and saving the new statistical value into a third database.

According to the database data processing method, the device, the computer equipment, the storage medium and the computer program product, the statistical value is calculated by utilizing the log data, so that the statistical value is prevented from being calculated directly by the associated data of each dimension, the statistical value can be calculated without writing refreshing logic corresponding to the associated data of each dimension, a large number of data cleaning personnel are saved, and compared with the process of processing by a large number of data cleaning personnel, the process is performed by the computer program, and the probability of error can be reduced.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure and do not constitute an undue limitation on the disclosure.

FIG. 1 is a diagram of an application environment for a database data processing method in one embodiment;

FIG. 2 is a flow diagram of a database data processing method in one embodiment;

FIG. 3 is a flow chart of a database data processing method according to another embodiment;

FIG. 4 is a block diagram of a database data processing apparatus in one embodiment;

FIG. 5 is a block diagram of a database data processing apparatus in another embodiment;

fig. 6 is an internal structural diagram of a computer device in one embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present disclosure more apparent, the present disclosure will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present disclosure.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the foregoing figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the disclosure described herein may be capable of operation in sequences other than those illustrated or described herein. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the accompanying claims.

The database data processing method provided by the embodiment of the disclosure can be applied to an application environment as shown in fig. 1. The data storage system may store data that the server 102 needs to process. The data storage system may be integrated on the server 102 or may be located on a cloud or other network server. The server 102 may include one or more data acquisition terminals that collect log data in the first database, the log data including object association data, the object association data including dimension association data. Server 102 pushes the log data to a second database that includes statistics tables that respectively correspond to different dimensions of different objects. The server 102 stores the log data into a corresponding statistical data table according to the object information and the dimension information in the log data, and determines the current data. The server 102 calculates a new statistic value corresponding to the statistic data table based on the history statistic value, the data state change information, the numerical value increase/decrease information, and the data increase/decrease information of the current data. The server 102 saves the new statistics into a third database. The server 102 may be implemented as a stand-alone server or as a server cluster of multiple servers.

In one embodiment, as shown in fig. 2, a database data processing method is provided, and an application environment of the method in fig. 1 is taken as an example for explanation, the method includes the following steps:

s202, collecting log data in a first database, wherein the log data comprises object association data, and the object association data comprises dimension association data.

The log data may refer to data of the modified contents of the record database. The object association data may refer to data having an association relationship with an object. Dimension association data may refer to data associated with an object in one or more information dimensions.

Specifically, the dimension may refer to an information dimension. The first database may refer broadly to one or more databases storing dimension association data. The log data may be a binlog log. One or more service tables may be included in the first database, and each service table may have a corresponding binlog. The log data in the first database is collected and may include all binlog logs. The log data includes a modified record of object association data. The object may be a person or an organization, such as a corporate boss or a corporation. The dimension association data may be dimension association data of an individual, such as dimension association data of a boss of a certain company. The dimension association data may be dimension association data of an organization, such as dimension association data of a company. An object may have one or more associated information dimensions, and an associated information dimension may relate to one or more statistics. A particular piece of information may be dimension association information for one or more objects. A particular piece of information may influence the calculation of one or more statistics. The dimensions of different objects may have the same dimension name.

And S204, pushing the log data to a second database, wherein the second database comprises statistical data tables respectively corresponding to different dimensions of different objects.

In particular, the statistics table may be referred to as a count table. There may be a corresponding table of statistics for each dimension of each object. The statistics table may be of different types. Different types of statistical data tables may have different table structures according to actual needs. The log data is pushed to a second database. The log data for the full field may be pushed to a second database.

S206, storing the log data into a corresponding statistical data table according to the object information and the dimension information in the log data, and determining the current data.

Specifically, the log data is consumed, and the log data is stored in a statistics record table of corresponding dimensions of corresponding objects according to object information and dimension information in the log data.

S208, calculating a new statistical value corresponding to the statistical data table according to the historical statistical value, the data state change information, the numerical value increase and decrease information and the data increase and decrease information of the current data.

Wherein, the historical statistics may refer to the values of the statistics prior to recalculation. The data increasing/decreasing information may be data number increasing/decreasing information.

Specifically, the historical statistic is the latest value of the statistic before recalculation. And consuming the log data in the statistical data table, and determining the corresponding statistical value change amount of each statistical data table according to the data state change information, the numerical value increase and decrease information and the data increase and decrease information of the current data. And adding the statistic value change amount and the corresponding historical statistic value to obtain a new statistic value corresponding to each statistic data table. For example, when the statistic data table includes modifying the data state of a piece of dimension-related data from valid to invalid, the statistic change amount may be determined to be-1, and the corresponding historical statistic value (for example, the value count_x) is added to-1 to obtain the value count_x-1, that is, the new statistic value. For example, when a piece of dimension association data is newly added to the statistics table, the statistics change amount may be determined as 1, and the corresponding historical statistics (for example, the number count_y) is added to 1, so as to obtain the number count_y+1, which is the new statistics. The historical statistics may be stored in the third database. For example, when the statistics table includes increasing or decreasing the number of staff by two, the statistics change amount may be determined as 2, and the corresponding historical statistics (for example, the number count_z) may be added to 2 to obtain the number count_z+2, which is a new statistics.

And S210, storing the new statistical value into a third database.

Specifically, the calculated statistical values are summarized and stored in a third database, so that the query and the use of the statistical values are facilitated.

In the database data processing method, the statistical value is calculated by utilizing the log data, so that the statistical value is prevented from being calculated directly by the associated data of each dimension, the statistical value can be calculated without writing refreshing logic corresponding to the associated data of each dimension, a large number of data cleaning personnel are saved, and compared with the process of processing by a large number of data cleaning personnel, the process is performed by a computer program, and the error probability can be reduced.

Specifically, the second database is an open source distributed relational database, for example, may be a TiDB database (TiDB database is an abbreviation of open source distributed HTAP database, and HTAP is a name of a database series, which is called Hybrid Transactional and Analytical Processing in english). The third database is a distributed document storage database, for example, a monglodb database (monglodb database is a distributed document storage database written in c++ language).

In this embodiment, by using the open source distributed relational database as the second database, the beneficial effects of better meeting the requirement of large-scale data processing and better storing the statistical data table can be achieved. By using a distributed document storage database as the third database, the beneficial effects of facilitating the querying and use of statistics can be achieved.

In one embodiment, the second database further includes a data refresh table, and calculating a new statistic corresponding to the statistic table according to the log data and the historical statistic in the statistic table includes:

s302, determining a target object and a target dimension based on the log data.

S304, storing the target object and the target dimension into a data refreshing table.

S306, determining a statistical data table needing to calculate statistical values according to the data refreshing table.

Specifically, an object contained in the log data is determined as a target object. And determining the dimension of information contained in the log data as a target dimension, wherein the target dimension and the target object have an association relationship. And storing the target object and the target dimension into a data refreshing table. And according to the data refreshing table, determining the statistical data table with data change as a target statistical data table, further providing an indication for calculating the statistical value, and only calculating the statistical value corresponding to the target statistical data table.

In this embodiment, by screening the statistical data table that needs to perform statistical value calculation, the beneficial effects of reducing the calculation amount and accelerating the statistical value calculation speed can be achieved.

In one embodiment, the statistics table includes classified statistics tables, unclassified statistics tables.

Wherein, the classified statistic data table may refer to a statistic data table capable of further classifying the stored information. An unclassified statistics table may refer to a statistics table that does not classify stored information any further.

Specifically, the statistical data table comprises a classified statistical data table and an unclassified statistical data table. For a categorized statistics table, each subdivision category may correspond to a statistical value. And storing corresponding dimension association data by using a statistical data table of a proper type according to actual needs.

In this embodiment, the statistical data table includes a classified statistical data table and an unclassified statistical data table, so that dimension association data can be stored more clearly, data consumption is facilitated, and statistical value calculation is facilitated.

In one embodiment, the pushing the log data to the second database comprises:

pushing the log data to a second database in the form of a message queue.

Specifically, the log data is pushed to the second database in the form of a message queue. The message queue may be a kafka message queue (kafka is a name of a message queue).

In this embodiment, the log data is pushed to the second database in the form of a message queue, so that the data push stability and push efficiency can be improved, and the requirement of large-scale data processing can be met.

It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.

Based on the same inventive concept, the embodiment of the disclosure also provides a database data processing device for implementing the above related database data processing method. The implementation of the solution provided by the apparatus is similar to the implementation described in the above method, so the specific limitation of one or more embodiments of the database data processing apparatus provided below may be referred to the limitation of the database data processing method hereinabove, and will not be repeated herein.

In one embodiment, as shown in FIG. 4, there is provided a database data processing apparatus comprising: a data acquisition module 402, a data pushing module 404, a data storage module 406, a calculation module 408, and a data summarization module 410, wherein:

the data collection module 402 is configured to collect log data in the first database, where the log data includes object association data, and the object association data includes dimension association data.

The data pushing module 404 is configured to push the log data to a second database, where the second database includes statistics tables corresponding to different dimensions of different objects respectively.

And the data storage module 406 is configured to store the log data into a corresponding statistical data table according to the object information and the dimension information in the log data, and determine current data.

The calculating module 408 is configured to calculate a new statistic value corresponding to the statistic data table according to the historical statistic value, the data state change information, the numerical value increase and decrease information, and the data increase and decrease information of the current data.

And a data summarization module 410, configured to save the new statistics into a third database.

In one embodiment, as shown in fig. 5, the database data processing apparatus includes: a goal determination module 502, a refresh table module 504, a screening module 506, wherein:

the target determining module 502 is configured to determine a target object and a target dimension based on the log data.

And a refresh table module 504, configured to store the target object and the target dimension in a data refresh table.

And the screening module 506 is configured to determine a statistic data table that needs to perform statistic calculation according to the data refresh table.

In one embodiment, the data pushing module 404 is configured to push the log data to a second database in the form of a message queue.

The various modules in the database data processing apparatus described above may be implemented in whole or in part by software, hardware, or a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 6. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is for storing dimension association data and related process data. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a database data processing method.

Those skilled in the art will appreciate that the architecture shown in fig. 6 is merely a block diagram of a portion of the architecture associated with the disclosed aspects and is not limiting of the computer device to which the disclosed aspects apply, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.

In an embodiment, there is also provided a computer device comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the steps of the method embodiments described above when the computer program is executed.

In one embodiment, a computer device is provided comprising a memory and a processor, the memory having stored therein a computer program, the processor when executing the computer program performing the steps of:

and saving the new statistical value into a third database.

In one embodiment, the processor when executing the computer program further performs the steps of:

determining a target object and a target dimension based on the log data;

In one embodiment, a computer readable storage medium is provided having a computer program stored thereon, which when executed by a processor, performs the steps of:

and saving the new statistical value into a third database.

In one embodiment, the computer program when executed by the processor further performs the steps of:

determining a target object and a target dimension based on the log data;

In one embodiment, a computer program product is provided comprising a computer program which, when executed by a processor, performs the steps of:

and saving the new statistical value into a third database.

determining a target object and a target dimension based on the log data;

It should be noted that, the user information (including, but not limited to, user equipment information, user personal information, etc.) and the data (including, but not limited to, data for analysis, stored data, presented data, etc.) related to the present disclosure are information and data authorized by the user or sufficiently authorized by each party.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in embodiments provided by the present disclosure may include at least one of non-volatile and volatile memory, among others. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the various embodiments provided by the present disclosure may include at least one of a relational database and a non-relational database. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processors involved in the embodiments provided by the present disclosure may be general-purpose processors, central processing units, graphics processors, digital signal processors, programmable logic, quantum computing-based data processing logic, etc., without limitation thereto.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The foregoing examples have expressed only a few embodiments of the present disclosure, which are described in more detail and detail, but are not to be construed as limiting the scope of the present disclosure. It should be noted that variations and modifications can be made by those skilled in the art without departing from the spirit of the disclosure, which are within the scope of the disclosure. Accordingly, the scope of the present disclosure should be determined from the following claims.

Claims

1. A database data processing method, the method comprising:

pushing the log data to a second database, wherein the second database comprises statistical data tables respectively corresponding to different dimensions of different objects; the second database further includes a data refresh table for:

determining a target object and a target dimension based on the log data;

determining a statistical data table needing to perform statistical value calculation according to the data refreshing table;

and saving the new statistical value into a third database.

2. The method of claim 1, wherein the second database is an open source distributed relational database and the third database is a distributed document storage database.

3. The method of claim 1, wherein the statistics table comprises a categorized statistics table, an unclassified statistics table.

4. The method of claim 1, wherein pushing the log data to a second database comprises:

pushing the log data to a second database in the form of a message queue.

5. A statistical apparatus for associating dimension data, the apparatus comprising:

the target determining module is used for determining a target object and a target dimension based on the log data;

the refreshing table module is used for storing the target object and the target dimension into a data refreshing table;

the screening module is used for determining a statistical data table needing to calculate a statistical value according to the data refreshing table;

6. The apparatus of claim 5, wherein the data pushing module is configured to push the log data to a second database in the form of a message queue.

7. The apparatus of claim 5, wherein the second database is an open source distributed relational database and the third database is a distributed document storage database.

8. The apparatus of claim 5, wherein the statistics table comprises a categorized statistics table, an unclassified statistics table.

9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 4 when the computer program is executed.

10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 4.