CN114238258A

CN114238258A - Database data processing method and device, computer equipment and storage medium

Info

Publication number: CN114238258A
Application number: CN202111452515.1A
Authority: CN
Inventors: 赵勇; 王金虎
Original assignee: Qichacha Technology Co ltd
Current assignee: Qichacha Technology Co ltd
Priority date: 2021-11-30
Filing date: 2021-11-30
Publication date: 2022-03-25
Anticipated expiration: 2041-11-30
Also published as: CN114238258B

Abstract

The present disclosure relates to a database data processing method, apparatus, computer device, storage medium, and computer program product. The method comprises the following steps: collecting log data in a first database; pushing the log data to a second database, wherein the second database comprises statistical data tables respectively corresponding to different dimensions of different objects; storing the log data into a corresponding statistical data table according to the object information and the dimension information in the log data, and determining the current data; calculating a new statistical value corresponding to the statistical data table according to the historical statistical value and the data state change information, the numerical value increase and decrease information and the data increase and decrease information of the current data; saving the new statistical value to a third database. By adopting the method, a large number of data cleaning personnel can be saved, and the process is carried out through a computer program, so that the error probability can be reduced compared with the processing by a large number of data cleaning personnel.

Description

Database data processing method and device, computer equipment and storage medium

Technical Field

The present disclosure relates to the field of electrical data processing technologies, and in particular, to a database data processing method, an apparatus, a computer device, a storage medium, and a computer program product.

Background

With the development of big data technology, dimension correlation data statistics technology appears. Data information associated with an object or primary key may be divided into a plurality of dimensions. The establishment of information dimension facilitates the classification, statistics and effective information utilization of the associated information. In the specific statistics of the information included in each dimension, the statistics (for example, the number of core persons, the number of lists, and the number of pieces of patent information) such as the number, the number of times, and the number of pieces of specific content may be collectively referred to as count.

In the existing count calculation method, because data of each dimension has different content characteristics and statistical requirements, data cleaning personnel of different dimensions generally write respective refreshing logic to calculate a count value, so that a large number of data cleaning personnel are required to participate in count calculation, the efficiency is low, and errors are easy to occur.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a database data processing method, an apparatus, a computer device, a computer readable storage medium, and a computer program product, which can efficiently and accurately perform count calculation.

In a first aspect, the present disclosure provides a database data processing method. The method comprises the following steps:

collecting log data in a first database, wherein the log data comprise object associated data, and the object associated data comprise dimension associated data;

pushing the log data to a second database, wherein the second database comprises statistical data tables respectively corresponding to different dimensions of different objects;

storing the log data into a corresponding statistical data table according to the object information and the dimension information in the log data, and determining the current data;

calculating a new statistical value corresponding to the statistical data table according to the historical statistical value and the data state change information, the numerical value increase and decrease information and the data increase and decrease information of the current data;

saving the new statistical value to a third database.

In one embodiment, the second database is an open source distributed relational database, and the third database is a distributed document storage database.

In one embodiment, the second database further comprises a data refresh table for:

determining a target object and a target dimension based on the log data;

storing the target object and the target dimension to a data refresh table;

and determining a statistical data table needing to be subjected to statistical value calculation according to the data refreshing table.

In one embodiment, the statistical data table includes classified statistical data table and unclassified statistical data table.

In one embodiment, the pushing the log data to the second database includes:

and pushing the log data to a second database in a message queue mode.

In a second aspect, the present disclosure also provides a database data processing apparatus. The device comprises:

the data acquisition module is used for acquiring log data in a first database, wherein the log data comprises object associated data, and the object associated data comprises dimension associated data;

the data pushing module is used for pushing the log data to a second database, and the second database comprises statistical data tables respectively corresponding to different dimensions of different objects;

the data storage module is used for storing the log data into a corresponding statistical data table according to the object information and the dimension information in the log data and determining the current data;

the calculation module is used for calculating a new statistical value corresponding to the statistical data table according to a historical statistical value and the data state change information, the numerical value increase and decrease information and the data increase and decrease information of the current data;

and the data summarization module is used for storing the new summarized value into a third database.

In a third aspect, the present disclosure also provides a computer device. The computer device comprises a memory storing a computer program and a processor implementing the following steps when executing the computer program:

saving the new statistical value to a third database.

In a fourth aspect, the present disclosure also provides a computer-readable storage medium. The computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of:

saving the new statistical value to a third database.

In a fifth aspect, the present disclosure also provides a computer program product. The computer program product comprising a computer program which when executed by a processor performs the steps of:

saving the new statistical value to a third database.

According to the database data processing method, the database data processing device, the computer equipment, the storage medium and the computer program product, the statistical value is calculated by utilizing the log data, the fact that the statistical value is directly calculated through the associated data of each dimension is avoided, the statistical value can be calculated without compiling refreshing logic corresponding to the associated data of each dimension, a large number of data cleaning personnel are saved, the process is carried out through the computer program, and compared with the processing by a large number of data cleaning personnel, the probability of errors can be reduced.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.

FIG. 1 is a diagram of an application environment of a database data processing method in one embodiment;

FIG. 2 is a schematic flow chart diagram illustrating a database data processing method according to one embodiment;

FIG. 3 is a schematic flow chart diagram illustrating a database data processing method according to another embodiment;

FIG. 4 is a block diagram showing the structure of a database data processing apparatus according to an embodiment;

FIG. 5 is a block diagram showing the structure of a database data processing apparatus according to another embodiment;

FIG. 6 is a diagram illustrating an internal structure of a computer device according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present disclosure more clearly understood, the present disclosure is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the disclosure and are not intended to limit the disclosure.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

The database data processing method provided by the embodiment of the disclosure can be applied to the application environment shown in fig. 1. Wherein the data storage system may store data that the server 102 needs to process. The data storage system may be integrated on the server 102, or may be located on the cloud or other network server. The server 102 may include one or more data acquisition terminals that acquire log data in a first database, the log data including object association data, the object association data including dimension association data. The server 102 pushes the log data to a second database comprising statistical data tables corresponding to different dimensions of different objects, respectively. The server 102 stores the log data into a corresponding statistical data table according to the object information and the dimension information in the log data, and determines the current data. The server 102 calculates a new statistical value corresponding to the statistical data table based on the historical statistical value and the data state change information, the numerical value increase/decrease information, and the data increase/decrease information of the current data. Server 102 saves the new statistical value to a third database. The server 102 may be implemented as a stand-alone server or as a server cluster comprised of multiple servers.

In one embodiment, as shown in fig. 2, a database data processing method is provided, which is described by taking the application environment in fig. 1 as an example, and includes the following steps:

s202, collecting log data in a first database, wherein the log data comprise object associated data, and the object associated data comprise dimension associated data.

The log data may refer to data for recording modified content of the database. The object association data may refer to data having an association relation with an object. Dimension association data may refer to data associated with the presence of an object in one or more information dimensions.

In particular, a dimension may refer to an information dimension. The first database may broadly refer to one or more databases storing dimension-related data. The log data may be a binlog log. The first database may include one or more service tables, and each service table may have a corresponding binlog log. Log data, which may include all binlog logs, in the first database is collected. The log data includes a modification record of the object association data. The object may be an individual or an organization, such as a company boss or a company. The dimension association data may be dimension association data of an individual, such as dimension association data of a company boss. The dimension association data may be dimension association data of an organization, such as dimension association data of a certain company. An object may have one or more associated information dimensions, and an associated information dimension may relate to one or more statistics. A particular piece of information may become dimension related information for one or more objects. A particular piece of information may affect the calculation of one or more statistical values. Dimensions of different objects may have the same dimension name.

S204, pushing the log data to a second database, wherein the second database comprises statistical data tables respectively corresponding to different dimensions of different objects.

In particular, the statistics table may be referred to as a count table. Each dimension of each object may have a corresponding table of statistics. The statistics table may be of different types. Different types of statistical data tables can have different table structures according to actual needs. And pushing the log data to a second database. The full field log data may be pushed to the second database.

S206, according to the object information and the dimension information in the log data, storing the log data into a corresponding statistical data table, and determining the current data.

Specifically, the log data are consumed, and the log data are stored into a statistical record table of corresponding dimensions of corresponding objects according to object information and dimension information in the log data.

And S208, calculating a new statistical value corresponding to the statistical data table according to the historical statistical value and the data state change information, the numerical value increase and decrease information and the data increase and decrease information of the current data.

Wherein, the historical statistic value may refer to a numerical value of the statistic value before recalculation. The data increase/decrease information may be information for increasing or decreasing the number of pieces of data.

Specifically, the historical statistical value is the latest value of the statistical value before recalculation. And consuming the log data in the statistical data table, and determining the statistical value change amount corresponding to each statistical data table according to the data state change information, the numerical value increase and decrease information and the data increase and decrease information of the current data. And adding the statistic value change quantity and the corresponding historical statistic value to obtain a new statistic value corresponding to each statistic data table. For example, when the data state of a certain dimension associated data is modified from valid to invalid, the amount of change of the statistical value may be determined as-1, and the corresponding historical statistical value (e.g., the value count _ X) is added to-1 to obtain a value count _ X-1, which is a new statistical value. For example, when a new dimension related data is added to the statistical data table, the amount of change of the statistical value may be determined as 1, and the corresponding historical statistical value (for example, the number count _ Y) is added to 1 to obtain a number count _ Y +1, which is a new statistical value. The historical statistics may be stored in the third database. For example, when the statistical data table includes two people for increasing or decreasing the number of staff, the amount of change of the statistical value may be determined as 2, and the corresponding historical statistical value (e.g., the number count _ Z) is added to 2 to obtain a number count _ Z +2, which is a new statistical value.

And S210, storing the new statistical value into a third database.

Specifically, the calculated statistical values are collected and stored in a third database, so that the statistical values can be conveniently inquired and used.

In the database data processing method, the log data is used for calculating the statistical value, the fact that the statistical value is directly calculated through the associated data of each dimension is avoided, the statistical value can be calculated without compiling the refreshing logic corresponding to the associated data of each dimension, a large number of data cleaning personnel are saved, the process is carried out through a computer program, and compared with the process of being processed by a large number of data cleaning personnel, the probability of errors can be reduced.

In one embodiment, the second database is an open source distributed relational database and the third database is a distributed document storage database.

Specifically, the second database is an open-source distributed relational database, such as a TiDB database (the TiDB database is an abbreviation of an open-source distributed HTAP database, and an HTAP is a name of a database column, which is called Hybrid Transactional and Analytical Processing in all english). The third database is a distributed document storage database, and may be, for example, a MongoDB database (the MongoDB database is a distributed document storage database written in C + + language).

In this embodiment, the open-source distributed relational database is used as the second database, so that the beneficial effects of better meeting the large-scale data processing requirements and better storing the statistical data table can be achieved. By using the distributed document storage database as the third database, the advantageous effect of facilitating the query and use of the statistical value data can be achieved.

In one embodiment, the second database further includes a data refresh table, and the calculating a new statistic corresponding to the statistic table according to the log data in the statistic table and the historical statistic includes:

s302, determining a target object and a target dimension based on the log data.

S304, storing the target object and the target dimension into a data updating table.

S306, determining a statistical data table needing to be subjected to statistical value calculation according to the data refreshing table.

Specifically, an object included in the log data is determined as a target object. And determining the information dimension contained in the log data as a target dimension, wherein the target dimension and the target object have an association relation. And storing the target object and the target dimension to a data updating table. And determining the statistical data table with data change as a target statistical data table according to the data refreshing table, thereby providing an indication for the calculation of the statistical value, and only calculating the statistical value corresponding to the target statistical data table.

In this embodiment, the statistical data table that needs to be subjected to statistical value calculation is screened, so that the beneficial effects of reducing the calculation amount and accelerating the calculation speed of the statistical value can be achieved.

In one embodiment, the statistics table includes a sorted statistics table, a non-sorted statistics table.

The classified statistical data table may refer to a statistical data table that may further classify the stored information. A non-sorted statistics table may refer to a statistics table that does not further sort the stored information.

Specifically, the statistical data table includes a classified statistical data table and a non-classified statistical data table. For a sorted statistics table, each sub-category may correspond to a statistics value. And storing corresponding dimension associated data by using a statistical data table of a proper type according to actual needs.

In this embodiment, the statistical data table includes a classified statistical data table and a non-classified statistical data table, so that the beneficial effects of storing dimension associated data more clearly, facilitating data consumption, and facilitating statistical value calculation can be achieved.

In one embodiment, said pushing said log data to a second database comprises:

and pushing the log data to a second database in a message queue mode.

Specifically, the log data is pushed to the second database in the form of a message queue. The message queue may be a kafka message queue (kafka is a name of a message queue).

In this embodiment, the log data is pushed to the second database in the form of a message queue, so that the beneficial effects of improving the data pushing stability and efficiency and meeting the large-scale data processing requirements can be achieved.

It should be understood that, although the steps in the flowcharts related to the embodiments as described above are sequentially displayed as indicated by arrows, the steps are not necessarily performed sequentially as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a part of the steps in the flowcharts related to the embodiments described above may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the execution order of the steps or stages is not necessarily sequential, but may be rotated or alternated with other steps or at least a part of the steps or stages in other steps.

Based on the same inventive concept, the embodiment of the present disclosure further provides a database data processing apparatus for implementing the above-mentioned database data processing method. The implementation scheme for solving the problem provided by the device is similar to the implementation scheme described in the above method, so the specific limitations in one or more embodiments of the database data processing device provided below may refer to the limitations on the database data processing method in the above description, and are not described herein again.

In one embodiment, as shown in fig. 4, there is provided a database data processing apparatus including: data acquisition module 402, data push module 404, data storage module 406, calculation module 408 and data summarization module 410, wherein:

a data collection module 402, configured to collect log data in a first database, where the log data includes object association data, and the object association data includes dimension association data.

A data pushing module 404, configured to push the log data to a second database, where the second database includes statistical data tables respectively corresponding to different dimensions of different objects.

And the data storage module 406 is configured to store the log data into a corresponding statistical data table according to the object information and the dimension information in the log data, and determine current data.

The calculating module 408 is configured to calculate a new statistical value corresponding to the statistical data table according to the historical statistical value and the data state change information, the numerical value increase/decrease information, and the data increase/decrease information of the current data.

And a data summarization module 410, configured to save the new summarized value to a third database.

In one embodiment, as shown in fig. 5, the database data processing apparatus includes: a goal determination module 502, a refresh table module 504, a screening module 506, wherein:

a target determination module 502 for determining a target object and a target dimension based on the log data.

A refresh table module 504, configured to store the target object and the target dimension to a data refresh table.

And the screening module 506 is configured to determine a statistical data table that needs to perform statistical value calculation according to the data refresh table.

In one embodiment, the data pushing module 404 is configured to push the log data to the second database in the form of a message queue.

The modules in the database data processing device can be wholly or partially implemented by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as shown in fig. 6. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing dimension association data and related processing data. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a database data processing method.

Those skilled in the art will appreciate that the architecture shown in fig. 6 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is further provided, which includes a memory and a processor, the memory stores a computer program, and the processor implements the steps of the above method embodiments when executing the computer program.

In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having a computer program stored therein, the processor implementing the following steps when executing the computer program:

saving the new statistical value to a third database.

In one embodiment, the processor, when executing the computer program, further performs the steps of:

determining a target object and a target dimension based on the log data;

storing the target object and the target dimension to a data refresh table;

In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of:

saving the new statistical value to a third database.

In one embodiment, the computer program when executed by the processor further performs the steps of:

determining a target object and a target dimension based on the log data;

storing the target object and the target dimension to a data refresh table;

In one embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, performs the steps of:

saving the new statistical value to a third database.

determining a target object and a target dimension based on the log data;

storing the target object and the target dimension to a data refresh table;

It should be noted that the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data for analysis, stored data, presented data, etc.) referred to in the present disclosure are information and data that are authorized by the user or sufficiently authorized by each party.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, databases, or other media used in the embodiments provided by the present disclosure may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high-density embedded nonvolatile Memory, resistive Random Access Memory (ReRAM), Magnetic Random Access Memory (MRAM), Ferroelectric Random Access Memory (FRAM), Phase Change Memory (PCM), graphene Memory, and the like. Volatile Memory can include Random Access Memory (RAM), external cache Memory, and the like. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others. The databases involved in embodiments provided by the present disclosure may include at least one of relational and non-relational databases. The non-relational database may include, but is not limited to, a block chain based distributed database, and the like. The processors referred to in the embodiments provided in this disclosure may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic, quantum computing based data processing logic, etc., without limitation.

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present disclosure, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present disclosure. It should be noted that, for those skilled in the art, various changes and modifications can be made without departing from the concept of the present disclosure, and these changes and modifications are all within the scope of the present disclosure. Therefore, the protection scope of the present disclosure should be subject to the appended claims.

Claims

1. A database data processing method, the method comprising:

saving the new statistical value to a third database.

2. The method of claim 1, wherein the second database further comprises a data refresh table for:

determining a target object and a target dimension based on the log data;

storing the target object and the target dimension to a data refresh table;

3. The method of claim 1, wherein the second database is an open source distributed relational database and the third database is a distributed document storage database.

4. The method of claim 1, wherein the statistics table comprises a sorted statistics table and a non-sorted statistics table.

5. The method of claim 1, wherein pushing the log data to a second database comprises:

and pushing the log data to a second database in a message queue mode.

6. A statistical apparatus for correlating dimensional data, the apparatus comprising:

7. The apparatus of claim 6, further comprising:

a target determination module for determining a target object and a target dimension based on the log data;

the refreshing table module is used for storing the target object and the target dimension to a data refreshing table;

and the screening module is used for determining a statistical data table which needs to be subjected to statistical value calculation according to the data updating table.

8. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 5.

9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 5.

10. A computer program product comprising a computer program, characterized in that the computer program realizes the steps of the method of any one of claims 1 to 5 when executed by a processor.