CN114791915A

CN114791915A - Data aggregation method and device, computer equipment and storage medium

Info

Publication number: CN114791915A
Application number: CN202210708780.XA
Authority: CN
Inventors: 张民遐; 刘冲
Original assignee: Shenzhen Gaodeng Computer Technology Co ltd
Current assignee: Shenzhen Gaodeng Computer Technology Co ltd
Priority date: 2022-06-22
Filing date: 2022-06-22
Publication date: 2022-07-26
Anticipated expiration: 2042-06-22
Also published as: CN114791915B

Abstract

The application relates to a data collection method, a data collection device, computer equipment and a storage medium. The method comprises the following steps: when the data collection request is obtained, responding to the data collection request, and determining request parameters of the data collection request; the request parameter comprises a time parameter; determining a target collection strategy corresponding to the data collection request according to the time parameter; determining a target data source corresponding to the target collection strategy in the multiple data sources, and screening target collection data corresponding to the request parameters from the target data source; and collecting the target collection data to obtain a data collection result. By adopting the method, the accuracy of data collection can be improved.

Description

Data aggregation method and device, computer equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a data aggregation method and apparatus, a computer device, and a storage medium.

Background

With the development of computer technology, users often perform transactions in an online payment mode, so that a plurality of business data are generated, and after the enterprises acquire the business data of the users, the enterprises need to perform data aggregation processing, namely, a process of summarizing the business data to the appointed accounts of the enterprises.

At present, when an enterprise needs to perform collection processing on business data, the enterprise generally directly obtains pre-stored data from a database, and then performs collection processing on the obtained data one by one. However, since the business data needs to be stored in the database in advance before being collected, the number of the business data in the real transaction process increases in real time, and thus, the problem that all the data cannot be collected accurately is caused. Therefore, how to aggregate the service data with the quantity increasing in real time so as to improve the accuracy of data aggregation is a problem to be solved by the present disclosure.

Disclosure of Invention

In view of the foregoing, it is necessary to provide a data aggregation method, apparatus, computer device, computer readable storage medium, and computer program product capable of improving accuracy of data aggregation.

In a first aspect, the present application provides a data aggregation method. The method comprises the following steps:

when a data collection request is obtained, responding to the data collection request, and determining request parameters of the data collection request; the request parameter comprises a time parameter;

determining a target collection strategy corresponding to the data collection request according to the time parameter;

determining a target data source corresponding to the target collection strategy in a plurality of data sources, and screening target collection data corresponding to the request parameters from the target data source;

and collecting the target collection data to obtain a data collection result.

In one embodiment, the determining a target aggregation policy corresponding to the data aggregation request according to the time parameter includes: acquiring a target time period; when the time parameter comprises the target time period, determining that the target collection strategy is a first collection strategy; the first collection strategy is a strategy for dynamically collecting the data in the target time period; the determining a target data source of a plurality of data sources corresponding to the target aggregation policy comprises: taking a preset database and a dynamic database as target data sources corresponding to the first aggregation strategy; the dynamic database stores data dynamically collected on distributed clusters.

In one embodiment, the determining a target aggregation policy corresponding to the data aggregation request according to the time parameter includes: acquiring a target time period; when the time parameter does not comprise the target time period, determining that the target collection strategy is a second collection strategy; the second aggregation strategy is a strategy for aggregating data in a non-target time period; the determining a target data source of a plurality of data sources corresponding to the target aggregation policy comprises: and taking a preset database as a target data source corresponding to the second aggregation strategy.

In one embodiment, the data aggregation method is performed by a data aggregation component; the data aggregation component comprises a storage module configured with the target data source; the storage module comprises at least one main node and at least one slave node managed by each main node; the screening out the target collection data corresponding to the request parameters from the target data source comprises the following steps: when the target collection strategy is a first collection strategy, acquiring a target association relation between a pre-configured main node and a request type; screening out a target main node corresponding to the data collection request from at least one main node of the storage module according to the target association relation; and screening target slave nodes from at least one slave node managed by the target master node according to the time parameter, and reading target collection data from the target slave nodes.

In one embodiment, the data aggregation method is performed by a data aggregation component; the data collection component comprises a calculation module and a preset database; the preset database comprises a collection data table; the collecting processing of the target collection data to obtain a data collection result comprises the following steps: triggering the computing module according to the request type in the request parameter, and screening out a target code file required by executing the collection processing from the code file set; initializing a collection data table in the preset database to obtain a temporary data table; filling the target collection data into the temporary data table to obtain a target data table; and running the codes indicated by the target code file to perform collection processing on the target data table to obtain a data collection result.

In one embodiment, the target collection data includes an account identification and at least one item category field; the collecting processing of the target collection data to obtain a data collection result comprises the following steps: determining an account identifier and at least one item category field corresponding to each target set data; grouping the target collection data according to the account identification to obtain at least one group of target collection data sets; the target collection dataset comprises at least one target collection data; for each group of target collection data sets in the multiple groups of target collection data sets, respectively superposing field data with the same item type field in the current target collection data set to obtain a collection sub-result corresponding to the current target collection data set; and integrating the corresponding collection sub-results of each target collection data set to obtain a data collection result.

In one embodiment, the data aggregation method is performed by a data aggregation component; the computing module and the storage module which are comprised by the data aggregation component run on the distributed cluster; the storage module comprises a plurality of main nodes; the computing module and the storage module carry out associated configuration through a code file; the association process of the code file comprises the following steps: through the computing module, when a code file association task is obtained, determining an operating environment and a code file corresponding to the code file association task, and sending the code file to the storage module; determining resources required by executing the code file according to the running environment through the computing module, and sending the resources to the storage module; determining a main node corresponding to the resource from a plurality of main nodes and a task type of the code file association task through the storage module, and associating the task type with the main node to obtain a target association relation; the task type is the same as the corresponding request type when the related request is executed through the code file; and acquiring the code file sent by the computing module through the storage module, and configuring at least one slave node in the master node according to the code file.

In one of the embodiments, the method further comprises; displaying a data collection interface through a terminal; a time area and an aggregation control are displayed in the data aggregation interface; responding to selection operation aiming at the time region through the terminal, and determining a time parameter selected by the selection operation; responding to the trigger operation aiming at the collection control by the terminal, and generating a data collection request according to the time parameter; and acquiring a data collection result obtained in response to the data collection request through the terminal, and displaying the data collection result.

In one embodiment, the data aggregation result comprises a plurality of pieces of aggregated data; the aggregated data comprises account identification and project category data; a history collection list is displayed in the data collection interface and comprises data information and header information; the displaying the data aggregation result comprises: when a data aggregation result is received, deleting the data information in the history aggregation list to obtain an empty list only comprising the header information; determining a target collection area of the collected data in the vacant list according to the account identification in the collected data; and filling the item category data in the collected data into a target collection area.

In a second aspect, the present application further provides a data aggregation device. The device comprises:

the request acquisition module is used for responding to the data collection request and determining request parameters of the data collection request when the data collection request is acquired; the request parameter comprises a time parameter;

the strategy determining module is used for determining a target collection strategy corresponding to the data collection request according to the time parameter;

the result determining module is used for determining a target data source corresponding to the target collection strategy in a plurality of data sources and screening target collection data corresponding to the request parameters from the target data source; and collecting the target collection data to obtain a data collection result.

In a third aspect, the application also provides a computer device. The computer device comprises a memory storing a computer program and a processor implementing the following steps when executing the computer program:

and collecting the target collection data to obtain a data collection result.

In a fourth aspect, the present application further provides a computer-readable storage medium. The computer-readable storage medium has stored thereon a computer program which, when executed by a processor, performs the steps of:

and collecting the target collection data to obtain a data collection result.

In a fifth aspect, the present application further provides a computer program product. The computer program product comprising a computer program which when executed by a processor performs the steps of:

and collecting the target collection data to obtain a data collection result.

According to the data collection method, the data collection device, the computer equipment, the storage medium and the computer program product, when the data collection request is obtained, the request parameters of the data collection request are determined by responding to the data collection request, wherein the request parameters comprise time parameters; further, according to the time parameter, a target collection strategy corresponding to the data collection request can be determined; and determining a target data source corresponding to the target collection strategy in the data sources, and screening target collection data corresponding to the request parameters from the target data source, so that the target collection data can be collected to obtain a data collection result. Because the target data source is determined based on the target collection strategy after the target collection strategy is determined, compared with the traditional mode that only the database of the pre-stored data is used as the data source, the target collection data corresponding to the request parameters can be accurately screened from the target data source, and the accuracy of data collection is improved; meanwhile, different target collection strategies can be determined according to time parameters, so that the problem of insufficient flexibility of data collection is solved.

Drawings

FIG. 1 is a diagram of an embodiment of the data aggregation method;

FIG. 2 is a schematic flow chart diagram illustrating a data aggregation method in one embodiment;

FIG. 3 is a schematic flow diagram that illustrates the filtering of target aggregate data in one embodiment;

FIG. 4 is a schematic diagram of a memory module in one embodiment;

FIG. 5 is a flowchart illustrating the task of associating configuration code files in one embodiment;

FIG. 6 is a schematic diagram that illustrates tasks associated with configuring a code file in one embodiment;

FIG. 7 is a schematic diagram of an interface to a data aggregation interface in one embodiment;

FIG. 8 is a schematic flow chart diagram illustrating a data aggregation method in accordance with another embodiment;

FIG. 9 is a block diagram showing the structure of a data aggregating apparatus in one embodiment;

fig. 10 is an internal structural diagram of a computer device in one embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more clearly understood, the present application is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

The data aggregation method provided by the embodiment of the application can be applied to the application environment shown in fig. 1. Wherein the terminal 102 communicates with the server 104 via a network. The data storage system may store data that the server 104 needs to process. The data storage system may be integrated on the server 104 or may be placed on the cloud or other network server. The terminal 102 is configured to generate a data aggregation request and send the data aggregation request to the server 104. When the data collection request is obtained, the server 104 determines a target collection policy corresponding to the data collection request. The server 104 is further configured to determine a target data source corresponding to the target collection policy, screen out target collection data corresponding to the request parameter from the target data source, and collect the target collection data to obtain a data collection result. The terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, internet of things devices and portable wearable devices, and the internet of things devices may be smart speakers, smart televisions, smart air conditioners, smart car-mounted devices, and the like. The portable wearable device can be a smart watch, a smart bracelet, a head-mounted device, and the like. The server 104 may be implemented as a stand-alone server or as a server cluster comprised of multiple servers.

In one embodiment, as shown in fig. 2, a data aggregation method is provided, which is described by taking the method as an example applied to the server in fig. 1, and includes the following steps:

step 202, when a data collection request is obtained, determining request parameters of the data collection request in response to the data collection request; the request parameter includes a time parameter.

The request parameters are used for triggering the server to execute the data collection task corresponding to the data collection request, and when the request parameters are different, the tasks executed by the triggering server are also different.

Specifically, when the server obtains the data collecting request, the server may perform data analysis on the data collecting request to obtain a request parameter of the data collecting request. The time parameter included in the request parameter is obtained by analyzing the time selected by the user in advance, for example, the data that can be identified by the server obtained after analysis is 2022 year 1 month 1 day to 2022 year 1 month 2 day.

In one embodiment, the time parameter may be used to determine a target aggregation policy corresponding to each of the different data aggregation requests, and may also be used to filter the target aggregation data.

In one embodiment, when a user needs to query data collection conditions of different time periods, a collection control in a data collection interface in a terminal can be triggered, so that the terminal generates a data collection request and sends the data collection request to a server.

And step 204, determining a target collection strategy corresponding to the data collection request according to the time parameter.

Wherein the time parameter comprises at least one of a target time period and a historical time period; the target aggregation policy may include a first aggregation policy and a second aggregation policy.

In one embodiment, determining a target aggregation policy corresponding to the data aggregation request according to the time parameter includes: acquiring a target time period; when the time parameter comprises a target time period, determining that the target collection strategy is a first collection strategy; the first aggregation policy is a policy for dynamically aggregating data of a target time period.

The target time period may be a time period preset by the user, or a time period which is dynamically changed and is the same as the current time period when the server receives the data aggregation request. For example, when the time when the server receives the data collecting request is No. 2 and No. 8, the target time period is from No. 2 to No. 2 and No. 8, and when the time when the server receives the data collecting request again is No. 02 and No. 10, the target time period is changed from No. 2 to No. 0 and No. 2 and No. 10; and the time for receiving the data aggregation request corresponds to the standard time of the national time service center in real time.

Specifically, the server searches the time for acquiring the data collection request through a pre-configured time search module, and obtains a target time period according to a search result. And the server compares the time parameter in the data collection request with the target time period, and determines the target collection strategy as a first collection strategy when the time parameter comprises the target time period. The first aggregation strategy is a strategy for dynamically aggregating data in the target time period, that is, a strategy for aggregating data in a dynamic change period when the target time period is dynamically changed. For example, the target time period is from 0 point 2 to 8 points 2, and the time parameter is from 1 point to 2, at which time the target aggregation policy may be determined to be the first aggregation policy.

In one embodiment, determining a target aggregation policy corresponding to the data aggregation request according to the time parameter includes: acquiring a target time period; when the time parameter does not comprise the target time period, determining that the target collection strategy is a second collection strategy; the second aggregation policy is a policy for aggregating data in a non-target time period.

Specifically, the server acquires a target time period through a pre-configured time retrieval module, compares the time parameter with the target time period, and determines that the target collection strategy is a second collection strategy when the time parameter does not include the target time period, which may be considered as a strategy for collecting data in a historical time period. For example, when the target time period is from 0 point No. 2 to 8 points No. 2 and the time parameter is No. 1, the target aggregation policy may be determined to be the second aggregation policy.

In the embodiment, different aggregation strategies can be determined through the time parameters and the target time periods, so that the target data source can be accurately screened out subsequently based on the different aggregation strategies, therefore, various possible aggregation strategies are provided, and the flexibility of data aggregation is improved.

In one embodiment, a collection strategy selection interface is displayed in the terminal, and a target collection strategy selected by a selection operation is determined in response to the selection operation of a user on the collection strategy selection interface.

Step 206, determining a target data source corresponding to the target collection strategy in the plurality of data sources, and screening target collection data corresponding to the request parameter from the target data source.

And when the target collection strategies are different, the corresponding target data sources are also different.

Specifically, the server obtains a corresponding relationship between the target aggregation policy and the target data source, where the corresponding relationship includes a first corresponding relationship and a second corresponding relationship. The server can determine a first target data source corresponding to the first aggregation strategy according to the first corresponding relation, wherein the first target data source can be a user preset database. The server may determine a second target data source corresponding to the second aggregation policy according to the second correspondence. The first target data source may be a preset database and a dynamic database, and the second target data source may be a preset database. When the server determines the target data source, the target collection data can be screened from different target data sources according to the time parameter in the request parameter.

In one embodiment, the user may configure the correspondence between the target aggregation policy and the target data source in advance.

In one embodiment, determining a target data source of the plurality of data sources corresponding to the target aggregation policy comprises: taking a preset database and a dynamic database as target data sources corresponding to the first aggregation strategy; the dynamic database stores data dynamically collected over the distributed clusters. Therefore, when the target collection strategy is the first collection strategy, the target collection number can be screened from the preset database and the dynamic database respectively according to the time parameter.

In one embodiment, if the time parameter only includes the target time period, the server only uses the dynamic database as the target data source corresponding to the first aggregation policy.

The dynamic database stores data dynamically acquired on the distributed cluster in a target time period, namely, when the target time period changes, the screened target collection data are different. Because the dynamic database adopts a distributed cluster form, the dynamic database can be ensured to realize load balance and can bear larger data access amount.

In one embodiment, determining a target data source of the plurality of data sources corresponding to the target aggregation policy comprises: and taking the preset database as a target data source corresponding to the second aggregation strategy.

Specifically, when an enterprise generally manages business data of a user, a management manner of T +1 is generally adopted, that is, after today's business data needs to be stored in a preset database, the business data can be read from the preset database the next day, and then data aggregation processing can be performed.

In the embodiment, the target data sources corresponding to the target collection strategy are screened out from different data sources, so that the accuracy of acquiring the target collection data is ensured, and meanwhile, the first collection strategy comprises the data dynamically acquired from the dynamic database, so that the subsequent collection processing can be still performed when the data change in real time.

In one embodiment, when the server determines the target data source, a plurality of documents associated with the time parameter are screened from the target data source, and information in each document is analyzed respectively to obtain target collection data corresponding to each document.

And step 208, collecting the target collected data to obtain a data collection result.

The data processing also means data summarization, that is, various business documents or financial documents and the like are summarized into an enterprise designated account, and a statistical calculation process is performed on various business data included in the business documents.

Specifically, the server analyzes the target collection data to obtain an account identifier and at least one item category field in the target collection data, wherein the account identifier is used for judging whether a plurality of target collection data are collected to the same account; the item category field is used for judging how the multiple target collection data are classified for statistical calculation. For example, the item category field may be: the recharging amount 50, the service fee 10 and the like, and 50 is field data in a recharging amount field. And the server collects the target collection data according to the account identification, namely collects the target collection data with the same account identification to the same account. And for each target collection data under the same account, performing statistical calculation on the field data in the item type field according to the item type field, namely performing statistical calculation on the field data with the recharge amount field, performing statistical calculation on the field data with the service charge field and the like. And when the target collection data included in each account is subjected to the statistical calculation of the field data in the project category field, obtaining a data collection result.

In the data collection method, when a data collection request is obtained, request parameters of the data collection request are determined by responding to the data collection request, wherein the request parameters comprise time parameters; further, according to the time parameter, a target collection strategy corresponding to the data collection request can be determined; and determining a target data source corresponding to the target collection strategy in the plurality of data sources, and screening target collection data corresponding to the request parameters from the target data source, so that the target collection data can be collected to obtain a data collection result. Because the target data source is determined based on the target collection strategy after the target collection strategy is determined, compared with the traditional mode that only a database of pre-stored data is used as the data source, the target collection data corresponding to the request parameters can be accurately screened from the target data source, and the accuracy of data collection is improved; meanwhile, different target collection strategies can be determined according to time parameters, so that the problem of insufficient flexibility of data collection is solved.

In one embodiment, the data aggregation method is performed by a data aggregation component; the data aggregation component comprises a storage module configured with a dynamic database, as shown in fig. 3, and screens out target aggregation data corresponding to the request parameters from the target data source, and comprises the following steps:

step 302, when the target aggregation policy is the first aggregation policy, acquiring a target association relationship between a pre-configured master node and the request type.

The request type is a type corresponding to the data collection request and a type corresponding to a code file required by executing collection processing; the storage module comprises at least one main node and at least one slave node managed by each main node, each main node is responsible for cluster management and can realize high availability of a framework, and a plurality of slave nodes are responsible for data storage and providing data read-write service. As shown in fig. 4, fig. 4 is a schematic diagram of a memory module.

The target association relation needs a user to trigger a code file associated with the task type, and then the main node and the task type in the storage module can be configured. Different task types can be configured in association with the same main node or different main nodes. Where the task type is a type associated with the request parameter, for example, when the request parameter corresponds to a data aggregation request, the data aggregation request corresponds to an aggregation task type. Because the storage module is configured with the dynamic database, the server can filter the target collection data corresponding to the request parameters from the dynamic database based on the storage module when the target collection policy is the first collection policy.

In one embodiment, when the time parameter includes a target time period and a historical time period, the server filters target collection data corresponding to the request parameter from the dynamic database based on the storage module, and simultaneously filters target collection data corresponding to the request parameter from the preset database.

In one embodiment, the storage module may be a columnar storage distributed database, such as a kudu database. The Master Master node in the kudu database is responsible for managing metadata, namely basic information of a plurality of table segments, and the table segments are equivalent to slave nodes and used for supporting partition viewing, capacity expansion and high data availability. The kudu database can provide low-delay random reading and writing and efficient data analysis capability at the same time.

And 304, screening out a target main node corresponding to the data collection request from at least one main node of the storage module according to the target association relationship.

Specifically, referring to fig. 4, after determining the target association relationship, the server may directly screen the multiple host nodes of the storage module to obtain a target host node corresponding to the data aggregation task, that is, a target host node corresponding to the data aggregation request.

In one embodiment, the user may configure the target association relationship between the master node and the task type in advance.

And step 306, screening out target slave nodes from at least one slave node managed by the target master node according to the time parameter, and reading target collection data from the target slave nodes.

The column storage distributed database stores data in a form of a table, wherein the table is partitioned horizontally, that is, each table is divided into a plurality of sections, that is, table sections, and each table section is a slave node.

Specifically, since different slave nodes store target collection data for different time periods, when the server determines the time parameter, the target slave nodes associated with the time parameter can be screened from the plurality of slave nodes, and the target collection data is read from the target slave nodes.

In one embodiment, the time parameter and the target collection data may be a mapping relationship, for example, a key-value key value pair format is adopted for mapping, and the mapping relationship is stored in the slave node, where the key is a keyword corresponding to the time parameter, and the value may be a character, an array, a complex object, and the like, that is, the target collection data corresponding to the time parameter.

In this embodiment, the storage module may screen out the target master node corresponding to the data aggregation request first in the first aggregation policy, so that after the target master node is determined, the target aggregation data is accurately read from the target slave node managed by the target master node, and the efficiency and accuracy of reading the target aggregation data are improved.

In one embodiment, the collecting the target collected data to obtain a data collecting result includes: triggering the computing module according to the request type in the request parameters, and screening out a target code file required by executing the collection processing from the code file set; initializing a collection data table in a preset database to obtain a temporary data table; filling the target collection data into a temporary data table to obtain a target data table; and running the codes indicated by the target code file to perform collection processing on the target data table to obtain a data collection result.

The data collection component further comprises a calculation module and a preset database, the calculation module can be a distributed parallel calculation engine, such as a spark calculation engine, an impala calculation engine and the like, the preset database comprises a collection data table, the collection data table is an array defined in advance by a user, and the data collection result after the last collection processing is stored in the array.

Specifically, the server triggers the computing module according to the request type in the request parameter, and screens out the target code file required for executing the collection processing from the code file set, that is, the related class file in the spark computing engine, which is related to the data collection request, so that when the request parameter is different, the target code files screened out after triggering the computing module are also different. The server initializes the collection data table in the preset database to obtain a temporary data table, namely deletes the data collection result stored in the collection data table after the last collection processing, namely deletes the historical collection list, and creates a new temporary data table. And the server fills the target collection data into the temporary data table to obtain a target data table. The filling may be performed according to a time sequence of obtaining the target collection data, or may be performed according to an account identifier in the target collection data, and the like, which is not limited herein. And when the target data table is obtained, the server can operate the codes indicated by the target code file, and the target data table is subjected to collection processing to obtain a data collection result.

In one embodiment, a user writes an application program calling the spark API through the spark calculation engine, that is, an object code file required when performing the aggregation process, and determines an entry of the application program through a main function method defined by the user.

In one embodiment, the user may set the database identifier, the database user name, the database password, and the database name of the preset database in advance.

In one embodiment, the data aggregation result obtained after the target data table is subjected to the aggregation processing is a new aggregation data table, and the new aggregation data table is stored in the preset database. When the data collection processing is carried out next time, the step of initializing the collection data table in the preset database is returned again, and the step of obtaining the temporary data table is continued.

In this embodiment, by triggering the calculation module, the target code file required for executing the collection processing may be screened out, and then the target data table may be obtained based on the target code file and the collection data table in the preset database, and the collection processing may be performed on the target collection data, thereby realizing the calculation efficiency of real-time calculation on mass data.

In one embodiment, the data aggregation component comprises a computing module and a storage module running on a distributed cluster; the computing module and the storage module carry out association configuration through a code file; as shown in fig. 5, the configuration process of the code file association task includes the following steps:

step 502, by the computing module, when the code file association task is obtained, determining the operating environment and the code file corresponding to the code file association task, and sending the code file to the storage module.

The code file association task is a task for associating the computing module and the storage module through the code file. Fig. 6 is a schematic diagram illustrating a principle of configuring a task associated with a code file, where a computing module includes a driver and a resource manager, a storage module includes a plurality of working nodes, and a context module included in the driver is a main entry point for all functions of the computing module, and is capable of directly interacting with the resource manager, applying for a resource from the resource manager, and sending the relevant code file to the working node in the storage module.

Specifically, when a code file association task is acquired, a driver in the computing module runs an application program written by a user through a spark computing engine, wherein the application program is associated with the code file, a context module is further created, and the code file corresponding to the code file association task is sent to the storage module, wherein the context module is created to prepare an operating environment of the application program.

Step 504, determining, by the computing module, the resource required for executing the code file according to the operating environment, and sending the resource to the storage module.

The resources can be CUP resources, memory resources and the like, and the resource manager is mainly used for managing the resources applied by the application program. The created context module can communicate with the resource manager, and the resource manager allocates resources required by the application program when executing the code file and sends the resources to the storage module.

Step 506, determining a master node corresponding to the resource from the plurality of master nodes and determining a task type of the code file association task through the storage module, and associating the task type with the master node to obtain a target association relationship.

Wherein the task type is associated with a corresponding request type when the associated request is executed through the code file. For example, when the task type of the code file association task is a data collection task, the corresponding request type when the code file executes the relevant request is a data collection request. The working nodes in the storage module comprise a plurality of main nodes and a plurality of slave nodes managed by each main node. Referring to fig. 6, the master node corresponds to an executor in a worker node, and the slave node corresponds to a task module managed by the executor. And the different working nodes can also realize the mutual access of data.

Specifically, the master nodes corresponding to the resources may be screened according to the actual resource amounts respectively corresponding to the plurality of master nodes in the storage module, the master node with the largest actual resource amount may be selected, for example, the actual resource amount may be compared with the resources sent by the calculation module, the master node with the resource larger than the resources sent by the calculation module is taken as a candidate master node, and the master node with the smallest actual resource amount in the candidate master nodes is taken as the target master node. And associating the task type of the code file association task with the main node to obtain a target association relation, namely, different task types can be respectively associated with different main nodes.

And step 508, acquiring the code file sent by the computing module through the storage module, and configuring at least one slave node in the master node according to the code file.

When the target association relation is obtained, the slave nodes managed by the master node can be configured according to the code file, namely the task modules managed by the executor are configured. Therefore, after different task types are associated with the same main node, a plurality of slave nodes can be configured under the same main node according to different code files. The executor is a process started for the code file association task on the working node, can run the code file association task, stores the code file corresponding to the code file association task in a memory or a disk storage, and can return a data collection result obtained after collection processing is executed through the code file to the computing module.

In one embodiment, the driver in the computing module is further configured to create at least one task and send the plurality of tasks to the task module in the executor, such that scheduling of different tasks is coordinated among the processes of the plurality of executors.

In one embodiment, when the execution of the executor is finished, a driver in the computing module is responsible for closing the context module.

In this embodiment, the association between the storage module and the computing module is realized by configuring the code file in advance, that is, the configuration of the data collection component is completed, so that when the collection processing is executed and the target collection data is required, the data can be directly obtained according to the target association relationship, and the accuracy of subsequent data collection is improved.

In one embodiment, the target collection data includes an account identification and at least one item category field; the item category field includes field data; collecting the target collection data to obtain a data collection result, wherein the data collection result comprises the following steps: determining account identification and at least one item category field corresponding to each target collection data; grouping the target collection data according to the account identification to obtain at least one group of target collection data sets; the target collection dataset comprises at least one target collection data; for each group of target collection data sets in the multiple groups of target collection data sets, respectively superposing field data with the same item type field in the current target collection data set to obtain a collection sub-result corresponding to the current target collection data set; and integrating the respective corresponding collection sub-results of the target collection data sets to obtain a data collection result.

Specifically, the server performs field identification on each target collection data, and identifies an account identifier and at least one item category field in the target collection data, for example, the account identifier of the target collection data 1 is identified as 001, and the item category field includes a recharge amount 60 and a cash withdrawal amount 10; the account identification of the target collection data 2 is 001, the item category field comprises a recharge amount 40 and an operation service fee 20; the account identification of the target collection data 3 is 002, and the item category field includes a charge amount 20, a basic service fee 20, and the like. The server groups the target collection data with the same account identification to obtain at least one group of target collection data, namely, the target collection data 1 and the target collection data 2 are divided into one group, and the target collection data 3 is divided into one group. And the server respectively superposes the field data with the same item type field in the same target collection data set to obtain a collection sub-result corresponding to the target collection data set. For example, the recharge amount 50 in the target collection data 1 and the recharge amount 10 in the target collection data 2 are overlapped, and since the target collection data 2 does not include the withdrawal amount field, the withdrawal amount 10 in the target collection data 1 is used as the collection sub-result of the withdrawal amount field, and thus, the corresponding collection sub-result of the target collection data set is: the account identification is 001, the charging amount is 100, the withdrawal amount is 10 and the operating service fee is 20.

In the embodiment, after the field identification is accurately carried out on the target collection data, the target collection data with the same account identification are grouped, so that the efficiency and the accuracy of the subsequent calculation of the field data in the fields of different project types are improved.

In one of the embodiments, the method further comprises; displaying a data collection interface through a terminal; displaying a time area and a collection control in the data collection interface; responding to the selection operation aiming at the time region through the terminal, and determining the time parameter selected by the selection operation; responding to the triggering operation aiming at the collection control by the terminal, and generating a data collection request according to the time parameter; and acquiring a data collection result obtained in response to the data collection request through the terminal, and displaying the data collection result.

The terminal and the server can be independent of each other, and can also be a terminal comprising the server.

Specifically, a data aggregation interface is displayed in the terminal, and as shown in fig. 7 as an example, a time region 701 and an aggregation control 702 are displayed in the data aggregation interface, wherein after the terminal responds to a selection operation of a user for the time region, a time parameter selected by the selection operation can be determined. And after the time parameter is generated in the terminal, the terminal responds to the triggering operation of the user for the collection control to generate a corresponding data collection request. The terminal sends the data collection request to the server and receives a data collection result sent by the server after responding to the data collection request, so that the terminal displays the received data collection result in a data collection interface. For example, referring to the above example, the data may be grouped into a grouping sub-result in the result: the account identifier is 001, the recharge amount is 100, the withdrawal amount is 10, and the operation service fee is 20 displayed in the data collection interface.

In the embodiment, the terminal provides the data aggregation interface capable of generating the data aggregation request for the user, so that the user can conveniently select and operate different time parameters, and the user can quickly inquire the data aggregation condition in real time.

In one embodiment, the data aggregation result comprises a plurality of pieces of aggregated data; the aggregated data includes account identification and project category data; a historical collection list is displayed in the data collection interface and comprises data information and header information; displaying a data collection result, comprising: when a data aggregation result is received, deleting data information in the historical aggregation list to obtain an empty list only comprising header information; determining a target collection area of the collected data in the vacant list according to the account identification in the collected data; and filling the item category data in the collected data into the target collection area.

The historical collection list is a data collection result formed by data to be collected after the last query operation of the user, namely the terminal responds to the last trigger operation of the user on the collection control; the data collection result of the data to be collected is displayed in a historical collection list in a collected data form.

Specifically, as shown in fig. 7, the history collection list includes header information 703 and data information 704, where the header information 703 includes an account name and item categories, and the item categories include a recharge amount, a cash-out amount, an operation service fee, a basic service fee, and the like; the data information 704 includes at least one piece of collected data, each piece of collected data includes an account identifier corresponding to an account name in the header information and at least one item category data corresponding to an item category in the header information. Wherein the item category data included in the different aggregated data is not necessarily the same.

And when the terminal receives the data collection result sent by the server, deleting the data information in the history collection list, only keeping the header information in the history collection list, and changing the history collection list into an empty list at the moment. And aiming at each piece of collected data included in the data collection result, determining a target collection area of the collected data in the vacant list according to the account identification in the collected data. And for at least one item category data included in each piece of collected data, the terminal fills each item category data in the collected data into a corresponding item category in the target collection area according to the header information in the empty list. For example, referring to fig. 7, when the account id in the already-collected data 1 is 001, the target collection area in the empty list is the first line area, and the charge amount data in the already-collected data 1 is 100, so that the charge amount data 100 is filled under the charge amount in the first line area.

In one embodiment, the item categories in the header information may be added according to actual situations, such as adding change amount, change use discount, service fee use discount, platform use fee, and the like, and the application is not limited herein for the item categories that can be added.

In this embodiment, when receiving different data aggregation results, the terminal may dynamically update and display the aggregated data in the data aggregation interface, so as to achieve the effect of intuitively and efficiently updating and processing the aggregated data.

In one embodiment, as shown in fig. 8, fig. 8 is a flow chart illustrating a data aggregation method in another embodiment. The method comprises the following steps: s801: the server acquires a history aggregation list; s802: the server responds to the triggering operation of the user for the collection control; s803: triggering a related class file which is in the spark calculation engine and is requested by the data collection; s804: initializing a collection data table in a preset database, deleting a historical collection list, and creating a temporary data table; s805: reading target collection data from a kudu database; s806: filling the target collection data into a temporary data table to obtain a target data table, and performing collection processing on the target data table to obtain a data collection result; s807: the server rewrites the data collection result into a preset database; s808: and the server responds to the refreshing operation of the user on the data collection interface, displays the data collection result, and returns to the step S801, namely, the step S801 is continuously executed when the data collection processing is required to be carried out again.

It should be understood that, although the steps in the flowcharts related to the above embodiments are shown in sequence as indicated by the arrows, the steps are not necessarily performed in sequence as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a part of the steps in the flowcharts related to the above embodiments may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of performing the steps or stages is not necessarily sequential, but may be performed alternately or alternately with other steps or at least a part of the steps or stages in other steps.

Based on the same inventive concept, the embodiment of the application also provides a data aggregation device for realizing the data aggregation method. The implementation scheme for solving the problem provided by the apparatus is similar to the implementation scheme described in the above method, so specific limitations in one or more embodiments of the data aggregation apparatus provided below may refer to the limitations on the data aggregation method in the foregoing, and details are not described here.

In one embodiment, as shown in fig. 9, there is provided a data aggregation apparatus 900, including: a request acquisition module 902, a policy determination module 904, and a result determination module 906, wherein:

a request obtaining module 902, configured to, when a data aggregation request is obtained, determine a request parameter of the data aggregation request in response to the data aggregation request; the request parameter comprises a time parameter;

a policy determining module 904, configured to determine, according to the time parameter, a target aggregation policy corresponding to the data aggregation request;

a result determining module 906, configured to determine a target data source corresponding to the target aggregation policy in the plurality of data sources, and screen out target aggregation data corresponding to the request parameter from the target data source; and collecting the target collection data to obtain a data collection result.

In one embodiment, the policy determination module 904 includes a first policy module 9041 for obtaining a target time period; when the time parameter comprises a target time period, determining that the target collection strategy is a first collection strategy; the first collection strategy is a strategy for dynamically collecting data in a target time period; determining a target data source of the plurality of data sources corresponding to the target aggregation policy, comprising: taking a preset database and a dynamic database as target data sources corresponding to the first aggregation strategy; the dynamic database stores data dynamically collected over the distributed clusters.

In one embodiment, the policy determination module 904 includes a second policy module 9042 for obtaining a target time period; when the time parameter does not comprise the target time period, determining that the target collection strategy is a second collection strategy; the second collection strategy is a strategy for collecting data in a non-target time period; determining a target data source of the plurality of data sources corresponding to the target aggregation policy, comprising: and taking the preset database as a target data source corresponding to the second aggregation strategy.

In one embodiment, the result determining module 906 includes a storing module 9061, configured to obtain a target association relationship between a preconfigured master node and a request type when the target aggregation policy is the first aggregation policy; screening out a target main node corresponding to the data collection request from at least one main node of the storage module according to the target association relation; and screening target slave nodes from at least one slave node managed by the target master node according to the time parameter, and reading target collection data from the target slave nodes.

In one embodiment, the result determining module 906 further includes a calculating module 9062, configured to trigger the calculating module according to a request type of the request parameter, and screen out, from the code file set, a target code file required when performing the collection processing; initializing a collection data table in a preset database to obtain a temporary data table; filling the target collection data into a temporary data table to obtain a target data table; and running codes indicated by the target code file to carry out collection processing on the target data table to obtain a data collection result.

In one embodiment, the result determining module 906 further includes a field processing module 9063, configured to determine an account identifier and at least one item category field corresponding to each target collection data; grouping the target collection data according to the account identifications to obtain at least one target collection data corresponding to each account identification; for each account identifier in the plurality of account identifiers, determining a plurality of target item category fields with the same item category from at least one target collection data corresponding to the current account identifier; respectively superposing the data in each target project category field to obtain a collection sub-result corresponding to the current account identification; and integrating the collection sub-results corresponding to the account identifications to obtain a data collection result.

In one embodiment, the data aggregation apparatus 900 includes a configuration module 908, configured to determine, through the computing module, when the code file association task is obtained, a running environment and a code file corresponding to the code file association task, and send the code file to the storage module; determining resources required by executing the code file according to the operating environment through a computing module, and sending the resources to a storage module; determining a main node corresponding to the resource from a plurality of main nodes and determining a task type of a code file association task through a storage module, and associating the task type with the main node to obtain a target association relation; and acquiring the code file sent by the computing module through the storage module, and configuring at least one slave node in the master node according to the code file.

In one embodiment, the data collecting apparatus 900 further includes a terminal displaying module 910, configured to display the data collecting interface through a terminal; displaying a time area and a collection control in the data collection interface; responding to selection operation aiming at the time region through the terminal, and determining a time parameter selected by the selection operation; responding to the triggering operation aiming at the collection control by the terminal, and generating a data collection request according to the time parameter; and acquiring a data collection result obtained in response to the data collection request through the terminal, and displaying the data collection result.

In one embodiment, the terminal displaying module 910 is further configured to delete the data information in the history aggregation list when the data aggregation result is received, so as to obtain an empty list only including the header information; determining a target collection area of the collected data in the vacant list according to the account identification in the collected data; and filling the item category data in the collected data into the target collection area.

The modules in the data grouping device can be wholly or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent of a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a terminal, and its internal structure diagram may be as shown in fig. 10. The computer device includes a processor, a memory, an Input/Output (I/O) interface, a communication interface, a display unit, and an Input apparatus. The processor, the memory and the input/output interface are connected by a system bus, and the communication interface, the display unit and the input device are connected by the input/output interface to the system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operating system and the computer program to run on the non-volatile storage medium. The input/output interface of the computer device is used for exchanging information between the processor and an external device. The communication interface of the computer device is used for communicating with an external terminal in a wired or wireless manner, and the wireless manner can be realized through WIFI, a mobile cellular network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement a data aggregation method. The display unit of the computer equipment is used for forming a visual picture, and can be a display screen, a projection device or a virtual reality imaging device, the display screen can be a liquid crystal display screen or an electronic ink display screen, the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on a shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.

Those skilled in the art will appreciate that the architecture shown in fig. 10 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is further provided, which includes a memory and a processor, the memory stores a computer program, and the processor implements the steps of the above method embodiments when executing the computer program.

In an embodiment, a computer-readable storage medium is provided, in which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method embodiments.

In one embodiment, a computer program product or computer program is provided that includes computer instructions stored in a computer-readable storage medium. The computer instructions are read by a processor of a computer device from a computer-readable storage medium, and the computer instructions are executed by the processor to cause the computer device to perform the steps in the above-mentioned method embodiments.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above may be implemented by hardware instructions of a computer program, and the computer program may be stored in a non-volatile computer-readable storage medium, and when executed, may include the processes of the embodiments of the methods described above. Any reference to memory, databases, or other media used in the embodiments provided herein can include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high-density embedded nonvolatile Memory, resistive Random Access Memory (ReRAM), Magnetic Random Access Memory (MRAM), Ferroelectric Random Access Memory (FRAM), Phase Change Memory (PCM), graphene Memory, and the like. Volatile Memory can include Random Access Memory (RAM), external cache Memory, and the like. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others. The databases referred to in various embodiments provided herein may include at least one of relational and non-relational databases. The non-relational database may include, but is not limited to, a block chain based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic devices, quantum computing based data processing logic devices, etc., without limitation.

All possible combinations of the technical features in the above embodiments may not be described for the sake of brevity, but should be considered as being within the scope of the present disclosure as long as there is no contradiction between the combinations of the technical features.

The above examples only express several embodiments of the present application, and the description thereof is more specific and detailed, but not to be construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present application should be subject to the appended claims.

Claims

1. A method of data aggregation, the method comprising:

and collecting the target collection data to obtain a data collection result.

2. The method of claim 1, wherein determining the target aggregation policy corresponding to the data aggregation request according to the time parameter comprises:

acquiring a target time period;

when the time parameter comprises the target time period, determining that the target collection strategy is a first collection strategy; the first collection strategy is a strategy for dynamically collecting the data of the target time period;

the determining a target data source of a plurality of data sources corresponding to the target aggregation policy comprises:

taking a preset database and a dynamic database as target data sources corresponding to the first collection strategy; the dynamic database stores data dynamically collected over distributed clusters.

3. The method of claim 1, wherein determining the target aggregation policy corresponding to the data aggregation request according to the time parameter comprises:

acquiring a target time period;

when the time parameter does not comprise the target time period, determining that the target collection strategy is a second collection strategy; the second collection strategy is a strategy for collecting data in a non-target time period;

and taking a preset database as a target data source corresponding to the second collection strategy.

4. The method of claim 1, wherein the data aggregation method is performed by a data aggregation component; the data aggregation component comprises a storage module configured with a dynamic database; the storage module comprises at least one main node and at least one slave node managed by each main node; the screening out the target collection data corresponding to the request parameters from the target data source comprises the following steps:

when the target collection strategy is a first collection strategy, acquiring a target association relation between a pre-configured main node and a request type;

screening out a target main node corresponding to the data collection request from at least one main node of the storage module according to the target association relation;

and screening target slave nodes from at least one slave node managed by the target master node according to the time parameter, and reading target collection data from the target slave nodes.

5. The method of claim 1, wherein the data aggregation method is performed by a data aggregation component; the data collection component comprises a calculation module and a preset database; the preset database comprises a collection data table; the collecting processing of the target collection data to obtain a data collection result includes:

triggering the computing module according to the request type of the request parameter, and screening out a target code file required by executing the collection processing from the code file set;

initializing a collection data table in the preset database to obtain a temporary data table;

filling the target collection data into the temporary data table to obtain a target data table;

and running the codes indicated by the target code file to carry out collection processing on the target data table to obtain a data collection result.

6. The method of claim 1, wherein the target aggregation data comprises an account identification and at least one item category field; the item category field comprises field data; the collecting processing of the target collection data to obtain a data collection result comprises the following steps:

determining an account identifier and at least one item category field corresponding to each target set data;

grouping the target collection data according to the account identification to obtain at least one group of target collection data sets; the target collection dataset comprises at least one target collection dataset;

for each group of target collection data sets in the multiple groups of target collection data sets, respectively superposing field data with the same item type field in the current target collection data set to obtain a collection sub-result corresponding to the current target collection data set;

and integrating the corresponding collection sub-results of each target collection data set to obtain a data collection result.

7. The method of claim 1, wherein the data aggregation method is performed by a data aggregation component; the computing module and the storage module which are comprised by the data aggregation component run on the distributed cluster; the storage module comprises a plurality of main nodes and a plurality of slave nodes; the computing module and the storage module carry out association configuration through a code file; the association process of the code file comprises the following steps:

through the computing module, when a code file association task is obtained, determining an operating environment and a code file corresponding to the code file association task, and sending the code file to the storage module;

determining resources required by executing the code file according to the running environment through the computing module, and sending the resources to the storage module;

determining a main node corresponding to the resource from a plurality of main nodes and determining a task type of the code file association task through the storage module, and associating the task type with the main node to obtain a target association relationship; the task type is associated with a corresponding request type when the related request is executed through the code file;

and acquiring the code file sent by the computing module through the storage module, and configuring at least one slave node in the master node according to the code file.

8. The method of claim 1, further comprising;

displaying a data collection interface through a terminal; a time area and an aggregation control are displayed in the data aggregation interface;

responding to selection operation aiming at the time region through the terminal, and determining a time parameter selected by the selection operation;

responding to the triggering operation aiming at the collection control by the terminal, and generating a data collection request according to the time parameter;

and acquiring a data collection result obtained in response to the data collection request through the terminal, and displaying the data collection result.

9. The method of claim 8, wherein the data-aggregation result comprises a plurality of pieces of aggregated data; the aggregated data comprises account identification and item category data; a historical collection list is displayed in the data collection interface and comprises data information and header information; the displaying the data aggregation result comprises:

when a data aggregation result is received, deleting the data information in the history aggregation list to obtain an empty list only comprising the header information;

determining a target collection area of the collected data in the vacant list according to the account identification in the collected data;

and filling the item category data in the collected data into a target collection area.

10. A data-aggregation device, the device comprising:

11. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor realizes the steps of the method of any one of claims 1 to 9 when executing the computer program.

12. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 9.