CN114238286B - Data warehouse data processing method and device, electronic equipment and storage medium - Google Patents

Data warehouse data processing method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN114238286B
CN114238286B CN202210184591.7A CN202210184591A CN114238286B CN 114238286 B CN114238286 B CN 114238286B CN 202210184591 A CN202210184591 A CN 202210184591A CN 114238286 B CN114238286 B CN 114238286B
Authority
CN
China
Prior art keywords
data
historical
task
history
description information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210184591.7A
Other languages
Chinese (zh)
Other versions
CN114238286A (en
Inventor
林晶晶
甘红伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Lianlian Hangzhou Information Technology Co ltd
Original Assignee
Lianlian Hangzhou Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lianlian Hangzhou Information Technology Co ltd filed Critical Lianlian Hangzhou Information Technology Co ltd
Priority to CN202210184591.7A priority Critical patent/CN114238286B/en
Publication of CN114238286A publication Critical patent/CN114238286A/en
Application granted granted Critical
Publication of CN114238286B publication Critical patent/CN114238286B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/217Database tuning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiment of the application provides a data warehouse data processing method, a data warehouse data processing device, electronic equipment and a storage medium, wherein the method comprises the following steps: receiving a task request; the task request carries a task identifier; analyzing the target task based on the task identifier; the target task comprises a current execution file; the current execution file comprises execution description information and a data processing statement; determining data to be processed from the current data set based on the execution description information; determining a current data set based on a plurality of historical data sets; processing the data to be processed according to the data processing statement to obtain target data; and sending the target data. By the data warehouse data processing method, data processing can be performed based on the optimized processing task, and processing efficiency of the data warehouse task is improved.

Description

Data warehouse data processing method and device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a data processing method, apparatus, system and storage medium for a data warehouse.
Background
With the advent of the big data era, data grows explosively, more and more services need to analyze data by means of big data capacity, warehouse counting tasks are more and more, and links are more and more complex. A large amount of redundant data is stored in the data warehouse, and a large amount of data is often required to be called for a data processing task, and the demand for the data processing efficiency of the data warehouse is gradually increased while the data processing service needs to be increased.
The existing data warehouse model generally repeatedly calls data from the same data source or repeatedly processes the same data, even under the condition that a plurality of tasks repeatedly call data from a large number of data sources, the data warehouse model has low task processing efficiency and occupies a large number of links. Both the data sources and the corresponding processing tasks of the data warehouse need to be optimized to improve the processing efficiency of the data warehouse tasks.
Disclosure of Invention
In view of the defects in the prior art, embodiments of the present disclosure provide a data warehouse data processing method, apparatus, system, and storage medium, which can perform data processing based on an optimized processing task, thereby improving the processing efficiency of the data warehouse task.
The embodiment of the application provides a data warehouse data processing method, which comprises the following steps: receiving a task request; the task request carries a task identifier; analyzing the target task based on the task identifier; the target task comprises a current execution file; the current execution file comprises execution description information and a data processing statement; determining data to be processed from the current data set based on the execution description information; determining a current data set based on a plurality of historical data sets; processing the data to be processed according to the data processing statement to obtain target data; and sending the target data.
Specifically, determining the data to be processed from the current data set based on the execution description information includes: acquiring a current data set identifier and a to-be-processed data identifier from the execution description information; determining a current data set from the data warehouse based on the current data set identifier; and determining the data to be processed from the current data set based on the data to be processed identifier.
Specifically, before reading the task corresponding to the task identifier based on the task identifier carried in the task request if the task request is detected, the method further includes: acquiring a historical task set; each historical task in the historical task set comprises a historical execution file; the history execution file comprises history description information and history processing statements; if history description information and history processing statements in a plurality of history tasks in the history task set meet preset conditions, determining a history data set identifier from the history description information in the plurality of history tasks; determining a plurality of historical data sets based on the historical data set identification; a current data set is generated based on the plurality of historical data sets and the historical processing statement.
Specifically, the method further comprises: generating a data processing statement corresponding to the current data set based on the historical processing statement; taking the history description information as the execution description information; generating a current execution file based on the execution description information and the data processing statement; and determining the target task of the associated task identifier according to the current execution file.
Specifically, if history description information and history processing statements in a plurality of history tasks in the history task set meet preset conditions, determining a history data set identifier from the history description information in the plurality of history tasks includes: if a plurality of historical tasks exist in the historical task set and the historical data set identifications contained in the historical description information of each historical task in the plurality of historical tasks are the same, acquiring the historical processing statement of each historical task; and if the history processing statements of each history task are the same, determining the historical data set identifier from the history description information in the plurality of history tasks.
Specifically, the historical task is associated with the task identification, and the priority of the historical task is lower than that of the target task.
Specifically, analyzing the target task based on the task identifier includes: determining a related target task and a history task based on the task identifier; and if the number of times of the target task being analyzed is less than or equal to a preset threshold value within the preset time length, analyzing the target task based on the task identification.
Correspondingly, the embodiment of the present application provides a data warehouse data processing apparatus, and the apparatus includes: the receiving module is used for receiving the task request; the task request carries a task identifier; the analysis module is used for analyzing the target task based on the task identifier; the target task comprises a current execution file; the current execution file comprises execution description information and a data processing statement; the determining module is used for determining data to be processed from the current data set based on the execution description information; determining a current data set based on a plurality of historical data sets; the processing module is used for processing the data to be processed according to the data processing statement to obtain target data; and the sending module is used for sending the target data.
Specifically, determining the data to be processed from the current data set based on the execution description information includes: acquiring a current data set identifier and a to-be-processed data identifier from the execution description information; determining a current data set from the data warehouse based on the current data set identifier; and determining the data to be processed from the current data set based on the data to be processed identifier.
Specifically, the apparatus further comprises a preprocessing module configured to: acquiring a historical task set; each historical task in the historical task set comprises a historical execution file; the history execution file comprises history description information and history processing statements; if history description information and history processing statements in a plurality of history tasks in the history task set meet preset conditions, determining a history data set identifier from the history description information in the plurality of history tasks; determining a plurality of historical data sets based on the historical data set identifications; a current data set is generated based on the plurality of historical data sets and the historical processing statement.
Specifically, the preprocessing module is further configured to: generating a data processing statement corresponding to the current data set based on the historical processing statement; taking the history description information as the execution description information; generating a current execution file based on the execution description information and the data processing statement; and determining the target task of the associated task identifier according to the current execution file.
Specifically, if history description information and history processing statements in a plurality of history tasks in the history task set meet preset conditions, determining a history data set identifier from the history description information in the plurality of history tasks includes: if a plurality of historical tasks exist in the historical task set and historical data set identifications contained in historical description information of each historical task in the plurality of historical tasks are the same, acquiring a historical processing statement of each historical task; and if the history processing statements of each history task are the same, determining the historical data set identifier from the history description information in the plurality of history tasks.
Specifically, the historical task is associated with the task identification, and the priority of the historical task is lower than that of the target task.
Specifically, analyzing the target task based on the task identifier includes: determining a related target task and a history task based on the task identifier; and if the number of times of the target task being analyzed is less than or equal to a preset threshold value within the preset time length, analyzing the target task based on the task identification.
Accordingly, an embodiment of the present disclosure provides an electronic device, which includes a processor and a memory, where at least one instruction, at least one program, a code set, or an instruction set is stored in the memory, and the at least one instruction, the at least one program, the code set, or the instruction set is loaded and executed by the processor to implement the data warehouse data processing method.
Accordingly, embodiments of the present disclosure provide a computer-readable storage medium having at least one instruction, at least one program, a set of codes, or a set of instructions stored therein, which is loaded and executed by a processor to implement the data warehouse data processing method described above.
The embodiment of the application has the following beneficial effects:
(1) the data set obtained by preprocessing is called, so that the number of data sources required to be called in the task execution process is reduced, and the occupation of a link is reduced;
(2) the processing efficiency of the data warehouse is improved by executing simplified data processing statements;
(3) the analysis target task or the historical task is determined according to the number of times that the target task is analyzed, the existing data can be utilized to the maximum degree under the condition that the processing amount of the data warehouse task is large, the task processing is carried out through a plurality of data processing paths, and the maximization of the processing efficiency is achieved.
Drawings
In order to more clearly illustrate the technical solutions and advantages of the embodiments of the present application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a schematic application scenario diagram of a data warehouse data processing method provided in an embodiment of the present application;
fig. 2 is a first flowchart of a data warehouse data processing method according to an embodiment of the present application;
fig. 3 is a second flowchart of a data warehouse data processing method according to an embodiment of the present application;
fig. 4 is a third flowchart of a data warehouse data processing method according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of a data warehouse data processing apparatus according to an embodiment of the present application;
fig. 6 is a hardware block diagram of a server of a data warehouse data processing method according to an embodiment of the present application.
Detailed Description
To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings. It should be apparent that the described embodiment is only one embodiment of the present application and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
An "embodiment" as referred to herein relates to a particular feature, structure, or characteristic that may be included in at least one implementation of the present application. In the description of the embodiments of the present application, it should be understood that the terms "upper", "lower", "left", "right", "top", "bottom", and the like, indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, are only used for convenience in describing the present application and simplifying the description, and do not indicate or imply that the devices/systems or elements referred to must have a specific orientation, be constructed and operated in a specific orientation, and thus, should not be taken as limiting the present application. The terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. Moreover, the terms "first," "second," and the like are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in other sequences than described or illustrated herein. Furthermore, the terms "comprises," "comprising," and "having"/"is," and any variations thereof, are intended to cover non-exclusive inclusions, such that a process, method, system/apparatus, article, or apparatus that comprises a list of steps or elements/modules is not necessarily limited to those steps or elements/modules expressly listed, but may include other steps or elements/modules not expressly listed or inherent to such process, method, article, or apparatus.
The following describes a specific embodiment of a data warehouse data processing method provided by the present application. Referring to fig. 1, fig. 1 is a schematic view of an application scenario of data processing of a data warehouse according to an embodiment of the present application. As shown in fig. 1, includes a server 101 and a terminal 102. Alternatively, the server 101 and the terminal 102 may be connected through a wireless link or a wired link, which is not limited in this disclosure.
In an alternative embodiment, the server 101 may receive a task request and invoke and parse a target task from the data repository based on a task identification in the task request. The terminal 102 may transmit a task request to the service 101, or may receive target data transmitted by the server 101. The server 101 may also be used to manage an internal system of data processing tasks, which may initiate task requests based on a preset time or frequency to periodically analyze the data warehouse for the required target data. Specifically, the server 101 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, a middleware service, a domain name service, a security service, a CDN (Content Delivery Network), a big data and artificial intelligence platform, and the like. Alternatively, the operating system running on the server 101 may include, but is not limited to, an IOS, Linux, Windows, Unix, Android system, and the like.
In an alternative embodiment, the terminal 102 may communicate with the server 101 when there is a need for the target data, send a task request to the server, so that the server 101 parses and executes the target task based on the task request, and send the target data to the terminal 102. The target task performed may be the processing of a data table in a data warehouse. In particular, the terminal 102 may include, but is not limited to, a smart phone, a desktop computer, a tablet computer, a laptop computer, a smart speaker, a digital assistant, an Augmented Reality (AR)/Virtual Reality (VR) device, a smart wearable device, and other types of electronic devices. Optionally, the operating system running on the electronic device may include, but is not limited to, an android system, an IOS system, linux, windows, and the like.
In addition, it should be noted that fig. 1 is only one application environment of the data warehouse data processing method provided by the present disclosure, and in practical applications, other application environments may also be included, for example, the server 101 may send a task request through an internal system, receive the task request through a data processing system and execute a target task, and then send the target data to the internal system or other clients 102.
An exemplary flow of a data warehouse data processing method provided by the present application is described below. Fig. 2 is a first flowchart of a data warehouse data processing method provided in an embodiment of the present application, and the present specification provides methods or process operation steps as shown in the embodiment or the flowchart, but may include more or less operation steps based on conventional or non-inventive labor. The sequence of steps recited in the embodiments is only one of many execution sequences, and does not represent the only execution sequence, and in actual execution, the steps can be executed according to the method or the flow sequence shown in the embodiment or the figure, or executed in parallel (for example, a parallel processor or a multi-thread processing environment). Specifically, as shown in fig. 2, the method includes:
step S201: a task request is received.
Specifically, the task request may carry a task identifier. The task request may be sent to the data warehouse platform by a client or other terminal, or may be generated by an internal system of the data warehouse based on a preset time point or a preset frequency, such as periodically generating an internal statistical report.
Step S202: and analyzing the target task based on the task identifier.
In a specific embodiment, the task identifiers may correspond to the target tasks one-to-one. The target tasks may include data processing tasks in a data warehouse. The target task may contain a current execution file, wherein the current execution file may include execution description information and data processing statements. The current execution file may include one or more.
In an embodiment where the task of the target task is described as generating an annual video delivery report, the target task may include a plurality of currently executed files, and the plurality of currently executed files may be respectively used to determine a near-half-year play trend analysis trend graph, a play partition ranking, and highest play video information. In embodiments where the task description of the target task is a user popularity ranking, the target task may include a current execution file that may be used to determine the user popularity ranking.
In an embodiment where the task of the target task is described as a power utilization node with three-phase imbalance statistics, the target task may include a current execution file, and the current execution file may be used to determine a power utilization node with a three-phase imbalance degree greater than a preset threshold value among the plurality of power utilization nodes.
Step S203: and determining the data to be processed from the current data set based on the execution description information.
In particular, the execution description information may be used to indicate which data in the current data set may be acquired as pending data. For the same execution description information, executing step S203 at different points in time may acquire different data to be processed. In a particular embodiment, the execution description information may be an amount of the order trade in the current month, based on which the data to be processed acquired in different months correspond to different time periods.
Step S203 is further described below in conjunction with fig. 3. Fig. 3 is a second flowchart of a data warehouse data processing method according to an embodiment of the present application. As illustrated in fig. 3, an exemplary flow includes:
step S301: and acquiring the current data set identification and the data identification to be processed from the execution description information.
In a specific embodiment, in an embodiment where the task of the target task is described as a near-half-year play trend analysis, the current dataset identification may be a monthly play dataset, which may be presented in the form of a data table; that is, the data set identification may correspond to a monthly play data table in the data warehouse. In another specific embodiment, the current data set identifier may be play data, and the current data set identifier may correspond to a monthly play data table and a video daily play data table in the data warehouse, where the monthly play data table has a higher priority than the video daily play data table, and in the subsequent determination of the current data set based on the current data set identifier, the monthly play data table may be preferentially determined as the current data set.
In a specific embodiment, the to-be-processed data identifier may be a number of the last six months, and the to-be-processed data identifier may correspond to a sub-table name or a field name in the monthly play data table. In another specific embodiment, the to-be-processed data identifier may be a last half year, and the number of months in the current month may be 6 months, on the basis of which, based on the to-be-processed data identifier and the number of months in the current month, the number of months of data acquired from the monthly play data table may be determined to be 1 month, 2 months, 3 months, 4 months, 5 months, and 6 months, respectively.
In a specific embodiment, in an embodiment where the task of the target task is described as a statistical three-phase imbalance power utilization node, the current data set identification may correspond to a power three-phase imbalance table, and the data to be processed identification may correspond to three-phase imbalance data in the table.
Step S302: a current data set is determined from the data warehouse based on the current data set identification.
The current data set identification and one or more data sets in the data warehouse may be a corresponding relationship. Specifically, the data set identifier may be monthly play data, or an identifier number corresponding to a monthly play data table; a monthly play data table may be determined from the data warehouse as the current data set based on the data set identification.
In particular, the current data set may be determined based on a plurality of historical data sets. The data of the historical data set may be presented in the form of a historical table.
Step S303: and determining the data to be processed from the current data set based on the data to be processed identifier.
The identifier of the data to be processed may correspond to at least one word table or at least one field in the current data set. Specifically, the to-be-processed data identifier may be the number of months of the last six months, or the number of the last six months; based on the identifier to be processed, the data of the six sub-tables corresponding to approximately six months or the field corresponding to approximately six months can be taken out from the current data set, namely the monthly play data table, and taken as the data to be processed.
The data warehouse data processing method according to the embodiment of the present application is explained below with reference to fig. 2:
step S204: and processing the data to be processed according to the data processing statement to obtain target data.
Specifically, the target data may be a calculation result calculated based on the data to be processed, or an output result of inputting the data to be processed into a trained model. In another specific embodiment, the target data may also be a visualization graph or an analysis result chart obtained based on the data to be processed.
In a particular embodiment, the data processing statement may be configured to generate a monthly playback volume line graph based on monthly playback data for approximately six months. In another specific embodiment, the data processing statement may be further configured to determine a node with a three-phase imbalance degree greater than a preset threshold from the three-phase imbalance degree data of all the nodes.
Step S205: and sending the target data.
Specifically, the target data may be sent to a client that issued the task request or to a data warehouse internal system.
A data warehouse data processing method provided in an embodiment of the present application is further described below with reference to fig. 4. Fig. 4 is a third flowchart of a data warehouse data processing method according to an embodiment of the present application. Before step S201, the method may further include steps S401 to S408, and the steps S401 to S408 may be executed to implement preprocessing on the history execution file to obtain the current execution file; and preprocessing the historical data set to obtain a current data set. The method provided by the embodiment of the application may include steps S401 to S404, and steps S405 to S408, where the steps may be executed based on an execution instruction, and may also be periodically executed by a preprocessing system of the data warehouse. As may be particularly illustrated in fig. 4, an exemplary flow includes:
step S401: and acquiring a historical task set.
Specifically, each historical task in the set of historical tasks may include a historical execution file. The history execution file may include history description information and history processing statements. The history execution file may include one or more files. The history execution file may include one or more files. In an embodiment where the task of the historical task is described as generating an annual video delivery report, the target task may include a plurality of currently executing files, and the plurality of currently executing files may be used to determine a near-half-year play trend analysis trend graph, a play partition ranking, and highest play video information, respectively. In embodiments where the task descriptions of the historical tasks are ranked by user popularity, the target task may comprise a current execution file that may be used to determine the user popularity ranking.
In a particular embodiment, the historical task may be associated with a task identification, and the historical task may have a lower priority than the target task. The historical task and the current task corresponding to the same task identifier can correspond to the same task description; the history description information of the history task and the execution description information of the current task may be the same or different, and the history processing statement of the history task and the data processing statement of the current task may be different. Specifically, in step S202, after receiving the task identifier, in the process of analyzing the task corresponding to the task identifier based on the task identifier, a target task and a historical task corresponding to the task identifier may be determined based on the task identifier; and, the target task is preferentially analyzed from the target task and the historical task based on the priority of the target task being higher than the priority of the historical task. In this embodiment, step S202 may further include: determining a related target task and a history task based on the task identifier; if the number of times of target task analysis is less than or equal to a preset threshold value within a preset time length, analyzing the target task based on the task identification; and if the number of times that the target task is analyzed is greater than a preset threshold value within the preset time length, analyzing the historical task based on the task identifier.
In the embodiment of the application, the target task or the historical task is determined and analyzed according to the number of times of analyzing the target task, the existing data can be utilized to the maximum degree under the condition that the processing amount of the tasks of the data warehouse is large, the tasks are processed through a plurality of data processing paths, and the maximization of the processing efficiency is realized.
Step S402: if the historical description information and the historical processing statements in the plurality of historical tasks in the historical task set meet preset conditions, determining a historical data set identifier from the historical description information in the plurality of historical tasks.
Specifically, a plurality of historical tasks can be obtained by analyzing the historical task set, and the historical description information and the historical processing statements of the plurality of historical tasks are obtained. The set of historical tasks may correspond to data warehouse tasks, and the historical tasks may be submodules of tasks in the data warehouse tasks.
In a specific embodiment, if a plurality of historical tasks exist in the historical task set, and the historical data set identifiers contained in the historical description information of each historical task in the plurality of historical tasks are the same, the same historical data set identifier is determined as the historical data set identifier.
In another specific implementation manner, if a plurality of historical tasks exist in a historical task set and historical data set identifications contained in historical description information of each historical task in the plurality of historical tasks are the same, a historical processing statement of each historical task is obtained; and if the history processing statements of each history task are the same, determining the historical data set identifier from the history description information in the plurality of history tasks.
Specifically, the history data set identifier included in the history description information of each history task may correspond to video daily playing data, and the history processing statements of each history task may be monthly playing amounts calculated based on daily playing amounts, so that it may be determined that the history data set identifier is video daily playing data. In a particular embodiment, the historical data set identifies three-phase power data that may correspond to a plurality of nodes, and the historical processing statement may calculate a three-phase imbalance based on the three-phase power data and may determine that the historical data set identifies three-phase power data for the plurality of nodes.
In another specific implementation manner, if a plurality of historical tasks exist in a historical task set and historical description information of each historical task in the plurality of historical tasks contains the same historical data set identifier, a historical processing statement of each historical task is obtained; counting the number of history processing sentences corresponding to the same history data set identification from the history processing sentences of each history task; and if the number of the historical data sets is larger than the preset number, determining the historical data set identification by using the same historical data set identification.
Specifically, the same historical data set identifier that may be included in the historical description information of each historical task may correspond to video daily play data, the number of history processing statements that process the video daily play data may be counted, and if the number of history processing statements is greater than a preset number, it may be determined that the historical data set identifier is the video daily play data.
In particular, in the above embodiment of determining the historical data set identifier, there may be an association relationship between a plurality of historical data sets corresponding to the historical data set identifier. The association relationship may refer to the presence of the same field information in the two data sets, for example, in the embodiment where the two data sets are an order detail table and a user information table, respectively, the same field may be present in the order detail table and the user information table, and the field may be user id field information. In the case where the data set is a data table, the data or information in the columns corresponding to the same fields in both tables may be consistent.
In the embodiment of the application, the data sets or data tables which are always used in pairs in the task are determined, so that preprocessing or merging processing is performed to obtain a new table, namely the current data set, and the current data set can be called in the updated task, so that a large amount of repeated operation in the data processing process is avoided.
Step S403: a plurality of historical data sets is determined based on the historical data set identification.
The historical data set identification and one or more data sets in the data warehouse may be in a corresponding relationship. Specifically, the data set identifier may include single video daily play data of each video, and may also include an identifier number corresponding to each single video daily play data; a plurality of video day play data tables may be determined from the data warehouse as the current data set based on the data set identification.
Step S404: a current data set is generated based on the plurality of historical data sets and the historical processing statement.
Specifically, data in a plurality of historical data sets may be processed based on a historical processing statement to obtain a processing result; a current data set is generated based on the processing result.
In a specific embodiment, the plurality of historical data sets may be a plurality of video daily play data tables, and the history processing statements may include a statement for calculating a monthly play amount based on the daily play amount, and the statement is also a history processing statement corresponding to the same historical data set identifier, i.e., the single video daily play data. The data of the video daily playing data tables can be processed based on the history processing statements to obtain monthly playing data of all videos, and a monthly playing data table is generated based on the obtained monthly playing data to serve as a current data set.
In another specific embodiment, the association relationship of the plurality of historical data sets may be obtained, and the association relationship includes a primary table relationship and a secondary table relationship, that is, a primary table and a secondary table are determined from the plurality of historical data sets. The secondary table may be merged into the primary table based on multiple historical data sets, and the merged wide table may serve as the current data set.
In particular, the current data set may be prioritized over the plurality of historical data sets.
In the embodiment of the application, the current data set is obtained by preprocessing or merging the plurality of historical data sets, so that the data of the plurality of historical data sets can be prevented from being repeatedly calculated when a data warehouse task is executed, the waste of calculation power is avoided, and the data processing efficiency is increased.
Step S405: and generating a data processing statement corresponding to the current data set based on the historical processing statement.
In a specific embodiment, the current data set is a monthly play data table, and the statements corresponding to the same history data set identifier in the history processing statements may be deleted. The history processing sentences may include sentences for calculating a monthly play amount based on a daily play amount, and may also include sentences for generating a play amount line graph in units of months based on the monthly play amount; the sentence for calculating the monthly play amount based on the daily play amount may be deleted in step S405.
In another specific implementation, the current data set may be obtained by merging a plurality of historical data sets in step S404, and in this embodiment, statements corresponding to the plurality of historical data sets may be modified into statements corresponding to the current data set based on the current data set. The statements corresponding to the plurality of historical data sets may include a statement that calculates a total monthly play amount based on a daily play amount of each video in the plurality of video daily play data tables, and the data processing statement corresponding to the modified current data set may include a statement that calculates a monthly play amount based on a daily play amount of each video in the current data set.
Step S406: the history description information is regarded as the execution description information.
In a specific embodiment, the historical data set identifier in the historical description information may be playing data, and the historical pending data identifier in the historical description information may be last half year. In this embodiment, the history description information may be regarded as the execution description information. The historical data set identification may correspond to both the historical data set and the current data set, and in performing step S203, i.e., determining the data to be processed from the current data set based on the execution description information, the current data set may be read from the data warehouse based on the priority of the current data set.
In another specific embodiment, the historical data set identifier in the historical description information may be video daily playing data of each video, and the historical pending data identifier in the historical description information may be 1 month and 1 day to 6 months and 31 days. In this embodiment, the historical description information may be updated based on the current data set, resulting in the execution description information. The history description information can be updated to monthly play data as the current description information of the execution description information based on the current data set as a monthly play data table; and updating the historical to-be-processed data identification to be 1 month, 2 months, 3 months, 4 months, 5 months and 6 months as to-be-processed data identification of the execution description information.
Step S407: and generating a current execution file based on the execution description information and the data processing statement.
Specifically, the data processing statement of the currently executed file may include the data processing statement corresponding to the current data set in step S405.
Step S408: and determining the target task of the associated task identifier according to the current execution file.
In one particular embodiment, the target task may be generated based on a currently executing file. The task identifications associated with the historical tasks may be associated with the target task at the same time. Specifically, the historical task may have a lower priority than the target task.
Steps S401 to S408 may be executed before step S201, and may implement preprocessing on the history execution file to obtain the current execution file; and preprocessing the historical data set to obtain a current data set.
In the embodiment of the application, the historical execution file can be optimized to obtain the current execution file, and the current execution file is executed when the task request is received. In the process, the number of data sources required to be called in the task execution process can be reduced by calling the newly generated data set, so that the occupation of a link is reduced; and the processing efficiency of the data warehouse can be improved by executing the simplified data processing statement. It should be noted that, the data type or the data method of the data warehouse task processing is not limited in the present application, the task description of the target task is not limited to the description set forth above, and in some alternative embodiments, the data warehouse data processing method of the present application may implement efficient processing on other types of data based on the current data set and the current execution file obtained by preprocessing.
Correspondingly, the application provides a data warehouse data processing device. Fig. 5 is a schematic structural diagram of a data warehouse data processing apparatus according to an embodiment of the present application. As illustrated in fig. 5, the data warehouse data processing apparatus 500 may include:
a receiving module 501, configured to receive a task request; the task request carries a task identifier;
an analysis module 502 for analyzing the target task based on the task identifier; the target task comprises a current execution file; the current execution file comprises execution description information and a data processing statement;
a determining module 503, configured to determine to-be-processed data from the current data set based on the execution description information; determining a current data set based on a plurality of historical data sets;
the processing module 504 is configured to process the data to be processed according to the data processing statement to obtain target data;
a sending module 505, configured to send the target data.
Specifically, determining the data to be processed from the current data set based on the execution description information includes: acquiring a current data set identifier and a to-be-processed data identifier from the execution description information; determining a current data set from the data warehouse based on the current data set identification; and determining the data to be processed from the current data set based on the data to be processed identifier.
Specifically, the apparatus may further include a preprocessing module configured to: acquiring a historical task set; each historical task in the historical task set comprises a historical execution file; the history execution file comprises history description information and history processing statements; if history description information and history processing statements in a plurality of history tasks in the history task set meet preset conditions, determining a history data set identifier from the history description information in the plurality of history tasks; determining a plurality of historical data sets based on the historical data set identification; a current data set is generated based on the plurality of historical data sets and the historical processing statement.
Specifically, the preprocessing module may be further configured to: generating a data processing statement corresponding to the current data set based on the historical processing statement; taking the history description information as the execution description information; generating a current execution file based on the execution description information and the data processing statement; and determining the target task of the associated task identifier according to the current execution file.
Specifically, if history description information and history processing statements in a plurality of history tasks in the history task set meet preset conditions, determining a history data set identifier from the history description information in the plurality of history tasks includes: if a plurality of historical tasks exist in the historical task set and historical data set identifications contained in historical description information of each historical task in the plurality of historical tasks are the same, acquiring a historical processing statement of each historical task; and if the history processing statements of each history task are the same, determining the historical data set identifier from the history description information in the plurality of history tasks.
Specifically, the historical task is associated with the task identification, and the priority of the historical task is lower than that of the target task.
Specifically, parsing the target task based on the task identification may include: determining a related target task and a history task based on the task identifier; and if the number of times of the target task being analyzed is less than or equal to a preset threshold value within the preset time length, analyzing the target task based on the task identification.
The apparatus embodiments and method embodiments of the present application may be based on the same concept.
Accordingly, an embodiment of the present disclosure further provides an electronic device, which includes a processor and a memory, where the memory stores at least one instruction, at least one program, a code set, or an instruction set, and the at least one instruction, the at least one program, the code set, or the instruction set is loaded and executed by the processor to implement the data warehouse data processing method.
The method provided by the embodiment of the application can be executed in a computer terminal, a server or a similar operation device. Taking the example of running on a server, fig. 6 is a hardware structure block diagram of the server of the data warehouse data processing method provided in the embodiment of the present application. As shown in fig. 6, the server 600 may have a relatively large difference due to different configurations or performances, and may include one or more Central Processing Units (CPUs) 610 (the CPU 610 may include but is not limited to a Processing device such as a microprocessor MCU or a programmable logic device FPGA), a memory 630 for storing data, and one or more storage media 620 (e.g., one or more mass storage devices) for storing applications 623 or data 622. Memory 630 and storage medium 620 may be, among other things, transient or persistent storage. The program stored on the storage medium 620 may include one or more modules, each of which may include a series of instruction operations for the server. Still further, the central processor 610 may be configured to communicate with the storage medium 620 to execute a series of instruction operations in the storage medium 620 on the server 600. The server 600 may also include one or more power supplies 660, one or more wired or wireless network interfaces 650, one or more input-output interfaces 640, and/or one or more operating systems 621, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, and so forth.
The input/output interface 640 may be used to receive or transmit data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the server 600. In one example, i/o Interface 640 includes a Network adapter (NIC) that may be coupled to other Network devices via a base station to communicate with the internet. In one example, the input/output interface 640 may be a Radio Frequency (RF) module, which is used for communicating with the internet in a wireless manner.
It will be understood by those skilled in the art that the structure shown in fig. 6 is only an illustration and is not intended to limit the structure of the electronic device. For example, server 600 may also include more or fewer components than shown in FIG. 6, or have a different configuration than shown in FIG. 6.
The present application provides a storage medium, which may be disposed in a server to store at least one instruction, at least one program, a code set, or a set of instructions related to implementing a data warehouse data processing method in the method embodiment, where the at least one instruction, the at least one program, the code set, or the set of instructions is loaded and executed by the processor to implement the data warehouse data processing method.
Specifically, in this embodiment, the storage medium may be located in at least one network server of a plurality of network servers of a computer network. Optionally, in this embodiment, the storage medium may include, but is not limited to, a storage medium including: various media that can store program codes, such as a usb disk, a Read-only Memory (ROM), a removable hard disk, a magnetic disk, or an optical disk.
In the present invention, unless otherwise expressly stated or limited, the terms "connected" and "connected" are to be construed broadly, e.g., as meaning either a fixed connection or a removable connection, or an integral part; can be mechanically or electrically connected; either directly or indirectly through intervening media, either internally or in any other relationship. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.
It should be noted that: the foregoing sequence of the embodiments of the present application is for description only and does not represent the superiority and inferiority of the embodiments, and the specific embodiments are described in the specification, and other embodiments are also within the scope of the appended claims. In some cases, the actions or steps recited in the claims can be performed in the order of execution in different embodiments and achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown or connected to enable the desired results to be achieved, and in some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
All the embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment is described with emphasis on differences from other embodiments. In particular, for the embodiments of the apparatus/system, since they are based on embodiments similar to the method embodiments, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiments.
The foregoing is a preferred embodiment of the present invention, and it should be noted that it would be apparent to those skilled in the art that various modifications and enhancements can be made without departing from the principles of the invention, and such modifications and enhancements are also considered to be within the scope of the invention.

Claims (9)

1. A data warehouse data processing method, the method comprising:
acquiring a historical task set; each historical task in the historical task set comprises a historical execution file; the history execution file comprises history description information and history processing statements;
if history description information and history processing statements in a plurality of history tasks in the history task set meet preset conditions, determining a history data set identifier from the history description information in the plurality of history tasks;
determining a plurality of historical data sets based on the historical data set identification;
generating a current dataset based on the plurality of historical datasets and the historical processing statement;
receiving a task request; the task request carries a task identifier;
analyzing a target task based on the task identification; the target task comprises a current execution file; the current execution file comprises execution description information and a data processing statement;
determining data to be processed from the current data set based on the execution description information;
processing the data to be processed according to the data processing statement to obtain target data;
sending the target data;
wherein the plurality of historical data sets are processed in a single historical task and there are a plurality of the single historical tasks in a historical task set.
2. The data warehouse data processing method of claim 1, wherein the determining the data to be processed from the current data set based on the execution description information comprises:
acquiring a current data set identifier and a to-be-processed data identifier from the execution description information;
determining the current data set from a data warehouse based on the current data set identification;
determining the data to be processed from the current data set based on the data to be processed identification.
3. The data warehouse data processing method of claim 1, wherein after the generating the current data set based on the plurality of historical data sets and the historical processing statement, the method further comprises:
generating the data processing statement corresponding to the current data set based on the historical processing statement;
regarding the history description information as the execution description information;
generating the current execution file based on the execution description information and the data processing statement;
and determining the target task associated with the task identifier according to the current execution file.
4. The data warehouse data processing method of claim 1, wherein if historical description information and historical processing statements in a plurality of historical tasks in the historical task set satisfy a preset condition, determining a historical data set identifier from the historical description information in the plurality of historical tasks comprises:
if a plurality of historical tasks exist in the historical task set and historical data set identifications contained in historical description information of each historical task in the plurality of historical tasks are the same, acquiring a historical processing statement of each historical task;
and if the history processing statements of each history task are the same, determining a history data set identifier from the history description information in the plurality of history tasks.
5. The data warehouse data processing method of claim 1, wherein the historical tasks are associated with the task identifiers, and wherein the historical tasks have a lower priority than the target tasks.
6. The data warehouse data processing method of claim 5, wherein parsing the target task based on the task identification comprises:
determining the associated target task and the historical task based on the task identification;
and if the number of times of the target task being analyzed is less than or equal to a preset threshold value within a preset time length, analyzing the target task based on the task identifier.
7. A data warehouse data processing apparatus, the apparatus comprising:
the acquisition module is used for acquiring a historical task set; each historical task in the historical task set comprises a historical execution file; the history execution file comprises history description information and history processing statements;
the identification determining module is used for determining the identification of the historical data set from the historical description information in the plurality of historical tasks if the historical description information and the historical processing statements in the plurality of historical tasks in the historical task set meet preset conditions;
a historical data set determination module for determining a plurality of historical data sets based on the historical data set identification;
a current data set generating module for generating a current data set based on the plurality of historical data sets and the historical processing statement;
the receiving module is used for receiving the task request; the task request carries a task identifier;
the analysis module is used for analyzing the target task based on the task identifier; the target task comprises a current execution file; the current execution file comprises execution description information and a data processing statement;
the determining module is used for determining data to be processed from the current data set based on the execution description information; the current data set is determined based on a plurality of historical data sets;
the processing module is used for processing the data to be processed according to the data processing statement to obtain target data;
and the sending module is used for sending the target data.
8. An electronic device, comprising a processor and a memory, wherein at least one instruction, at least one program, set of codes, or set of instructions is stored in the memory, and wherein the at least one instruction, the at least one program, the set of codes, or the set of instructions is loaded and executed by the processor to implement the data warehouse data processing method of any of claims 1-6.
9. A computer readable storage medium having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, which is loaded and executed by a processor to implement the data warehouse data processing method of any of claims 1-6.
CN202210184591.7A 2022-02-28 2022-02-28 Data warehouse data processing method and device, electronic equipment and storage medium Active CN114238286B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210184591.7A CN114238286B (en) 2022-02-28 2022-02-28 Data warehouse data processing method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210184591.7A CN114238286B (en) 2022-02-28 2022-02-28 Data warehouse data processing method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN114238286A CN114238286A (en) 2022-03-25
CN114238286B true CN114238286B (en) 2022-08-05

Family

ID=80748225

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210184591.7A Active CN114238286B (en) 2022-02-28 2022-02-28 Data warehouse data processing method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114238286B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115695432B (en) * 2023-01-04 2023-04-07 河北华通科技股份有限公司 Load balancing method and device, electronic equipment and storage medium

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107038218A (en) * 2017-03-17 2017-08-11 腾讯科技(深圳)有限公司 report processing method and system
CN107665233A (en) * 2017-07-24 2018-02-06 上海壹账通金融科技有限公司 Database data processing method, device, computer equipment and storage medium
CN109388637A (en) * 2018-09-21 2019-02-26 北京京东金融科技控股有限公司 Data warehouse information processing method, device, system, medium
CN111190932A (en) * 2019-12-16 2020-05-22 北京淇瑀信息科技有限公司 Privacy cluster query method and device and electronic equipment
CN111475534A (en) * 2020-05-12 2020-07-31 北京爱笔科技有限公司 Data query method and related equipment
CN111831464A (en) * 2019-04-22 2020-10-27 阿里巴巴集团控股有限公司 Data operation control method and device
CN112434195A (en) * 2020-11-30 2021-03-02 天津狮拓信息技术有限公司 Data analysis method and device, electronic equipment and computer readable storage medium
CN112860727A (en) * 2021-02-20 2021-05-28 平安科技(深圳)有限公司 Data query method, device, equipment and medium based on big data query engine
CN112965982A (en) * 2021-03-16 2021-06-15 中国平安财产保险股份有限公司 Table processing method, device, equipment and storage medium
CN113420051A (en) * 2021-06-30 2021-09-21 网易(杭州)网络有限公司 Data query method and device, electronic equipment and storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109542428B (en) * 2018-10-16 2024-06-11 北京神州数码云科信息技术有限公司 Service processing method, device, computer equipment and storage medium
GB201818997D0 (en) * 2018-11-22 2019-01-09 Palantir Technologies Inc Providing external access to a prcoessing platform
CN110119310A (en) * 2019-04-12 2019-08-13 深圳壹账通智能科技有限公司 Method for distributing system resource, device, computer readable storage medium and server
CN111563101B (en) * 2020-07-11 2020-12-29 阿里云计算有限公司 Execution plan optimization method, device, equipment and storage medium

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107038218A (en) * 2017-03-17 2017-08-11 腾讯科技(深圳)有限公司 report processing method and system
CN107665233A (en) * 2017-07-24 2018-02-06 上海壹账通金融科技有限公司 Database data processing method, device, computer equipment and storage medium
CN109388637A (en) * 2018-09-21 2019-02-26 北京京东金融科技控股有限公司 Data warehouse information processing method, device, system, medium
CN111831464A (en) * 2019-04-22 2020-10-27 阿里巴巴集团控股有限公司 Data operation control method and device
CN111190932A (en) * 2019-12-16 2020-05-22 北京淇瑀信息科技有限公司 Privacy cluster query method and device and electronic equipment
CN111475534A (en) * 2020-05-12 2020-07-31 北京爱笔科技有限公司 Data query method and related equipment
CN112434195A (en) * 2020-11-30 2021-03-02 天津狮拓信息技术有限公司 Data analysis method and device, electronic equipment and computer readable storage medium
CN112860727A (en) * 2021-02-20 2021-05-28 平安科技(深圳)有限公司 Data query method, device, equipment and medium based on big data query engine
CN112965982A (en) * 2021-03-16 2021-06-15 中国平安财产保险股份有限公司 Table processing method, device, equipment and storage medium
CN113420051A (en) * 2021-06-30 2021-09-21 网易(杭州)网络有限公司 Data query method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN114238286A (en) 2022-03-25

Similar Documents

Publication Publication Date Title
CN112800095B (en) Data processing method, device, equipment and storage medium
CN112148693A (en) Data processing method, device and storage medium
CN113839977A (en) Message pushing method and device, computer equipment and storage medium
CN114417408A (en) Data processing method, device, equipment and storage medium
CN114238286B (en) Data warehouse data processing method and device, electronic equipment and storage medium
CN110689268A (en) Method and device for extracting indexes
CN112528067A (en) Graph database storage method, graph database reading method, graph database storage device, graph database reading device and graph database reading equipment
CN114398520A (en) Data retrieval method, system, device, electronic equipment and storage medium
CN114547069A (en) Data query method and device, electronic equipment and storage medium
US20140214826A1 (en) Ranking method and system
CN112579422B (en) Scheme testing method and device, server and storage medium
CN110728118B (en) Cross-data-platform data processing method, device, equipment and storage medium
CN110909072A (en) Data table establishing method, device and equipment
CN114896347A (en) Data processing method and device, electronic equipment and storage medium
CN114817003A (en) Test information processing method, device, equipment and storage medium
CN113961797A (en) Resource recommendation method and device, electronic equipment and readable storage medium
CN115794806A (en) Gridding processing system, method and device for financial data and computing equipment
CN112579673A (en) Multi-source data processing method and device
CN113760484A (en) Data processing method and device
CN111552674A (en) Log processing method and device
WO2014117566A1 (en) Ranking method and system
CN114490095B (en) Request result determination method and device, storage medium and electronic device
CN116303811A (en) Data processing method and device, electronic equipment and storage medium
CN117056663B (en) Data processing method and device, electronic equipment and storage medium
CN112308431B (en) Big data index management method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant