CN116383471B - Method and system for extracting data by data browser in large data scene of resource management industry - Google Patents

Method and system for extracting data by data browser in large data scene of resource management industry Download PDF

Info

Publication number
CN116383471B
CN116383471B CN202310647082.8A CN202310647082A CN116383471B CN 116383471 B CN116383471 B CN 116383471B CN 202310647082 A CN202310647082 A CN 202310647082A CN 116383471 B CN116383471 B CN 116383471B
Authority
CN
China
Prior art keywords
data
directed acyclic
acyclic graph
execution
sql
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310647082.8A
Other languages
Chinese (zh)
Other versions
CN116383471A (en
Inventor
花磊
余家奎
芦辉
许一锴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Boyun Technology Co ltd
Original Assignee
Jiangsu Boyun Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Boyun Technology Co ltd filed Critical Jiangsu Boyun Technology Co ltd
Priority to CN202310647082.8A priority Critical patent/CN116383471B/en
Publication of CN116383471A publication Critical patent/CN116383471A/en
Application granted granted Critical
Publication of CN116383471B publication Critical patent/CN116383471B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The embodiment of the specification provides a method and a system for extracting data by a data browser in a large data scene in resource management industry, wherein the method comprises the following steps: acquiring a data model to be extracted; generating a structure directed acyclic graph corresponding to the data model to be extracted; constructing a data directed acyclic graph based on the structural directed acyclic graph according to the request parameters; optimizing the data directed acyclic graph to generate an execution directed acyclic graph; based on the execution directed acyclic graph, calculation is performed to obtain a calculation result corresponding to the data extraction request, and the method has the advantage of improving the data extraction efficiency of the resource management industry.

Description

Method and system for extracting data by data browser in large data scene of resource management industry
Technical Field
The specification relates to the field of data processing, in particular to a method and a system for extracting data by a data browser in a big data scene in the resource management industry.
Background
The asset management business refers to the act of an asset manager operating a customer's asset according to the manner, conditions, requirements and limitations agreed by the asset management contract, providing securities, funds, and other financial products to the customer, and collecting fees. For the resource industry, one data model can be constructed through a plurality of index combinations, indexes of the same data model can be in the same dimension or different dimensions, a result of one index can be used as a parameter of another index, and different data models can be constructed through various flexible combinations. When the data model is subjected to data extraction, the data extraction can only be executed according to a pre-configured business process, and the efficiency is low.
Therefore, a method and a system for extracting data by a data browser in a big data scene of the resource management industry are needed to be provided, so that the data extraction efficiency of the resource management industry is improved.
Disclosure of Invention
One of the embodiments of the present disclosure provides a method for extracting data by a data browser in a big data scenario in the resource industry, the method including: acquiring a data model to be extracted; generating a structural directed acyclic graph corresponding to the data model to be extracted; constructing a data directed acyclic graph based on the structured directed acyclic graph according to the request parameters; optimizing the data directed acyclic graph to generate an execution directed acyclic graph; and calculating based on the execution directed acyclic graph to obtain a calculation result corresponding to the data extraction request.
In some embodiments, the generating the structure directed acyclic graph corresponding to the data model to be extracted includes: judging whether a structural directed acyclic graph corresponding to the data model to be extracted is cached or not; and if the structure directed acyclic graph corresponding to the data model to be extracted is not cached, generating the structure directed acyclic graph corresponding to the data model to be extracted according to each index definition of the data model to be extracted.
In some embodiments, the building a data directed acyclic graph from the structured directed acyclic graph according to request parameters includes: and combining the SQL index lifting queries which meet the preset combining conditions into one SQL index lifting query.
In some embodiments, the building a data directed acyclic graph from the structured directed acyclic graph according to request parameters includes: and carrying out associated query on the original SQL and the parameter table, and rewriting the corresponding parameters in the original SQL into the corresponding fields in the parameter table.
In some embodiments, the building a data directed acyclic graph from the structured directed acyclic graph according to request parameters includes: and (3) splitting the batch parameters into single, filling the SQL sentences, and assembling all the SQL sentences in a unit mode.
In some embodiments, the optimizing the data directed acyclic graph to generate an execution directed acyclic graph includes: and combining and splitting the plurality of nodes which are executed at the same level and concurrently.
In some embodiments, the optimizing the data directed acyclic graph to generate an execution directed acyclic graph includes: and acquiring an SQL sentence, carrying out syntax tree analysis on the SQL sentence, optimizing the SQL sentence based on a relational algebra theory, and optimizing based on cost query.
In some embodiments, the calculating, based on the execution directed acyclic graph, to obtain a calculation result corresponding to the data extraction request includes: the management node determines a plurality of execution tasks based on the execution directed acyclic graph; the management node distributes the plurality of execution tasks to a plurality of working nodes; and the management node receives execution results from the plurality of working nodes and generates calculation results corresponding to the data extraction requests.
In some embodiments, the calculating, based on the execution directed acyclic graph, to obtain a calculation result corresponding to the data extraction request includes: when the data quantity of the request parameters is larger than a preset data quantity threshold, splitting the request parameters to generate a plurality of parameter groups, generating a plurality of tasks based on the plurality of parameter groups and the execution directed acyclic graph, executing the plurality of tasks offline, and generating a calculation result corresponding to the data extraction request.
One of the embodiments of the present disclosure provides a system for extracting data by a data browser in a big data scenario in the resource industry, including: the generation engine is used for acquiring a data model to be extracted and generating a structure directed acyclic graph corresponding to the data model to be extracted; the merging engine is used for constructing a data directed acyclic graph based on the structure directed acyclic graph according to the request parameters; the execution engine is used for optimizing the data directed acyclic graph and generating an execution directed acyclic graph; and the calculation engine is used for calculating based on the execution directed acyclic graph and obtaining a calculation result corresponding to the data extraction request.
Drawings
The present specification will be further elucidated by way of example embodiments, which will be described in detail by means of the accompanying drawings. The embodiments are not limiting, in which like numerals represent like structures, wherein:
FIG. 1 is a block diagram of a system for extracting data by a data browser in a data industry big data scenario, according to some embodiments of the present description;
FIG. 2 is a flow chart of a method for a data browser to extract data in a data industry big data scenario, as shown in some embodiments of the present description;
FIG. 3 is a flow chart of generating a directed acyclic graph according to some embodiments of the present description;
FIG. 4 is a schematic diagram of a data directed acyclic graph of a data model according to some embodiments of the present description;
FIG. 5 is a schematic diagram of a directed acyclic graph of a data model after node merging according to some embodiments of the present description;
FIG. 6 is a flow diagram illustrating an optimization retrofit of SQL nodes according to some embodiments of the present description;
FIG. 7 is a flow chart of a calculation engine calculating a calculation result corresponding to a data extraction request according to some embodiments of the present description;
FIG. 8 is a schematic diagram of a management node and a plurality of work nodes shown in accordance with some embodiments of the present description;
FIG. 9 is a flow chart of a task engine computing a computation result corresponding to a data extraction request, according to some embodiments of the present description;
FIG. 10 is a schematic diagram of a data model 1 shown in accordance with some embodiments of the present description;
FIG. 11 is a schematic illustration of an initial execution directed acyclic graph corresponding to data model 1 shown according to some embodiments of the present description;
FIG. 12 is a schematic diagram of a corresponding final execution directed acyclic graph of data model 1 shown according to some embodiments of the present description;
fig. 13 is a block diagram of an electronic device according to some embodiments of the present description.
Detailed Description
In order to more clearly illustrate the technical solutions of the embodiments of the present specification, the drawings that are required to be used in the description of the embodiments will be briefly described below. It is apparent that the drawings in the following description are only some examples or embodiments of the present specification, and it is possible for those of ordinary skill in the art to apply the present specification to other similar situations according to the drawings without inventive effort. Unless otherwise apparent from the context of the language or otherwise specified, like reference numerals in the figures refer to like structures or operations.
As used in this specification and the claims, the terms "a," "an," "the," and/or "the" are not specific to a singular, but may include a plurality, unless the context clearly dictates otherwise. In general, the terms "comprises" and "comprising" merely indicate that the steps and elements are explicitly identified, and they do not constitute an exclusive list, as other steps or elements may be included in a method or apparatus.
A flowchart is used in this specification to describe the operations performed by the system according to embodiments of the present specification. It should be appreciated that the preceding or following operations are not necessarily performed in order precisely. Rather, the steps may be processed in reverse order or simultaneously. Also, other operations may be added to or removed from these processes.
FIG. 1 is a block diagram of a system for extracting data by a data browser in a data industry big data scenario, according to some embodiments of the present description. As shown in fig. 1, a system for extracting data by a data browser in a data industry big data scenario may include a generation engine, a merge engine, an execution engine, a calculation engine, and a task engine.
The generating engine can be used for acquiring the data model to be extracted and generating a structure directed acyclic graph corresponding to the data model to be extracted.
The merge engine may be configured to construct a data directed acyclic graph based on the structured directed acyclic graph according to the request parameters.
The execution engine may be configured to optimize the directed acyclic graph of data to generate an executed directed acyclic graph.
The calculation engine can be used for calculating based on the execution directed acyclic graph, and obtaining a calculation result corresponding to the data extraction request.
When the data volume of the request parameters is larger than a preset data volume threshold, the task engine can split the request parameters to generate a plurality of parameter sets, generate a plurality of tasks based on the plurality of parameter sets and the execution directed acyclic graph, execute the plurality of tasks offline, and generate a calculation result corresponding to the data extraction request.
For further description of the generation engine, the merge engine, the execution engine, the calculation engine, and the task engine, see FIG. 2 and its associated description, which are not repeated here.
It should be noted that the above description of the system for extracting data by the data browser and the modules thereof in the big data scenario of the resource industry is only for convenience of description, and the present disclosure should not be limited to the scope of the illustrated embodiments. It will be appreciated by those skilled in the art that, given the principles of the system, various modules may be combined arbitrarily or a subsystem may be constructed in connection with other modules without departing from such principles. In some embodiments, the generation engine, the merging engine, the execution engine, the calculation engine, and the task engine disclosed in fig. 1 may be different modules in one system, or may be one module to implement the functions of two or more modules. For example, each engine may share one memory module, or each engine may have a respective memory module. Such variations are within the scope of the present description.
FIG. 2 is a flow chart of a method for a data browser to extract data in a data industry big data scenario, according to some embodiments of the present description. The operational schematic of the method for extracting data by a data browser in the data industry big data scenario presented below is illustrative. In some embodiments, the process may be accomplished with one or more additional operations not described above and/or one or more operations not discussed. In addition, the order of the operations of the method for extracting data by the data browser in the asset management industry big data scenario shown in fig. 2 and described below is not limiting. As shown in fig. 2, the method for extracting data by the data browser in the large data scene of the resource industry can include the following steps.
Step 210, a data model to be extracted is obtained. In some embodiments, step 210 may be performed by a generation engine.
The indexes of the same data model can be in the same dimension or different dimensions, the result of one index can be used as the parameter of another index, and different data models can be built through various flexible combinations.
The index is a value which is realized according to the service logic caliber, and the value can be the sum, information data and the like, and has no fixed relative result. The logic of the index may be implemented by a domain specific language (e.g., SQL), may perform some formula operations on the defined index, or may be implemented by a computer programming language (e.g., java, python). The indexes are divided into an atomic index and a derivative index, wherein the derivative index is a composite index configured based on other indexes, and logic configured as a derivative index C is as follows: derived index c=index a+index B.
The dimension is the angle of the main statistics of the data model, and the dimension determines the volume of the data model. For example, a data model is named as a bond data model, and then, in particular, there are bonds, the bond is a dimension, there are many records in the dimension, and there are many records in the data model. For another example, a data model has the following fields: bond code, bond name, equity, place of transaction, valuation, etc., assuming the bond code is a dimension field, it determines how many records the data model has, the bond name is the column attached to the dimension. The dimension of the data model is thus made up of two columns: bond code, bond name; the net price, the trade place and the valuation are three indexes corresponding to the data model. The dimensions may also be multiple compound dimensions, i.e., dimensions made up of multiple columns. For example, the dimension column calculates the holding status of bonds in different combinations, and the combination and bond are composite dimensions because the same bond may have holding status in different combinations and multiple bonds in the same combination.
After a user initiates a data extraction request for a certain data model, the generation engine can take the data model as a data model to be extracted.
And 220, generating a structure directed acyclic graph corresponding to the data model to be extracted. In some embodiments, step 220 may be performed by a generation engine.
The directed acyclic graph (Directed Acyclic Graph, DAG) build mechanism is a mechanism that performs tasks in a particular order by organizing the tasks into a directed acyclic graph. Each task has one or more inputs and outputs, and the links between tasks are represented by directed edges. Each task may have a plurality of predecessor tasks and a plurality of successor tasks, with the dependencies between the tasks being represented by directed edges. When all predecessor tasks of a task are completed, the task can only begin executing, and when one task is completed, all successor tasks of the task can only begin executing. DAG graph execution can effectively improve the efficiency of execution of an application program because it can make the program more clearly understand the relationships between tasks and can make the executor more effectively arrange the execution order of the tasks.
The structural DAG graph may be a DAG graph that generates dependency relationships between different metrics according to the metric definition, and belongs to a static DAG graph of a corresponding data model.
FIG. 3 is a flow chart of generating a directed acyclic graph according to some embodiments of the present disclosure, as shown in FIG. 3, in some embodiments, a generation engine may first determine whether a structured directed acyclic graph corresponding to a data model to be extracted is cached; if the structure directed acyclic graph corresponding to the data model to be extracted is judged not to be cached, generating the structure directed acyclic graph corresponding to the data model to be extracted according to each index definition of the data model to be extracted; if the structure directed acyclic graph corresponding to the data model to be extracted is judged to be cached, the cached structure directed acyclic graph corresponding to the data model to be extracted can be directly obtained.
Step 230, constructing a data directed acyclic graph based on the structured directed acyclic graph according to the request parameters. In some embodiments, step 230 may be performed by a merge engine.
The data DAG graph may be a DAG graph generated by optimizing the structural DAG graph for different request parameters.
In some embodiments, optimizing the structural DAG graph may include: and combining the SQL index-increasing queries into one SQL index-increasing query for the SQL index-increasing queries meeting the preset combining conditions.
For example, for multiple SQL index-related queries of the same data model, the following conditions may be combined into one SQL to be executed:
(1) The look-up tables are identical.
(2) The WHERE condition of the query is exactly the same.
By way of example only, the SQL logic implemented by the index A promotion query is as follows:
the SQL logic for index B promotion is as follows:
at this time, the merging engine optimizes the index a and index B index queries to the following SQL, and simultaneously obtains A, B values of the two indexes as follows:
in some embodiments, different parameters of the same SQL may be combined into one SQL to execute, and the parameter combining strategy may be divided into a simple strategy and a parameter table strategy.
Simple strategy: modifying the "=" operator in the SQL statement to an "in" operator, which is only applicable to a scene where there is only one batch parameter (the parameters of the non-batch are not limited) and the batch parameter is not in the sub-query and the operator of the batch parameter must be the "=" operator, allowing the SQL statement to contain Limit statements, but the number of Limit must be 1;
by way of example only, the original SQL is as follows:
request parameters:
the optimized SQL is as follows:
parameter table policy: and carrying out associated query on the original SQL and the parameter table, and rewriting the corresponding parameters in the original SQL into the corresponding fields in the parameter table. The parameter table rewrite strategy currently supports most of the scenarios, which are not currently supported by:
1) There is a limit grammar, but not limit 1, and only limit 1 grammar optimization is currently supported.
2) Two different columns are queried simultaneously using the aggregate function.
By way of example only, the original SQL is:
the original SQL is optimized as follows:
as yet another example, the original SQL is:
the original SQL is optimized as follows:
as yet another example, the original SQL is:
the original SQL is optimized as follows:
in some embodiments, optimizing the structural DAG graph may include: and (3) splitting the batch parameters into single, filling the SQL sentences, and assembling all the SQL sentences in a unit mode. Wherein, the same SQL fingerprint and the structure of different parameters SQL are consistent. Specifically, for each group of batch parameters, the original dynamic parameters are replaced, and a unit statement is constructed and then the column is supplemented.
By way of example only, the original SQL is:
the parameters are as follows:
the optimized original SQL is as follows:
it can be appreciated that after the structural DAG graphs are merged based on the request parameters, a new DAG graph, namely a data DAG graph, is formed, and the data DAG graph performs merging optimization processing relative to the structural DAG graph, so as to lay a foundation for subsequent high-efficiency execution.
Step 240, optimize the directed acyclic graph of data to generate an execution directed acyclic graph. In some embodiments, step 240 may be performed by an execution engine.
The execution directed acyclic graph may generate a final execution DAG graph for data-based DAG graph optimization.
In some embodiments, the execution engine may merge and split for multiple nodes executing concurrently at the same level.
For example, fig. 4 is a schematic diagram of a directed acyclic graph of data of a data model according to some embodiments of the present disclosure, as shown in fig. 4, a node a, a node B, and a node C may execute concurrently, a node D executes after a node C executes, and after node merging, three indexes of the node a, the node B, and the node C may be merged into one node, where there is a problem: in order to solve this problem, fig. 5 is a schematic diagram of a directed acyclic graph of a data model after node merging, as shown in fig. 5, in which an execution engine merges a node a and a node B, and reserves the node C, and the execution order of each node is as follows: and concurrently executing the merging node of the A node and the B node and the C node, wherein the D node is executed after the C node is executed.
The execution engine may be constructed based on an open source framework Calcite, and fig. 6 is a schematic diagram of optimizing and modifying an SQL node according to some embodiments of the present disclosure, and as shown in fig. 6, in some embodiments, the execution engine may obtain an SQL statement, parse the SQL statement into a syntax tree, optimize the SQL statement based on a relational algebra theory, and optimize the SQL statement based on a cost query. And the optimized SQL is put into a cache to improve the optimization performance, and simultaneously, the optimization can be performed on different SQL in parallel.
Step 250, based on executing the directed acyclic graph, performing calculation to obtain a calculation result corresponding to the data extraction request. In some embodiments, step 250 may be performed by a computing engine or a task engine.
Fig. 7 is a flowchart of a calculation result corresponding to a calculation request of a calculation engine according to some embodiments of the present disclosure, fig. 8 is a schematic diagram of a management node and a plurality of working nodes according to some embodiments of the present disclosure, as shown in fig. 7 and fig. 8, in some embodiments, when a data amount of a request parameter is less than or equal to a preset data amount threshold, the calculation engine performs calculation by using a pseudo-coroutine mechanism, so as to further improve the calculation performance, and omit the overhead of core context switching, and the calculation engine may include a management node (Manager) and a plurality of working nodes (Woker), where the management node determines a plurality of execution tasks based on an execution directed acyclic graph, and the management node distributes the plurality of execution tasks to the plurality of working nodes, and the management node receives the execution result from the plurality of working nodes and generates the calculation result corresponding to the data extraction request.
Fig. 9 is a flowchart of a task engine calculating a calculation result corresponding to a data extraction request according to some embodiments of the present disclosure, and as shown in fig. 9, in some embodiments, when a data amount of a request parameter is greater than a preset data amount threshold, the request parameter is split to generate a plurality of parameter sets, a plurality of tasks are generated based on the plurality of parameter sets and the execution directed acyclic graph, the plurality of tasks are executed offline, and the calculation result corresponding to the data extraction request is generated.
It can be understood that for very complex data models, the number of index columns possibly included is hundreds, and meanwhile, the logic configuration of each column is very complex, so that the data model result data can be quickly acquired or checked, and the direct extraction performance through real-time calculation is difficult to meet the requirement, so that the capability of a task engine for executing offline running batch calculation, storage and query is provided, and the extraction efficiency of the complex data model is greatly ensured. The task engine provides two modes of timing and manual execution, and meanwhile, the running batch participation of the data model can be dynamically and flexibly specified. For example, with a data model of bond dimensions, all possible bonds may be configured to run at 10 pm per day.
For the scene of larger request parameter data volume, in order to prevent the situations of larger system load or downtime and the like caused by single processing of excessive request data of a program, the number of single execution request parameters is divided into groups, and the maximum parameter number of a single group is limited. For example, if the number of bond entries corresponding to the bond data model is ten thousand, and up to 500 bond entries can be processed for a single task, the bond entries are split into 20 parameter sets.
The method comprises the steps of analyzing and combing a data model to obtain a corresponding execution directed acyclic graph, and generating a plurality of node tasks according to the number of nodes executing the directed acyclic graph, wherein each node corresponds to one task, and the task states of different nodes can be as follows: to be executed, execution success, execution failure, to be activated.
For one data model, the total number of tasks generated = the number of nodes executing the directed acyclic graph x the number of parameter sets. For example, for a ten thousand bond entry data model, there are 20 parameter sets, 10 nodes executing a directed acyclic graph, and theoretically 20×10=200 tasks would be generated. When 200 tasks are all processed, the data model runs out.
And after the task of a certain node is successfully executed, if the current node has a subsequent node task, changing the task state of the next node from to-be-activated to-be-executed. The task execution engine can continuously drag out the task to be executed in an abnormal state for execution, the detailed reasons and the execution times of the task abnormality are recorded after each task execution abnormality, and the execution engine can continuously retry the task until the task retry is successful or the abnormality times reach the upper limit when the abnormality times do not reach the upper limit of the retry times.
For offline task configuration, the physical table names of the database stored offline by the data model can be configured, and the table field names corresponding to each dimension column and index column are configured. When the task is executed, the tasks are put into storage one by one according to the corresponding relation. A unique index is generally set up for a unique column of dimensions to improve query efficiency.
After the batch running of the data model is completed, if the query of the data model is performed at the moment, complex calculation logic is not performed any more, and the data stored offline is directly queried, so that the query efficiency is greatly improved.
By way of example only, fig. 10 is a schematic diagram of a data model 1 according to some embodiments of the present disclosure, as shown in fig. 10, the data model 1 includes A, B, C, D, E, F six metrics, wherein metric D, E uses the result of the computation of metric C as a join computation, and metric F uses the result of metric E as a join computation, with the corresponding initial execution directed acyclic graph of the data model 1 as shown in fig. 11. For the formula type or derivative type index, the index is constructed through various logic combinations through other indexes, such as an index f=an index x1+an index X2 and an index d=an index x2+an index X3, the final execution directed acyclic graph corresponding to the data model 1 is shown in fig. 12, and because the index E and the index F simultaneously refer to the index X2, the indexes are automatically combined into a node when the DAG graph is executed, so that the problem of repeated execution of the same index is solved.
For the above scenario, when the request triggers the extraction of the data model, the execution DAG graph is built according to the definition of the data model, and then the nodes are sequentially executed according to the relation of the execution DAG graph, at this time, the execution sequence is as follows:
1) A, B, C are executed in parallel.
2) After the execution of C is completed, D, E is concurrently executed.
3) And F, after the execution of the E is finished, executing the F.
4) And after the execution of F is finished, executing X1 and X2 concurrently.
5) And after the D is executed, executing X2 and X3 concurrently.
Since D, F all depend on X2, for a post-execution index, if the execution result of X2 has been obtained in context, the result of the pre-index execution can be directly multiplexed.
The method for extracting the data by the data browser in the large data scene in the resource management industry has the following beneficial effects:
1. the data extraction performance is improved in the scene of dynamic formation of a data model in the resource and management industry, the corresponding DAG graph is dynamically generated mainly through the data model, the DAG graph and input parameter information are analyzed to optimize or rewrite indexes to generate the corresponding data DAG graph, then the optimization transformation processing is carried out to form the corresponding execution DAG graph, finally the calculation engine carries out the number lifting processing on task distribution execution, and the execution efficiency can be greatly improved through the whole flow processing.
2. Clear task relationship management: the complex execution flow is visualized, and the whole data model is clearly known to be calculated through the DAG graph technology, so that the data model is easier to understand and manage.
3. And the degree of multiplexing of the node execution result is improved: regardless of how a user defines a data model, through intelligent analysis of nodes, the DAG optimization is performed from a structure DAG to a data-level DAG graph optimization, the condition of repeated execution of the nodes is solved, and the execution efficiency of the whole DAG graph is improved.
4. An efficient data model index calculation extraction engine: by constructing DAG graph mechanism, merging engine, executing engine, calculating engine, splitting engine, task scheduling engine, pseudo-cooperative and efficient task distribution mechanism and other technologies, the robustness of the program is enhanced, and the extraction efficiency of the data model is greatly improved; the task offline execution engine is also provided for the complex data model, the offline data model is executed in batches through the visual configuration and parameter splitting engine, the execution result is stored, and extremely fast response can be achieved for the extraction index stored.
5. Node traceability: the method can provide a visualized node monitoring mechanism, and can provide the optimized result of each executing node and the access parameter information of each node, so that a user can monitor/check the execution condition of the node better, and when the user and operation and maintenance personnel find that the result of the data model is inconsistent with the expectation, the user and the operation and maintenance personnel can check the optimized DAG graph and which node the access parameter observation of each node is inconsistent with the expectation, so that whether the logic of index configuration is problematic or the program is problematic is analyzed, and the task monitoring performance is improved.
It should be noted that the above description of the method for extracting data by the data browser in the big data scenario in the resource industry is only for illustration and description, and does not limit the application scope of the present disclosure. Various modifications and changes to the cloud virtual host server management method may be made by those skilled in the art under the guidance of this specification. However, such modifications and variations are still within the scope of the present description.
Fig. 13 is a schematic structural diagram of an electronic device shown according to some embodiments of the present description, as shown in fig. 13, which is an example of a hardware device that can be applied to aspects of the present invention. Electronic devices are intended to represent various forms of digital electronic computer devices, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.
As shown in fig. 13, the electronic device includes a computing unit that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) or a computer program loaded from a storage unit into a Random Access Memory (RAM). In the RAM, various programs and data required for the operation of the device may also be stored. The computing unit, ROM and RAM are connected to each other by a bus. An input/output (I/O) interface is also connected to the bus.
A plurality of components in an electronic device are connected to an I/O interface, comprising: an input unit, an output unit, a storage unit, and a communication unit. The input unit may be any type of device capable of inputting information to the electronic device, and may receive input numeric or character information and generate key signal inputs related to user settings and/or function controls of the electronic device. The output unit may be any type of device capable of presenting information and may include, but is not limited to, a display, speakers, video/audio output terminals, vibrators, and/or printers. Storage units may include, but are not limited to, magnetic disks, optical disks. The communication unit allows the electronic device to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunications networks, and may include, but is not limited to, modems, network cards, infrared communication devices, wireless communication transceivers and/or chipsets, such as bluetooth (TM) devices, wiFi devices, wiMax devices, cellular communication devices, and/or the like.
The computing unit may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing units include, but are not limited to, central Processing Units (CPUs), graphics Processing Units (GPUs), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, digital Signal Processors (DSPs), and any suitable processors, controllers, microcontrollers, and the like. The computing unit performs the various methods and processes described above. For example, in some embodiments, the method of data browser extraction of data in a resource industry big data scenario may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as a storage unit. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device via the ROM and/or the communication unit. In some embodiments, the computing unit may be configured by any other suitable means (e.g., by means of firmware) to perform the method of data extraction by the data browser in the resource industry big data scenario.
While the basic concepts have been described above, it will be apparent to those skilled in the art that the foregoing detailed disclosure is by way of example only and is not intended to be limiting. Although not explicitly described herein, various modifications, improvements, and adaptations to the present disclosure may occur to one skilled in the art. Such modifications, improvements, and modifications are intended to be suggested within this specification, and therefore, such modifications, improvements, and modifications are intended to be included within the spirit and scope of the exemplary embodiments of the present invention.
Meanwhile, the specification uses specific words to describe the embodiments of the specification. Reference to "one embodiment," "an embodiment," and/or "some embodiments" means that a particular feature, structure, or characteristic is associated with at least one embodiment of the present description. Thus, it should be emphasized and should be appreciated that two or more references to "an embodiment" or "one embodiment" or "an alternative embodiment" in various positions in this specification are not necessarily referring to the same embodiment. Furthermore, certain features, structures, or characteristics of one or more embodiments of the present description may be combined as suitable.
Furthermore, the order in which the elements and sequences are processed, the use of numerical letters, or other designations in the description are not intended to limit the order in which the processes and methods of the description are performed unless explicitly recited in the claims. While certain presently useful inventive embodiments have been discussed in the foregoing disclosure, by way of various examples, it is to be understood that such details are merely illustrative and that the appended claims are not limited to the disclosed embodiments, but, on the contrary, are intended to cover all modifications and equivalent arrangements included within the spirit and scope of the embodiments of the present disclosure. For example, while the system components described above may be implemented by hardware devices, they may also be implemented solely by software solutions, such as installing the described system on an existing server or mobile device.
Likewise, it should be noted that in order to simplify the presentation disclosed in this specification and thereby aid in understanding one or more inventive embodiments, various features are sometimes grouped together in a single embodiment, figure, or description thereof. This method of disclosure, however, is not intended to imply that more features than are presented in the claims are required for the present description. Indeed, less than all of the features of a single embodiment disclosed above.
Finally, it should be understood that the embodiments described in this specification are merely illustrative of the principles of the embodiments of this specification. Other variations are possible within the scope of this description. Thus, by way of example, and not limitation, alternative configurations of embodiments of the present specification may be considered as consistent with the teachings of the present specification. Accordingly, the embodiments of the present specification are not limited to only the embodiments explicitly described and depicted in the present specification.

Claims (7)

1. The method for extracting data by the data browser in the large data scene in the resource management industry is characterized by comprising the following steps:
acquiring a data model to be extracted;
generating a structural directed acyclic graph corresponding to the data model to be extracted;
constructing a data directed acyclic graph based on the structured directed acyclic graph according to the request parameters;
optimizing the data directed acyclic graph to generate an execution directed acyclic graph;
based on the execution directed acyclic graph, calculating to obtain a calculation result corresponding to the data extraction request;
the building the data directed acyclic graph based on the structure directed acyclic graph according to the request parameters comprises the following steps:
for a plurality of SQL index lifting queries meeting preset merging conditions, merging the SQL index lifting queries into an SQL index lifting query, wherein the preset merging conditions comprise identical tables of the queries and identical WHERE conditions of the queries;
performing associated query on the original SQL and the parameter table, and rewriting corresponding parameters in the original SQL into corresponding fields in the parameter table;
splitting batch parameters into single, filling SQL sentences, and assembling all SQL sentences in a unit mode;
the calculating based on the execution directed acyclic graph to obtain a calculation result corresponding to the data extraction request comprises the following steps:
when the data volume of the request parameter is smaller than or equal to a preset data volume threshold, the management node of the calculation engine determines a plurality of execution tasks based on the execution directed acyclic graph, distributes the plurality of execution tasks to a plurality of working nodes, receives execution results from the plurality of working nodes, and generates calculation results corresponding to the data extraction request.
2. The method for extracting data by a data browser in a big data scenario of resource and management industry according to claim 1, wherein the generating a directed acyclic graph of a structure corresponding to the data model to be extracted includes:
judging whether a structural directed acyclic graph corresponding to the data model to be extracted is cached or not;
and if the structure directed acyclic graph corresponding to the data model to be extracted is not cached, generating the structure directed acyclic graph corresponding to the data model to be extracted according to each index definition of the data model to be extracted.
3. The method for extracting data by a data browser in a big data scenario of resource and management industry according to claim 1 or 2, wherein the optimizing the directed acyclic graph of the data to generate the execution directed acyclic graph comprises:
and combining and splitting the plurality of nodes which are executed at the same level and concurrently.
4. The method for extracting data by a data browser in a big data scenario of resource and management industry according to claim 1 or 2, wherein the optimizing the directed acyclic graph of the data to generate the execution directed acyclic graph comprises:
and acquiring an SQL sentence, carrying out syntax tree analysis on the SQL sentence, optimizing the SQL sentence based on a relational algebra theory, and optimizing based on cost query.
5. The method for extracting data by a data browser in a big data scenario of resource and management industry according to claim 1 or 2, wherein the calculating based on the execution directed acyclic graph to obtain a calculation result corresponding to a data extraction request includes:
the management node determines a plurality of execution tasks based on the execution directed acyclic graph;
the management node distributes the plurality of execution tasks to a plurality of working nodes;
and the management node receives execution results from the plurality of working nodes and generates calculation results corresponding to the data extraction requests.
6. The method for extracting data by a data browser in a big data scenario of resource and management industry according to claim 1 or 2, wherein the calculating based on the execution directed acyclic graph to obtain a calculation result corresponding to a data extraction request includes:
when the data quantity of the request parameters is larger than a preset data quantity threshold, splitting the request parameters to generate a plurality of parameter groups, generating a plurality of tasks based on the plurality of parameter groups and the execution directed acyclic graph, executing the plurality of tasks offline, and generating a calculation result corresponding to the data extraction request.
7. The system for extracting data by a data browser in a large data scene in resource management industry is characterized by comprising the following components:
the generation engine is used for acquiring a data model to be extracted and generating a structure directed acyclic graph corresponding to the data model to be extracted;
the merging engine is used for constructing a data directed acyclic graph based on the structure directed acyclic graph according to the request parameters;
the execution engine is used for optimizing the data directed acyclic graph and generating an execution directed acyclic graph;
the calculation engine is used for calculating based on the execution directed acyclic graph to obtain a calculation result corresponding to the data extraction request;
the constructing a data directed acyclic graph based on the structured directed acyclic graph according to request parameters includes:
for a plurality of SQL index lifting queries meeting preset merging conditions, merging the SQL index lifting queries into an SQL index lifting query, wherein the preset merging conditions comprise identical tables of the queries and identical WHERE conditions of the queries;
performing associated query on the original SQL and the parameter table, and rewriting corresponding parameters in the original SQL into corresponding fields in the parameter table;
splitting batch parameters into single, filling SQL sentences, and assembling all SQL sentences in a unit mode;
the calculating based on the execution directed acyclic graph to obtain a calculation result corresponding to the data extraction request comprises the following steps:
when the data volume of the request parameter is smaller than or equal to a preset data volume threshold, the management node of the calculation engine determines a plurality of execution tasks based on the execution directed acyclic graph, distributes the plurality of execution tasks to a plurality of working nodes, receives execution results from the plurality of working nodes, and generates calculation results corresponding to the data extraction request.
CN202310647082.8A 2023-06-02 2023-06-02 Method and system for extracting data by data browser in large data scene of resource management industry Active CN116383471B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310647082.8A CN116383471B (en) 2023-06-02 2023-06-02 Method and system for extracting data by data browser in large data scene of resource management industry

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310647082.8A CN116383471B (en) 2023-06-02 2023-06-02 Method and system for extracting data by data browser in large data scene of resource management industry

Publications (2)

Publication Number Publication Date
CN116383471A CN116383471A (en) 2023-07-04
CN116383471B true CN116383471B (en) 2023-08-25

Family

ID=86971419

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310647082.8A Active CN116383471B (en) 2023-06-02 2023-06-02 Method and system for extracting data by data browser in large data scene of resource management industry

Country Status (1)

Country Link
CN (1) CN116383471B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109063056A (en) * 2018-07-20 2018-12-21 阿里巴巴集团控股有限公司 A kind of data query method, system and terminal device
US20220358178A1 (en) * 2021-08-04 2022-11-10 Beijing Baidu Netcom Science Technology Co., Ltd. Data query method, electronic device, and storage medium
CN115794393A (en) * 2022-11-28 2023-03-14 北京锐安科技有限公司 Method, device, server and storage medium for executing business model

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109063056A (en) * 2018-07-20 2018-12-21 阿里巴巴集团控股有限公司 A kind of data query method, system and terminal device
US20220358178A1 (en) * 2021-08-04 2022-11-10 Beijing Baidu Netcom Science Technology Co., Ltd. Data query method, electronic device, and storage medium
CN115794393A (en) * 2022-11-28 2023-03-14 北京锐安科技有限公司 Method, device, server and storage medium for executing business model

Also Published As

Publication number Publication date
CN116383471A (en) 2023-07-04

Similar Documents

Publication Publication Date Title
CN109345377B (en) Data real-time processing system and data real-time processing method
CN105117286B (en) The dispatching method of task and streamlined perform method in MapReduce
CN102375731B (en) Coding-free integrated application platform system
Chen et al. MRGIS: A MapReduce-Enabled high performance workflow system for GIS
CN108804630B (en) Industry application-oriented big data intelligent analysis service system
CN109725899B (en) Data stream processing method and device
US10339137B2 (en) System and method for caching and parameterizing IR
CN112287015B (en) Image generation system, image generation method, electronic device, and storage medium
CN109740765B (en) Machine learning system building method based on Amazon network server
CN114416855A (en) Visualization platform and method based on electric power big data
CN111858608A (en) Data management method, device, server and storage medium
CN113094116B (en) Deep learning application cloud configuration recommendation method and system based on load characteristic analysis
CN108829505A (en) A kind of distributed scheduling system and method
CN106874067A (en) Parallel calculating method, apparatus and system based on lightweight virtual machine
CN108804601A (en) Power grid operation monitors the active analysis method of big data and device
CN116775041B (en) Real-time decision engine implementation method based on stream calculation and RETE algorithm
CN116383471B (en) Method and system for extracting data by data browser in large data scene of resource management industry
US10489416B2 (en) Optimizing and managing execution of hybrid flows
CN115438995B (en) Business processing method and equipment for clothing customization enterprise based on knowledge graph
CN110879753A (en) GPU acceleration performance optimization method and system based on automatic cluster resource management
CN116166813A (en) Management method, system, equipment and storage medium for big data automation operation and maintenance
CN113420419B (en) Business process model analysis method under micro-service scene
CN112130849B (en) Code automatic generation method and device
CN111290868B (en) Task processing method, device and system and flow engine
CN112668285A (en) Method and device for generating fund daily report by combining RPA and AI and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant