CN116383471B

CN116383471B - Method and system for extracting data by data browser in large data scene of resource management industry

Info

Publication number: CN116383471B
Application number: CN202310647082.8A
Authority: CN
Inventors: 花磊; 余家奎; 芦辉; 许一锴
Original assignee: Jiangsu Boyun Technology Co ltd
Current assignee: Jiangsu Boyun Technology Co ltd
Priority date: 2023-06-02
Filing date: 2023-06-02
Publication date: 2023-08-25
Anticipated expiration: 2043-06-02
Also published as: CN116383471A

Abstract

The embodiment of the specification provides a method and a system for extracting data by a data browser in a large data scene in resource management industry, wherein the method comprises the following steps: acquiring a data model to be extracted; generating a structure directed acyclic graph corresponding to the data model to be extracted; constructing a data directed acyclic graph based on the structural directed acyclic graph according to the request parameters; optimizing the data directed acyclic graph to generate an execution directed acyclic graph; based on the execution directed acyclic graph, calculation is performed to obtain a calculation result corresponding to the data extraction request, and the method has the advantage of improving the data extraction efficiency of the resource management industry.

Description

Method and system for extracting data by data browser in large data scene of resource management industry

Technical Field

The specification relates to the field of data processing, in particular to a method and a system for extracting data by a data browser in a big data scene in the resource management industry.

Background

The asset management business refers to the act of an asset manager operating a customer's asset according to the manner, conditions, requirements and limitations agreed by the asset management contract, providing securities, funds, and other financial products to the customer, and collecting fees. For the resource industry, one data model can be constructed through a plurality of index combinations, indexes of the same data model can be in the same dimension or different dimensions, a result of one index can be used as a parameter of another index, and different data models can be constructed through various flexible combinations. When the data model is subjected to data extraction, the data extraction can only be executed according to a pre-configured business process, and the efficiency is low.

Therefore, a method and a system for extracting data by a data browser in a big data scene of the resource management industry are needed to be provided, so that the data extraction efficiency of the resource management industry is improved.

Disclosure of Invention

One of the embodiments of the present disclosure provides a method for extracting data by a data browser in a big data scenario in the resource industry, the method including: acquiring a data model to be extracted; generating a structural directed acyclic graph corresponding to the data model to be extracted; constructing a data directed acyclic graph based on the structured directed acyclic graph according to the request parameters; optimizing the data directed acyclic graph to generate an execution directed acyclic graph; and calculating based on the execution directed acyclic graph to obtain a calculation result corresponding to the data extraction request.

In some embodiments, the generating the structure directed acyclic graph corresponding to the data model to be extracted includes: judging whether a structural directed acyclic graph corresponding to the data model to be extracted is cached or not; and if the structure directed acyclic graph corresponding to the data model to be extracted is not cached, generating the structure directed acyclic graph corresponding to the data model to be extracted according to each index definition of the data model to be extracted.

In some embodiments, the building a data directed acyclic graph from the structured directed acyclic graph according to request parameters includes: and combining the SQL index lifting queries which meet the preset combining conditions into one SQL index lifting query.

In some embodiments, the building a data directed acyclic graph from the structured directed acyclic graph according to request parameters includes: and carrying out associated query on the original SQL and the parameter table, and rewriting the corresponding parameters in the original SQL into the corresponding fields in the parameter table.

In some embodiments, the building a data directed acyclic graph from the structured directed acyclic graph according to request parameters includes: and (3) splitting the batch parameters into single, filling the SQL sentences, and assembling all the SQL sentences in a unit mode.

In some embodiments, the optimizing the data directed acyclic graph to generate an execution directed acyclic graph includes: and combining and splitting the plurality of nodes which are executed at the same level and concurrently.

In some embodiments, the optimizing the data directed acyclic graph to generate an execution directed acyclic graph includes: and acquiring an SQL sentence, carrying out syntax tree analysis on the SQL sentence, optimizing the SQL sentence based on a relational algebra theory, and optimizing based on cost query.

In some embodiments, the calculating, based on the execution directed acyclic graph, to obtain a calculation result corresponding to the data extraction request includes: the management node determines a plurality of execution tasks based on the execution directed acyclic graph; the management node distributes the plurality of execution tasks to a plurality of working nodes; and the management node receives execution results from the plurality of working nodes and generates calculation results corresponding to the data extraction requests.

In some embodiments, the calculating, based on the execution directed acyclic graph, to obtain a calculation result corresponding to the data extraction request includes: when the data quantity of the request parameters is larger than a preset data quantity threshold, splitting the request parameters to generate a plurality of parameter groups, generating a plurality of tasks based on the plurality of parameter groups and the execution directed acyclic graph, executing the plurality of tasks offline, and generating a calculation result corresponding to the data extraction request.

One of the embodiments of the present disclosure provides a system for extracting data by a data browser in a big data scenario in the resource industry, including: the generation engine is used for acquiring a data model to be extracted and generating a structure directed acyclic graph corresponding to the data model to be extracted; the merging engine is used for constructing a data directed acyclic graph based on the structure directed acyclic graph according to the request parameters; the execution engine is used for optimizing the data directed acyclic graph and generating an execution directed acyclic graph; and the calculation engine is used for calculating based on the execution directed acyclic graph and obtaining a calculation result corresponding to the data extraction request.

Drawings

The present specification will be further elucidated by way of example embodiments, which will be described in detail by means of the accompanying drawings. The embodiments are not limiting, in which like numerals represent like structures, wherein:

FIG. 1 is a block diagram of a system for extracting data by a data browser in a data industry big data scenario, according to some embodiments of the present description;

FIG. 2 is a flow chart of a method for a data browser to extract data in a data industry big data scenario, as shown in some embodiments of the present description;

FIG. 3 is a flow chart of generating a directed acyclic graph according to some embodiments of the present description;

FIG. 4 is a schematic diagram of a data directed acyclic graph of a data model according to some embodiments of the present description;

FIG. 5 is a schematic diagram of a directed acyclic graph of a data model after node merging according to some embodiments of the present description;

FIG. 6 is a flow diagram illustrating an optimization retrofit of SQL nodes according to some embodiments of the present description;

FIG. 7 is a flow chart of a calculation engine calculating a calculation result corresponding to a data extraction request according to some embodiments of the present description;

FIG. 8 is a schematic diagram of a management node and a plurality of work nodes shown in accordance with some embodiments of the present description;

FIG. 9 is a flow chart of a task engine computing a computation result corresponding to a data extraction request, according to some embodiments of the present description;

FIG. 10 is a schematic diagram of a data model 1 shown in accordance with some embodiments of the present description;

FIG. 11 is a schematic illustration of an initial execution directed acyclic graph corresponding to data model 1 shown according to some embodiments of the present description;

FIG. 12 is a schematic diagram of a corresponding final execution directed acyclic graph of data model 1 shown according to some embodiments of the present description;

fig. 13 is a block diagram of an electronic device according to some embodiments of the present description.

Detailed Description

In order to more clearly illustrate the technical solutions of the embodiments of the present specification, the drawings that are required to be used in the description of the embodiments will be briefly described below. It is apparent that the drawings in the following description are only some examples or embodiments of the present specification, and it is possible for those of ordinary skill in the art to apply the present specification to other similar situations according to the drawings without inventive effort. Unless otherwise apparent from the context of the language or otherwise specified, like reference numerals in the figures refer to like structures or operations.

As used in this specification and the claims, the terms "a," "an," "the," and/or "the" are not specific to a singular, but may include a plurality, unless the context clearly dictates otherwise. In general, the terms "comprises" and "comprising" merely indicate that the steps and elements are explicitly identified, and they do not constitute an exclusive list, as other steps or elements may be included in a method or apparatus.

A flowchart is used in this specification to describe the operations performed by the system according to embodiments of the present specification. It should be appreciated that the preceding or following operations are not necessarily performed in order precisely. Rather, the steps may be processed in reverse order or simultaneously. Also, other operations may be added to or removed from these processes.

FIG. 1 is a block diagram of a system for extracting data by a data browser in a data industry big data scenario, according to some embodiments of the present description. As shown in fig. 1, a system for extracting data by a data browser in a data industry big data scenario may include a generation engine, a merge engine, an execution engine, a calculation engine, and a task engine.

The generating engine can be used for acquiring the data model to be extracted and generating a structure directed acyclic graph corresponding to the data model to be extracted.

The merge engine may be configured to construct a data directed acyclic graph based on the structured directed acyclic graph according to the request parameters.

The execution engine may be configured to optimize the directed acyclic graph of data to generate an executed directed acyclic graph.

The calculation engine can be used for calculating based on the execution directed acyclic graph, and obtaining a calculation result corresponding to the data extraction request.

When the data volume of the request parameters is larger than a preset data volume threshold, the task engine can split the request parameters to generate a plurality of parameter sets, generate a plurality of tasks based on the plurality of parameter sets and the execution directed acyclic graph, execute the plurality of tasks offline, and generate a calculation result corresponding to the data extraction request.

For further description of the generation engine, the merge engine, the execution engine, the calculation engine, and the task engine, see FIG. 2 and its associated description, which are not repeated here.

It should be noted that the above description of the system for extracting data by the data browser and the modules thereof in the big data scenario of the resource industry is only for convenience of description, and the present disclosure should not be limited to the scope of the illustrated embodiments. It will be appreciated by those skilled in the art that, given the principles of the system, various modules may be combined arbitrarily or a subsystem may be constructed in connection with other modules without departing from such principles. In some embodiments, the generation engine, the merging engine, the execution engine, the calculation engine, and the task engine disclosed in fig. 1 may be different modules in one system, or may be one module to implement the functions of two or more modules. For example, each engine may share one memory module, or each engine may have a respective memory module. Such variations are within the scope of the present description.

FIG. 2 is a flow chart of a method for a data browser to extract data in a data industry big data scenario, according to some embodiments of the present description. The operational schematic of the method for extracting data by a data browser in the data industry big data scenario presented below is illustrative. In some embodiments, the process may be accomplished with one or more additional operations not described above and/or one or more operations not discussed. In addition, the order of the operations of the method for extracting data by the data browser in the asset management industry big data scenario shown in fig. 2 and described below is not limiting. As shown in fig. 2, the method for extracting data by the data browser in the large data scene of the resource industry can include the following steps.

Step 210, a data model to be extracted is obtained. In some embodiments, step 210 may be performed by a generation engine.

The indexes of the same data model can be in the same dimension or different dimensions, the result of one index can be used as the parameter of another index, and different data models can be built through various flexible combinations.

The index is a value which is realized according to the service logic caliber, and the value can be the sum, information data and the like, and has no fixed relative result. The logic of the index may be implemented by a domain specific language (e.g., SQL), may perform some formula operations on the defined index, or may be implemented by a computer programming language (e.g., java, python). The indexes are divided into an atomic index and a derivative index, wherein the derivative index is a composite index configured based on other indexes, and logic configured as a derivative index C is as follows: derived index c=index a+index B.

The dimension is the angle of the main statistics of the data model, and the dimension determines the volume of the data model. For example, a data model is named as a bond data model, and then, in particular, there are bonds, the bond is a dimension, there are many records in the dimension, and there are many records in the data model. For another example, a data model has the following fields: bond code, bond name, equity, place of transaction, valuation, etc., assuming the bond code is a dimension field, it determines how many records the data model has, the bond name is the column attached to the dimension. The dimension of the data model is thus made up of two columns: bond code, bond name; the net price, the trade place and the valuation are three indexes corresponding to the data model. The dimensions may also be multiple compound dimensions, i.e., dimensions made up of multiple columns. For example, the dimension column calculates the holding status of bonds in different combinations, and the combination and bond are composite dimensions because the same bond may have holding status in different combinations and multiple bonds in the same combination.

After a user initiates a data extraction request for a certain data model, the generation engine can take the data model as a data model to be extracted.

And 220, generating a structure directed acyclic graph corresponding to the data model to be extracted. In some embodiments, step 220 may be performed by a generation engine.

The directed acyclic graph (Directed Acyclic Graph, DAG) build mechanism is a mechanism that performs tasks in a particular order by organizing the tasks into a directed acyclic graph. Each task has one or more inputs and outputs, and the links between tasks are represented by directed edges. Each task may have a plurality of predecessor tasks and a plurality of successor tasks, with the dependencies between the tasks being represented by directed edges. When all predecessor tasks of a task are completed, the task can only begin executing, and when one task is completed, all successor tasks of the task can only begin executing. DAG graph execution can effectively improve the efficiency of execution of an application program because it can make the program more clearly understand the relationships between tasks and can make the executor more effectively arrange the execution order of the tasks.

The structural DAG graph may be a DAG graph that generates dependency relationships between different metrics according to the metric definition, and belongs to a static DAG graph of a corresponding data model.

FIG. 3 is a flow chart of generating a directed acyclic graph according to some embodiments of the present disclosure, as shown in FIG. 3, in some embodiments, a generation engine may first determine whether a structured directed acyclic graph corresponding to a data model to be extracted is cached; if the structure directed acyclic graph corresponding to the data model to be extracted is judged not to be cached, generating the structure directed acyclic graph corresponding to the data model to be extracted according to each index definition of the data model to be extracted; if the structure directed acyclic graph corresponding to the data model to be extracted is judged to be cached, the cached structure directed acyclic graph corresponding to the data model to be extracted can be directly obtained.

Step 230, constructing a data directed acyclic graph based on the structured directed acyclic graph according to the request parameters. In some embodiments, step 230 may be performed by a merge engine.

The data DAG graph may be a DAG graph generated by optimizing the structural DAG graph for different request parameters.

In some embodiments, optimizing the structural DAG graph may include: and combining the SQL index-increasing queries into one SQL index-increasing query for the SQL index-increasing queries meeting the preset combining conditions.

For example, for multiple SQL index-related queries of the same data model, the following conditions may be combined into one SQL to be executed:

(1) The look-up tables are identical.

(2) The WHERE condition of the query is exactly the same.

By way of example only, the SQL logic implemented by the index A promotion query is as follows:

the SQL logic for index B promotion is as follows:

at this time, the merging engine optimizes the index a and index B index queries to the following SQL, and simultaneously obtains A, B values of the two indexes as follows:

in some embodiments, different parameters of the same SQL may be combined into one SQL to execute, and the parameter combining strategy may be divided into a simple strategy and a parameter table strategy.

Simple strategy: modifying the "=" operator in the SQL statement to an "in" operator, which is only applicable to a scene where there is only one batch parameter (the parameters of the non-batch are not limited) and the batch parameter is not in the sub-query and the operator of the batch parameter must be the "=" operator, allowing the SQL statement to contain Limit statements, but the number of Limit must be 1;

by way of example only, the original SQL is as follows:

request parameters:

the optimized SQL is as follows:

parameter table policy: and carrying out associated query on the original SQL and the parameter table, and rewriting the corresponding parameters in the original SQL into the corresponding fields in the parameter table. The parameter table rewrite strategy currently supports most of the scenarios, which are not currently supported by:

1) There is a limit grammar, but not limit 1, and only limit 1 grammar optimization is currently supported.

2) Two different columns are queried simultaneously using the aggregate function.

By way of example only, the original SQL is:

the original SQL is optimized as follows:

as yet another example, the original SQL is:

the original SQL is optimized as follows:

as yet another example, the original SQL is:

the original SQL is optimized as follows:

in some embodiments, optimizing the structural DAG graph may include: and (3) splitting the batch parameters into single, filling the SQL sentences, and assembling all the SQL sentences in a unit mode. Wherein, the same SQL fingerprint and the structure of different parameters SQL are consistent. Specifically, for each group of batch parameters, the original dynamic parameters are replaced, and a unit statement is constructed and then the column is supplemented.

By way of example only, the original SQL is:

the parameters are as follows:

the optimized original SQL is as follows:

it can be appreciated that after the structural DAG graphs are merged based on the request parameters, a new DAG graph, namely a data DAG graph, is formed, and the data DAG graph performs merging optimization processing relative to the structural DAG graph, so as to lay a foundation for subsequent high-efficiency execution.

Step 240, optimize the directed acyclic graph of data to generate an execution directed acyclic graph. In some embodiments, step 240 may be performed by an execution engine.

The execution directed acyclic graph may generate a final execution DAG graph for data-based DAG graph optimization.

In some embodiments, the execution engine may merge and split for multiple nodes executing concurrently at the same level.

For example, fig. 4 is a schematic diagram of a directed acyclic graph of data of a data model according to some embodiments of the present disclosure, as shown in fig. 4, a node a, a node B, and a node C may execute concurrently, a node D executes after a node C executes, and after node merging, three indexes of the node a, the node B, and the node C may be merged into one node, where there is a problem: in order to solve this problem, fig. 5 is a schematic diagram of a directed acyclic graph of a data model after node merging, as shown in fig. 5, in which an execution engine merges a node a and a node B, and reserves the node C, and the execution order of each node is as follows: and concurrently executing the merging node of the A node and the B node and the C node, wherein the D node is executed after the C node is executed.

The execution engine may be constructed based on an open source framework Calcite, and fig. 6 is a schematic diagram of optimizing and modifying an SQL node according to some embodiments of the present disclosure, and as shown in fig. 6, in some embodiments, the execution engine may obtain an SQL statement, parse the SQL statement into a syntax tree, optimize the SQL statement based on a relational algebra theory, and optimize the SQL statement based on a cost query. And the optimized SQL is put into a cache to improve the optimization performance, and simultaneously, the optimization can be performed on different SQL in parallel.

Step 250, based on executing the directed acyclic graph, performing calculation to obtain a calculation result corresponding to the data extraction request. In some embodiments, step 250 may be performed by a computing engine or a task engine.

Fig. 7 is a flowchart of a calculation result corresponding to a calculation request of a calculation engine according to some embodiments of the present disclosure, fig. 8 is a schematic diagram of a management node and a plurality of working nodes according to some embodiments of the present disclosure, as shown in fig. 7 and fig. 8, in some embodiments, when a data amount of a request parameter is less than or equal to a preset data amount threshold, the calculation engine performs calculation by using a pseudo-coroutine mechanism, so as to further improve the calculation performance, and omit the overhead of core context switching, and the calculation engine may include a management node (Manager) and a plurality of working nodes (Woker), where the management node determines a plurality of execution tasks based on an execution directed acyclic graph, and the management node distributes the plurality of execution tasks to the plurality of working nodes, and the management node receives the execution result from the plurality of working nodes and generates the calculation result corresponding to the data extraction request.

Fig. 9 is a flowchart of a task engine calculating a calculation result corresponding to a data extraction request according to some embodiments of the present disclosure, and as shown in fig. 9, in some embodiments, when a data amount of a request parameter is greater than a preset data amount threshold, the request parameter is split to generate a plurality of parameter sets, a plurality of tasks are generated based on the plurality of parameter sets and the execution directed acyclic graph, the plurality of tasks are executed offline, and the calculation result corresponding to the data extraction request is generated.

It can be understood that for very complex data models, the number of index columns possibly included is hundreds, and meanwhile, the logic configuration of each column is very complex, so that the data model result data can be quickly acquired or checked, and the direct extraction performance through real-time calculation is difficult to meet the requirement, so that the capability of a task engine for executing offline running batch calculation, storage and query is provided, and the extraction efficiency of the complex data model is greatly ensured. The task engine provides two modes of timing and manual execution, and meanwhile, the running batch participation of the data model can be dynamically and flexibly specified. For example, with a data model of bond dimensions, all possible bonds may be configured to run at 10 pm per day.

For the scene of larger request parameter data volume, in order to prevent the situations of larger system load or downtime and the like caused by single processing of excessive request data of a program, the number of single execution request parameters is divided into groups, and the maximum parameter number of a single group is limited. For example, if the number of bond entries corresponding to the bond data model is ten thousand, and up to 500 bond entries can be processed for a single task, the bond entries are split into 20 parameter sets.

The method comprises the steps of analyzing and combing a data model to obtain a corresponding execution directed acyclic graph, and generating a plurality of node tasks according to the number of nodes executing the directed acyclic graph, wherein each node corresponds to one task, and the task states of different nodes can be as follows: to be executed, execution success, execution failure, to be activated.

For one data model, the total number of tasks generated = the number of nodes executing the directed acyclic graph x the number of parameter sets. For example, for a ten thousand bond entry data model, there are 20 parameter sets, 10 nodes executing a directed acyclic graph, and theoretically 20×10=200 tasks would be generated. When 200 tasks are all processed, the data model runs out.

And after the task of a certain node is successfully executed, if the current node has a subsequent node task, changing the task state of the next node from to-be-activated to-be-executed. The task execution engine can continuously drag out the task to be executed in an abnormal state for execution, the detailed reasons and the execution times of the task abnormality are recorded after each task execution abnormality, and the execution engine can continuously retry the task until the task retry is successful or the abnormality times reach the upper limit when the abnormality times do not reach the upper limit of the retry times.

For offline task configuration, the physical table names of the database stored offline by the data model can be configured, and the table field names corresponding to each dimension column and index column are configured. When the task is executed, the tasks are put into storage one by one according to the corresponding relation. A unique index is generally set up for a unique column of dimensions to improve query efficiency.

After the batch running of the data model is completed, if the query of the data model is performed at the moment, complex calculation logic is not performed any more, and the data stored offline is directly queried, so that the query efficiency is greatly improved.

By way of example only, fig. 10 is a schematic diagram of a data model 1 according to some embodiments of the present disclosure, as shown in fig. 10, the data model 1 includes A, B, C, D, E, F six metrics, wherein metric D, E uses the result of the computation of metric C as a join computation, and metric F uses the result of metric E as a join computation, with the corresponding initial execution directed acyclic graph of the data model 1 as shown in fig. 11. For the formula type or derivative type index, the index is constructed through various logic combinations through other indexes, such as an index f=an index x1+an index X2 and an index d=an index x2+an index X3, the final execution directed acyclic graph corresponding to the data model 1 is shown in fig. 12, and because the index E and the index F simultaneously refer to the index X2, the indexes are automatically combined into a node when the DAG graph is executed, so that the problem of repeated execution of the same index is solved.

For the above scenario, when the request triggers the extraction of the data model, the execution DAG graph is built according to the definition of the data model, and then the nodes are sequentially executed according to the relation of the execution DAG graph, at this time, the execution sequence is as follows:

1) A, B, C are executed in parallel.

2) After the execution of C is completed, D, E is concurrently executed.

3) And F, after the execution of the E is finished, executing the F.

4) And after the execution of F is finished, executing X1 and X2 concurrently.

5) And after the D is executed, executing X2 and X3 concurrently.

Since D, F all depend on X2, for a post-execution index, if the execution result of X2 has been obtained in context, the result of the pre-index execution can be directly multiplexed.

The method for extracting the data by the data browser in the large data scene in the resource management industry has the following beneficial effects:

1. the data extraction performance is improved in the scene of dynamic formation of a data model in the resource and management industry, the corresponding DAG graph is dynamically generated mainly through the data model, the DAG graph and input parameter information are analyzed to optimize or rewrite indexes to generate the corresponding data DAG graph, then the optimization transformation processing is carried out to form the corresponding execution DAG graph, finally the calculation engine carries out the number lifting processing on task distribution execution, and the execution efficiency can be greatly improved through the whole flow processing.

2. Clear task relationship management: the complex execution flow is visualized, and the whole data model is clearly known to be calculated through the DAG graph technology, so that the data model is easier to understand and manage.

3. And the degree of multiplexing of the node execution result is improved: regardless of how a user defines a data model, through intelligent analysis of nodes, the DAG optimization is performed from a structure DAG to a data-level DAG graph optimization, the condition of repeated execution of the nodes is solved, and the execution efficiency of the whole DAG graph is improved.

4. An efficient data model index calculation extraction engine: by constructing DAG graph mechanism, merging engine, executing engine, calculating engine, splitting engine, task scheduling engine, pseudo-cooperative and efficient task distribution mechanism and other technologies, the robustness of the program is enhanced, and the extraction efficiency of the data model is greatly improved; the task offline execution engine is also provided for the complex data model, the offline data model is executed in batches through the visual configuration and parameter splitting engine, the execution result is stored, and extremely fast response can be achieved for the extraction index stored.

5. Node traceability: the method can provide a visualized node monitoring mechanism, and can provide the optimized result of each executing node and the access parameter information of each node, so that a user can monitor/check the execution condition of the node better, and when the user and operation and maintenance personnel find that the result of the data model is inconsistent with the expectation, the user and the operation and maintenance personnel can check the optimized DAG graph and which node the access parameter observation of each node is inconsistent with the expectation, so that whether the logic of index configuration is problematic or the program is problematic is analyzed, and the task monitoring performance is improved.

It should be noted that the above description of the method for extracting data by the data browser in the big data scenario in the resource industry is only for illustration and description, and does not limit the application scope of the present disclosure. Various modifications and changes to the cloud virtual host server management method may be made by those skilled in the art under the guidance of this specification. However, such modifications and variations are still within the scope of the present description.

Fig. 13 is a schematic structural diagram of an electronic device shown according to some embodiments of the present description, as shown in fig. 13, which is an example of a hardware device that can be applied to aspects of the present invention. Electronic devices are intended to represent various forms of digital electronic computer devices, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.

As shown in fig. 13, the electronic device includes a computing unit that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) or a computer program loaded from a storage unit into a Random Access Memory (RAM). In the RAM, various programs and data required for the operation of the device may also be stored. The computing unit, ROM and RAM are connected to each other by a bus. An input/output (I/O) interface is also connected to the bus.

A plurality of components in an electronic device are connected to an I/O interface, comprising: an input unit, an output unit, a storage unit, and a communication unit. The input unit may be any type of device capable of inputting information to the electronic device, and may receive input numeric or character information and generate key signal inputs related to user settings and/or function controls of the electronic device. The output unit may be any type of device capable of presenting information and may include, but is not limited to, a display, speakers, video/audio output terminals, vibrators, and/or printers. Storage units may include, but are not limited to, magnetic disks, optical disks. The communication unit allows the electronic device to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunications networks, and may include, but is not limited to, modems, network cards, infrared communication devices, wireless communication transceivers and/or chipsets, such as bluetooth (TM) devices, wiFi devices, wiMax devices, cellular communication devices, and/or the like.

The computing unit may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing units include, but are not limited to, central Processing Units (CPUs), graphics Processing Units (GPUs), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, digital Signal Processors (DSPs), and any suitable processors, controllers, microcontrollers, and the like. The computing unit performs the various methods and processes described above. For example, in some embodiments, the method of data browser extraction of data in a resource industry big data scenario may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as a storage unit. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device via the ROM and/or the communication unit. In some embodiments, the computing unit may be configured by any other suitable means (e.g., by means of firmware) to perform the method of data extraction by the data browser in the resource industry big data scenario.

While the basic concepts have been described above, it will be apparent to those skilled in the art that the foregoing detailed disclosure is by way of example only and is not intended to be limiting. Although not explicitly described herein, various modifications, improvements, and adaptations to the present disclosure may occur to one skilled in the art. Such modifications, improvements, and modifications are intended to be suggested within this specification, and therefore, such modifications, improvements, and modifications are intended to be included within the spirit and scope of the exemplary embodiments of the present invention.

Meanwhile, the specification uses specific words to describe the embodiments of the specification. Reference to "one embodiment," "an embodiment," and/or "some embodiments" means that a particular feature, structure, or characteristic is associated with at least one embodiment of the present description. Thus, it should be emphasized and should be appreciated that two or more references to "an embodiment" or "one embodiment" or "an alternative embodiment" in various positions in this specification are not necessarily referring to the same embodiment. Furthermore, certain features, structures, or characteristics of one or more embodiments of the present description may be combined as suitable.

Furthermore, the order in which the elements and sequences are processed, the use of numerical letters, or other designations in the description are not intended to limit the order in which the processes and methods of the description are performed unless explicitly recited in the claims. While certain presently useful inventive embodiments have been discussed in the foregoing disclosure, by way of various examples, it is to be understood that such details are merely illustrative and that the appended claims are not limited to the disclosed embodiments, but, on the contrary, are intended to cover all modifications and equivalent arrangements included within the spirit and scope of the embodiments of the present disclosure. For example, while the system components described above may be implemented by hardware devices, they may also be implemented solely by software solutions, such as installing the described system on an existing server or mobile device.

Likewise, it should be noted that in order to simplify the presentation disclosed in this specification and thereby aid in understanding one or more inventive embodiments, various features are sometimes grouped together in a single embodiment, figure, or description thereof. This method of disclosure, however, is not intended to imply that more features than are presented in the claims are required for the present description. Indeed, less than all of the features of a single embodiment disclosed above.

Finally, it should be understood that the embodiments described in this specification are merely illustrative of the principles of the embodiments of this specification. Other variations are possible within the scope of this description. Thus, by way of example, and not limitation, alternative configurations of embodiments of the present specification may be considered as consistent with the teachings of the present specification. Accordingly, the embodiments of the present specification are not limited to only the embodiments explicitly described and depicted in the present specification.

Claims

1. The method for extracting data by the data browser in the large data scene in the resource management industry is characterized by comprising the following steps:

acquiring a data model to be extracted;

generating a structural directed acyclic graph corresponding to the data model to be extracted;

constructing a data directed acyclic graph based on the structured directed acyclic graph according to the request parameters;

optimizing the data directed acyclic graph to generate an execution directed acyclic graph;

based on the execution directed acyclic graph, calculating to obtain a calculation result corresponding to the data extraction request;

the building the data directed acyclic graph based on the structure directed acyclic graph according to the request parameters comprises the following steps:

for a plurality of SQL index lifting queries meeting preset merging conditions, merging the SQL index lifting queries into an SQL index lifting query, wherein the preset merging conditions comprise identical tables of the queries and identical WHERE conditions of the queries;

performing associated query on the original SQL and the parameter table, and rewriting corresponding parameters in the original SQL into corresponding fields in the parameter table;

splitting batch parameters into single, filling SQL sentences, and assembling all SQL sentences in a unit mode;

the calculating based on the execution directed acyclic graph to obtain a calculation result corresponding to the data extraction request comprises the following steps:

when the data volume of the request parameter is smaller than or equal to a preset data volume threshold, the management node of the calculation engine determines a plurality of execution tasks based on the execution directed acyclic graph, distributes the plurality of execution tasks to a plurality of working nodes, receives execution results from the plurality of working nodes, and generates calculation results corresponding to the data extraction request.

2. The method for extracting data by a data browser in a big data scenario of resource and management industry according to claim 1, wherein the generating a directed acyclic graph of a structure corresponding to the data model to be extracted includes:

judging whether a structural directed acyclic graph corresponding to the data model to be extracted is cached or not;

and if the structure directed acyclic graph corresponding to the data model to be extracted is not cached, generating the structure directed acyclic graph corresponding to the data model to be extracted according to each index definition of the data model to be extracted.

3. The method for extracting data by a data browser in a big data scenario of resource and management industry according to claim 1 or 2, wherein the optimizing the directed acyclic graph of the data to generate the execution directed acyclic graph comprises:

and combining and splitting the plurality of nodes which are executed at the same level and concurrently.

4. The method for extracting data by a data browser in a big data scenario of resource and management industry according to claim 1 or 2, wherein the optimizing the directed acyclic graph of the data to generate the execution directed acyclic graph comprises:

and acquiring an SQL sentence, carrying out syntax tree analysis on the SQL sentence, optimizing the SQL sentence based on a relational algebra theory, and optimizing based on cost query.

5. The method for extracting data by a data browser in a big data scenario of resource and management industry according to claim 1 or 2, wherein the calculating based on the execution directed acyclic graph to obtain a calculation result corresponding to a data extraction request includes:

the management node determines a plurality of execution tasks based on the execution directed acyclic graph;

the management node distributes the plurality of execution tasks to a plurality of working nodes;

and the management node receives execution results from the plurality of working nodes and generates calculation results corresponding to the data extraction requests.

6. The method for extracting data by a data browser in a big data scenario of resource and management industry according to claim 1 or 2, wherein the calculating based on the execution directed acyclic graph to obtain a calculation result corresponding to a data extraction request includes:

when the data quantity of the request parameters is larger than a preset data quantity threshold, splitting the request parameters to generate a plurality of parameter groups, generating a plurality of tasks based on the plurality of parameter groups and the execution directed acyclic graph, executing the plurality of tasks offline, and generating a calculation result corresponding to the data extraction request.

7. The system for extracting data by a data browser in a large data scene in resource management industry is characterized by comprising the following components:

the generation engine is used for acquiring a data model to be extracted and generating a structure directed acyclic graph corresponding to the data model to be extracted;

the merging engine is used for constructing a data directed acyclic graph based on the structure directed acyclic graph according to the request parameters;

the execution engine is used for optimizing the data directed acyclic graph and generating an execution directed acyclic graph;

the calculation engine is used for calculating based on the execution directed acyclic graph to obtain a calculation result corresponding to the data extraction request;

the constructing a data directed acyclic graph based on the structured directed acyclic graph according to request parameters includes: