WO2022156542A1 - 数据访问方法、系统和存储介质 - Google Patents

数据访问方法、系统和存储介质 Download PDF

Info

Publication number
WO2022156542A1
WO2022156542A1 PCT/CN2022/070540 CN2022070540W WO2022156542A1 WO 2022156542 A1 WO2022156542 A1 WO 2022156542A1 CN 2022070540 W CN2022070540 W CN 2022070540W WO 2022156542 A1 WO2022156542 A1 WO 2022156542A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
request
subtask
engine
probe
Prior art date
Application number
PCT/CN2022/070540
Other languages
English (en)
French (fr)
Inventor
张伟
Original Assignee
北京沃东天骏信息技术有限公司
北京京东世纪贸易有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京沃东天骏信息技术有限公司, 北京京东世纪贸易有限公司 filed Critical 北京沃东天骏信息技术有限公司
Publication of WO2022156542A1 publication Critical patent/WO2022156542A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses

Definitions

  • the present application is based on the CN application number 202110081569.5 and the filing date is January 21, 2021, and claims its priority.
  • the disclosure of the CN application is hereby incorporated into the present application as a whole.
  • the present disclosure relates to the technical field of databases, and in particular, to a data access method, system and storage medium.
  • the ETL data warehouse can realize the data sharing of different databases to achieve the purpose of supporting the data warehousing service 100.
  • the operation process includes:
  • Extract Extract and store up by data nodes (city 131-134, province 111-112, top-level 101), and store them; except for top-level data nodes, all other data nodes need to support incremental extraction;
  • An object of the present disclosure is to propose a data access architecture to improve the efficiency of data access in the case of multiple data nodes.
  • a data access system including: a data platform configured to receive a data request and send it to a data engine; receive a request execution result fed back by the data engine; and a data engine configured to Split the data request, obtain multiple request subtasks, and send the request subtasks to each data probe; summarize the subtask execution results from each data probe, generate the request execution result, and feed it back to the data platform; and multiple The data probe is configured to obtain the request subtask; call the corresponding data node to execute the request subtask, generate the subtask execution result, and feed back the subtask execution result to the data engine.
  • the data probes are deployed in a distributed manner, and the intranet environment of each database corresponds to one data probe.
  • At least one of the data platform or the data engine is a distributed cluster deployment.
  • the data probe is configured to: receive feedback data from the data nodes; and generate a subtask execution result based on the data model of the data platform according to the feedback data.
  • the data engine is configured to: publish the message queue requesting the subtask to the subtask topic; the data probe is configured to: filter the message queue from the subtask topic based on a preset message filtering rule, and obtain information belonging to the subtask topic. own request subtask.
  • the data probe is further configured to publish the subtask execution result to the result topic; the data engine is further configured to: obtain the subtasks of the request subtasks belonging to the same data request according to the message queue of the result topic Execution result; according to the obtained subtask execution result, aggregate it into a request execution result that conforms to the data model of the data platform.
  • the data platform is further configured to perform at least one of the following: establish a standard model and deliver it to the data engine; generate a correspondence between each data probe and a data node, and a rule for each corresponding data node, and send it to the corresponding data probe; receive data requests from external applications; after receiving the request execution results from the data engine, feedback the request execution results to the external applications; or after receiving data requests, execute data requests Check, if the check is passed, send the data request to the data engine.
  • At least one of the data platform, data engine or data probe is deployed in a virtualized manner.
  • a data access method including: a data platform sends a data request to a data engine; the data engine splits the data request, acquires multiple request subtasks, and sends the request subtasks To each data probe; the data probe obtains the request subtask, calls the corresponding data node to execute the request subtask, generates the subtask execution result, and feeds back the subtask execution result to the data engine; the data engine summarizes the data from each data probe.
  • the subtask execution result generates the request execution result and feeds it back to the data platform.
  • generating the subtask execution result includes: receiving feedback data from the data node; and generating the subtask execution result based on the data model of the data platform according to the feedback data.
  • sending the request subtask to each data probe includes: the data engine publishes the message queue of the request subtask to the subtask topic; the data probe acquisition request subtask includes: the data probe filters based on preset messages Rules, filter the message queue from the subtask topic, and get the request subtask that belongs to itself.
  • feeding back the subtask execution result to the data engine includes: the data probe publishes the subtask execution result to the result topic; the data engine summarizes the subtask execution result from each data probe, and generating the request execution result includes: The data engine obtains the subtask execution results of the request subtasks belonging to the same data request according to the message queue of the result topic; according to the obtained subtask execution results, it aggregates the request execution results conforming to the data model of the data platform.
  • the data access method further includes at least one of the following: the data platform establishes a standard model and sends it to the data engine; the data platform generates the correspondence between each data probe and the data node, and the rules of each corresponding data node , and send it to the corresponding data probe; the data platform receives the data request from the external application; after receiving the request execution result from the data engine, the request execution result is fed back to the external application; after the data platform receives the data request , perform data request verification, and send the data request to the data engine if the verification passes.
  • a data access system comprising: a memory; and a processor coupled to the memory, the processor being configured to perform any one of the above data accesses based on instructions stored in the memory method.
  • a non-transitory computer-readable storage medium on which computer program instructions are stored, and when the instructions are executed by a processor, implement the steps of any one of the above data access methods .
  • FIG. 1 is a schematic diagram of database level deployment in the related art.
  • FIG. 2 is a schematic diagram of some embodiments of the disclosed data access system.
  • FIG. 3 is a schematic diagram of other embodiments of the disclosed data access system.
  • 4A-4C are schematic diagrams of the operation of some embodiments of the data access system of the present disclosure.
  • FIG. 5 is a schematic diagram of further embodiments of the disclosed data access system.
  • FIG. 6 is a schematic diagram of further embodiments of the disclosed data access system.
  • FIG. 7 is a flowchart of some embodiments of the data access method of the present disclosure.
  • FIG. 8 is a flowchart of other embodiments of the disclosed data access method.
  • the leaf data node and all data structures of its superiors need to be modified; and the asynchronous operation of data extraction, conversion and loading, there is a certain delay, the multi-level will be more obvious, real-time Poor sex.
  • the data of the top-level data warehouse service will be incomplete or incorrect. For example, if the data node is down, the leaf data node of the data node cannot synchronize data with the upper-level node.
  • FIG. 2 A schematic diagram of some embodiments of the data access system of the present disclosure is shown in FIG. 2 .
  • the data platform 21 can receive the data request and send it to the data engine, and receive the request execution result fed back by the data engine.
  • the data platform 21 may be centrally deployed, and open user access interfaces and external application calling interfaces for users to initiate requests and accept calls from external applications.
  • the data platform 21 may be a separate server, or may be deployed in a virtual manner to reduce physical resource requirements and deployment costs.
  • data platform 21 may be deployed in a distributed cluster manner, thereby improving service capability and system robustness.
  • data requests may include data query (call) requests, data statistics requests, and the like.
  • the data engine 22 can split the data request, obtain multiple request subtasks, and send the request subtasks to each data probe; summarize the subtask execution results from each data probe, generate the request execution result, and feed it back to the data platform .
  • database call statement rules are stored in the data engine 22, and tasks can be split through statement analysis to obtain multiple request subtasks.
  • the data request may be split according to the target data node of the query, or the request may be split according to the data probe corresponding to the target data node of the query, so that the same request subtask only targets the same data node, or only For the data nodes that can be accessed by the same data probe, it is ensured that the data probe can execute the obtained request subtasks smoothly and completely.
  • the data engine 22 may be a separate server, or may be deployed in a virtual manner to reduce physical resource requirements and deployment costs.
  • the data engine 22 may be deployed in a distributed cluster manner, thereby improving service capability and system robustness.
  • the data probes 231 to 23n can obtain the request subtask, call the corresponding data node to execute the request subtask, generate the subtask execution result, and feed the subtask execution result back to the data engine.
  • the data probes are deployed in a distributed manner, for example, they can be deployed nearby according to the location where the data nodes are deployed, so as to improve the access efficiency to the data nodes.
  • each data probe can preset message filtering rules
  • the data engine 22 can provide all request subtasks to each data probe, and the data probe filters the request subtasks according to its own message filtering rules to obtain A request subtask that needs to execute itself.
  • the data probes 231-23n may be independent servers, or may be deployed in a virtual manner to reduce physical resource requirements and deployment costs.
  • the intranet environment of each database can be made to correspond to a data probe, thereby breaking through the access restrictions of the intranet environment and expanding the scope of data invocation.
  • the data engine can split the data request, and the probe associated with the data node executes the corresponding request sub-task, thereby realizing non-intrusive access to the data node and no data synchronization operation, which improves the data request efficiency. Real-time, improve request efficiency.
  • the data nodes do not need to interact with each other and will not affect each other.
  • the leaf data nodes can also feed back data to ensure the reliability and reliability of task execution to the greatest extent. Integrity of feedback data.
  • the data platform can generate a standard model as required, and the data probe can generate as many sub-models as possible that conform to the standard model according to the data fed back by the data node. Task execution result.
  • data node A has 50 parameters, of which 46 are included in the standard model; data node B has 80 parameters, of which 70 are included in the standard model, then
  • the data probe of query data node A feeds back the execution results of subtasks with 46 parameters that conform to the standard model, and the data probe of query data node B feeds back the execution results of subtasks that conform to the standard model with 70 parameters, so as to ensure that the data
  • the services are executed smoothly, the success rate is improved, and the requirements for the similarity and completeness of data nodes are reduced.
  • FIG. 3 A schematic diagram of other embodiments of the data access system of the present disclosure is shown in FIG. 3 .
  • the data platform cluster 31 includes a plurality of data platforms 311 to 31p, where p is a positive integer greater than 1. Each data platform in the data platform cluster 31 runs in distributed logic. In some embodiments, the data platform cluster 31 has the functions of business standard model management, virtual network probe topology management and monitoring, database instance management and monitoring, virtual data service management and query management.
  • the administrator can create a new standard model through the data platform cluster 31, and establish a mapping relationship between the standard model and the data node data structure for each data probe, so that the data probe can call rules between the standard model and the data node Perform data conversion between data services; establish a data service, establish a standard model for data request with the data service, and publish the data service to be invoked, thereby improving the scalability of the system.
  • the data platform cluster 31 may deliver the correspondence between each data probe and the data node, and the rules (data structure) of each corresponding data node to the corresponding data probe.
  • the data engine cluster 32 includes a plurality of data engines 321 ⁇ 32p, and q is a positive integer greater than 1. Each data platform in the data engine cluster 32 runs in distributed logic. In some embodiments, the data engine cluster 32 has the functions of parsing database statements (eg, V-SQL), disassembling data requests, distributing sub-query tasks, and aggregating sub-query results.
  • V-SQL parsing database statements
  • each data probe may have a data node's internal and external network proxy engine, a sub-query execution engine, and a database instance information collection probe.
  • the data probes 331-33n can accept configuration information from the data platform cluster 31 as a basis for subsequent execution of the request subtask and feedback of the execution result of the subtask.
  • the data nodes accessed by the data probes 331-33n may not have connections, thereby expanding the scope of data calling.
  • the data nodes accessed by the data probes 331 to 33n may have a hierarchical relationship, as shown in FIG. 1 , so that there is no need to call each other between data nodes, intrusion and influence on intermediate data nodes are avoided, and due to The performance of an intermediate data node causes the task to fail, which improves the reliability of task execution and the integrity of the feedback data.
  • Such a data access system can support standard model configuration based on a data platform, improve user friendliness and scalability; and improve the system's carrying capacity and parallel processing capacity through cluster deployment.
  • the data structure of the leaf node changes, other data nodes do not need to be notified, and other data nodes do not need to make any modifications, which does not affect the use of data warehouse services; no data synchronization operation is required, and the real-time performance of data query will not be affected by synchronization. Reliability and efficiency of data requests.
  • the distributed coordination of the data platform, the data engine, and the data probes can be implemented using zooKeeper.
  • zooKeeper is a software that provides consistent services for distributed applications. The functions provided include: configuration maintenance, domain name service, distributed synchronization, group service, etc., thereby reducing the implementation difficulty of the data access system and improving the implementation efficiency.
  • each data engine can publish tasks based on MQ (Message Queue, message queue) rules.
  • MQ Message Queue, message queue
  • the message queue of the subtask is published to the subtask Topic.
  • Each data probe can filter tasks based on MQ rules, for example, based on preset message filtering rules, filter message queues from subtask topics, and obtain their own request subtasks.
  • Such a system can use message queues to implement asynchronous task processing and decouple the system, which reduces service pressure and ensures the reliability of task execution.
  • each data engine can also collect execution results based on MQ rules. For example, according to the message queue of the result topic, the subtask execution result of the request subtask belonging to the same data request is obtained; according to the obtained subtask execution result, the request execution result conforming to the data model of the data platform is aggregated.
  • Each data probe can perform task execution result feedback based on MQ rules, such as publishing subtask execution results to the result topic for collection by each data engine.
  • Such a system can use message queues to realize asynchronous data feedback, and the system is decoupled, which reduces the service pressure and ensures the reliability of the feedback data.
  • the data access system includes a data platform 41 , a data engine 42 and a data probe 43 .
  • the data platform 41 (eg, the virtual management platform v-ms) mainly includes configuration management 411 , service engine 412 and network topology management 413 .
  • v-ms uses distributed cluster deployment to improve data service performance and reliability.
  • Configuration management 411 provides management functions for standard models and mapping relationships (eg, one or more of new, updated, offline, and deployment). Through the distributed deployment of zookeeper, the configuration and coordination between v-ms and multiple v-agents (virtual data probes) are guaranteed. The configuration management 411 can also deliver the standard model and the mapping relationship to the v-engine (virtual data engine).
  • the service engine 412 can provide data service management functions, including service registration, service monitoring, service online, service offline, service processing, and virtual data service query manager (supports writing v-sql query data on the page), etc.
  • the service engine 412 can receive invocation service requests and responses of external applications; issue query tasks to v-engine, and receive the task of v-engine to return query results.
  • Network topology management 413 can provide data node network topology management, visualization, database node health and data monitoring; receive monitoring data of all v-agents, and store them in a relational database for statistical analysis and visualization.
  • the data engine (virtual data engine v-engine) 42 is distributed in a cluster.
  • an appropriate number of v-engine instances in the cluster may be defined according to the query volume of the data service and the performance index of the actual service.
  • the v-engine mainly includes a task receiver 421 , a parser 422 and a data collector 423 . Between modules, modules and v-agent communicate through MQ. Each module is processed in a multi-threaded manner to improve the throughput and utilization of the system.
  • the task receiver 421 can receive the query task of v-ms, perform the service verification of the task, and send the task MQ to the subtask topic after success.
  • the parser 422 can asynchronously consume the subtask MQ in the task topic.
  • the task is composed of V-SQL. Refer to the grammar rules of V-SQL. According to the parsing rules, the task MQ is parsed into multiple subtasks.
  • the agent node sends the subtask MQ to the subtask Topic, and v-agent uses the message filter rules of MQ to consume its own subtask MQ messages.
  • the data collector 423 can consume the MQ messages on the result topic and parse the messages, wherein the v-agent sends the result of the subtask execution to the result topic. Through a query task, it will be split into multiple subtasks, because the data collector needs to collect the return results of multiple subtasks. The multiple results are then aggregated into a data structure conforming to the common standard model and returned to v-ms.
  • the data probe 43 and the information connection between the data probe 43 and other units are shown in FIG. 4C :
  • the main functions of the data probe (virtual data probe v-agent) 43 include an enterprise internal and external network agent engine, a sub-query execution engine, and a database instance information collection probe.
  • Data probe 43 includes configuration engine 431 , task engine 432 and probe module 433 .
  • the number of v-agent and databases in the data network is the same, and one v-agent needs to be deployed in the intranet environment of each database.
  • the configuration engine 431 can receive the configuration information of the standard model and the mapping relationship distributed by the v-ms in the listening mode.
  • the task engine 432 contains three important logical modules: task configuration parser, task processor, and data encapsulator.
  • the task configuration parser is mainly used to parse v-SQL.
  • v-SQL is an important content in subtasks.
  • v-SQL has a set of detailed grammar rules to support the diversity of queries.
  • the parser supports all the grammar rules of v-SQL, reads the general standard model and mapping relationship of the configuration engine, parses and maps v-SQL (the data structure of the general model is used in the middle) into the data structure of the database node, and passes it to the task processor;
  • the task processor can receive the database query instruction passed in by the task configuration parser, access the data node data in the intranet, and return the query result to the data encapsulator;
  • the data encapsulator can receive the data returned by the task processor, read the general standard model and mapping relationship of the configuration engine, map the data of the data node into the data of the general standard model, and return it to v-engine.
  • the probe 433 can act as an internal and external network agent, regularly check the health status and data status of database nodes through task polling, and feed back the results to v-ms .
  • the probe can interact with v-ms, and the interaction method can be asynchronous MQ, which reduces the pressure on the v-ms server and ensures the reliability of data.
  • Such a data access system has changed the traditional database data sharing implementation method through unique architecture design, distributed deployment, big data platform and other technical combinations, greatly reducing the cost of implementing and maintaining human and physical resources, and improving the security of business databases. , The real-time nature of data services.
  • the data access system can be deployed through a virtual architecture, for example, implemented by using zooKeeper scheduling, thereby avoiding the burden of physical device configuration and reducing maintenance costs.
  • FIG. 5 A schematic structural diagram of an embodiment of the data access system of the present disclosure is shown in FIG. 5 .
  • the implementation body of each part in the data access system includes a memory 501 and a processor 502 .
  • the memory 501 may be a magnetic disk, a flash memory or any other non-volatile storage medium.
  • the memory is used to store instructions in corresponding embodiments of the data access method performed by the corresponding sections below.
  • the processor 502 is coupled to the memory 501 and may be implemented as one or more integrated circuits, such as a microprocessor or microcontroller.
  • the processor 502 is used to execute the instructions stored in the memory, which can improve the real-time performance of data request and improve the request efficiency.
  • each part of the data access system 600 includes a memory 601 and a processor 602 .
  • Processor 602 is coupled to memory 601 through BUS bus 603 .
  • the data access system 600 can also be connected to an external storage device 605 through a storage interface 604 for recalling external data, and can also be connected to a network or another computer system (not shown) through a network interface 606 . It will not be described in detail here.
  • the data instructions are stored in the memory, and then the above-mentioned instructions are processed by the processor, which can improve the real-time performance of the data request and improve the request efficiency.
  • FIG. 7 A flowchart of some embodiments of the data access method of the present disclosure is shown in FIG. 7 .
  • step 701 the data platform sends the data request to the data engine.
  • the data request may be initiated by the user on the data platform, or sent to the data platform through an external application.
  • the data platform after receiving the data request, performs data request verification; if the verification is passed, the data request is sent to the data engine.
  • the verification may include analyzing the business type corresponding to the request, which may be implemented by judging whether it conforms to the standard template of at least one business of the data platform.
  • permission verification may also be performed to determine whether the user and application corresponding to the request have the permission to request the corresponding data.
  • the data engine splits the data request, acquires multiple request subtasks, and sends the request subtasks to each data probe.
  • the data request may be split according to the target data node of the query, or the request may be split according to the data probe corresponding to the target data node of the query, so that the same request subtask only targets the same data node, or only For the data nodes that can be accessed by the same data probe, it is ensured that the data probe can smoothly and completely execute the obtained request subtasks.
  • the data probe acquires the request subtask, invokes the corresponding data node to execute the request subtask, generates the subtask execution result, and feeds back the subtask execution result to the data engine.
  • each data probe can preset message filtering rules
  • the data engine can provide all request subtasks to each data probe
  • the data probe filters the request subtasks according to its own message filtering rules to obtain the required A request subtask that executes itself.
  • step 704 the data engine aggregates the subtask execution results from each data probe, generates the request execution result, and feeds it back to the data platform.
  • the request execution result needs to conform to the standard model of the data platform, so as to facilitate the identification of the data platform.
  • the data platform can feed back the request execution result to the external application.
  • the data engine can split the data request, and the probe associated with the data node executes the corresponding request sub-task, so as to realize the non-intrusive access to the data node, and no need for data synchronization operation, which improves the real-time performance of the data request. , to improve request efficiency.
  • the data probe after receiving the feedback data from the data node, can generate the subtask execution result based on the feedback data and the data model of the data platform, thereby overcoming the inconsistency of the data structure and language rules of each data node. problems, to ensure that the data platform can obtain the implementation results of unified standards, and to improve the adaptability and application scope of the system.
  • the data engine may publish the split message queue of the request subtasks to the subtask topic; the data probe filters the message queue from the subtask topic based on preset message filtering rules, and obtains its own request Subtasks.
  • the data probe after the data probe generates the subtask execution result, it can publish the subtask execution result to the result topic; the data engine summarizes the subtask execution result from each data probe, and generates the request execution result, including: data The engine obtains the subtask execution results of the request subtasks belonging to the same data request according to the message queue of the result topic; according to the obtained subtask execution results, it aggregates the request execution results that conform to the data model of the data platform.
  • the data platform can establish a standard model and send it to the data engine; the data platform can also generate the corresponding relationship between each data probe and the data node, as well as the rules of each corresponding data node, and send it to the corresponding Data probe; the data platform receives data requests from external applications.
  • the data platform can support users to perform standard model configuration and synchronize to relevant nodes, update data node rules and the corresponding relationship with data probes in time, improve user friendliness and scalability, and further improve data requests The reliability of task execution; in addition, since multiple data probes can execute request subtasks in parallel, the execution efficiency of characters can be further improved.
  • FIG. 8 A flowchart of other embodiments of the data access method of the present disclosure is shown in FIG. 8 .
  • step 801 the data platform establishes a standard model and sends it to the data engine.
  • Standard models may include syntax standards, data structure standards, and identifier standards for the same meaning.
  • the standard model can also be delivered to the data probe, so that the data probe can generate subtask execution results that conform to the standard model. For example, in multiple data nodes, "Yes” and “No” are represented by “1, 0" and “Y, N” respectively, and a unified standard is established, for example, "1, 0" is used.
  • the data platform In step 802, the data platform generates the correspondence between each data probe and the data node, as well as the rules of each corresponding data node, and issues it to the corresponding data probe.
  • the rules of a data node may include grammar rules, data structure rules, and meanings corresponding to each identifier of the node.
  • the data platform receives the data request from the external application, and performs data request verification.
  • the verification may include analyzing the service category corresponding to the request, which may be implemented by judging whether it conforms to a standard template of at least one service of the data platform.
  • permission verification may also be performed to determine whether the user and application corresponding to the request have the permission to request the corresponding data.
  • step 804 it is determined whether the verification is passed. If the verification is passed, step 805 is executed; otherwise, the execution of the current request can be terminated and an error is reported.
  • step 805 the data platform routes the data request to the data engine.
  • step 806 the data engine splits the data request, obtains multiple request subtasks, and publishes the message queue of the request subtask to the subtask topic based on the topic mode of MQ.
  • step 807 the data probe filters the message queue from the subtask topic based on the preset message filtering rule, and obtains its own request subtask.
  • step 808 the data probe invokes the corresponding data node to perform the request subtask, and receives feedback data from the data node.
  • the data probe generates a subtask execution result based on the data model (ie, the standard model) of the data platform according to the feedback data.
  • the data probe can perform result rule conversion on the feedback data based on the rules of the current data node and the standard model of the data platform, and generate a subtask execution result conforming to the standard model.
  • step 810 based on the topic mode of MQ, the data probe publishes the subtask execution result to the result topic.
  • step 811 the data engine obtains the subtask execution result of the request subtask belonging to the same data request according to the message queue of the result topic.
  • the data engine aggregates the subtask execution results into request execution results conforming to the data model of the data platform.
  • the data request is a data query
  • the data in the subtask execution results can be aggregated to generate a request execution result conforming to the standard model of the data platform; in some embodiments, if the data request is data statistics, Then, based on the statistical information in the execution result of each subtask, the second statistics are performed to generate the request execution result conforming to the standard model of the data platform.
  • step 813 after receiving the request execution result from the data engine, the data platform feeds back the request execution result to the external application.
  • the data engine can split the data request, and the probe associated with the data node executes the corresponding request sub-task, so as to realize the non-intrusive access to the data node, and no need for data synchronization operation, which improves the real-time performance of the data request. , to improve request efficiency.
  • the data nodes do not need to interact with each other and will not affect each other.
  • the leaf data nodes can also feed back data to ensure the reliability and reliability of task execution to the greatest extent. Integrity of feedback data.
  • leaf nodes When the data structure of leaf nodes changes, other data nodes do not need to be notified, and other data nodes do not need to be modified, which does not affect the use of data warehouse services; using message queues to achieve asynchronous task processing, system decoupling, which reduces service pressure At the same time, the reliability of task execution is guaranteed.
  • the present disclosure further provides a computer-readable storage medium on which computer program instructions are stored, and when the instructions are executed by a processor, implement the steps of the method in the corresponding embodiment of the data access method.
  • embodiments of the present disclosure may be provided as a method, apparatus, or computer program product. Accordingly, the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer-usable non-transitory storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein .
  • These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory result in an article of manufacture comprising instruction means, the instructions
  • the apparatus implements the functions specified in the flow or flow of the flowcharts and/or the block or blocks of the block diagrams.
  • the methods and apparatus of the present disclosure may be implemented in many ways.
  • the methods and apparatus of the present disclosure may be implemented by software, hardware, firmware, or any combination of software, hardware, and firmware.
  • the above-described order of steps for the method is for illustration only, and the steps of the method of the present disclosure are not limited to the order specifically described above unless specifically stated otherwise.
  • the present disclosure can also be implemented as programs recorded in a recording medium, the programs including machine-readable instructions for implementing methods according to the present disclosure.
  • the present disclosure also covers a recording medium storing a program for executing the method according to the present disclosure.

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本公开提出一种数据访问方法、系统和存储介质,涉及数据库技术领域。本公开的一种数据访问系统,包括:数据平台,被配置为接收数据请求并发送至数据引擎;接收数据引擎反馈的请求执行结果;数据引擎,被配置为拆分数据请求,获取多个请求子任务,并将请求子任务发送给各个数据探针;汇总来自各个数据探针的子任务执行结果,生成请求执行结果,并反馈给数据平台;和多个数据探针,被配置为获取请求子任务;调用对应的数据节点执行请求子任务,生成子任务执行结果并反馈给数据引擎。

Description

数据访问方法、系统和存储介质
相关申请的交叉引用
本申请是以CN申请号为202110081569.5,申请日为2021年1月21日的申请为基础,并主张其优先权,该CN申请的公开内容在此作为整体引入本申请中。
技术领域
本公开涉及数据库技术领域,特别是一种数据访问方法、系统和存储介质。
背景技术
在现代企业中,存在多个信息化系统,系统中的业务数据信息存储在数据库中,其中,关系型数据库是最常见的数据库。关系型数据库的类型多种多样,彼此之间存在一些差异。如图1所示,ETL数据仓库能够实现不同数据库的数据共享,达到支持数据仓储服务100的目的,其运行过程包括:
(1)抽取(extract):按照数据节点(市131~134、省111~112、顶层101)逐级向上抽取,并存储;除顶层数据节点以外,其他所有数据节点都需要支持增量抽取;
(2)转换(transform):将从叶子节点(131~134)抽取的数据,转换映射成当前数据节点定义的数据结构;
(3)加载(load):将第(2)步中转换好的数据写入当前数据节点的数据库中。
发明内容
本公开的一个目的在于提出一种数据访问架构,提高多数据节点的情况下数据访问的效率。
根据本公开的一些实施例的一个方面,提出一种数据访问系统,包括:数据平台,被配置为接收数据请求并发送至数据引擎;接收数据引擎反馈的请求执行结果;数据引擎,被配置为拆分数据请求,获取多个请求子任务,并将请求子任务发送给各个数据探针;汇总来自各个数据探针的子任务执行结果,生成请求执行结果,并反馈给数据平台;和多个数据探针,被配置为获取请求子任务;调用对应的数据节点执行请求子任务,生成子任务执行结果,并将子任务执行结果反馈给数据引擎。
在一些实施例中,数据探针为分布式部署,每个数据库的内网环境对应一个数据 探针。
在一些实施例中,数据平台或数据引擎中的至少一项为分布式集群部署。
在一些实施例中,数据探针被配置为:接收来自数据节点的反馈数据;根据反馈数据,基于数据平台的数据模型生成子任务执行结果。
在一些实施例中,数据引擎被配置为:将请求子任务的消息队列发布至子任务主题;数据探针被配置为:基于预设消息过滤规则,从子任务主题中过滤消息队列,获取属于自身的请求子任务。
在一些实施例中,数据探针还被配置为将子任务执行结果发布至结果主题;数据引擎还被配置为:根据结果主题的消息队列,获取属于同一个数据请求的请求子任务的子任务执行结果;根据获取的子任务执行结果,聚合成符合数据平台的数据模型的请求执行结果。
在一些实施例中,数据平台还被配置为执行以下至少一项:建立标准模型,并下发给数据引擎;生成各个数据探针与数据节点的对应关系,以及各个对应的数据节点的规则,并下发给对应的数据探针;接收来自外部应用的数据请求;在收到来自数据引擎的请求执行结果后,将请求执行结果反馈给外部应用;或在收到数据请求后,执行数据请求校验,在校验通过的情况下,将数据请求发送至数据引擎。
在一些实施例中,数据平台、数据引擎或数据探针中的至少一项为采用虚拟化方式部署。
根据本公开的一些实施例的一个方面,提出一种数据访问方法,包括:数据平台将数据请求发送至数据引擎;数据引擎拆分数据请求,获取多个请求子任务,并将请求子任务发送给各个数据探针;数据探针获取请求子任务,调用对应的数据节点执行请求子任务,生成子任务执行结果,并将子任务执行结果反馈给数据引擎;数据引擎汇总来自各个数据探针的子任务执行结果,生成请求执行结果,并反馈给数据平台。
在一些实施例中,生成子任务执行结果包括:接收来自数据节点的反馈数据;根据反馈数据,基于数据平台的数据模型生成子任务执行结果。
在一些实施例中,将请求子任务发送给各个数据探针包括:数据引擎将请求子任务的消息队列发布至子任务主题;数据探针获取请求子任务包括:数据探针基于预设消息过滤规则,从子任务主题中过滤消息队列,获取属于自身的请求子任务。
在一些实施例中,将子任务执行结果反馈给数据引擎包括:数据探针将子任务执行结果发布至结果主题;数据引擎汇总来自各个数据探针的子任务执行结果,生成请 求执行结果包括:数据引擎根据结果主题的消息队列,获取属于同一个数据请求的请求子任务的子任务执行结果;根据获取的子任务执行结果,聚合成符合数据平台的数据模型的请求执行结果。
在一些实施例中,数据访问方法还包括以下至少一项:数据平台建立标准模型并下发给数据引擎;数据平台生成各个数据探针与数据节点的对应关系,以及各个对应的数据节点的规则,并下发给对应的数据探针;数据平台接收来自外部应用的数据请求;在收到来自数据引擎的请求执行结果后,将请求执行结果反馈给外部应用;数据平台在收到数据请求后,执行数据请求校验,在校验通过的情况下,将数据请求发送至数据引擎。
根据本公开的一些实施例的一个方面,提出一种数据访问系统,包括:存储器;以及耦接至存储器的处理器,处理器被配置为基于存储在存储器的指令执行上文中任意一种数据访问方法。
根据本公开的一些实施例的一个方面,提出一种非瞬时性的计算机可读存储介质,其上存储有计算机程序指令,该指令被处理器执行时实现上文中任意一种数据访问方法的步骤。
附图说明
此处所说明的附图用来提供对本公开的进一步理解,构成本公开的一部分,本公开的示意性实施例及其说明用于解释本公开,并不构成对本公开的不当限定。在附图中:
图1为相关技术中数据库层级部署的示意图。
图2为本公开的数据访问系统的一些实施例的示意图。
图3为本公开的数据访问系统的另一些实施例的示意图。
图4A~4C为本公开的数据访问系统的一些实施例的运行示意图。
图5为本公开的数据访问系统的又一些实施例的示意图。
图6为本公开的数据访问系统的再一些实施例的示意图。
图7为本公开的数据访问方法的一些实施例的流程图。
图8为本公开的数据访问方法的另一些实施例的流程图。
具体实施方式
下面通过附图和实施例,对本公开的技术方案做进一步的详细描述。
发明人发现,相关技术中的数据共享方式中,除顶层数据节点以外,其他所有数据节点都需要支持增量抽取数据,对数据库有一定的侵入性。除了叶子数据节点以外,其他所有节点都需要建立独立的数据仓库,存储下级数据节点的所有数据,需要占用大量的存储资源,全量传输数据也会增加网络带宽的占用。且数据节点层级越多,数据冗余存储就越多,随着数据量的增大,存储成本会越来越大,会导致数据库和应用性能变差。
另外,当叶子数据节点的数据结构发生变化时,叶子数据节点和它上级所有数据结构都需要进行修改;且数据抽取,转换和加载时异步操作,有一定的延迟,多层级会更明显,实时性较差。此外,如果数据抽取,转换和加载,任意一个环节出现异常,会导致顶层数据仓库服务的数据不完整或不正确,比如数据节点宕机,该数据节点的叶子数据节点无法向上级节点同步数据。
本公开的数据访问系统的一些实施例的示意图如图2所示。
数据平台21能够接收数据请求并发送至数据引擎,并接收数据引擎反馈的请求执行结果。在一些实施例中,数据平台21可以集中部署,并开放用户访问接口、外部应用调用接口,供用户发起请求以及接受外部应用的调用。
在一些实施例中,数据平台21可以为单独的服务器,也可以采用虚拟方式部署以降低物理资源需求,降低部署成本。
在一些实施例中,数据平台21可以为分布式集群方式部署,从而提高服务能力和系统的鲁棒性。在一些实施例中,数据请求可以包括数据查询(调用)请求,数据统计请求等。
数据引擎22能够拆分数据请求,获取多个请求子任务,并将请求子任务发送给各个数据探针;汇总来自各个数据探针的子任务执行结果,生成请求执行结果,并反馈给数据平台。
在一些实施例中,数据引擎22中存储有数据库调用语句规则,能够通过语句分析拆分任务,从而获得多个请求子任务。在一些实施例中,可以根据查询的目标数据节点拆分数据请求,或者根据查询的目标数据节点对应的数据探针拆分请求,使同一个请求子任务只针对于同一个数据节点,或只针对于同一个数据探针能够访问的数据节点,从而保证数据探针能够顺利、完整的执行获得的请求子任务。
在一些实施例中,数据引擎22可以为单独的服务器,也可以采用虚拟方式部署 以降低物理资源需求,降低部署成本。
在一些实施例中,数据引擎22可以为分布式集群方式部署,从而提高服务能力和系统的鲁棒性。
数据探针231~23n(n为大于1的正整数)能够获取请求子任务,并调用对应的数据节点执行请求子任务,生成子任务执行结果,并将子任务执行结果反馈给数据引擎。在一些实施例中,数据探针为分布式部署,例如可以根据数据节点部署的位置就近部署,提高对于数据节点的访问效率。
在一些实施例中,每个数据探针可以预设消息过滤规则,数据引擎22可以将所有请求子任务提供给每个数据探针,数据探针根据自身的消息过滤规则过滤请求子任务,得到需要自身执行的请求子任务。
在一些实施例中,数据探针231~23n可以为单独的服务器,也可以采用虚拟方式部署以降低物理资源需求,降低部署成本。
在一些实施例中,可以使每个数据库的内网环境对应一个数据探针,从而突破内网环境的访问限制,扩大数据调用范围。
这样的数据访问系统中,数据引擎能够拆分数据请求,与数据节点相关联的探针执行对应的请求子任务,从而实现数据节点的无入侵访问,也无需数据同步操作,提高了数据请求的实时性,提高请求效率。另外,这样的数据访问系统中,数据节点之间无需交互,不会相互影响,当某个中间数据节点出现异常或故障时,叶子数据节点也能反馈数据,最大程度保证任务执行的可靠性和反馈数据的完整性。
在一些实施例中,由于各个数据节点的数据结构、数据的完备性各不相同,数据平台可以根据需要生成标准模型,数据探针根据数据节点反馈的数据生成符合标准模型的尽可能丰富的子任务执行结果。例如,对于同一条信息,标准模型下有100个参数,数据节点A具有50个参数,其中46个包括在标准模型中;数据节点B具备80个参数,其中70个包括在标准模型中,则查询数据节点A的数据探针反馈符合标准模型的具备46个参数的子任务执行结果,查询数据节点B的数据探针反馈符合标准模型的具备70个参数的子任务执行结果,从而保证在数据结构不相同、数据不全面的情况下服务顺利执行,提高成功率,降低对数据节点相似度、完备度的要求。
本公开的数据访问系统的另一些实施例的示意图如图3所示。
数据平台集群31包括多个数据平台311~31p,p为大于1的正整数。数据平台集群31中各个数据平台呈分布式逻辑运行。在一些实施例中,数据平台集群31具备业 务标准模型管理、虚拟网络探针拓扑管理和监控、数据库实例管理和监控、虚拟数据服务管理和查询管理的功能。
在一些实施例中,管理员可以通过数据平台集群31新建标准模型,并针对每个数据探针建立标准模型和数据节点数据结构的映射关系,以便数据探针在标准模型和数据节点的调用规则之间执行数据转换;建立数据服务,建立针对与该数据服务的数据请求标准模型,并发布数据服务以便被调用,从而提高系统的可扩展性。
在一些实施例中,数据平台集群31可以将各个数据探针与数据节点的对应关系,以及各个对应的数据节点的规则(数据结构),下发给对应的数据探针。
数据引擎集群32包括多个数据引擎321~32p,q为大于1的正整数。数据引擎集群32中各个数据平台呈分布式逻辑运行。在一些实施例中,数据引擎集群32具备解析数据库语句(如V-SQL),拆解数据请求、分发子查询任务,以及聚合子查询结果的功能。
数据探针331~33n,n为大于1的正整数。在一些实施例中,每个数据探针可以具备数据节点的内外网代理引擎、子查询执行引擎,以及数据库实例信息采集探针。在一些实施例中,数据探针331~33n能够接受来自数据平台集群31的配置信息,作为后续执行请求子任务和反馈子任务执行结果的基础。
在一些实施例中,数据探针331~33n访问的数据节点间可以不具备连接,从而扩展数据调用范围。在一些实施例中,数据探针331~33n访问的数据节点间可以具备层级关系,如图1所示,从而无需数据节点间的互相调用,避免对于中间数据节点的侵入和影响,也避免由于某个中间数据节点的性能导致任务失败,提高任务执行的可靠性和反馈数据的完整性。
这样的数据访问系统能够支持基于数据平台的标准模型配置,提高用户友好度和可扩展性;通过集群部署提高系统的承载能力和并行处理能力。当叶子节点数据结构发生变化,不需要通知其他数据节点,其他数据节点也不需要做任何修改,不影响数据仓库服务的使用;无需数据同步操作,不会因为同步影响数据查询的实时性,保证数据请求的可靠性和效率。
在一些实施例中,可以利用zooKeeper实现数据平台、数据引擎以及数据探针的分布式协调。zooKeeper是一个为分布式应用提供一致性服务的软件,提供的功能包括:配置维护、域名服务、分布式同步、组服务等,从而降低了数据访问系统的实现难度,提高实现效率。
在一些实施例中,各个数据引擎可以基于MQ(Message Queue,消息队列)规则进行任务发布,如基于MQ的Topic模式(称为“主题模式”,又名“发布订阅者模式”),将请求子任务的消息队列发布至子任务Topic。各个数据探针可以基于MQ规则进行任务筛选,如基于预设消息过滤规则,从子任务Topic中过滤消息队列,获取属于自身的请求子任务。
这样的系统能够利用消息队列实现异步任务处理,系统解耦,在降低了服务压力的同时,保证任务执行的可靠性。
在一些实施例中,各个数据引擎同样可以基于MQ规则进行执行结果收集。如根据结果Topic的消息队列,获取属于同一个数据请求的请求子任务的子任务执行结果;根据获取的子任务执行结果,聚合成符合数据平台的数据模型的请求执行结果。各个数据探针可以基于MQ规则进行任务执行结果反馈,如将子任务执行结果发布至结果Topic,以便各个数据引擎采集。
这样的系统能够利用消息队列实现异步数据反馈,系统解耦,在降低了服务压力的同时,保证反馈数据的可靠性。
本公开的数据访问系统中各个单元的一些实施例的示意图如图4A~4C所示。数据访问系统中包括数据平台41、数据引擎42和数据探针43。
数据平台41以及数据平台41与其他单元之间的信息连接如图4A所示:
数据平台41(如虚拟管理平台v-ms)主要包含配置管理411、服务引擎412和网络拓扑管理413。v-ms使用分布式集群部署,提升数据服务性能和可靠性。
配置管理411提供标准模型和映射关系的管理功能(例如新建,更新,下线和部署中的一种或多种)。通过zookeeper分布式部署,保证v-ms和多个v-agent(虚拟数据探针)之间的配置和协调性。配置管理411还能够将标准模型和映射关系下发到v-engine(虚拟数据引擎)。
服务引擎412能够提供数据服务管理功能,包括服务注册、服务监控、服务上线、服务下线,服务处理以及虚拟数据服务查询管理器(支持在页面上编写v-sql查询数据)等。服务引擎412能够接收外部应用的调用服务请求和响应;下发查询任务给v-engine,并接收v-engine的任务回传查询结果。
网络拓扑管理413能够提供数据节点网络拓扑管理,可视化,数据库节点健康情况和数据监控;接收所有v-agent的监控数据,并存储至关系型数据库,用于统计分析和可视化。
数据引擎42以及数据引擎42与其他单元之间的信息连接如图4B所示:
数据引擎(虚拟数据引擎v-engine)42呈集群分布式部署。在一些实施例中,可以根据数据服务的查询量和实际服务的性能指标,定义集群内合适的v-engine实例个数。
v-engine主要包含任务接收器421、解析器422和数据搜集器423。模块之间,模块和v-agent之间通过MQ通信。各个模块均为多线程方式处理,提升系统的吞吐量和利用率。
任务接收器421能够接收v-ms的查询任务,执行任务的业务校验,成功后发送任务MQ到子任务topic。
解析器422能够异步消费任务Topic中的子任务MQ,任务由V-SQL组成,参看V-SQL的语法规则,根据解析规则,将任务MQ解析成多个子任务,每个子任务对每个v-agent节点,将子任务MQ发送到子任务Topic,v-agent使用MQ的消息过滤器规则,消费属于自己的子任务MQ消息。
数据搜集器423能够消费结果topic上的MQ消息,并解析消息,其中,v-agent将子任务执行的结果发送到结果topic上。通过一个查询任务,会有被拆分为多个子任务,因为数据搜集器需要搜集多个子任务的返回结果。然后将多个结果聚合成符合通用标准模型的数据结构,返回给v-ms。
数据探针43以及数据探针43与其他单元之间的信息连接如图4C所示:
数据探针(虚拟数据探针v-agent)43主要功能包括企业内外网代理引擎、子查询执行引擎、数据库实例信息采集探针。数据探针43包含配置引擎431、任务引擎432和探针模块433。
v-agent和数据网络中的数据库个数一致,每个数据库的内网环境中需要部署一个v-agent。
配置引擎431能够通过监听模式,接收v-ms分发的标准模型和映射关系的配置信息。
任务引擎432包含三个重要的逻辑模块:任务配置解析器,任务处理器,数据封装器。
任务配置解析器主要用来解析v-SQL,v-SQL是子任务中的重要内容,v-SQL有一套详细的语法规则,用来支持查询的多样性。解析器支持v-SQL所有的语法规则,读取配置引擎的通用标准模型和映射关系,将v-SQL(中间使用的是通用模型的数据 结构)解析映射成数据库节点的数据结构,并传递给任务处理器;
任务处理器能够接收任务配置解析器传入的数据库查询指令,访问内网中的数据节点数据,并将查询结果返回给数据封装器;
数据封装器能够接收任务处理器返回的数据,读取配置引擎的通用标准模型和映射关系,将数据节点的数据映射成通用标准模型的数据,并返回给v-engine。
由于企业一般有内外网,业务数据库一般部署在内网,探针433能够充当内外网代理,通过任务轮询的方式,定时检查数据库节点的健康状态和数据情况,并将结果反馈给v-ms。探针可以与v-ms交互,交互方式可以为异步MQ的方式,降低了v-ms服务器的压力,同时也保证数据的可靠性。
这样的数据访问系统通过独有的架构设计,分布式部署,大数据平台等技术组合,改变了传统数据库数据共享实现方式,大大的减少实现和维护人力和物理资源成本,提升业务数据库的安全性、数据服务的实时性。
在一些实施例中,数据访问系统可以为通过虚拟架构部署,例如采用zooKeeper调度实现,从而避免了物理设备配置的负担,也降低了维护成本。
本公开数据访问系统的一个实施例的结构示意图如图5所示。数据访问系统中的每个部分的实现主体中均包括存储器501和处理器502。其中:存储器501可以是磁盘、闪存或其它任何非易失性存储介质。存储器用于存储下文中对应部分执行的数据访问方法的对应实施例中的指令。处理器502耦接至存储器501,可以作为一个或多个集成电路来实施,例如微处理器或微控制器。该处理器502用于执行存储器中存储的指令,能够提高数据请求的实时性,提高请求效率。
在一个实施例中,还可以如图6所示,数据访问系统600的每个部分的实现主体中包括存储器601和处理器602。处理器602通过BUS总线603耦合至存储器601。该数据访问系统600还可以通过存储接口604连接至外部存储装置605以便调用外部数据,还可以通过网络接口606连接至网络或者另外一台计算机系统(未标出)。此处不再进行详细介绍。
在该实施例中,通过存储器存储数据指令,再通过处理器处理上述指令,能够提高数据请求的实时性,提高请求效率。
本公开的数据访问方法的一些实施例的流程图如图7所示。
在步骤701中,数据平台将数据请求发送至数据引擎。在一些实施例中,数据请求可以由用户在数据平台发起,或通过外部应用发送给数据平台。
在一些实施例中,数据平台在收到数据请求后,执行数据请求校验;在校验通过的情况下,将数据请求发送至数据引擎。校验可以包括对请求对应的业务种类分析,可以通过判断是否符合数据平台的至少一个业务的标准模板来实现。在一些实施例中,还可以进行权限校验,判断该请求对应的用户和应用是否具备请求对应数据的权限。
在步骤702中,数据引擎拆分数据请求,获取多个请求子任务,并将请求子任务发送给各个数据探针。在一些实施例中,可以根据查询的目标数据节点拆分数据请求,或者根据查询的目标数据节点对应的数据探针拆分请求,使同一个请求子任务只针对于同一个数据节点,或只针对于同一个数据探针能够访问的数据节点,从而保证数据探针能够顺利、完整的执行获得的请求子任务。
在步骤703中,数据探针获取请求子任务,调用对应的数据节点执行请求子任务,生成子任务执行结果,并将子任务执行结果反馈给数据引擎。在一些实施例中,每个数据探针可以预设消息过滤规则,数据引擎可以将所有请求子任务提供给每个数据探针,数据探针根据自身的消息过滤规则过滤请求子任务,得到需要自身执行的请求子任务。
在步骤704中,数据引擎汇总来自各个数据探针的子任务执行结果,生成请求执行结果,并反馈给数据平台。在一些实施例中,请求执行结果需要符合数据平台的标准模型,以便于数据平台识别。在一些实施例中,数据平台可以将请求执行结果反馈给外部应用。
通过这样的方法,数据引擎能够拆分数据请求,与数据节点相关联的探针执行对应的请求子任务,从而实现数据节点的无入侵访问,也无需数据同步操作,提高了数据请求的实时性,提高请求效率。
在一些实施例中,数据探针在接收来自数据节点的反馈数据后,可以根据反馈数据,基于数据平台的数据模型生成子任务执行结果,从而克服各个数据节点的数据结构、语言规则等不一致的问题,确保数据平台能够得到统一标准的执行结果,提高了系统的适应性和应用范围。
在一些实施例中,数据引擎可以将拆分成的请求子任务的消息队列发布至子任务Topic;数据探针基于预设消息过滤规则,从子任务Topic中过滤消息队列,获取属于自身的请求子任务。通过这样的方法,能够实现数据引擎和数据探针间的异步通信,降低服务器负担,达到流量削峰的目的,提高运行性能。
在一些实施例中,数据探针在生成子任务执行结果后,可以将子任务执行结果发 布至结果Topic;数据引擎汇总来自各个数据探针的子任务执行结果,生成请求执行结果,包括:数据引擎根据结果主题的消息队列,获取属于同一个数据请求的请求子任务的子任务执行结果;根据获取的子任务执行结果,聚合成符合数据平台的数据模型的请求执行结果。
在一些实施例中,数据平台可以建立标准模型并下发给数据引擎;数据平台还可以生成各个数据探针与数据节点的对应关系,以及各个对应的数据节点的规则,并下发给对应的数据探针;数据平台接收来自外部应用的数据请求。
通过这样的方法,数据平台能够支持用户进行执行标准模型配置,并同步至相关节点,能够及时更新数据节点规则和与数据探针的对应关系,提高用户友好度和可扩展性,进一步提高数据请求任务执行的可靠性;另外,由于多数据探针能够并行执行请求子任务,能够进一步提高人物执行效率。
本公开的数据访问方法的另一些实施例的流程图如图8所示。
在步骤801中,数据平台建立标准模型并下发给数据引擎。标准模型可以包括语法标准、数据结构标准,以及对于同一个含义采用的标示符标准等。标准模型也可以下发给数据探针,以便数据探针生成符合标准模型的子任务执行结果。例如,在多个数据节点中,对于“是”、“否”,分别采用“1、0”和“Y、N”表示,则设立统一标准,例如采用“1、0”表示。
在步骤802中,数据平台生成各个数据探针与数据节点的对应关系,以及各个对应的数据节点的规则,并下发给对应的数据探针。在一些实施例中,数据节点的规则可以包括该节点的语法规则、数据结构规则,以及对于每个标示符所对应的含义等。
在步骤803中,数据平台接收来自外部应用的数据请求,执行数据请求校验。在一些实施例中,校验可以包括对请求对应的业务种类分析,可以通过判断是否符合数据平台的至少一个业务的标准模板来实现。在一些实施例中,还可以进行权限校验,判断该请求对应的用户和应用是否具备请求对应数据的权限。
在步骤804中,判断是否校验通过。若校验通过,则执行步骤805,否则,可以终止当前请求的执行并反馈发生错误。
在步骤805中,数据平台将数据请求路由至数据引擎。
在步骤806中,数据引擎拆分数据请求,获取多个请求子任务,基于MQ的Topic模式,将请求子任务的消息队列发布至子任务Topic。
在步骤807中,数据探针基于预设消息过滤规则,从子任务主题中过滤消息队列, 获取属于自身的请求子任务。
在步骤808中,数据探针调用对应的数据节点执行请求子任务,接收来自数据节点的反馈数据。
在步骤809中,数据探针根据反馈数据,基于数据平台的数据模型(即标准模型)生成子任务执行结果。在一些实施例中,数据探针可以基于当前数据节点的规则,以及数据平台的标准模型,对反馈数据进行结果规则转换,生成符合标准模型的子任务执行结果。
在步骤810中,基于MQ的Topic模式,数据探针将子任务执行结果发布至结果Topic。
在步骤811中,数据引擎根据结果Topic的消息队列,获取属于同一个数据请求的请求子任务的子任务执行结果。
在步骤812中,数据引擎将子任务执行结果聚合成符合数据平台的数据模型的请求执行结果。在一些实施例中,若数据请求为数据查询,则可以将子任务执行结果中的数据汇总,生成符合数据平台的标准模型的请求执行结果;在一些实施例中,若数据请求为数据统计,则基于每个子任务执行结果中的统计信息进而二次统计,生成符合数据平台的标准模型的请求执行结果。
在步骤813中,数据平台在收到来自数据引擎的请求执行结果后,将请求执行结果反馈给外部应用。
通过这样的方法,数据引擎能够拆分数据请求,与数据节点相关联的探针执行对应的请求子任务,从而实现数据节点的无入侵访问,也无需数据同步操作,提高了数据请求的实时性,提高请求效率。另外,这样的数据访问系统中,数据节点之间无需交互,不会相互影响,当某个中间数据节点出现异常或故障时,叶子数据节点也能反馈数据,最大程度保证任务执行的可靠性和反馈数据的完整性。当叶子节点数据结构发生变化,不需要通知其他数据节点,其他数据节点也不需要做任何修改,不影响数据仓库服务的使用;利用消息队列实现异步任务处理,系统解耦,在降低了服务压力的同时,保证任务执行的可靠性。
在一个实施例中,本公开还提出一种计算机可读存储介质,其上存储有计算机程序指令,该指令被处理器执行时实现数据访问方法对应实施例中的方法的步骤。本领域内的技术人员应明白,本公开的实施例可提供为方法、装置、或计算机程序产品。因此,本公开可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实 施例的形式。而且,本公开可采用在一个或多个其中包含有计算机可用程序代码的计算机可用非瞬时性存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本公开是参照根据本公开实施例的方法、设备(系统)和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
至此,已经详细描述了本公开。为了避免遮蔽本公开的构思,没有描述本领域所公知的一些细节。本领域技术人员根据上面的描述,完全可以明白如何实施这里公开的技术方案。
可能以许多方式来实现本公开的方法以及装置。例如,可通过软件、硬件、固件或者软件、硬件、固件的任何组合来实现本公开的方法以及装置。用于所述方法的步骤的上述顺序仅是为了进行说明,本公开的方法的步骤不限于以上具体描述的顺序,除非以其它方式特别说明。此外,在一些实施例中,还可将本公开实施为记录在记录介质中的程序,这些程序包括用于实现根据本公开的方法的机器可读指令。因而,本公开还覆盖存储用于执行根据本公开的方法的程序的记录介质。
最后应当说明的是:以上实施例仅用以说明本公开的技术方案而非对其限制;尽管参照较佳实施例对本公开进行了详细的说明,所属领域的普通技术人员应当理解:依然可以对本公开的具体实施方式进行修改或者对部分技术特征进行等同替换;而不 脱离本公开技术方案的精神,其均应涵盖在本公开请求保护的技术方案范围当中。

Claims (15)

  1. 一种数据访问系统,包括:
    数据平台,被配置为接收数据请求并发送至数据引擎,和接收所述数据引擎反馈的请求执行结果;
    数据引擎,被配置为拆分所述数据请求,获取多个请求子任务,并将所述请求子任务发送给各个数据探针,汇总来自各个数据探针的子任务执行结果,生成请求执行结果,并反馈给所述数据平台;和
    多个数据探针,被配置为获取所述请求子任务,调用对应的数据节点执行所述请求子任务,生成子任务执行结果,并将所述子任务执行结果反馈给所述数据引擎。
  2. 根据权利要求1所述的数据访问系统,其中,所述数据探针为分布式部署,每个数据库的内网环境对应一个所述数据探针。
  3. 根据权利要求1所述的数据访问系统,其中,所述数据平台或所述数据引擎中的至少一项为分布式集群部署。
  4. 根据权利要求1所述的数据访问系统,其中,所述数据探针被配置为:
    接收来自所述数据节点的反馈数据;
    根据所述反馈数据,基于数据平台的数据模型生成所述子任务执行结果。
  5. 根据权利要求1所述的数据访问系统,其中,
    所述数据引擎被配置为:将所述请求子任务的消息队列发布至子任务主题;
    所述数据探针被配置为:基于预设消息过滤规则,从所述子任务主题中过滤消息队列,获取属于自身的请求子任务。
  6. 根据权利要求5所述的数据访问系统,其中,
    所述数据探针还被配置为将子任务执行结果发布至结果主题;
    所述数据引擎还被配置为:
    根据所述结果主题的消息队列,获取属于同一个数据请求的请求子任务的子任务执行结果;
    根据获取的子任务执行结果,聚合成符合所述数据平台的数据模型的请求执行结果。
  7. 根据权利要求1~6任意一项所述的数据访问系统,其中,所述数据平台还被配 置为执行以下至少一项:
    建立标准模型,并下发给所述数据引擎;
    生成各个数据探针与数据节点的对应关系,以及各个对应的数据节点的规则,并下发给对应的所述数据探针;
    接收来自外部应用的数据请求;在收到来自所述数据引擎的请求执行结果后,将所述请求执行结果反馈给所述外部应用;或
    在收到所述数据请求后,执行数据请求校验;在校验通过的情况下,将所述数据请求发送至所述数据引擎。
  8. 根据权利要求1~6任意一项所述的数据访问系统,其中,所述数据平台、所述数据引擎或所述数据探针中的至少一项为采用虚拟化方式部署。
  9. 一种数据访问方法,包括:
    数据平台将数据请求发送至数据引擎;
    所述数据引擎拆分所述数据请求,获取多个请求子任务,并将所述请求子任务发送给各个数据探针;
    数据探针获取所述请求子任务,调用对应的数据节点执行所述请求子任务,生成子任务执行结果,并将所述子任务执行结果反馈给所述数据引擎;
    所述数据引擎汇总来自各个数据探针的子任务执行结果,生成请求执行结果,并反馈给所述数据平台。
  10. 根据权利要求9所述的数据访问方法,其中,所述生成子任务执行结果包括:
    接收来自所述数据节点的反馈数据;
    根据所述反馈数据,基于数据平台的数据模型生成所述子任务执行结果。
  11. 根据权利要求9所述的数据访问方法,其中,
    所述将所述请求子任务发送给各个数据探针包括:所述数据引擎将所述请求子任务的消息队列发布至子任务主题;
    所述数据探针获取所述请求子任务包括:所述数据探针基于预设消息过滤规则,从所述子任务主题中过滤消息队列,获取属于自身的请求子任务。
  12. 根据权利要求11所述的数据访问方法,其中,
    所述将所述子任务执行结果反馈给所述数据引擎包括:所述数据探针将子任务执行结果发布至结果主题;
    所述数据引擎汇总来自各个数据探针的子任务执行结果,生成请求执行结果,包括:
    所述数据引擎根据所述结果主题的消息队列,获取属于同一个数据请求的请求子任务的子任务执行结果;
    根据获取的子任务执行结果,聚合成符合所述数据平台的数据模型的请求执行结果。
  13. 根据权利要求9~12任意一项所述的数据访问方法,还包括以下至少一项:
    所述数据平台建立标准模型并下发给所述数据引擎;
    所述数据平台生成各个数据探针与数据节点的对应关系,以及各个对应的数据节点的规则,并下发给对应的数据探针;
    所述数据平台接收来自外部应用的数据请求;在收到来自所述数据引擎的请求执行结果后,将所述请求执行结果反馈给外部应用;或
    所述数据平台在收到所述数据请求后,执行数据请求校验;在校验通过的情况下,将所述数据请求发送至所述数据引擎。
  14. 一种数据访问系统,包括:
    存储器;以及
    耦接至所述存储器的处理器,所述处理器被配置为基于存储在所述存储器的指令执行如权利要求9至13任一项所述的方法。
  15. 一种非瞬时性计算机可读存储介质,其上存储有计算机程序指令,该指令被处理器执行时实现权利要求9至13任意一项所述的方法的步骤。
PCT/CN2022/070540 2021-01-21 2022-01-06 数据访问方法、系统和存储介质 WO2022156542A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110081569.5 2021-01-21
CN202110081569.5A CN113761079A (zh) 2021-01-21 2021-01-21 数据访问方法、系统和存储介质

Publications (1)

Publication Number Publication Date
WO2022156542A1 true WO2022156542A1 (zh) 2022-07-28

Family

ID=78786444

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/070540 WO2022156542A1 (zh) 2021-01-21 2022-01-06 数据访问方法、系统和存储介质

Country Status (2)

Country Link
CN (1) CN113761079A (zh)
WO (1) WO2022156542A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113761079A (zh) * 2021-01-21 2021-12-07 北京沃东天骏信息技术有限公司 数据访问方法、系统和存储介质
CN117331926B (zh) * 2023-12-01 2024-03-01 太平金融科技服务(上海)有限公司 一种数据稽核方法、装置、电子设备和存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107273540A (zh) * 2017-07-05 2017-10-20 北京三快在线科技有限公司 分布式搜索及索引更新方法、系统、服务器及计算机设备
CN107766451A (zh) * 2017-09-26 2018-03-06 广西电网有限责任公司电力科学研究院 一种面向电力大数据的跨数据库关联检索方法
CN109815254A (zh) * 2018-12-28 2019-05-28 北京东方国信科技股份有限公司 基于大数据的跨地域任务调度方法及系统
CN110516142A (zh) * 2019-08-29 2019-11-29 深圳前海微众银行股份有限公司 数据查询方法、装置、设备及介质
CN113761079A (zh) * 2021-01-21 2021-12-07 北京沃东天骏信息技术有限公司 数据访问方法、系统和存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107273540A (zh) * 2017-07-05 2017-10-20 北京三快在线科技有限公司 分布式搜索及索引更新方法、系统、服务器及计算机设备
CN107766451A (zh) * 2017-09-26 2018-03-06 广西电网有限责任公司电力科学研究院 一种面向电力大数据的跨数据库关联检索方法
CN109815254A (zh) * 2018-12-28 2019-05-28 北京东方国信科技股份有限公司 基于大数据的跨地域任务调度方法及系统
CN110516142A (zh) * 2019-08-29 2019-11-29 深圳前海微众银行股份有限公司 数据查询方法、装置、设备及介质
CN113761079A (zh) * 2021-01-21 2021-12-07 北京沃东天骏信息技术有限公司 数据访问方法、系统和存储介质

Also Published As

Publication number Publication date
CN113761079A (zh) 2021-12-07

Similar Documents

Publication Publication Date Title
US11411897B2 (en) Communication method and communication apparatus for message queue telemetry transport
CN103034735B (zh) 一种大数据分布式文件导出方法
WO2022156542A1 (zh) 数据访问方法、系统和存储介质
US8730819B2 (en) Flexible network measurement
US9817703B1 (en) Distributed lock management using conditional updates to a distributed key value data store
US20180278725A1 (en) Converting a single-tenant application for multi-tenant use
Grover et al. Data Ingestion in AsterixDB.
WO2007115477A1 (fr) Procédé et système de synchronisation de données
US9438665B1 (en) Scheduling and tracking control plane operations for distributed storage systems
WO2019153488A1 (zh) 服务配置管理方法、装置、存储介质和服务器
US10158709B1 (en) Identifying data store requests for asynchronous processing
CN110581893B (zh) 数据传输方法、装置、路由设备、服务器及存储介质
US10747739B1 (en) Implicit checkpoint for generating a secondary index of a table
CN108573029B (zh) 一种获取网络访问关系数据的方法、装置及存储介质
CN109325077A (zh) 一种基于canal和kafka实现实时数仓的系统
EP3172682B1 (en) Distributing and processing streams over one or more networks for on-the-fly schema evolution
WO2017092384A1 (zh) 一种集群数据库分布式存储的方法和装置
US20180337840A1 (en) System and method for testing filters for data streams in publisher-subscriber networks
US11816511B1 (en) Virtual partitioning of a shared message bus
EP4094161A1 (en) Method and apparatus for managing and controlling resource, device and storage medium
CN112685499A (zh) 一种工作业务流的流程数据同步方法、装置及设备
CN113901078A (zh) 业务订单关联查询方法、装置、设备及存储介质
US9898614B1 (en) Implicit prioritization to rate-limit secondary index creation for an online table
CN117294763A (zh) 基于代理服务的终端请求信息转发的云桌面终端管理方法
CN112637130A (zh) 一种基于消费队列的数据交换方法及系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22742021

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22742021

Country of ref document: EP

Kind code of ref document: A1