CN109739878B

CN109739878B - Big data query method, device, server and storage medium

Info

Publication number: CN109739878B
Application number: CN201811526906.1A
Authority: CN
Inventors: 杨文博
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2018-12-13
Filing date: 2018-12-13
Publication date: 2020-12-01
Anticipated expiration: 2038-12-13
Also published as: CN109739878A

Abstract

The disclosure relates to a big data query method, a big data query device, a server and a storage medium, wherein the method comprises the following steps: receiving a data query request of a natural language mode of a user side; generating a data calculation instruction tree comprising a data calculation instruction set according to the data query request; distributing the data calculation instructions in the data calculation instruction tree to corresponding calculation task nodes for execution according to the data calculation instruction tree; acquiring an execution result of the computing task node, and performing summary computation on the execution result according to the data computing instruction tree to obtain a data query result; and returning the data query result to the user side. The method and the device can effectively inquire and analyze the data stored in different positions, realize data reading and integration from the perspective of global business, effectively analyze scenes needing different department data in enterprises, and realize cross-business-field data association during enterprise data statistics.

Description

Big data query method, device, server and storage medium

Technical Field

The present disclosure relates to the field of data query technologies, and in particular, to a big data query method, apparatus, server, and storage medium.

Background

In the big data era, enterprises need to fully utilize the results of big data analysis and mining to realize data-driven customer analysis, market sales, product innovation, management and operation and the like.

At present, a Business statistical analysis report system based on traditional BI (Business Intelligence) can enable a Business department to timely and sufficiently acquire related data reports so as to grasp information of projects, users or markets and the like. However, the traditional enterprise data statistics has obvious limitations and disadvantages in cross-service data correlation, and is often limited to data query operation on a database, and data operation and utilization are limited to local data, and data cannot be integrated and read from the perspective of global services. In addition, the business personnel are limited by factors such as data business isolation and system isolation, and it is difficult to freely operate and interpret global data on the existing report system based on business requirements.

Disclosure of Invention

In order to overcome the problems in the related art, the present disclosure provides a big data query method, apparatus, server and storage medium.

According to a first aspect of the embodiments of the present disclosure, a big data query method is provided, including:

receiving a data query request of a natural language mode of a user side;

generating a data calculation instruction tree comprising a data calculation instruction set according to the data query request;

distributing the data calculation instructions in the data calculation instruction tree to corresponding calculation task nodes for execution according to the data calculation instruction tree;

acquiring an execution result of the computing task node, and performing summary computation on the execution result according to the data computing instruction tree to obtain a data query result;

and returning the data query result to the user side.

Optionally, the generating a data computation instruction tree including a data computation instruction set according to the data query request includes:

converting the data query request into a structured query data expression statement;

and generating a data calculation instruction tree comprising a data calculation instruction set according to the structured query data expression statement, wherein the data calculation instruction tree comprises a data dependency relationship and a calculation process dependency relationship.

Optionally, the converting the data query request into a structured query data expression statement includes:

performing text segmentation and service semantic annotation on the data query request to obtain segmentation and semantic annotation results;

performing context analysis on the data query request to perform service logic completion on the word segmentation and semantic annotation result to obtain a service logic completion result;

and generating a structured query data expression statement according to the word segmentation and semantic annotation result and the service logic completion result.

Optionally, the generating a data computation instruction tree including a data computation instruction set according to the structured query data expression statement includes:

determining a calculation task node where the data to be inquired is located according to the structured inquiry data expression statement and the enterprise data knowledge graph, and determining a data dependency relationship and a calculation process dependency relationship, wherein the enterprise data knowledge graph comprises metadata information of enterprise data;

and determining a data calculation instruction set according to the calculation task node, the data dependency relationship and the calculation process dependency relationship of the data to be inquired, and generating a data calculation instruction tree comprising the data calculation instruction set.

Optionally, the enterprise data knowledge graph includes a business logic layer, an analysis system layer, and a data index layer;

the business logic layer comprises the relation between the business group to which the data belongs and the business metadata;

the analysis system layer comprises an analysis method and analysis system definition, text description and service explanation of the analysis system, and service associated information of the analysis system;

the data index layer comprises definition of data indexes, text description of the data indexes, calculation specifications of the data indexes, storage paths, calculation time and historical spans of the data indexes.

Optionally, the method further includes:

and collecting metadata information of enterprise data, sorting the collected metadata information according to the enterprise data knowledge graph, and storing the sorted metadata information into the enterprise data knowledge graph.

Optionally, the acquiring metadata information of enterprise data, sorting the acquired metadata information according to the enterprise data knowledge graph, and storing the sorted metadata information into the enterprise data knowledge graph includes:

monitoring and collecting metadata information of enterprise data;

according to the metadata information standard of an enterprise, cleaning and aligning the metadata information;

extracting service logic in the cleaned and aligned metadata information according to the cleaned and aligned metadata information, and storing the service logic into a service logic layer of the enterprise data knowledge graph;

determining an analysis system in the business logic and a classification corresponding to the analysis system according to the analysis system definition, and storing the analysis system and the classification corresponding to the analysis system to an analysis system layer in an enterprise data knowledge graph;

and extracting and aligning the data indexes under the classification corresponding to the analysis system to unify the name, the text description and the calculation specification of the data indexes, and storing the name, the text description and the calculation specification of the data indexes into a data index layer of the enterprise data knowledge graph.

Optionally, after allocating the data computation instruction in the data computation instruction tree to the corresponding computation task node for execution according to the data computation instruction tree, the method further includes:

acquiring corresponding data according to the data calculation instruction through a local data calculation management module deployed in a calculation task node; or

And summarizing and calculating the data acquired by other calculation task nodes by a local data calculation management module deployed in the calculation task nodes according to the data calculation instruction.

Optionally, the summarizing and calculating data obtained by other computing task nodes by the local data computing management module deployed in the computing task node according to the data computing instruction includes:

the data to be summarized sent by other computing task nodes are received through a local data computing management module deployed in the computing task nodes, data deficiency completion and dimension normalization processing are carried out on the data to be summarized according to the data computing instruction, and the processed data are summarized and calculated.

Optionally, the performing data missing value filling on the data to be summarized includes:

if the data to be summarized are data with different densities, carrying out default filling on the data to be summarized by adopting mean value interpolation filling, sparse alignment or distributed interpolation filling;

and if the data to be summarized are the data obtained by different summarizing methods, carrying out heuristic calculation and completion on the data to be summarized.

Optionally, returning the data query result to the user side includes:

determining a data display template of the data query result according to a user analysis object, a data analysis operator and a data calculation instruction of a result data organization pattern in the data calculation instruction tree;

organizing the data query result into a presentation style of the data presentation template according to the data presentation template to obtain a data presentation result of the data query result;

and sending the data display result to the user side.

Optionally, the computing task node includes a data center and a database.

According to a second aspect of the embodiments of the present disclosure, there is provided a big data query apparatus, including:

the query request receiving module is configured to receive a data query request of a natural language mode of a user side;

the instruction tree generating module is configured to generate a data calculation instruction tree comprising a data calculation instruction set according to the data query request;

the global data calculation management module is configured to distribute the data calculation instructions in the data calculation instruction tree to corresponding calculation task nodes for execution according to the data calculation instruction tree;

the result summarizing module is configured to acquire an execution result of the computing task node, and summarize and calculate the execution result according to the data computing instruction tree to obtain a data query result;

and the data result display module is configured to return the data query result to the user side.

Optionally, the instruction tree generating module includes:

a query analysis unit configured to convert the data query request into a structured query data expression statement;

a computational process analysis engine configured to generate a data computation instruction tree including a set of data computation instructions from the structured query data expression statement, the data computation instruction tree including data dependencies and computational process dependencies.

Optionally, the query analysis unit is specifically configured to:

Optionally, the computational process analysis engine is specifically configured to:

Optionally, the device further comprises

And the enterprise data knowledge map module is configured to collect metadata information of enterprise data, arrange the collected metadata information according to the enterprise data knowledge map, and store the arranged metadata information into the enterprise data knowledge map.

Optionally, the enterprise data knowledge-graph module includes:

the system comprises a metadata acquisition unit, a data processing unit and a data processing unit, wherein the metadata acquisition unit is configured to monitor and acquire metadata information of enterprise data;

the cleaning and aligning unit is configured to clean and align the metadata information according to the metadata information standard of the enterprise;

the business logic extraction unit is configured to extract the business logic in the cleaned and aligned metadata information according to the cleaned and aligned metadata information, and store the business logic in a business logic layer of the enterprise data knowledge graph;

the analysis system determining unit is configured to determine an analysis system in the business logic and a classification corresponding to the analysis system according to the analysis system definition, and store the analysis system and the classification corresponding to the analysis system to an analysis system layer in an enterprise data knowledge graph;

and the data index extraction unit is configured to extract and align the data indexes under the classification corresponding to the analysis system so as to unify the names, the text descriptions and the calculation specifications of the data indexes, and store the names, the text descriptions and the calculation specifications of the data indexes into a data index layer of the enterprise data knowledge graph.

Optionally, the apparatus further comprises:

the local data calculation management module is deployed in the calculation task node and configured to acquire corresponding data according to the data calculation instruction; or performing summary calculation on the data acquired by other calculation task nodes according to the data calculation instruction.

Optionally, the local data calculation management module includes:

and the summarizing and calculating unit is configured to receive the data to be summarized sent by other calculation task nodes, perform data deficiency value completion and dimension normalization processing on the data to be summarized according to the data calculation instruction, and perform summarizing and calculation on the processed data.

Optionally, the summary calculating unit includes:

the missing value supplementing subunit is configured to perform missing value supplementing on the data to be summarized by adopting mean value interpolation supplementing, sparse alignment or distributed interpolation supplementing if the data to be summarized are data with different densities; and if the data to be summarized are the data obtained by different summarizing methods, carrying out heuristic calculation and completion on the data to be summarized.

Optionally, the data result presenting module is specifically configured to:

and sending the data display result to the user side.

Optionally, the computing task node includes a data center and a database.

According to a third aspect of the embodiments of the present disclosure, there is provided a server, including:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to perform the big data query method of the first aspect.

According to a fourth aspect of embodiments of the present disclosure, there is provided a non-transitory computer-readable storage medium, wherein instructions of the storage medium, when executed by a processor of a mobile terminal, enable the mobile terminal to perform a big data query method as described in the first aspect.

According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program, a method of which includes the steps of the big data query method in the first aspect.

The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects: by receiving a data query request of a natural language mode of a user end, generating a data computation instruction tree comprising a data computation instruction set according to the data query request, distributing data computation instructions in the data computation instruction tree to corresponding computation task nodes for execution according to the data computation instruction tree, obtaining the execution results of the computation task nodes, summarizing the execution results of each computation task node according to the data computation instruction tree to obtain data query results, returning the data query results to the user end, effectively querying and analyzing data stored at different positions, reading and integrating data from the perspective of global business, effectively analyzing scenes needing different department data in an enterprise, realizing cross-correlation of data in cross-business fields during enterprise data statistics, and eliminating the data query process of a user, the data of any dimensionality can be operated and analyzed under the constraints of data organization isolation, physical isolation, business isolation and the like.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.

FIG. 1 is a flow diagram illustrating a big data query method in accordance with an exemplary embodiment;

FIG. 2 is a flow diagram illustrating a method for big data queries in accordance with an exemplary embodiment;

FIG. 3 is a flow diagram of converting a data query request into a structured query data expression statement in an exemplary embodiment;

FIG. 4 is a flow diagram of generating a tree of data computation instructions from a structured query data expression statement in an exemplary embodiment;

FIG. 5 is a flow diagram of collecting and collating metadata information for enterprise data into an enterprise knowledge graph in an exemplary embodiment;

FIG. 6 is a flow diagram illustrating a method for big data queries in accordance with an exemplary embodiment;

FIG. 7 is a block diagram illustrating the structure of a big data query device, according to an example embodiment;

fig. 8 is a block diagram illustrating a configuration of a server according to an example embodiment.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.

Fig. 1 is a flowchart illustrating a big data query method according to an exemplary embodiment, where the big data query method is used in a server, and an application scenario of the method is as follows: different departments of the enterprise undertake different business responsibilities, and based on business requirements, independent data of related departments can be obtained to carry out statistical analysis support. Data analysis results of all service lines are generally stored in persistent media (such as relational databases, files and the like) of service isolation, department isolation and even data center crossing, general service personnel lack technical capability to integrate and summarize the data, but deep data analysis of a single service often involves other related data results, such as financial analysts need to comprehensively know sales volume, user public praise, supply chain conditions and the like of products.

As shown in fig. 1, the method includes the following steps.

In step S11, a data query request in the natural language mode at the user end is received.

The user can input the natural language mode statement through the user side to be used as a data query request. For example, the data query request input by the user may be: i want to subscribe to weekly liveness email briefings of group mobile App users at 10 am every monday.

Through the data query request of the natural language mode, the user can use the business type language or the natural language understood by the user to query the required data analysis result.

In step S12, a data calculation instruction tree including a data calculation instruction set is generated according to the data query request.

And performing natural language processing on the data query request, for example, performing word segmentation and semantic annotation on the data query request to decompose the data query request, determining the storage position of the data to be queried, generating data calculation instructions of each storage position to obtain a data calculation instruction set, and generating a data calculation instruction tree comprising the data calculation instruction set according to the data dependency relationship and the calculation process dependency relationship. The storage position of the data to be inquired is a calculation task node in the data calculation instruction tree.

In step S13, the data computation instruction in the data computation instruction tree is assigned to the corresponding computation task node for execution according to the data computation instruction tree.

The data calculation instruction tree comprises calculation task nodes and corresponding data calculation instructions. The computing task node comprises a data center and a database.

Data computation instruction tree, bottom-up approach, initiating and managing data integration summary computation

The data integration summary calculation can be started and managed in a bottom-up mode through the global data calculation management module according to the data calculation instruction tree, the data calculation instruction is distributed to the calculation task node corresponding to the data calculation instruction, and the calculation task node executes the data calculation instruction to obtain corresponding data. One computing task node can obtain data to be queried from the local, can also perform summary calculation on data obtained by other computing task nodes, or perform summary calculation on data obtained from the local and data obtained by other computing task nodes. By data calculation according to the instruction tree, distributed query and calculation of data can be performed, data of different storage positions can be obtained, and processing speed is improved.

In step S14, the execution results of the computation task nodes are obtained, and the execution results are summarized and computed according to the data computation instruction tree, so as to obtain a data query result.

The server may be a root node in the data calculation instruction tree, and performs summary calculation on the execution results of the calculation task nodes to obtain a data query result corresponding to the data query request of the user.

In step S15, the data query result is returned to the user end.

The data query result can be directly returned to the user side, and the data query result can be matched with a display form required by the user and returned to the user side.

Optionally, returning the data query result to the user side includes:

and sending the data display result to the user side.

The data result display module can be used for matching the data display template, and the data result display module can store the corresponding relation between the data analysis operator, the user analysis object, the result data organization pattern and the data display template, and determine the data display template of the data query result. For example, according to the user analysis object as user analysis, the data analysis operator as summary analysis and the result data organization style as brief report, the data presentation template as the user summary presentation template 1 can be obtained. In the data display template, each analyzed index has a corresponding legend, table and file format, and the template is automatically extracted according to the query and scene of the user and the display style expected by the user is generated. For example, a "user activity week comparison report" automatically generates a visual report according to a template, wherein a "user login pv/uv" index is automatically displayed in a line graph with hour/day granularity, and a sliding option of a time window is added.

And sending the data display result to the user side, so that the user side can display the data display result in a data display form desired by the user.

The big data query method provided by the exemplary embodiment generates a data computation instruction tree including a data computation instruction set according to a data query request by receiving the data query request in a natural language mode of a user terminal, distributes data computation instructions in the data computation instruction tree to corresponding computation task nodes for execution according to the data computation instruction tree, obtains execution results of the computation task nodes, performs summary computation on the execution results of each computation task node according to the data computation instruction tree to obtain data query results, and returns the data query results to the user terminal, so that data stored in different positions can be effectively queried and analyzed, data reading and integration from a global service perspective can be realized, scenes in which different departments need data in an enterprise can be effectively analyzed, cross-correlation of data across service fields during enterprise data statistics can be realized, the constraints of data organization isolation, physical isolation, business isolation and the like suffered by a user in the data query process are eliminated, and the data of any dimensionality can be operated and analyzed.

FIG. 2 is a flow diagram illustrating a method for big data query, as shown in FIG. 2, including the following steps, according to an example embodiment.

In step S21, a data query request in the natural language mode at the user end is received.

The specific content of this step is the same as that of step S11 in the above exemplary embodiment, and is not described here again.

In step S22, the data query request is converted into a structured query data expression statement.

The information described by the structural query data expression statement comprises the following information: 1) the identity of the inquirer, the user group to which the inquirer belongs and the current context information; 2) computing arrangement of user query operation, such as single start or timing start, effective time and/or expiration time, and the like; 3) the method comprises the following steps that a user wants to inquire a business range of data, wherein the business range comprises enterprises, departments and product lines to which the data to be inquired belong and a specific data owner list; 4) description objects of user access data, such as user data, product data, financial data, or supply chain data; 5) the user requests a data analysis operator corresponding to the data, and the available data analysis operators comprise data comparison (Compare), data summarization (Aggregation), key factors (Factor) and the like; 6) the data display forms expected by the user comprise mails, reports, files or APIs and the like; 7) the data organization form that the user wants includes brief, ordinary or detailed.

FIG. 3 is a flow diagram of converting a data query request into a structured query data expression statement in an exemplary embodiment, as shown in FIG. 3, which may include the steps of:

in step S221, text segmentation and service semantic annotation are performed on the data query request to obtain segmentation and semantic annotation results.

Performing text word segmentation on the data query request in the natural language mode to obtain word segmentation results, and performing service semantic annotation on each word segmentation result to obtain word segmentation and semantic annotation results. For example, the data query request in natural language mode is: "i want to subscribe to weekly liveness email bulletins of group mobile App users at 10 am every monday", the words and semantic annotation result is obtained by converting the words and the semantic annotation into standardized input semantic components in the form shown in table 1: i (Requester), want (V), 10 am every Monday (Task Scheduling), subscribe (V), group (Enterprise), Mobile App (Business group), user (Analysis Object), week (Data Time Span), liveness (Analysis Method), mail (Result Format), brief (Result Schema).

TABLE 1 participle and Business semantics Annotation

In step S222, the data query request is subjected to context analysis, so as to perform service logic completion on the segmentation and semantic annotation result, and obtain a service logic completion result.

Still taking the data query request as an example, performing context analysis on the data query request, and determining that the group to which the user belongs is an ECS and the department is E2E; the user is browsing KPI report data through a Web client. Obtaining a context analysis result: user Group (User Group) -Lenovo: ECS: E2E: Analyist, User scene (Query scene) -Web Client + KPI Report. When the service logic completion is carried out on the word segmentation and semantic annotation result, according to the group to which the user belongs, the service group to which the query data belongs is all departments of the ECS group; and generating a Web end report form according to the data result according to the current user scene. Obtaining a service logic completion result: data Group (Data Group) -Lenovo: ECS: All _ Dep: App, Data Result style (Result Schema) -Web Report + Email.

In step S223, a structured query data expression statement is generated according to the segmentation and semantic annotation result and the service logic completion result.

Still taking the above data query request as an example, the obtained partial structured query data expression statement is as follows:

Enterprise_ID:002,035

Department_ID:0021,0358

Product_ID:156,386

Analysis_Obj:user

Analysis_Method:Aggregation

Result Format:Email+Web Report

Result Schema:Briefing

the above-mentioned structural query data expression statement is only a partial structural query data expression statement of the above-mentioned data query request conversion, and also includes many structural query expression statements related to data query, and they are not listed one by one here. By utilizing technologies such as natural language processing and a business knowledge ontology base, the natural language query statement of a user based on a business angle is converted into a structured query data expression statement, and business personnel can understand and operate data conveniently.

In step S23, a data computation instruction tree including a data computation instruction set is generated according to the structured query data expression statement, where the data computation instruction tree includes data dependencies and computation process dependencies.

And generating a data calculation instruction set according to the structured query data expression statement, and representing the data calculation instruction set as a tree-shaped execution flow according to the data dependency relationship and the calculation process dependency relationship to obtain the data calculation instruction tree.

FIG. 4 is a flow diagram of generating a tree of data computation instructions from a structured query data expression statement in an exemplary embodiment, as shown in FIG. 4, the generating a tree of data computation instructions including a set of data computation instructions from the structured query data expression statement, including:

in step S231, according to the structured query data expression statement and the enterprise data knowledge graph, a computation task node where the data to be queried is located is determined, and a data dependency relationship and a computation process dependency relationship are determined, where the enterprise data knowledge graph includes metadata information of enterprise data.

Where an enterprise data knowledge graph is Metadata (Metadata) information for data across an enterprise. Metadata is also called intermediate data and relay data, and is data describing data, mainly information describing data attributes, and is used to support functions such as indicating storage locations, history data, resource searching, file recording, and the like. Metadata is an electronic catalog, and in order to achieve the purpose of creating a catalog, the contents or features of data must be described and collected, so as to achieve the purpose of assisting data retrieval.

The metadata information of the enterprise big data is stored and reported by using a predefined standardized data structure, the enterprise uniformly establishes a data structure standard, and each department and a data team design and manage the data structure and the metadata information of each business according to the enterprise data standard.

As shown in Table 2, the enterprise data knowledge graph optionally comprises a business logic layer, an analysis system layer and a data index layer; the business logic layer comprises the relation between the business group to which the data belongs and the business metadata; the analysis system layer comprises an analysis method and analysis system definition, text description and service explanation of the analysis system, and service associated information of the analysis system; the data index layer comprises definition of data indexes, text description of the data indexes, calculation specifications of the data indexes, storage paths, calculation time and historical spans of the data indexes. The enterprise data knowledge graph can provide a query interface for the server to query the metadata information to determine the computing task node where the data to be queried is located, and the data dependency relationship and the computing process dependency relationship among the data.

TABLE 2 Enterprise data knowledge graph Structure

In step S232, a data calculation instruction set is determined according to the calculation task node, the data dependency relationship, and the calculation process dependency relationship where the data to be queried is located, and a data calculation instruction tree including the data calculation instruction set is generated.

The calculation required by user query is decomposed by a calculation process analysis engine, and a data calculation instruction set is organized into a tree-shaped execution flow according to the data dependency relationship and the calculation process dependency relationship, each non-leaf node represents a summary process, each leaf node represents a reading process of original data, and a root node represents a final data analysis result. For example, a root node in the data calculation instruction tree is a server, a calculation task node at the next level of the server is a data center, a calculation task node at the next level of the data center is a database, and a calculation task node at the next level of the database is a data table, so that the server allocates data with query to respective corresponding data centers according to the data center where the data to be queried is located, if the data to be queried in one data center is located in different databases, the corresponding data calculation instruction is allocated to the corresponding database, and if the data to be queried in one database is located in different data tables, the data in different data tables is acquired according to the data calculation instruction.

The physical mapping relation between the data query request of the user and the enterprise data can be obtained by querying an enterprise data knowledge graph, for example, which product lines are specifically covered by a mobile App, which statistical indexes are specifically corresponding to a user liveness, which data centers are specifically corresponding to the storage of target data, which databases and data tables are specifically corresponding to, so that a data computing instruction set is generated through analysis, the data computing instruction set is executed and managed by a global data and computing management module, a data computing instruction tree is generated, and computing task nodes are controlled to execute step by step from bottom to top according to the data computing instruction tree.

By utilizing the technologies of computer language compiling, distributed computing and the like, the structured query data expression statement is converted into a data computing instruction set which can be distributed and executed and integrates data query and summary, and the data computing instruction set is optimized and organized according to the priority and the step dependence state.

In step S24, the data computation instruction in the data computation instruction tree is assigned to the corresponding computation task node for execution according to the data computation instruction tree.

The specific content of this step is the same as that of step S13 in the above exemplary embodiment, and is not described here again.

In step S25, the execution results of the computation task nodes are obtained, and the execution results are summarized and computed according to the data computation instruction tree, so as to obtain a data query result.

The specific content of this step is the same as that of step S14 in the above exemplary embodiment, and is not described here again.

In step S26, the data query result is returned to the user end.

The specific content of this step is the same as that of step S15 in the above exemplary embodiment, and is not described here again.

According to the big data query method provided by the exemplary embodiment, the data query request is converted into the structured query data expression statement, the data computation instruction tree comprising the data computation instruction set is generated according to the structured query data expression statement, the instruction tree comprises the data dependency relationship and the computation process dependency relationship, and therefore corresponding data can be obtained in a distributed mode through each computation task node according to the data computation instruction tree, and accurate data in the enterprise global scope can be obtained.

On the basis of the technical scheme, the method can also optionally comprise the following steps:

The metadata information of the enterprise data can be monitored and collected through the enterprise data knowledge graph module, and the collected metadata information is sorted and stored in the enterprise data knowledge graph.

Fig. 5 is a flowchart of collecting and sorting metadata information of enterprise data into an enterprise data knowledge graph in an exemplary embodiment, and as shown in fig. 5, the collecting metadata information of enterprise data, sorting the collected metadata information according to the enterprise data knowledge graph, and storing the sorted metadata information into the enterprise data knowledge graph includes:

in step S501, metadata information of enterprise data is monitored and collected.

In step S502, the metadata information is cleaned and aligned according to the metadata information standard of the enterprise.

According to the metadata information standard of an enterprise and in combination with a text analysis technology, cleaning and aligning the metadata information of the collected enterprise global data, for example, normalizing the equivalent index terms of 'user activity, user activity and activated user', and merging the terms of 'Mobile division, MBD, Mobile Business decision' and the like referring to the same Business group.

In step S503, according to the cleaned and aligned metadata information, extracting a business logic in the cleaned and aligned metadata information, and storing the business logic in a business logic layer of the enterprise data knowledge graph.

For the cleaned and aligned metadata information, the business logic such as the relation between the business group to which the data in the metadata information belongs and the business metadata is automatically extracted, and the business logic is automatically summarized and arranged according to the hierarchical relation, for example, the extractable hierarchical business architecture "XX group → mobile business department → sales business" in "group mobile business department sales KPI index" and the like.

In step S504, an analysis system in the business logic and a classification corresponding to the analysis system are determined according to the analysis system definition, and the analysis system and the classification corresponding to the analysis system are stored in an analysis system layer in an enterprise data knowledge graph.

According to the definition of a data analysis system (such as sales KPI, user liveness, advertisement marketing effect analysis and the like), firstly, an attempt is made to automatically incorporate a newly found analysis system into an existing classified analysis system by using a machine learning method, and for data which cannot be automatically classified, a mode of similarity clustering (such as index similarity or text description similarity and the like) is adopted, the classification system is clustered, manual labeling and induction are carried out, and the data are imported into the existing classification of an enterprise data knowledge graph or new classification branches are created.

In step S505, the data indicators under the classification corresponding to the analysis system are extracted and aligned to unify the name, the textual description, and the calculation specification of the data indicators, and the name, the textual description, and the calculation specification of the data indicators are stored in the data indicator layer of the enterprise data knowledge graph.

By monitoring and collecting metadata information of enterprise data, arranging and storing the metadata information into an enterprise data knowledge map, and providing an inquiry interface, the data to be inquired is positioned and the data dependency relationship and the calculation process dependency relationship are determined conveniently by inquiring.

FIG. 6 is a flow diagram illustrating a big data query method, according to an example embodiment, which may include, as shown in FIG. 6:

in step S61, a data query request in the natural language mode at the user end is received.

In step S62, a data calculation instruction tree including a data calculation instruction set is generated according to the data query request.

The specific content of this step is the same as that of step S12 in the above exemplary embodiment, and is not described here again.

In step S63, the data computation instruction in the data computation instruction tree is assigned to the corresponding computation task node for execution according to the data computation instruction tree.

According to membership and storage distribution of data required by each computing task node, the data analysis computing task is distributed to computing task nodes such as a data center where the data are located, an enterprise/department data platform and the like by combining the utilization condition of global data computing resources, local data and computing management modules deployed on the computing task nodes start and manage data computing of respective jurisdiction and report computing states in real time. After the calculation of each calculation task is completed, if the calculation task needs to be summarized with results of other calculation nodes, a data-near summary principle and an operation storage capacity advantage principle are adopted, and calculation task nodes such as a database, a data platform or a data center with the largest data proportion and the most sufficient calculation storage resources are sequentially selected for data summary calculation.

In step S64, acquiring corresponding data according to the data calculation instruction by a local data calculation management module deployed in the calculation task node; or, summarizing and calculating the data acquired by other calculation task nodes by a local data calculation management module deployed in the calculation task nodes according to the data calculation instruction.

The local data calculation management module deployed in the calculation task node is responsible for managing data and calculation of the calculation task node and supports data management and calculation management of a cross-data platform in the data center. The compute task node may be an independent data center, business partition, or department partition. In a complete computing process, the local data and computation management module may be assigned a subtree of computation instructions.

The local data and calculation management module automatically checks the completeness and the availability of data required by the check calculation in detail, monitors the task state in the calculation process, updates the global data and calculation management module in the server in real time, and tries to start a retry strategy if the calculation fails.

The local data and calculation management module may undertake one or more summary calculation tasks in the calculation process, receives data calculated by the local or other local data and calculation management module, executes higher-layer integration summary, and is responsible for sending results to the upper-layer calculation task nodes until the task root node. If the data to be summarized are data with different dimensions, the data to be summarized are processed by adopting a standard normalization (the mean value is 0, the variance is 1) or a one-way normalization (one of the data is used as a data scaling standard) method and the like, so that the dimensions of the data to be summarized are normalized.

For data with different densities, methods such as mean interpolation filling (filling data according to the mean), sparse alignment (based on sparse data), distributed interpolation filling (estimating and filling data according to data distribution) and the like are adopted to process and align the data. For example, data A is compared to data B, but A is hour-granularity data and B is day-granularity data, then A and B can be length-aligned using the method described above. For data from different summary methods of the same source, a heuristic calculation is used to align, for example, a is an average value, B is a total, C is obtained as a × N, and then C and B are compared.

In step S65, the execution results of the computation task nodes are obtained, and the execution results are summarized and computed according to the data computation instruction tree, so as to obtain a data query result.

In step S66, the data query result is returned to the user end.

In the big data query method provided in the exemplary embodiment, the local data calculation management module deployed in the calculation task node is responsible for acquiring and calculating and managing data in the calculation task node, or performs summary calculation on data of other calculation task nodes, so that distributed calculation of each calculation task node is realized, and enterprise data is acquired in the enterprise global.

Fig. 7 is a block diagram illustrating a structure of a big data query apparatus according to an exemplary embodiment. Referring to fig. 7, the apparatus includes a query request receiving module 71, an instruction tree generating module 72, a global data calculation management module 73, a result summarizing module 74, and a data result presenting module 73.

The query request receiving module 71 is configured to receive a data query request in a natural language mode from a user end;

the instruction tree generating module 72 is configured to generate a data computation instruction tree including a data computation instruction set according to the data query request;

the global data computation management module 73 is configured to allocate the data computation instruction in the data computation instruction tree to the corresponding computation task node for execution according to the data computation instruction tree;

the result summarizing module 74 is configured to obtain the execution result of the computation task node, and perform summarizing computation on the execution result according to the data computation instruction tree to obtain a data query result;

the data result presentation module 75 is configured to return the data query result to the user side.

Optionally, the instruction tree generating module includes:

Optionally, the query analysis unit is specifically configured to:

Optionally, the device further comprises

Optionally, the enterprise data knowledge-graph module includes:

Optionally, the apparatus further comprises:

Optionally, the local data calculation management module includes:

Optionally, the summary calculating unit includes:

Optionally, the data result presenting module is specifically configured to:

and sending the data display result to the user side.

Optionally, the computing task node includes a data center and a database.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

Fig. 8 is a block diagram illustrating a configuration of a server according to an example embodiment. Referring to FIG. 8, server 800 includes a processing component 822, which further includes one or more processors and memory resources, represented by memory 832, for storing instructions, such as applications, that are executable by processing component 822. The application programs stored in memory 832 may include one or more modules that each correspond to a set of instructions. Further, the processing component 822 is configured to execute instructions to perform the above-described methods.

The server 800 may also include a power component 826 configured to perform power management of the server 800, a wired or wireless network interface 850 configured to connect the server 800 to a network, and an input/output (I/O) interface 858. The server 800 may operate based on an operating system stored in memory 1932, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, or the like.

In an exemplary embodiment, a non-transitory computer readable storage medium is also provided that includes instructions, such as the memory 832 including instructions, that are executable by the processing component 822 of the server 800 to perform the above-described method. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

The present application also provides a computer program, which when executed by a processor implements the big data query method described above.

Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.

It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

Claims

1. A big data query method is characterized by comprising the following steps:

receiving a data query request of a natural language mode of a user side;

returning the data query result to the user side;

wherein, according to the data query request, generating a data computation instruction tree including a data computation instruction set includes:

generating a data calculation instruction tree comprising a data calculation instruction set according to the structured query data expression statement, wherein the data calculation instruction tree comprises a data dependency relationship and a calculation process dependency relationship;

wherein the generating a data computation instruction tree including a set of data computation instructions according to the structured query data expression statement comprises:

determining a data calculation instruction set according to the calculation task node, the data dependency relationship and the calculation process dependency relationship where the data to be inquired is located, and generating a data calculation instruction tree comprising the data calculation instruction set;

the enterprise data knowledge graph comprises a business logic layer, an analysis system layer and a data index layer;

2. The method of claim 1, wherein converting the data query request into a structured query data expression statement comprises:

3. The method of claim 1, further comprising:

4. The method of claim 3, wherein the collecting metadata information of the enterprise data, sorting the collected metadata information according to the enterprise data knowledge graph, and storing the sorted metadata information into the enterprise data knowledge graph comprises:

monitoring and collecting metadata information of enterprise data;

5. The method according to claim 1, wherein after allocating the data computation instruction in the data computation instruction tree to the corresponding computation task node for execution according to the data computation instruction tree, further comprising:

6. The method according to claim 5, wherein the performing summary computation on the data acquired by other computing task nodes by a local data computing management module deployed in the computing task nodes according to the data computing instruction comprises:

7. The method of claim 6, wherein the data gap filling of the data to be summarized comprises:

8. The method of claim 1, wherein returning the data query result to the user side comprises:

and sending the data display result to the user side.

9. The method of claim 1, wherein the compute task nodes comprise a data center and a database.

10. A big data query device, comprising:

the data result display module is configured to return the data query result to the user side;

wherein the instruction tree generation module comprises:

a computation process analysis engine configured to generate a data computation instruction tree including a set of data computation instructions from the structured query data expression statement, the data computation instruction tree including data dependencies and computation process dependencies;

wherein the computational process analysis engine is specifically configured to:

11. The apparatus according to claim 10, wherein the query analysis unit is specifically configured to:

12. The apparatus of claim 10, further comprising

13. The apparatus of claim 12, wherein the enterprise data knowledge-graph module comprises:

14. The apparatus of claim 10, further comprising:

15. The apparatus of claim 14, wherein the local data computation management module comprises:

16. The apparatus of claim 15, wherein the summary computing unit comprises:

17. The apparatus of claim 10, wherein the data result presentation module is specifically configured to:

and sending the data display result to the user side.

18. The apparatus of claim 10, wherein the compute task node comprises a data center and a database.

19. A server, comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to perform the big data query method of any of claims 1-9.

20. A non-transitory computer readable storage medium, wherein instructions in the storage medium, when executed by a processor of a mobile terminal, enable the mobile terminal to perform a big data query method, the method comprising the steps of any of claims 1-9.