CN118035303A

CN118035303A - Data query method, device and storage medium

Info

Publication number: CN118035303A
Application number: CN202410122092.4A
Authority: CN
Inventors: 毕博; 侯敬仁; 豆丹丹; 吴昊
Original assignee: Beijing Mafengwo Network Technology Co ltd
Current assignee: Beijing Mafengwo Network Technology Co ltd
Priority date: 2024-01-29
Filing date: 2024-01-29
Publication date: 2024-05-14

Abstract

The disclosure relates to a data query method, a data query device and a storage medium. The data query method comprises the following steps: acquiring user query information, generating a matching model based on the user query information, and screening out an optimal matching model; and responding to the acquired optimal matching model, and selecting a corresponding metadata table based on a metadata management platform to execute query operation. By the data query method, query information input by a user is acquired, a matching model is generated according to the query information, the optimal matching model is screened out to execute query operation, query efficiency is improved, and user requirements are met.

Description

Data query method, device and storage medium

Technical Field

The present invention relates generally to the field of data processing, and in particular, to a data query method, apparatus, and storage medium.

Background

Currently, when making large-scale decisions, enterprises generally query related business data by utilizing a unified data query service, and make decisions according to the queried business data. In the prior art, the query of related data by utilizing the unified data query service has partial defects: the metadata cannot be uniformly managed and the query efficiency is poor. How to uniformly manage metadata and improve query efficiency is a current urgent problem to be solved.

Disclosure of Invention

In order to solve the above problems in the prior art, the present disclosure provides a data query method, a device and a storage medium.

According to a first aspect of an embodiment of the present disclosure, there is provided a data query method, including: acquiring user query information, generating a matching model based on the user query information, and screening out an optimal matching model; and responding to the acquired optimal matching model, and selecting a corresponding metadata table based on a metadata management platform to execute query operation.

In one embodiment, the generating a matching model based on the user query information includes: converting the format of the user query information based on a preset conversion rule to obtain second user query information, wherein the preset conversion rule comprises converting the Chinese name of the user query information into an English name; selecting a corresponding metadata table from the metadata management platform based on the acquired second user query information, and generating at least one matching model according to the attribute of the metadata table; the matching model includes at least one of the following attributes: table name, engine type, the hierarchy to which the table belongs, table availability index, and table availability dimension.

In one embodiment, the metadata management platform is obtained by: acquiring data tables, wherein the data tables are respectively stored on different query engines; acquiring metadata information of the data table, establishing metadata mapping, and generating the metadata table, wherein the metadata information comprises: index metadata, dimension metadata, table field metadata, and table metadata; the metadata map includes: index and physical table field mapping, dimension and physical table field mapping, and metadata and table field mapping of tables; and establishing a metadata management platform based on the metadata table.

In one embodiment, the screening out the best matching model includes: in response to obtaining the matching model, scoring the matching model based on preset evaluation rules, the preset evaluation rules comprising: data source rules, dimension data rules, full-matching rules, index hit rules, and model level rules; and screening out the matching model with the highest scoring evaluation as the optimal matching model.

In one embodiment, the selecting, based on the metadata management platform, a corresponding metadata table to perform a query operation includes: analyzing the optimal matching model to obtain a metadata table forming the optimal matching model; acquiring a data table of the component metadata table based on the metadata management platform; and in response to the data table being acquired, generating a query statement based on the attribute field of the data table, and executing a query operation.

According to a second aspect of embodiments of the present disclosure, there is provided a data query apparatus, including: the acquisition unit is used for acquiring user query information, generating a matching model based on the user query information and screening out an optimal matching model; and the processing unit is used for responding to the acquired optimal matching model, selecting a corresponding metadata table based on the metadata management platform and executing query operation.

In one embodiment, the obtaining unit generates the matching model based on the user query information in the following manner: converting the format of the user query information based on a preset conversion rule to obtain second user query information, wherein the preset conversion rule comprises converting the Chinese name of the user query information into an English name; selecting a corresponding metadata table from the metadata management platform based on the acquired second user query information, and generating at least one matching model according to the attribute of the metadata table; the matching model includes at least one of the following attributes: table name, engine type, the hierarchy to which the table belongs, table availability index, and table availability dimension.

In one embodiment, the processing unit obtains the metadata management platform in the following manner: acquiring data tables, wherein the data tables are respectively stored on different query engines; acquiring metadata information of the data table, establishing metadata mapping, and generating the metadata table, wherein the metadata information comprises: index metadata, dimension metadata, table field metadata, and table metadata; the metadata map includes: index and physical table field mapping, dimension and physical table field mapping, and metadata and table field mapping of tables; and establishing a metadata management platform based on the metadata table.

In one embodiment, the processing unit screens out the best matching model by: in response to obtaining the matching model, scoring the matching model based on preset evaluation rules, the preset evaluation rules comprising: data source rules, dimension data rules, full-matching rules, index hit rules, and model level rules; and screening out the matching model with the highest scoring evaluation as the optimal matching model.

In one embodiment, the processing unit performs the query operation by selecting a corresponding metadata table based on the metadata management platform in the following manner: analyzing the optimal matching model to obtain a metadata table forming the optimal matching model; acquiring a data table of the component metadata table based on the metadata management platform; and in response to the data table being acquired, generating a query statement based on the attribute field of the data table, and executing a query operation.

According to a third aspect of embodiments of the present disclosure, there is provided an electronic device, comprising: a memory for storing instructions; and the processor is used for calling the instructions stored in the memory to execute the data query method according to the first aspect or any implementation manner of the first aspect.

According to a fourth aspect of embodiments of the present disclosure, there is provided a computer readable storage medium having stored therein instructions which, when executed by a processor, perform the data query method according to the first aspect or any implementation manner of the first aspect.

The technical scheme provided by the embodiment of the disclosure can comprise the following beneficial effects: and acquiring query information input by a user, generating a matching model according to the query information, screening out an optimal matching model, executing query operation, reducing the amount of consumed resources required by query, improving query efficiency and meeting user requirements.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The above, as well as additional purposes, features, and advantages of embodiments of the present invention will become apparent in the following detailed written description and claims upon reference to the accompanying drawings. Several embodiments of the present invention are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which:

FIG. 1 is a flow chart illustrating a method of data querying according to an exemplary embodiment.

FIG. 2 is a flowchart illustrating a method of generating a matching model, according to an example embodiment.

FIG. 3 is a flowchart illustrating a method of acquiring a metadata management platform according to an example embodiment.

FIG. 4 is a schematic diagram illustrating a metadata table management based on a metadata management platform according to an example embodiment.

FIG. 5 is a flowchart illustrating a method of screening a best match model according to an exemplary embodiment.

FIG. 6 is a flowchart illustrating a method for performing a query operation based on a metadata management platform selecting a corresponding metadata table in accordance with an exemplary embodiment.

FIG. 7 is a schematic diagram illustrating a data query method according to an example embodiment.

FIG. 8 is a schematic diagram illustrating a data query based on a metadata management platform, according to an example embodiment.

Fig. 9 is a block diagram of a data querying device, according to an example embodiment.

FIG. 10 is a block diagram illustrating an apparatus for data querying, according to an example embodiment.

In the drawings, the same or corresponding reference numerals indicate the same or corresponding parts.

Detailed Description

The principles and spirit of the present invention will be described below with reference to several exemplary embodiments. It should be understood that these embodiments are presented merely to enable those skilled in the art to better understand and practice the invention and are not intended to limit the scope of the invention in any way.

In general, when an enterprise needs to specify decisions at a large scale, various queries are required from the data warehouse on the enterprise's own data to verify whether the decision to be made is reasonable. In querying enterprise data, there are generally the following drawbacks that result in poor query efficiency: the physical tables for recording data are stored on different engines, so that unified management cannot be realized; the optimal query path cannot be selected intelligently, so that the data query efficiency is poor.

In view of the above, the data query method provided by the present disclosure can save the amount of resources required for query and improve the query efficiency.

FIG. 1 is a flow chart of a method of querying data, as shown in FIG. 1, according to an exemplary embodiment, including the following steps.

In step S11, user query information is acquired, a matching model is generated based on the user query information, and an optimal matching model is screened out.

In the embodiment of the disclosure, a user wants to query for certain topic information, such as the click number of the website of the month, and after the user query information is acquired, a matching model related to the query information is generated and the optimal matching model is screened out.

The reason the present disclosure uses a matching model is that the data source of the unified online analytical processing (Online Analytical Processing, OLAP) data query service is structured on top of multiple OLAP query engines, such as Hive and StarRocks, for different index query requirements. Meanwhile, in the process of calculating and integrating the data, one index data may have multiple calculation paths, namely, one index data may be processed by different physical tables, which results in multiple selectable physical tables when the data service queries the service. Thus, there are a variety of query paths for data query services that may result from: one index has multiple selectable physical tables in the same OLAP engine, namely, the physical tables of index data on different OLAP engines can provide correct data, and multiple selectable engines are used for service inquiry; or an index with multiple engines. The performance of multiple query paths of the same index in actual query can vary greatly.

In one embodiment, for a user's query topic, the query operation may be accomplished using data stored on the A-table on the X-query engine, or may be accomplished using data stored on the B-table on the Y-query engine. Because the different query engines have different query operation executing efficiencies, a matching model is generated according to the related information of the data table/physical table stored by the query engine, and the optimal query mode is screened out, so that the query task efficiency is highest.

In step S12, in response to the acquisition of the optimal matching model, a query operation is performed based on the metadata management platform selecting the corresponding metadata table.

In the embodiment of the disclosure, after the optimal matching model is obtained, a query operation is performed according to information of a metadata table managed on a metadata management platform.

The metadata management platform is used for unified management by integrating metadata information stored in physical tables of different query engines, so that subsequent query operation execution is facilitated.

FIG. 2 is a flowchart illustrating a method of generating a matching model, as shown in FIG. 2, including the following steps, according to an exemplary embodiment.

In step S21, in response to obtaining the user query information, the format of the user query information is converted based on the preset conversion rule, so as to obtain second user query information.

The preset conversion rule comprises the step of converting the Chinese name of the user query information into an English name.

In the embodiment of the disclosure, the user submits corresponding query parameters including filtering conditions, sorting modes, limit and the like in the query system based on the actual demands of the user, wherein the query parameters use Chinese names of index dimensions, but attribute fields of a physical table of the computer are established according to English names, so that the Chinese names of the index dimensions are required to be converted into English names corresponding to the query index dimensions through a metadata management platform.

In one embodiment, a user submits a query request, wherein the query parameters include a query dimension, a query index, and a query condition, wherein the query dimension includes a date, a destination name, an application name, the query index includes a payment order amount, a creation order amount, a commodity transaction total, and the query condition includes an application name ios having a date ranging from 1 day 4 months to 10 days 4 months. And according to the acquired metadata information, converting the user request parameters by utilizing a metadata management platform, namely converting the query parameters into English name sets of indexes and dimensions. For example, the query dimension translates to dt, mdd_name, and app_name, and the query index translates to parent_order_cnt, create_order_cnt, and gmv.

In step S22, based on the obtained second user query information, a corresponding metadata table is selected from the metadata management platform, and a number of matching models greater than at least one are generated according to the attributes of the metadata table.

Wherein the matching model includes at least one of the following attributes: table name, engine type, the hierarchy to which the table belongs, table availability index, and table availability dimension.

In the embodiment of the disclosure, second user query information is obtained, information of a metadata table related to the query task is queried from a metadata management platform, the information comprises a table name, an engine type, a table belonging level, a table availability index and a table availability dimension, and a matching model is generated by utilizing the information of the metadata table.

FIG. 3 is a flowchart illustrating a method of acquiring a metadata management platform, as shown in FIG. 3, according to an exemplary embodiment, including the following steps.

In step S31, a data table is acquired, the data tables being stored on different query engines, respectively.

In step S32, metadata information of the data table is acquired, and metadata mapping is established to generate the metadata table.

Wherein the metadata information includes: index metadata, dimension metadata, table field metadata, and table metadata; the metadata map includes: index and physical table field mapping, dimension and physical table field mapping, and metadata and table field mapping of tables.

In the embodiment of the disclosure, metadata information of data tables from different query engines is acquired, a metadata mapping relation is established, and the metadata information of a certain data table is collected on one metadata table, so that operation and management are facilitated.

In the embodiment of the disclosure, metadata needs to be uniformly managed in a systematic manner. Wherein basic metadata information needs to be acquired. The base metadata includes index metadata, dimension metadata, table field metadata, and metadata of the table.

The index metadata is shown in the following table 1, and the attribute fields in the table 1 include an index english name, an index chinese name, and a calculation mode, where the calculation mode includes: count, sum, max and count distinct. For example, when the index english name is parameter_order_cnt, the corresponding index chinese name is payment order amount, and the calculation mode is sum. When the index English name is create_order_cnt, the corresponding index Chinese name is create order quantity, and the adopted calculation mode is sum. When the English name of the index is gmv, the Chinese name of the corresponding index is commodity transaction total, and the adopted calculation mode is sum.

English name of index	Index Chinese name	sum
			payment_order_cnt	Payment order quantity	sum
create_order_cnt	Creating order quantity	sum
			gmv	Commodity transaction total	sum

The dimension metadata is shown in table 2 below, and the attribute fields in table 2 include a dimension english name and a dimension chinese name. Illustratively, when the dimension english name is dt, the corresponding dimension chinese name is date. When the dimension English name is mdd_name, the corresponding dimension Chinese name is destination name. When the dimension english name is app_name, the corresponding dimension chinese name is application name. When the dimension English name is product_id, the corresponding dimension Chinese name is product id. When the English name of the dimension is second_ catagory _id, the Chinese name of the corresponding dimension is second class id.

Dimension English name	Dimension Chinese name
		dt	Date of day
mdd_name	Destination name
		app_name	Application name
product_id	Product id
		second_catagory_id	Secondary class id

Table field metadata as shown in table 3 below, the attribute fields in table 3 include field names, field types, and field notes. Illustratively, when the field name is dt, the field type is String, which indicates that the current field type is a String type. The field is annotated as date. When the field name is mdd_name, the field type is String, and the field annotation is destination name. When the field name is product_id, the field type is String, and the field annotation is product id. When the field name is parent_order_cnt, the field type is Long, which means that the current field type is Long type. The field notes the amount of the payment order. When the field name is create_order_cnt, the field type is Long, and the field annotation is create order volume.

When the field name is gmv, the field type is Long, and the field annotation is commodity transaction total.

Field name	Field type	Field annotation
			dt	String	Date of day
mdd_name	String	Destination name
			product_id	String	Product id
payment_order_cnt	Long	Payment order quantity
			create_order_cnt	Long	Creating order quantity
gmv	Long	Commodity transaction total

Still another example, as shown in Table 4, when the field name is dt, the field type is String, and the field is annotated with the date. When the field name is mdd_name, the field type is String, and the field annotation is destination name. When the field name is app_name, the field type is String, and the field annotation is application name. When the field name is product_id, the field type is String, and the field annotation is product id. When the field name is parent_order_cnt, the field type is Long, and the field annotation is payment order amount. When the field name is create_order_cnt, the field type is Long, and the field annotation is create order volume. When the field name is gmv, the field type is Long, and the field annotation is commodity transaction total.

Field name	Field type	Field annotation
			dt	String	Date of day
mdd_name	String	Destination name
			app_name	String	Application name
product_id	String	Product id
			payment_order_cnt	Long	Payment order quantity
create_order_cnt	Long	Creating order quantity
			gmv	Long	Commodity transaction total

Table metadata is as follows table 5, table 5 including table name, engine type and access address. Illustratively, when the table is named face-a, the engine type is StarRoccks and the access address is XXX. When the table name is face-b, the engine type is Hive, and the access address is YYY.

Table name	Engine type	Access address
			fact-a	StarRoccks	XXX
fact-b	Hive	YYY

In the embodiment of the disclosure, after the index metadata, the dimension metadata, the table field metadata and the table metadata are obtained, a mapping relation needs to be established for the metadata, and the mapping relation between the metadata comprises index and physical table field mapping, dimension and physical table field mapping and table metadata and table field mapping.

After the metadata establishes the mapping relationship, as shown in the following table 6, the attribute fields in table 6 include an index/dimension english name, an index/dimension, a table field, and a table name. Illustratively, when the index/dimension English name is dt, the index/dimension is dimension, the table field is dt, and the table name is fact-a. When the English name of the index/dimension is mdd_name, the index/dimension is dimension, the table field is mdd_name, and the table name is face-a. When the English name of the index/dimension is product_id, the index/dimension is dimension, the table field is product_id, and the table name is face-a. When the English name of the index/dimension is parent_order_cnt, the index/dimension is the index, the table field is parent_order_cnt, and the table name is face-a. When the English name of the index/dimension is create_order_cnt, the index/dimension is the index, the table field is create_order_cnt, and the table name is face-a.

Still another example, as shown in Table 7 below, when the index/dimension is named dt, the index/dimension is dimension, the table field is dt, and the table name is face-b. When the English name of the index/dimension is mdd_name, the index/dimension is dimension, the table field is mdd_name, and the table name is face-b. When the index/dimension English name is app_name, the index/dimension is dimension, the table field is app_name, and the table name is face-b. When the English name of the index/dimension is product_id, the index/dimension is dimension, the table field is product_id, and the table name is face-b. When the English name of the index/dimension is parent_order_cnt, the index/dimension is dimension, the table field is parent_order_cnt, and the table name is face-b. When the English name of the index/dimension is create_order_cnt, the index/dimension is dimension, the table field is create_order_cnt, and the table name is face-b.

In the embodiment of the disclosure, the index is a numerical value that can quantify how many objects are, for example: DAU, retention, etc. are all indicators. Dimension is a constraint on query metrics such as: years, months, days, countries, regions, etc. are dimensions. Metadata is data about data. The index data processing and query process can involve metadata information such as physical tables, table fields, engines, indexes and the like, for example, XX index data processing is calculated based on the XX field aggregation of the XX table on the XX engine.

In step S33, a metadata management platform is built based on the metadata table.

In the embodiment of the disclosure, all metadata tables are summarized, and a metadata management platform is established to perform visual management on the metadata tables. The metadata is defined as unified and universal data objects, the difference of different OLAP engines is shielded, the metadata objects and the mapping relation thereof are managed in a systematic mode, and the data query efficiency is improved.

FIG. 4 is a schematic diagram illustrating a metadata table management based on a metadata management platform according to an example embodiment. As shown in FIG. 4, an exemplary metadata table may have a view name of XXX_YYY_ZZZ_view, a view type of hive, a state of in-use, a creation time of MMMM/YY/DD/hh: mm: ss, an on-line time of PPPP/LL/OO/YY: uu: ii, a latest update time of TTTTTT/RR/GG/ff: ee: ww, a security level of ordinary, and may provide information of responsible person, service attribution, attribution department, creator, and latest updated person. By displaying the related information, a developer can know information of various aspects of metadata, such as source, modification, confidentiality and the like. The view can also display specific field information for a developer, for example, when a field name is platform, the type is string, the description information is null, the data classification (primary) is business data, the data classification (secondary) is data platform basic information, the data classification is common, and the business type is dimension id. When the field name is device_id, the type is string, the description information is null, the data classification (primary) is null, the data classification (secondary) is null, the data classification is null, and the service type is please select. When the field name is app_code, the type is string, the description information is null, the data classification (first level) is business data, the data classification (second level) is traffic base information, the data classification is common, and the business type is dimension id. When the field name is device_type, the type is string, the description information is null, the data classification (primary) is personal data, the data classification (secondary) is traffic base information, the data classification is common, and the service type is dimension id. When the field name is page_name, the type is string, the description information is null, the data classification (first level) is business data, the data classification (second level) is traffic base information, the data classification is common, and the business type is dimension id. When the field name is pos_module, the type is string, the description information is null, the data classification (first level) is business data, the data classification (second level) is traffic base information, the data classification is common, and the business type is dimension id. When the field name is module_name, the type is string, the description information is null, the data classification (first level) is business data, the data classification (second level) is traffic base information, the data classification is common, and the business type is dimension id. When the field name is item_type, the type is string, the description information is null, the data classification (first level) is business data, the data classification (second level) is traffic base information, the data classification is common, and the business type is dimension id. When the field name is item_type, the type is string, the description information is null, the data classification (first level) is business data, the data classification (second level) is traffic base information, the data classification is common, and the business type is dimension id. When the field name is item_id, the type is string, the description information is null, the data classification (first level) is business data, the data classification (second level) is traffic base information, the data classification is common, and the business type is dimension id. When the field name is show_device, the type is string, the description information is null, the data classification (first level) is null, the data classification (second level) is null, the data classification is null, and the service type is an index. When the field name is click_device, the type is string, the description information is null, the data classification (primary) is null, the data classification (secondary) is null, the data classification is null, and the service type is an index. When the field name is show_cnt, the type is bigint, the description information is null, the data classification (first level) is null, the data classification (second level) is null, the data classification is null, and the service type is an index. When the field name is click_cnt, the type is bigint, the description information is null, the data classification (primary) is null, the data classification (secondary) is null, the data classification is null, and the service type is an index. The view shows the specific field attribute information to a developer, and the developer can perform corresponding operation on the data table according to own requirements.

FIG. 5 is a flowchart illustrating a method of screening a best match model, as shown in FIG. 5, according to an exemplary embodiment, including the following steps.

In step S51, in response to the acquisition of the matching model, the matching model is scored based on a preset evaluation rule.

The preset evaluation rule comprises the following steps: data source rules, dimension data rules, full-match rules, index hit rules, and model level rules.

According to the embodiment of the disclosure, a matching model is defined in an abstract mode for evaluating the excellent requirement of the performance of the data query task, and the matching model can also be called as a Cube model, wherein the Cube model is used for converting a physical table field into an index and a dimension associated with the physical table field as an available index and an available dimension through a metadata management system from the view angle of a physical table. Metadata information of the table is lifted by the metadata management system, such as the hierarchy of the table and the engine type of the table, and the information is unified as input information of rule matching and is injected into a defined unified Cube matching model. Wherein the Cube model definition includes at least one of: table name, engine type, the hierarchy to which the table belongs, table availability index, and table availability dimension.

In one embodiment, the attribute of the Cube model constructed according to one or more physical tables with metadata mapping relation can be named as fact-a, engine type StarRocks, the hierarchy of the table is DWS, and the available indexes are parent_order_cnt, create_order_cnt and gmv, and the available dimensions are as follows: dt, product_id, and mdd_name.

In another embodiment, the attribute of the Cube model constructed according to one or more physical tables with metadata mapping relationships established may be that the data source engine is Hive, where the table belongs to a hierarchy: DWD, available index names: parent_order_cnt, create_order_cnt, and gmv, available dimension names dt, mdd_name, app_name, and product_id.

In the embodiment of the disclosure, different cube models are created according to different data tables. At this time, the Cube model may be scored according to a preset rule. The preset rule is a rule that a user performs weight scoring on information such as each Cube hit index, dimension conditions, a data source engine and the like, and general Cube scoring is precipitated through actual production experience. Meanwhile, each rule has a corresponding weight, and the final scoring condition of the Cube can be calculated integrally based on the scoring of each Cube rule and the weight of the current rule.

In the embodiment of the disclosure, the rule of Cube scoring includes: data source rules, dimension data rules, full-match rules, index hit rules, and model level rules.

In the embodiment of the disclosure, the purpose of establishing the data source rule is to actually score different engines, and the query efficiency for different query parameters is different due to the applicable scene and characteristics of the engines. Therefore, the scoring strategies of the query efficiency of different engines are summarized according to practical use experience. For example, starRocks typically plays a role in accelerating the query of data in OLAP scenarios, which is typically superior to Hive engines in query speed.

In the embodiment of the disclosure, the purpose of establishing the dimension data rule is to obtain the number of Cube available dimension sets and user query dimension sets which are combined, namely the number which can be matched with the user query dimension. The greater the number of dimensional matches, the higher the score that represents it.

In the embodiment of the disclosure, the purpose of establishing the full matching rule is to obtain that the available dimensions of the Cube include dimensions of the user query, that is, a dimension set of the user query is a subset of the Cube available dimension set; the available index of the Cube comprises indexes queried by the user, namely, the index set queried by the user is a subset of the Cube available index set. When the two conditions are satisfied at the same time, the full matching rule is satisfied.

In the embodiment of the disclosure, the purpose of establishing the index hit rule is to obtain the number of Cube available index sets and user query index sets after the Cube available index sets and the user query index sets are combined, namely the number which can be matched with the user query index sets. The greater the number of index matches, the higher the score that represents them.

In the disclosed embodiment, the purpose of building model level rules is to score based on the data warehouse level to which the physical table belongs. It is understood that in large data domain modeling, the data processing links of the data warehouse follow the hierarchical concepts of the industry, including the operations data layer (Operational Data Store, ODS), the detail data layer (Data Warehouse Detail, DWD), the summary data layer (Data Warehouse Summary, DWS) and the application data layer (Application Data Store, ADS). The high-level model is based on the abstraction aggregation of the low-level model, and the index of the same caliber can be provided by the high-level model and the low-level model at the same time, and the difference is that the high-level model has high query efficiency and small dimension quantity. The underlying model supports more dimensions but queries are inefficient. Where the high-level model may provide the user query dimension, the high-level model scores higher.

The method and the device can shield the difference of different OLAP engines by defining the unified Cube matching model, are used for multi-rule matching scoring, abstract define a plurality of matching rules, are suitable for any OLAP engine, dynamically match the optimal query engine and the physical table thereof through the query characteristics of users, and improve the query efficiency.

In step S52, the matching model with the highest scoring evaluation is selected as the optimal matching model.

In the embodiment of the disclosure, if a plurality of Cube models are acquired for the same query body, the Cube models are scored by using a preset evaluation rule, and the Cube model with the highest score is selected as the optimal matching model.

In one embodiment, two Cube are generated based on different physical tables, and the scoring results of the two Cube are as follows by combining the base information and the user query information of the Cube according to different rules: taking the data source rule as a judgment basis, and enabling Cube a to be more than Cube b; taking the dimension hit rule as a judgment basis, and Cube a < Cube b; taking the index hit rule as a judgment basis, wherein Cube a=cube b; taking the full matching rule as a judgment basis, and Cube a < Cube b; and taking the hierarchical rule of the model as a judgment basis, wherein Cube a > Cube b. The Cube a is higher in rule scoring of the layer level of the model because the layer level of the fact-a is DWS and is higher than the layer level of the fact-b in Cube b. Since different evaluation rules have different scoring weights, e.g., the scoring weight of a full-match rule is higher, the scoring weights of a dimension hit rule and an index hit rule are lower. Finally, summarizing scoring in different rules, and matching to obtain a Cube b as an optimal Cube model.

FIG. 6 is a flowchart illustrating a query operation performed based on a metadata management platform selecting a corresponding metadata table, according to an exemplary embodiment, as shown in FIG. 6, including the following steps.

In step S61, the optimal matching model is parsed, and a metadata table constituting the optimal matching model is acquired.

In the embodiment of the disclosure, after the optimal matching model is obtained, the optimal matching model is analyzed to obtain a metadata table forming the optimal matching model.

In step S62, a data table constituting a metadata table is acquired based on the metadata management platform.

In the embodiment of the disclosure, a metadata table forming an optimal matching model is obtained, related information of the metadata table is queried by using a metadata management platform, and a data table forming the metadata table is obtained.

In step S63, in response to the acquisition of the data table, a query statement is generated based on the attribute field of the data table, and a query operation is performed.

In the embodiment of the disclosure, a data table forming a metadata table is acquired, a query statement related to a query body is generated by using an attribute field of the data table, and a query operation is performed.

In one embodiment, the optimal Cube model is finally matched by scoring the Cube. And defining SQL condition keywords such as select, where, limit according to the user query request, and defining the used table fields or calculation expressions based on the table fields of the condition keywords according to the user query parameter conversion result and the Cube b model. Splicing the condition key word parts to form an integral SQL text, wherein the step of dynamically generating the SQL is shown in SQL example 1:

select

dt,

mdd_name,

app_name,

sum(payment_order_cnt)as payment_order_cnt

sum(create_order_cnt)as create_order_cnt

sum(gmv)as gmv,

from fact_d

where

dt between'20190401'and'20190410'

and app_name＝'ios'

group by dt,mdd_name,app_name

and inquiring the V table and the C table metadata and the access address of the table according to the table name and the engine information in the optimal Cube, and executing the dynamic generation SQL on the corresponding Hive engine.

FIG. 7 is a schematic diagram of a data query method according to an exemplary embodiment, and as shown in FIG. 7, the data query method includes the following steps: in step one, obtaining data tables respectively stored on different query engines, and obtaining metadata information includes: physical tables, fields, engines, metrics, and dimensions, mapping these metadata to corresponding metadata includes: index and physical table field mapping, dimension and physical table field mapping and table metadata and table field mapping, so that metadata information of the same table can be subjected to unified metadata management in one view, and a subsequent developer can conveniently inquire and operate to acquire data in a data table. In the second step, the user query information is acquired, the query index required by the user query task is defined, and rule input (query characteristics) is provided for the index/dimension conversion format related to the user query task, so that the metadata management platform can query the related metadata table according to the index. In the third step, metadata information of a metadata table related to the query task is acquired based on a metadata platform, including: and constructing a Cube model according to the metadata information, so that the Cube model is convenient to match with a subsequent user query task. . In the fourth step, based on the query information of the user and the created Cube model, the Cube model is scored by using a plurality of preset scoring rules, and finally, the scoring of different rules is integrated for summarization, so that the optimal Cube model is matched. In the fifth step, after the best Cube model is matched, a metadata table forming the best Cube model is obtained through a metadata query management platform, metadata information in the metadata table, such as physical table field information corresponding to indexes/dimensions, is obtained, and the field information is spliced to generate dynamic sql. The metadata query management platform is generated by summarizing the physical tables of different query engines, so that data query on the physical tables is facilitated. The attributes of the physical tables are combined into the Cube model and scored, the optimal Cube model is selected, and data are acquired from the physical tables stored by the designated query engine according to the information of the metadata query management platform, so that the data query efficiency is improved.

Based on an application example of the present disclosure, metadata is uniformly managed in a systematic manner in step one. Metadata information used in the data query service scene mainly comprises basic metadata object information and mapping relations among metadata objects. Wherein the base metadata information includes: index metadata including an index English name, an index Chinese name, and a calculation mode (count\sum\max\count_ distinct), wherein the index metadata is shown in Table 1; dimension metadata, including dimension attribute English name and dimension attribute Chinese name, wherein the dimension metadata is shown in Table 2; table field metadata including field name, field type, field annotation, step one table field metadata as shown in tables 3 and 4; the metadata of the table, including table name, engine type (e.g., hive, starRoccks), access address, step one table is shown in table 5. The mapping relation between the metadata comprises the following steps: index and physical table field mapping, dimension and physical table field mapping, and metadata and table field mapping of tables. In the first step, the mapping relation between the table metadata is shown in table 6 and table 7. The above metadata information of different types and the mapping relation between the metadata information are uniformly defined into data objects, and are persisted into a metadata management platform storage, and different metadata objects and the mapping relation thereof are managed and organized in a systematic way, and the system is uniformly called a metadata management platform.

In the second step, the user is inquired about the conversion of information, indexes and dimensions. A user can want a query system to submit query parameters according to actual query requirements, wherein the query parameters comprise filtering conditions, ordering modes, limit and the like, the query parameters use Chinese names of index dimensions, and the Chinese names are required to be converted into English names corresponding to query index dimensions through a metadata management platform. The user submits a query, step two user query examples are as follows: query dimension: date, destination name, application name; inquiring indexes, namely paying order quantity, creating order quantity and commodity transaction total; query conditions: date ranges from 4 months 1 day to 4 months 10 days, application name = ios. The metadata management platform converts the user request parameters, and converts the query parameters into English name sets of indexes and dimensions according to metadata information in the step-index metadata examples shown in table 1 and the step-dimension metadata examples shown in table 2. Step two, the user inquires the parameter conversion result example as follows: query dimension: dt, mdd_name, app_name; query indexes are parent_order_cnt, create_order_cnt, gmv.

In the third step, a unified Cube matching model is defined. The method comprises the steps of abstracting and defining a Cube model, wherein the Cube model is characterized in that a physical table field is converted into an index and a dimension associated with the physical table field as an available index and an available dimension through a metadata management system from the view angle of a physical table, metadata information of the table is lifted through the metadata management system, such as the level of the table and the engine type of the table, and the information is unified as rule matching input information and is injected into a defined unified Cube matching model. Cube model definition includes: table names, engine types, the hierarchy to which the table belongs, table availability indicators (indicator english name set) and table availability dimensions (dimension english name set). The step three Cube model data Cube a is generated based on the step one index metadata example shown in table 1, the step one dimension metadata example shown in table 2, the step one table field metadata example shown in table 3, the step one table metadata example shown in table 5 and the mapping relation example between the step two table metadata examples as follows: table name: face-a; engine type: starRocks; the table belongs to the hierarchy: DWS; the available indexes are as follows: the parent_order_cnt, the create_order_cnt, and gmv; available dimensions: dt, product_id, mdd_name. The step three fact-b table Cube model data Cube b is generated based on the step one index metadata example shown in table 1, the step one dimension metadata example shown in table 2, the step one table field metadata example shown in table 3, the step one table metadata example shown in table 5 and the step two table metadata mapping relation example as follows: a data source engine, hive; the table belongs to the hierarchy: DWD; the available index names are parent_order_cnt, create_order_cnt and gmv; dimension names dt, mdd_name, app_name, product_id may be used.

In the fourth step, the Cube is scored based on rules, and the optimal Cube is selected. And (3) carrying out weight scoring on information such as each Cube hit index, dimension conditions, a data source engine and the like according to rules, precipitating a general Cube scoring rule through actual production experience, and simultaneously, carrying out integral calculation on the final scoring condition of the Cube based on the scoring of each Cube rule and the weight of the current rule when each rule has corresponding weight. The general rule for Cube scoring is precipitated by practical production experience as follows: (1) And the data source rule is used for carrying out actual scoring on different engines, and the data source rule is based on the applicable scene and characteristics of the engines. Generally StarRocks plays a role in accelerating data query in an OLAP scene, and the query speed is generally superior to that of the Hive engine, so that the scoring strategies of the query efficiency of different engines, such as StarRocks > Hive, are summarized according to practical use experience. (2) And D, dimension data rules, namely, the number of the Cube available dimension sets and the user query dimension sets which are combined is the number which can be matched with the user query dimension. The greater the number of dimensional matches, the higher the score representing them. (3) The available dimensions of the Cube include dimensions of the user query, i.e., the set of dimensions of the user query is a subset of the Cube available dimension set. The Cube available index contains index of user query, i.e. index set of user query is subset of Cube available index set. And simultaneously, the two conditions are satisfied, namely the full matching is satisfied. (4) And (3) hit rules of indexes, namely the number of the index sets which are obtained by combining the Cube available index sets and the user query index sets, namely the number which can be matched with the user query index. The greater the number of index matches, the higher the score representing them. (5) Model level rules, in big data domain modeling, data processing links of data warehouse follow the hierarchical concept of industry, including operations data layer (Operational Data Store, ODS), detail data layer (Data Warehouse Detail, DWD), summary data layer (Data Warehouse Summary, DWS) and application data layer (Application Data Store, ADS). The high-level model is based on the abstraction aggregation of the low-level model, and the index of the same caliber can be provided by the high-level model and the low-level model at the same time, and the difference is that the high-level model has high query efficiency and small dimension quantity. The underlying model supports more dimensions but queries are inefficient. Where the high-level model may provide the user query dimension, the high-level model scores higher. The example of the Cube scoring process by the fourth rule is as follows, and each rule above performs scoring calculation on the three-step Cube model data Cube a example and the three-step Cube model data Cube b example based on the two-step user query parameter conversion result example 2. The scoring results of the two Cube by combining the base information of Cube and the query information of user by different rules are as follows: the method comprises the steps of data source rule Cube a > Cube b, dimension hit rule Cube a < Cube b, index hit rule Cube a=cube b, full match rule Cube a < Cube b, and model belonged level rule Cube a > Cube b (the table level of face-a in Cube a is DWS and higher than the table level of face-b in Cube b, so Cube a is scored higher). Finally, summarizing scoring in different rules, and matching to obtain a Cube b as an optimal Cube model.

In step five, SQL is dynamically generated. And finally matching the Cube scoring process example with Cube b according to the rule, defining SQL conditional keywords such as select, where, limit and the like according to the user query request example, defining used table fields or calculation expressions based on the table fields of the conditional keyword parts according to the user query parameter conversion result example and the Cube model data Cube b example, and splicing the conditional keyword parts to form an integral SQL text. An example of dynamically generating SQL is shown in SQL example 1. According to the table name in Cube b and the metadata of the first table and the access address of the table in the step one shown in the engine information lookup table 5, the dynamically generated SQL example is executed on the corresponding Hive engine.

The schematic diagram of data query based on the metadata management platform is shown in fig. 8, and a developer needs to manage the index, enter an index list option for index dimension management to edit the index, and edit basic information of the index: the service line option is based, the index root option is GMV, the home department option is App platform-transaction center, the Chinese name option is GMV, the English name option is m_basic_i_ GMV, the alias option is used for renaming the index, no special case can be left out, the topic domain option is order, the calculation mode is sum, the description option is based GMV, and the method comprises the following steps: travel (with benefit), hotel (with benefit &withoutreceiver), traffic all business (with change sign), insurance, besides, time period options can be edited. By modifying the basic information of the index, the requirement of a developer on the customization of the index can be met. In addition, the developer can also modify the calculation logic of the index, when the selection option of the library table is mfw _dwd.face_samples_ticket_poi_order_increment, the state is that the audit is passed, the field is gmv, and the calculation mode is sum. When the library table selection option is mfw _dwd.face_all_order_snap_view, the state is that the audit is passed, the field is gmv, and the calculation mode is sum. When the library table selection option is mfw _dwd.face_all_order_snap_tag_view, the state is that the audit is passed, the field is gmv, and the calculation mode is sum. By editing the calculation logic of the index, a developer can control the selection, the fields and the calculation mode of the library table of the index.

Based on the same conception, the embodiment of the disclosure also provides a data query device. It can be appreciated that, in order to implement the above-mentioned functions, the data query device provided in the embodiments of the present disclosure includes corresponding hardware structures and/or software modules that perform the respective functions. The disclosed embodiments may be implemented in hardware or a combination of hardware and computer software, in combination with the various example elements and algorithm steps disclosed in the embodiments of the disclosure. Whether a function is implemented as hardware or computer software driven hardware depends upon the particular application and design constraints imposed on the solution. Those skilled in the art may implement the described functionality using different approaches for each particular application, but such implementation is not to be considered as beyond the scope of the embodiments of the present disclosure.

Fig. 9 is a block diagram of a data querying device, according to an example embodiment. Referring to fig. 9, the apparatus 100 includes an acquisition unit 101 and a processing unit 102.

An obtaining unit 101, configured to obtain user query information, generate a matching model based on the user query information, and screen out an optimal matching model;

and the processing unit 102 is used for responding to the acquired optimal matching model and selecting a corresponding metadata table to execute query operation based on the metadata management platform.

In the embodiment of the present disclosure, the obtaining unit 101 is configured to convert a format of user query information based on a preset conversion rule to obtain second user query information in response to obtaining the user query information, where the preset conversion rule includes converting a chinese name of the user query information into an english name; selecting a corresponding metadata table from the metadata management platform based on the acquired second user query information, and generating at least one matching model according to the attribute of the metadata table; the matching model includes at least one of the following attributes: table name, engine type, the hierarchy to which the table belongs, table availability index, and table availability dimension.

In the disclosed embodiment, the processing unit 102 is configured to obtain data tables, where the data tables are stored on different query engines, respectively; metadata information of the data table is acquired, metadata mapping is established, the metadata table is generated, and the metadata information comprises: index metadata, dimension metadata, table field metadata, and table metadata; the metadata map includes: index and physical table field mapping, dimension and physical table field mapping, and metadata and table field mapping of tables; based on the metadata table, a metadata management platform is established.

In the embodiment of the disclosure, the processing unit 102 is configured to score the matching model based on a preset evaluation rule, where the preset evaluation rule includes: data source rules, dimension data rules, full-matching rules, index hit rules, and model level rules; and screening out the matching model with the highest scoring evaluation as the optimal matching model.

In the embodiment of the disclosure, the processing unit 102 is configured to parse the optimal matching model, and acquire a metadata table forming the optimal matching model; acquiring a data table forming a metadata table based on the metadata management platform; in response to obtaining the data table, a query statement is generated based on the attribute field of the data table, and a query operation is performed.

The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.

It should be noted that, although the terms "first", "second", etc. are used herein to describe various modules, steps, data, etc. of the embodiments of the present invention, the terms "first", "second", etc. are merely used to distinguish between the various modules, steps, data, etc. and do not denote a particular order or importance. Indeed, the expressions "first", "second", etc. may be used entirely interchangeably.

As shown in fig. 10, one embodiment of the present invention provides an electronic device 300. The electronic device 300 comprises, among other things, a memory 301, a processor 302, and an Input/Output (I/O) interface 303. Wherein the memory 301 is used for storing instructions. A processor 302, configured to invoke the instructions stored in the memory 301 to execute the data query method according to the embodiment of the present invention. Wherein the processor 302 is coupled to the memory 301, the I/O interface 303, respectively, such as via a bus system and/or other form of connection mechanism (not shown). The memory 301 may be used to store programs and data, including programs of the data query method related to the embodiment of the present invention, and the processor 302 performs various functional applications of the electronic device 300 and data processing by running the programs stored in the memory 301.

The processor 302 in embodiments of the present invention may be implemented in at least one of a digital signal processor (DIGITAL SIGNAL Processing, DSP), field-Programmable gate array (Field-Programmable GATE ARRAY, FPGA), programmable logic array (Programmable Logic Array, PLA), and the processor 302 may be one or a combination of several of a central Processing unit (Central Processing Unit, CPU) or other forms of Processing units having data Processing and/or instruction execution capabilities.

The memory 301 in embodiments of the present invention may comprise one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, random access memory (Random Access Memory, RAM) and/or cache memory (cache), and the like. The nonvolatile Memory may include, for example, read-Only Memory (ROM), flash Memory (Flash Memory), hard disk (HARD DISK DRIVE, HDD), solid state disk (Solid-state disk-STATE DRIVE, SSD), or the like.

In the embodiment of the present invention, the I/O interface 303 may be used to receive input instructions (e.g., numeric or character information, and generate key signal inputs related to user settings and function control of the electronic device 300, etc.), and may also output various information (e.g., images or sounds, etc.) to the outside. The I/O interface 303 may include one or more of a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a mouse, joystick, trackball, microphone, speaker, touch panel, etc., in embodiments of the invention.

It will be appreciated that although operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In certain circumstances, multitasking and parallel processing may be advantageous.

The methods and apparatus of embodiments of the present invention may be implemented using standard programming techniques with various method steps being performed using rule-based logic or other logic. It should also be noted that the words "apparatus" and "module" as used herein and in the claims are intended to include implementations using one or more lines of software code and/or hardware implementations and/or equipment for receiving inputs.

Any of the steps, operations, or procedures described herein may be performed or implemented using one or more hardware or software modules alone or in combination with other devices. In one embodiment, the software modules are implemented using a computer program product comprising a computer readable medium containing computer program code capable of being executed by a computer processor for performing any or all of the described steps, operations, or programs.

The foregoing description of the implementations of the invention has been presented for purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise form disclosed, and modifications and variations are possible in light of the above teachings or may be acquired from practice of the invention. The embodiments were chosen and described in order to explain the principles of the invention and its practical application to enable one skilled in the art to utilize the invention in various embodiments and with various modifications as are suited to the particular use contemplated.

Claims

1. A method of querying data, comprising:

Acquiring user query information, generating a matching model based on the user query information, and screening out an optimal matching model;

And responding to the acquired optimal matching model, and selecting a corresponding metadata table based on a metadata management platform to execute query operation.

2. The method of claim 1, wherein generating a matching model based on the user query information comprises:

Converting the format of the user query information based on a preset conversion rule to obtain second user query information, wherein the preset conversion rule comprises converting the Chinese name of the user query information into an English name;

Selecting a corresponding metadata table from the metadata management platform based on the acquired second user query information, and generating at least one matching model according to the attribute of the metadata table; the matching model includes at least one of the following attributes: table name, engine type, the hierarchy to which the table belongs, table availability index, and table availability dimension.

3. The method according to any of claims 1 to 2, wherein the metadata management platform is obtained by:

acquiring data tables, wherein the data tables are respectively stored on different query engines;

Acquiring metadata information of the data table, establishing metadata mapping, and generating the metadata table, wherein the metadata information comprises: index metadata, dimension metadata, table field metadata, and table metadata; the metadata map includes: index and physical table field mapping, dimension and physical table field mapping, and metadata and table field mapping of tables;

and establishing a metadata management platform based on the metadata table.

4. The method according to any one of claims 1 to 2, wherein said screening out the best matching model comprises:

In response to obtaining the matching model, scoring the matching model based on preset evaluation rules, the preset evaluation rules comprising: data source rules, dimension data rules, full-matching rules, index hit rules, and model level rules;

And screening out the matching model with the highest scoring evaluation as the optimal matching model.

5. The method according to any one of claims 1 to 4, wherein selecting a corresponding metadata table based on the metadata management platform to perform a query operation comprises:

Analyzing the optimal matching model to obtain a metadata table forming the optimal matching model;

acquiring a data table of the component metadata table based on the metadata management platform;

and in response to the data table being acquired, generating a query statement based on the attribute field of the data table, and executing a query operation.

6. A data query device, comprising:

The acquisition unit is used for acquiring user query information, generating a matching model based on the user query information and screening out an optimal matching model;

and the processing unit is used for responding to the acquired optimal matching model, selecting a corresponding metadata table based on the metadata management platform and executing query operation.

7. The apparatus of claim 6, wherein the obtaining unit generates the matching model based on the user query information by:

8. The apparatus according to any of the claims 6 to 7, wherein the processing unit obtains the metadata management platform by:

and establishing a metadata management platform based on the metadata table.

9. The apparatus according to any one of claims 6 to 7, wherein the processing unit screens out the best matching model by:

10. The apparatus according to any one of claims 6 to 9, wherein the processing unit performs the query operation by selecting the corresponding metadata table based on the metadata management platform in the following manner:

11. An electronic device, wherein the electronic device comprises:

a memory for storing instructions; and

A processor for invoking instructions stored in the memory to perform a data query method as claimed in any one of claims 1 to 5.

12. A computer readable storage medium having stored therein instructions which, when executed by a processor, perform the data query method of any of claims 1 to 5.