CN115329011A - Data model construction method, data query method, data model construction device and data query device, and storage medium - Google Patents

Data model construction method, data query method, data model construction device and data query device, and storage medium Download PDF

Info

Publication number
CN115329011A
CN115329011A CN202110504969.2A CN202110504969A CN115329011A CN 115329011 A CN115329011 A CN 115329011A CN 202110504969 A CN202110504969 A CN 202110504969A CN 115329011 A CN115329011 A CN 115329011A
Authority
CN
China
Prior art keywords
dimension
query
data
dimensions
index
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110504969.2A
Other languages
Chinese (zh)
Inventor
王锐
杨阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sankuai Online Technology Co Ltd
Original Assignee
Beijing Sankuai Online Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sankuai Online Technology Co Ltd filed Critical Beijing Sankuai Online Technology Co Ltd
Priority to CN202110504969.2A priority Critical patent/CN115329011A/en
Publication of CN115329011A publication Critical patent/CN115329011A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying

Abstract

The application discloses a data model construction method, a data query method, a data model construction device and a data query device, and relates to the technical field of data processing. The method comprises the following steps: acquiring at least two dimensions of field information and a dimension table in a dimension model, wherein the field information indicates that a mapping relation between the at least two dimensions exists; constructing a full-scale dimension relation graph between at least two dimensions based on the field information, wherein the field information also indicates the association relation between each dimension of the at least two dimensions and the fact table; and mounting the fact table to at least one dimension in the full-scale dimension relation graph based on the field information, and generating a directed acyclic connected graph taking the fact table as a center as a data model for determining a path from the query index to the fact table based on the query index and the dimension thereof. The method can construct the data model on the basis of the existing construction results of the data warehouse, so that the path of the query index in the data warehouse is determined through the data model.

Description

Data model construction method, data query device and storage medium
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a method for constructing a data model, a method and an apparatus for querying data, and a storage medium.
Background
The data warehouse is a strategic set that provides all types of data support for all levels of decision making processes of an enterprise.
In the face of business requirements such as business operation, business analysis and data products, a data user needs to extract required data from a data warehouse. Generally, the knowledge of the data warehouse known by the data user is not sufficient to support the data user to directly extract the required data from the data warehouse, so that the data user can provide a lot of scattered data reading requirements for the data developer, and the data developer provides the required data for the data developer. Furthermore, in order to facilitate the data user to extract the required data from the data warehouse, the data developer additionally constructs a subject broad table on the basis of the existing data table according to the business requirements.
However, the additionally constructed theme broad table cannot reuse the existing construction results of the data warehouse, and resource waste is caused.
Disclosure of Invention
The embodiment of the application provides a data model construction method, a data query device and a storage medium, a directed acyclic connected graph can be constructed on the basis of the existing construction results of a data warehouse to serve as a data model, the path of a query index in the data warehouse is determined through the data model, and the existing construction results of the data warehouse are fully utilized. The technical scheme is as follows:
according to an aspect of the present application, there is provided a method of constructing a data model, the method including:
obtaining model information of a dimension model, wherein the dimension model is constructed based on a dimension table and a fact table, the model information comprises field information and at least two dimensions of the dimension table, and the field information indicates a mapping relation between the at least two dimensions;
constructing a full dimension relationship graph between the at least two dimensions based on the field information, wherein the field information further indicates an association relationship between each dimension of the at least two dimensions and the fact table;
and mounting the fact table to at least one dimension in the full-scale dimension relation graph based on the field information, and generating a directed acyclic connected graph taking the fact table as a center as a data model, wherein the data model is used for determining a path from the query index to the fact table based on the query index and the dimension of the query index.
According to another aspect of the present application, there is provided a method for querying data, which is applied to a data system including a data model constructed by the method according to the above aspect, the method including:
receiving a data query request, wherein the data query request comprises a query index;
calling the data model to calculate a path to a fact table corresponding to the query index based on the query index and the dimensionality of the query index;
generating a query statement of the query indicator based on the path;
and executing the query statement to acquire the value of the query index from the data system.
According to another aspect of the present application, there is provided an apparatus for constructing a data model, the apparatus including:
the system comprises a first obtaining module, a second obtaining module and a third obtaining module, wherein the first obtaining module is used for obtaining model information of a dimension model, the dimension model is built based on a dimension table and a fact table, the model information comprises field information and at least two dimensions of the dimension table, and the field information indicates a mapping relation between the at least two dimensions;
a construction module, configured to construct a full-scale dimension relationship graph between the at least two dimensions based on the field information, where the field information further indicates an association relationship between each of the at least two dimensions and the fact table;
the first generation module is used for mounting the fact table to at least one dimension in the full-scale dimension relation graph based on the field information, generating a directed acyclic connected graph with the fact table as the center as a data model, and the data model is used for determining a path from the query index to the fact table based on the query index and the dimension of the query index.
According to another aspect of the present application, there is provided an apparatus for data query, applying the data model constructed by the method according to the above aspect, the apparatus comprising:
the receiving module is used for receiving a data query request, and the data query request comprises query indexes;
the calculation module is used for calling the data model to calculate a path reaching a fact table corresponding to the query index based on the query index and the dimensionality of the query index;
a second generation module, configured to generate a query statement of the query indicator based on the path;
and the second acquisition module is used for executing the query statement and acquiring the value of the query index from the data system.
According to another aspect of the present application, there is provided a server including:
a memory, a processor coupled to the memory;
a processor configured to load and execute executable instructions stored in the memory to implement the method of constructing a data model according to the above aspect and its alternative embodiments, or the method of querying data according to the above another aspect and its alternative embodiments.
According to another aspect of the present application, there is provided a computer-readable storage medium having at least one instruction, at least one program, a set of codes, or a set of instructions stored therein, which is loaded and executed by a processor to implement the method for constructing a data model according to the above one aspect and its optional embodiments, or the method for querying data according to the above another aspect and its optional embodiments.
According to another aspect of the present application, a computer program product is provided that includes computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes the method for constructing the data model according to the above aspect and the optional embodiments thereof, or the method for querying the data according to the above another aspect and the optional embodiments thereof.
The beneficial effects that technical scheme that this application embodiment brought include at least:
the method comprises the steps of obtaining model information of a dimension model from a data system, constructing a directed acyclic connected graph corresponding to a dimension table and a fact table based on the model information to connect the dimension and the dimension, and the dimension and the fact table, using the directed acyclic connected graph as a data model, determining a path from a query index to the fact table through the data model when index query is carried out, further extracting a value of the query index from a data warehouse according to the path, fully utilizing existing construction results of the data warehouse, providing a more convenient and faster index data extraction environment for a user, and avoiding the problem of resource waste caused by the fact that an additionally constructed subject wide table cannot reuse the existing construction results of the data warehouse; secondly, the method can also realize cross-topic data query, thereby avoiding the problem that an additionally constructed topic broad table can only realize data query aiming at a specified topic; furthermore, the method releases a large amount of human resources which are originally spent on the establishment of the subject wide table because the subject wide table does not need to be established.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings required to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the description below are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
FIG. 1 is a block diagram of a data system provided in an exemplary embodiment of the present application;
FIG. 2 is a flow chart of a method of constructing a data model provided in an exemplary embodiment of the present application;
FIG. 3 is a schematic diagram of a data model provided by an exemplary embodiment of the present application;
FIG. 4 is a schematic illustration of a process for building a data model provided by an exemplary embodiment of the present application;
FIG. 5 is a flow chart of a method of data querying provided by an exemplary embodiment of the present application;
FIG. 6 is a diagram illustrating a process for data querying provided by an exemplary embodiment of the present application;
FIG. 7 is a block diagram of an apparatus for constructing a data model provided in an exemplary embodiment of the present application;
FIG. 8 is a block diagram of an apparatus for data querying as provided by an exemplary embodiment of the present application;
fig. 9 is a schematic structural diagram of a server according to an exemplary embodiment of the present application.
Detailed Description
To make the objects, technical solutions and advantages of the present application more clear, the following detailed description of the embodiments of the present application will be made with reference to the accompanying drawings.
The explanations for words involved in this application are as follows:
metadata, also called intermediate data and relay data, is data describing data, mainly describing data attribute (property) information, used to support functions such as indicating storage location, history data, data search, file record, etc. metadata is an electronic catalog, and in order to achieve the purpose of compiling the catalog, the content or features of data must be described and collected, so as to achieve the purpose of assisting data retrieval.
The index is a unit and a method for measuring the development degree of a transaction, is usually obtained through aggregation statistics such as addition and averaging, and is an index under certain conditions, such as favorable rating, increment and the like.
Dimension is some characteristic of the phenomenon of the transaction, for example, gender, region, time are dimensions.
Dimension tables, also known as dimension tables, contain properties of fact records in a fact data table, some properties providing descriptive information, some properties specifying how to aggregate the fact data table data to provide useful information to an analyst, and the dimension tables contain a hierarchy of properties that help aggregate the data.
The fact tables, i.e., the fact data tables, are mainly characterized by containing numerical data (facts) and the numerical information is summarized to provide data on units as history, each fact data table contains a multipart index containing the primary key of the relevance table as a foreign key, and the dimension table contains the characteristics of the fact record.
A wide table refers to a data table in which indexes, dimensions, and attributes are associated together.
An atomic index refers to a set of concepts that express atomic quantization attributes of business entities and are not separable, such as an order volume, a user volume, a visit volume (PV), a single visitor (UV), and the like.
The calculation index is a calculation index set which is established on the atomic index and formed through a certain operation rule, such as average user transaction amount, asset liability ratio and the like.
The derived index is an index generated by combining an atomic index or a calculation index with a dimension member, a statistical attribute, a management attribute and the like, such as a completion value, a plan value, a cumulative value, a homography, a ring ratio, an occupation ratio and the like of a transaction amount.
Fig. 1 shows a schematic structural diagram of a data system 100 according to an embodiment of the present application, where the data system 100 includes an application layer 101, a service layer 102, middleware 103, and a storage layer 104.
The application layer 101 provides five functions of data extraction requirement management, task execution, data security control, operation and maintenance management and application access; the data extraction requirement management is management of data extraction requirements, the task execution is execution management of a task set in the application layer 101, the data security control is management of data security in the data system 100, the operation and maintenance management is management of maintaining operation of the data system 100, and the application access is management of accessing the application program to the data system 100.
The service layer 102 includes three modules, a metadata processing 1021, a data query processing 1022, and an assistance module 1023. The metadata processing 1021 comprises three functions of metadata synchronization, metadata construction and metadata storage, and the metadata processing 1021 can realize the construction method of the data model provided in the embodiment of the application, wherein the metadata synchronization comprises synchronization of a physical table, indexes, dimensions, mapping relations and resource confidentiality levels; the metadata construction comprises two aspects of an extension index and an extension dimension, wherein the extension index comprises the extension of an atom index, a derivative index and a calculation index, and the extension dimension comprises the extension of a broad-form model, a star model and a snowflake model; the metadata store includes storage of a dimension path graph (including a directed acyclic connectivity graph) and a logical width table.
Data query processing 1022 includes protocol parsing, logic construction, model optimization, and physical construction. The protocol analysis provides functions of statistical indexes, screening conditions, summarizing dimensions and query attributes, and provides indexes and dimensions for logic construction; the logic construction provides functions of available index calculation, available dimension calculation, wide table model screening and user permission marking, so that a candidate logic wide table is constructed, and a candidate logic wide table is provided for model optimization; the model optimization comprises two aspects of wide table screening and dimension table screening, wherein wide table screening conditions comprise less dimension fields, high hierarchy level and less fact tables, dimension table screening conditions comprise less associated fact table hierarchy and high associated coverage, and a logic wide table meeting the screening conditions is screened from a candidate logic wide table; the physical construction is based on a logic wide table to realize the mapping from the wide table to a fact table, from indexes to fields, from dimensionality to a dimension table, from dimensionality to fields and from the fact table to the dimension table, and the merging of the fact tables, and finally generates an inquiry statement; and extracting the value of the query index from the data warehouse by using the query statement.
The assistance module 1023 provides assistance services for task management, operation and maintenance management, rights management, task scheduling, and query throttling to assist in the processing of metadata and data queries.
The middleware 103 includes an offline query engine, a task scheduling system, and an authority management and control service, where the task scheduling system schedules tasks, such as scheduling data query tasks, the authority management and control service determines whether the data query authority is provided, and the offline query engine executes the query tasks scheduled by the task scheduling system to implement data query.
The storage layer 104 comprises a data warehouse tool Hive, a relational database management system mySQL and a cloud service; hive is used for data extraction, transformation and loading, and is a mechanism for storing, querying and analyzing stored large-scale data; mySQL is used for storing data in different tables and supporting quick query of the data; the cloud service is used for storing data and a cloud service end, and supporting data storage and query.
For data governance of data system 100, data governance application support 200 may be employed, data governance application support 200 comprising model management, authority management, process control and interface services for data system 100, wherein model management further comprises logical models, physical models and relational mapping. Data governance application support 200 may implement governance over metadata in data system 100, such as updating metadata in data system 100.
Illustratively, the data system 100 is carried by a server cluster, which may include at least one of a server, a plurality of servers, a cloud computing platform, and a virtualization center. Those skilled in the art will appreciate that the number of the server clusters may be only one, or several tens, several hundreds, or more, and the number and type of the server clusters are not limited in the embodiment of the present application.
Fig. 2 is a flowchart illustrating a method for building a data model, which is applied to a server of the data system shown in fig. 1, according to an exemplary embodiment of the present application, and the method includes:
step 301, obtaining model information of the dimension model, where the model information includes at least two dimensions of the field information and the dimension table.
The server acquires model information of the dimension model from a data system, the dimension model is constructed based on a dimension table and a fact table, and the field information indicates a mapping relation between at least two dimensions and also indicates an association relation between each of the at least two dimensions and the fact table. Illustratively, the field information includes at least one of field information of the dimension table and field information of the fact table.
Illustratively, the model information includes n field information, and the n field information includes at least one field information for uniquely identifying a dimension; for example, in 5 field information, the field information 1 uniquely identifies the dimension 1, and the four field information, i.e., the field information 2, the field information 3, the field information 4, and the field information 5, are all used for uniquely identifying the dimension 2.
Illustratively, the server starts to execute step 301 to construct data at the time of data system initialization; alternatively, the server begins performing step 301 to reconstruct the data model in response to metadata updates in the data system.
Step 302, constructing a full-scale dimension relation graph between at least two dimensions based on the field information.
And the server constructs a full-scale dimension relational graph between at least two dimensions according to the mapping relation indicated by the field information, wherein the dimensions in the full-scale dimension relational graph are connected in a directed mode.
The dimensions in the full-scale dimension relation graph are divided into a dimension primary key and a dimension member. Optionally, each dimension primary key uniquely identifies one main body, and dimension members having a direct or indirect mapping relationship with the dimension primary key are used for describing the main bodies of the dimension primary key identification together; for example, the dimension key "merchant third-level city" uniquely identifies that the subject is "merchant city", and the dimension members of the dimension key include "merchant first-level city" and "merchant second-level city", which are both dimension members extending around the subject "merchant city".
The dimension primary key is relative to the dimension members, and the dimension members of one main body can also be used as the dimension primary keys of other main bodies; for example, the dimension key "daily merchant" is used as a dimension key under another subject, and directional connection exists between the dimension member "merchant cell" and two dimension members, namely "merchant third-level city" and "merchant cell type".
Illustratively, the directed connections in the full-scale dimension relationship graph include directed connections pointed to dimension members by the dimension primary key; the directional connections may also include directional connections where one dimension member points to another dimension member.
Optionally, for the construction of the full-scale dimension relation graph, the server determines a dimension primary key from at least two dimensions; determining dimension members of the dimension primary keys based on the field information and at least two dimensions, and constructing a mapping relation between the dimension primary keys and the dimension members to form the full-scale dimension relation graph; wherein, the at least two dimensions with mapping relation are the dimensions described for the same subject.
Optionally, the dimension primary key and the dimension member can be in a direct mapping relationship or an indirect mapping relationship. Illustratively, the dimension members comprise a first dimension member and a second dimension member, and for the construction of the mapping relationship between the dimensions, the server constructs a first mapping relationship between the primary dimension key and the first dimension member and a second mapping relationship between the first dimension member and the second dimension member, wherein the first mapping relationship and the second mapping relationship together describe an indirect mapping relationship between the second dimension member and the primary dimension key.
Optionally, for the extension of the dimensions, the server traverses the dimension model, and performs dimension extension on each of at least two dimensions to obtain an extended dimension; dimension members are determined from the at least two dimensions and the extension dimension based on the field information. Illustratively, the server may traverse the dimension model, performing dimension expansion across the dimension tables based on the incidence relation between the dimensions.
Alternatively, for the determination of the dimension primary key, the server may determine a dimension of the fact table associated with the dimension table in at least two dimensions as the dimension primary key. Illustratively, a fact table is associated with at least one dimension table, which is associated with the fact table by a dimension that the server determines as a primary key to the dimension in building the data model.
Step 303, based on the field information, mounting the fact table to at least one dimension in the full-scale dimension relation graph, and generating a directed acyclic connected graph with the fact table as the center as a data model, wherein the data model is used for determining a path from the query index to the fact table based on the query index and the dimension of the query index.
And the server mounts the fact table to at least one dimension in the full-scale dimension relation graph based on the mapping relation between the fact table and the dimension indicated by the field information, and generates a directed acyclic connected graph taking the fact table as the center, so as to obtain the data model.
Optionally, the server mounts the fact table onto at least one primary dimension key in the full-scale dimension relational graph based on the field information, and generates a directed acyclic connected graph with the fact as the center as a data model.
Illustratively, as shown in fig. 3, the upper diagram is a schematic structural diagram of a constructed full-scale dimension relationship diagram, in which a dimension primary key and a dimension member are included; the lower diagram is a structural schematic diagram of a constructed directed acyclic connected graph, and a fact table 11 is mounted on four-dimensional primary keys of 'Japanese merchant', 'merchant cell', 'day' and 'user third-level city'.
Optionally, after the fact table is mounted to at least one dimension in the full-scale dimension relational graph based on the field information, the server deletes the dimension in the directed acyclic connected graph which is not in mapping relation with the fact table, and generates the directed acyclic connected graph with the fact table as the center as the data model. For example, as in fig. 3, the dimension primary key and the dimension member circled by the dashed box in the upper diagram have no mapping relation with the fact table, and thus can be deleted, and the lower diagram shows the directed acyclic connected graph with the dimension circled by the dashed box deleted.
For example, after the server generates the directed acyclic connectivity graph, at least one of an index supported by the data model, a calculation method of the index, at least two dimensions, and a mapping method between the dimensions may be stored in the memory in the form of a wide table with the data model, and finally, the wide table corresponding to the fact table one to one is formed. Exemplarily, as shown in fig. 4, the formation of the wide table is schematically illustrated, metadata stored in a structured manner exists in the data system, as shown in a metadata structured store 21, the server synchronizes the metadata structured store into a snowflake model store 22, and maps the field f1 to the dimension to obtain the dimension "date" and "merchant"; and mapping the field f1 to the index to obtain the index of 'order number' and 'transaction amount'. After the mapping from the field to the dimension is finished, mapping from the dimension to the dimension table is also carried out, and dimension tables d1 and d2 are obtained; and so on, continuing the mapping of the field to the dimension, the field to the index and the dimension to the dimension table. And after the mapping is completed, metadata construction is performed, the dimension expansion and the index expansion are performed and then spliced to generate a wide table 23, and the logic wide table 23 comprises the mapping relation between the dimensions and a fact table.
In the construction process of the data model, the server traverses the dimension model to perform index expansion. Illustratively, after traversing the dimensional model, the server performs index expansion based on at least one of the atomic index and the calculation index to generate an expanded index. The extended index may include at least one of a derived index and a calculated index. That is, a calculation index may be formed by a certain operation rule in the presence of a part of atomic indexes, and/or a derivative index may be formed by combining a part of atomic indexes or a calculation index with a dimension member, a statistical attribute, a management attribute, and the like, so that after the server traverses the dimension model, the calculation index and the derivative index are expanded, and the expanded index is associated with the dimension corresponding to the expanded index. The expansion index may be used to generate the wide table.
In summary, in the method for constructing a data model provided in this embodiment, model information of a dimension model is obtained from a data system, a directed acyclic connected graph corresponding to a dimension table and a fact table is constructed based on the model information to connect the dimension and the dimension, and the dimension and the fact table, the directed acyclic connected graph is used as the data model, when index query is performed, a path from a query index to the fact table can be determined through the data model, and then a value of the query index is extracted from a data warehouse according to the path, existing construction results of the data warehouse are fully utilized, a more convenient and faster index data extraction environment is provided for a user, and the problem of resource waste caused by the fact that an additionally constructed subject wide table cannot reuse the existing construction results of data is avoided; secondly, the method can also realize cross-topic data query, thereby avoiding the problem that an additionally constructed topic broad table can only realize data query aiming at a specified topic; furthermore, the method releases a large amount of human resources which are originally spent on the establishment of the subject wide table because the subject wide table does not need to be established.
Fig. 5 is a flowchart illustrating a method for querying data according to an exemplary embodiment of the present application, where the method is applied to a server in a data system shown in fig. 1, where the data system includes a data model constructed according to the method provided in the embodiment shown in fig. 2, and the method includes:
step 401, a data query request is received, where the data query request includes a query indicator.
The server receives a data query request sent by the terminal, the data query request carries query indexes, the server extracts the query indexes from the data query request, and the dimensionality of the query indexes is determined; or the data query request carries the query index and the dimensionality of the query index, and the server extracts the query index and the dimensionality of the query index from the data query request.
Step 402, based on the query index and the dimension of the query index, calling a data model to calculate a path to a fact table corresponding to the query index.
Illustratively, after analyzing the data query request by the server to obtain the query index and the dimensionality of the query index, screening a wide table combination which stores the query index and the dimensionality of the query index together; the calling data model calculates the path with the least dimension associated with the arrival fact table based on the wide table combination.
Optionally, for the screening of the wide table combination, the server may screen at least two candidate wide table combinations based on the query index and the dimension of the query index; and determining a wide table combination meeting optimization conditions from the at least two candidate wide table combinations, wherein the optimization conditions are used for screening the wide table combination with the performance of quickly realizing index query.
Optionally, the optimization condition includes at least one of:
the number of rows occupied by the information of the wide table combination is the least, or the number of rows occupied by the information of the wide table is less than the row number threshold;
the use frequency of the wide table is the highest, or the use frequency of the wide table combination is greater than the use frequency threshold value; illustratively, in index query, the data system records the frequency of use of broad table combinations;
the dimension table combination corresponding to the wide table combination is least associated with the fact table.
Step 403, generating a query statement of the query index based on the path.
For example, as shown in fig. 6, after obtaining the query index, the server performs logic construction to obtain a wide table combination 1 and a wide table combination 2; then, optimizing a logic model, namely screening the wide table combination by adopting an optimization condition to determine a wide table combination 1 with less line number occupied by the information of the wide table combination and high use frequency; then, physical model optimization is carried out, namely, a path with the least correlation dimension (or correlation fact table) in the mapping paths corresponding to the wide table combination is calculated, namely, the path from the fact table to a merchant, to d2 and to the honeycomb type is obtained; and finally, carrying out physical construction to generate the query statement corresponding to the path. Illustratively, a wide table combination refers to a combination of at least two wide tables, for example, wide table 1 and wide table 2 are combined into wide table combination 12.
Illustratively, the Query statement may comprise a Structured Query Language (SQL) statement. Illustratively, in the path, left outer join is used between the fact table and the dimension table, and union all is used between the fact tables to generate the final SQL statement.
Step 404, executing the query statement, and obtaining the value of the query index from the data system.
Illustratively, the server executes an SQL statement to extract the value of the query indicator from the data system.
In summary, the data query method provided by this embodiment applies the data model constructed by the method provided by the embodiment shown in fig. 2, where the data model includes mapping relationships between dimensions and between the dimensions and the fact table, and when index query is performed, a path from a query index to the fact table can be determined through the data model, and then a value of the query index is extracted from the data warehouse according to the path, so that existing construction results of the data warehouse are fully utilized, a more convenient and faster index data extraction environment is provided for a user, and the problem of resource waste caused by the fact that an additionally constructed subject wide table cannot reuse the existing construction results of the data warehouse is avoided; secondly, the method can also realize cross-topic data query, thereby avoiding the problem that an additionally constructed topic broad table can only realize data query aiming at a specified topic; furthermore, the method releases a great deal of human resources which are originally spent on the theme wide table construction because the theme wide table does not need to be constructed.
Fig. 7 is a block diagram of an apparatus for constructing a data model, which is applied to a server and implemented by software, hardware or a combination of the two, according to an exemplary embodiment of the present application, and includes:
a first obtaining module 512, configured to obtain model information of a dimension model, where the dimension model is constructed based on a dimension table and a fact table, the model information includes field information and at least two dimensions of the dimension table, and the field information indicates that a mapping relationship between the at least two dimensions exists;
a construction module 514, configured to construct a full dimension relationship graph between the at least two dimensions based on the field information, where the field information further indicates an association relationship between each of the at least two dimensions and the fact table;
and a first generating module 516, configured to mount the fact table to at least one dimension in the full-scale dimension relational graph based on the field information, and generate a directed acyclic connected graph centered on the fact table as a data model, where the data model is used to determine a path from the query indicator to the fact table based on the query indicator and the dimension of the query indicator.
In some embodiments, a build module 514 is configured to:
determining a dimension primary key from at least two dimensions;
determining dimension members of the dimension primary keys based on the field information and at least two dimensions, and constructing a mapping relation between the dimension primary keys and the dimension members to form a full-scale dimension relation graph;
wherein, the at least two dimensions with mapping relation are the dimensions described for the same subject.
In some embodiments, a build module 514 to:
performing dimension expansion on each dimension of at least two dimensions to obtain an expanded dimension;
dimension members are determined from the at least two dimensions and the extended dimension based on the field information.
In some embodiments, the dimension members include a first dimension member and a second dimension member; a build module 514 for:
and constructing a first mapping relation between the dimension primary key and the first dimension member and a second mapping relation between the first dimension member and the second dimension member, wherein the first mapping relation and the second mapping relation jointly describe an indirect mapping relation between the second dimension member and the dimension primary key.
In some embodiments, a build module 514 to:
and determining the dimension of the dimension table association fact table in at least two dimensions as a dimension primary key.
In some embodiments, the apparatus further comprises a storage module 518;
the storage module 518 is configured to store at least one of the index supported by the data model, a calculation method of the index, at least two dimensions, and a mapping method between the dimensions, and the data model in a form of a wide table into a memory.
In some embodiments, the first generating module 516 is configured to:
before generating a directed acyclic connected graph taking a fact table as a center as a data model, deleting dimensions which are not in mapping relation with the fact table in the directed acyclic connected graph.
In summary, in the data model building apparatus provided in this embodiment, model information of a dimension model is obtained from a data system, a directed acyclic connected graph corresponding to a dimension table and a fact table is built based on the model information to connect the dimension and the dimension, and the dimension and the fact table, the directed acyclic connected graph is used as a data model, when index query is performed, a path from a query index to the fact table may be determined through the data model, and then a value of the query index is extracted from a data warehouse according to the path, existing construction results of the data warehouse are fully utilized, a more convenient and faster index data extraction environment is provided for a user, and a problem of resource waste caused by that an additionally constructed theme wide table cannot be used to already construct data results is avoided; secondly, the device can also realize cross-topic data query, so that the problem that an additionally constructed topic broad table only can realize data query aiming at a specified topic is avoided; furthermore, the device releases a large amount of human resources which are originally spent on the theme wide table construction because the theme wide table is not required to be constructed.
Fig. 8 is a block diagram of an apparatus for data query provided in an exemplary embodiment of the present application, the apparatus being applied to a server, the apparatus being implemented as part of or all of the server through software, hardware or a combination of the two, the apparatus applying a data model constructed by the method of the embodiment shown in fig. 2, the apparatus including:
a receiving module 522, configured to receive a data query request, where the data query request includes a query indicator;
a calculating module 524, configured to invoke the data model to calculate a path to a fact table corresponding to the query indicator based on the query indicator and the dimension of the query indicator;
a second generating module 526, configured to generate a query statement of the query indicator based on the path;
the second obtaining module 528 is configured to execute the query statement to obtain the value of the query indicator from the data system.
In some embodiments, a calculation module 524 to:
screening a wide table combination which is stored for the query index and the dimensionality of the query index;
the calling data model calculates the path with the least dimension associated with the arrival fact table based on the wide table combination.
In some embodiments, a calculation module 524 to:
screening at least two candidate broad table combinations based on the query index and the dimensionality of the query index;
and determining the wide table combination meeting the optimization conditions from the at least two candidate wide table combinations, wherein the optimization conditions are used for screening the wide table combination with the performance of quickly and accurately realizing index query.
In some embodiments, the optimization conditions include at least one of:
the number of lines occupied by the information of the wide table combination is the minimum, or the number of lines occupied by the information of the wide table combination is smaller than a line number threshold value;
the use frequency of the wide table combination is the highest, or the use frequency of the wide table combination is greater than the use frequency threshold;
the dimension table combination corresponding to the wide table combination is least associated with the fact table.
In summary, the data query apparatus provided in this embodiment applies the data model constructed by the method provided in the embodiment shown in fig. 2, where the data model includes mapping relationships between dimensions and between the dimensions and the fact table, and when index query is performed, a path from a query index to the fact table may be determined through the data model, and then a value of the query index is extracted from the data warehouse according to the path, so that existing construction results of the data warehouse are fully utilized, a more convenient and faster index data extraction environment is provided for a user, and a problem of resource waste caused by that an additionally constructed subject wide table cannot reuse existing construction results of the data warehouse is avoided; secondly, the device can also realize data query of cross-topic, thereby avoiding the problem that an additionally constructed topic broad table can only realize data query aiming at a specified topic; furthermore, the device releases a large amount of human resources which are originally spent on the establishment of the theme broad table because the theme broad table is not required to be established.
Fig. 9 illustrates a schematic structural diagram of a server according to an embodiment of the present application. The server is used for implementing the method for constructing the data model or the method for querying the data provided in the above embodiments. Specifically, the method comprises the following steps:
the server 600 includes a CPU (Central Processing Unit) 601, a system Memory 604 including a RAM (Random Access Memory) 602 and a ROM (Read-Only Memory) 603, and a system bus 605 connecting the system Memory 604 and the Central Processing Unit 601. The server 600 also includes a basic I/O (Input/Output) system 606, which facilitates the transfer of information between devices within the computer, and a mass storage device 607, which stores an operating system 613, application programs 614, and other program modules 615.
The basic input/output system 606 includes a display 608 for displaying information and an input device 609 such as a mouse, keyboard, etc. for a user to input information. Wherein the display 608 and the input device 609 are connected to the central processing unit 601 via an input output controller 610 connected to the system bus 605. The basic input/output system 606 may also include an input/output controller 610 for receiving and processing input from a number of other devices, such as a keyboard, mouse, or electronic stylus. Similarly, input/output controller 610 also provides output to a display screen, a printer, or other type of output device.
The mass storage device 607 is connected to the central processing unit 601 through a mass storage controller (not shown) connected to the system bus 605. The mass storage device 607 and its associated computer-readable media provide non-volatile storage for the server 600. That is, the mass storage device 607 may include a computer readable medium (not shown) such as a hard disk or CD-ROM (Compact disk Read-Only Memory) drive.
Without loss of generality, the computer-readable media may comprise computer storage media and communication media. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes RAM, ROM, EPROM (Erasable Programmable Read-Only Memory), EEPROM (Electrically Erasable Programmable Read-Only Memory), flash Memory (Flash Memory) or other solid state Memory technology, CD-ROM, DVD (Digital Versatile disk), or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices. Of course, those skilled in the art will appreciate that the computer storage media is not limited to the foregoing. The system memory 604 and mass storage device 607 described above may be collectively referred to as memory.
The server 600 may also operate as a remote computer connected to a network through a network, such as the internet, according to various embodiments of the present application. That is, the server 600 may be connected to the network 612 through the network interface unit 611 connected to the system bus 605, or may be connected to other types of networks or remote computer systems (not shown) using the network interface unit 611.
The above-mentioned serial numbers of the embodiments of the present application are merely for description, and do not represent the advantages and disadvantages of the embodiments.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
The above description is intended only to illustrate the alternative embodiments of the present application, and should not be construed as limiting the present application, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims (15)

1. A method of constructing a data model, the method comprising:
obtaining model information of a dimension model, wherein the dimension model is constructed based on a dimension table and a fact table, the model information comprises field information and at least two dimensions of the dimension table, and the field information indicates a mapping relation between the at least two dimensions;
constructing a full dimension relationship graph between the at least two dimensions based on the field information, wherein the field information further indicates an association relationship between each dimension of the at least two dimensions and the fact table;
and mounting the fact table to at least one dimension in the full-scale dimension relation graph based on the field information, and generating a directed acyclic connected graph taking the fact table as a center as a data model, wherein the data model is used for determining a path from the query index to the fact table based on the query index and the dimension of the query index.
2. The method of claim 1, wherein the constructing a full-scale dimension relationship graph between the at least two dimensions based on the field information comprises:
determining a dimension primary key from the at least two dimensions;
determining dimension members of the dimension primary keys based on the field information and the at least two dimensions, and constructing a mapping relation between the dimension primary keys and the dimension members to form the full-scale dimension relation graph;
wherein the at least two dimensions with the mapping relation are dimensions described for the same subject.
3. The method of claim 2, wherein determining the dimension member of the dimension primary key based on the field information and the at least two dimensions comprises:
performing dimension expansion on each dimension of the at least two dimensions to obtain an expanded dimension;
determining the dimension members from the at least two dimensions and the extended dimension based on the field information.
4. The method of claim 2, wherein the dimension members comprise a first dimension member and a second dimension member;
the constructing the mapping relationship between the dimension primary key and the dimension member comprises:
and constructing a first mapping relation between the dimension primary key and the first dimension member and a second mapping relation between the first dimension member and the second dimension member, wherein the first mapping relation and the second mapping relation jointly describe an indirect mapping relation between the second dimension member and the dimension primary key.
5. The method of claim 2, wherein determining the primary key of a dimension from the at least two dimensions comprises:
and determining the dimension of the fact table associated with the dimension table in the at least two dimensions as the dimension primary key.
6. The method of any of claims 1 to 3, wherein after generating the directed acyclic connectivity graph centered around the fact table as a data model, the method comprises:
and storing at least one of indexes supported by the data model, index calculation modes, at least two dimensions and inter-dimension mapping modes and the data model into a memory in a form of a wide table.
7. The method of any of claims 1 to 3, wherein prior to generating the directed acyclic connectivity graph centered around the fact table as the data model, comprising:
and deleting the dimension which has no mapping relation with the fact table in the directed acyclic connected graph.
8. A method for data query, applied to a data system including a data model constructed by the method of claim 1, the method comprising:
receiving a data query request, wherein the data query request comprises a query index;
calling the data model to calculate a path to a fact table corresponding to the query index based on the query index and the dimensionality of the query index;
generating a query statement of the query indicator based on the path;
and executing the query statement to acquire the value of the query index from the data system.
9. The method of claim 8, wherein the invoking the data model to calculate a path to a fact table corresponding to the query indicator based on the query indicator and a dimension of the query indicator comprises:
screening a wide table combination which is stored for the query index and the dimensionality of the query index;
invoking the data model to calculate the path to the fact table with the least dimension based on the wide table combination.
10. The method of claim 9, wherein the screening broad table combinations stored in common for the query metrics and the dimensions of the query metrics comprises:
screening at least two candidate broad table combinations based on the query index and the dimensionality of the query index;
and determining the wide table combination meeting the optimization conditions from the at least two candidate wide table combinations, wherein the optimization conditions are used for screening the wide table combinations with the performance of quickly and accurately realizing index query.
11. The method of claim 10, wherein the optimization condition comprises at least one of:
the number of lines occupied by the information of the wide table combination is the least, or the number of lines occupied by the information of the wide table combination is less than a line number threshold value;
the use frequency of the wide table combination is the highest, or the use frequency of the wide table combination is greater than a use frequency threshold value;
and the dimension table combination corresponding to the wide table combination is least associated with the fact table.
12. An apparatus for constructing a data model, the apparatus comprising:
the system comprises a first obtaining module, a second obtaining module and a third obtaining module, wherein the first obtaining module is used for obtaining model information of a dimension model, the dimension model is built based on a dimension table and a fact table, the model information comprises field information and at least two dimensions of the dimension table, and the field information indicates a mapping relation between the at least two dimensions;
a construction module, configured to construct a full-scale dimension relationship graph between the at least two dimensions based on the field information, where the field information further indicates an association relationship between each of the at least two dimensions and the fact table;
the first generation module is used for mounting the fact table to at least one dimension in the full-scale dimension relation graph based on the field information, generating a directed acyclic connected graph with the fact table as the center as a data model, and the data model is used for determining a path from the query index to the fact table based on the query index and the dimension of the query index.
13. An apparatus for data query, wherein the data model is constructed by applying the method of claim 1, the apparatus comprising:
the receiving module is used for receiving a data query request, and the data query request comprises query indexes;
the calculation module is used for calling the data model to calculate a path reaching a fact table corresponding to the query index based on the query index and the dimensionality of the query index;
a second generation module, configured to generate a query statement of the query indicator based on the path;
and the second acquisition module is used for executing the query statement and acquiring the value of the query index from the data system.
14. A server, characterized in that the server comprises:
a memory, a processor coupled to the memory;
the processor is configured to load and execute the executable instructions stored in the memory to implement the method of constructing a data model according to any one of claims 1 to 7 or the method of querying data according to any one of claims 8 to 11.
15. A computer readable storage medium having stored therein at least one instruction, at least one program, set of codes, or set of instructions; the at least one instruction, the at least one program, the set of codes or the set of instructions being loaded and executed by a processor to implement a method of constructing a data model according to any one of claims 1 to 7 or a method of querying data according to any one of claims 8 to 11.
CN202110504969.2A 2021-05-10 2021-05-10 Data model construction method, data query method, data model construction device and data query device, and storage medium Pending CN115329011A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110504969.2A CN115329011A (en) 2021-05-10 2021-05-10 Data model construction method, data query method, data model construction device and data query device, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110504969.2A CN115329011A (en) 2021-05-10 2021-05-10 Data model construction method, data query method, data model construction device and data query device, and storage medium

Publications (1)

Publication Number Publication Date
CN115329011A true CN115329011A (en) 2022-11-11

Family

ID=83912902

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110504969.2A Pending CN115329011A (en) 2021-05-10 2021-05-10 Data model construction method, data query method, data model construction device and data query device, and storage medium

Country Status (1)

Country Link
CN (1) CN115329011A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117573698A (en) * 2024-01-15 2024-02-20 广州思迈特软件有限公司 Data query method and storage medium
CN117709804A (en) * 2024-02-05 2024-03-15 杭州研趣信息技术有限公司 Index calculation method, device, equipment and medium based on block matrix

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117573698A (en) * 2024-01-15 2024-02-20 广州思迈特软件有限公司 Data query method and storage medium
CN117573698B (en) * 2024-01-15 2024-04-05 广州思迈特软件有限公司 Data query method and storage medium
CN117709804A (en) * 2024-02-05 2024-03-15 杭州研趣信息技术有限公司 Index calculation method, device, equipment and medium based on block matrix

Similar Documents

Publication Publication Date Title
US10725981B1 (en) Analyzing big data
US7886028B2 (en) Method and system for system migration
US9361320B1 (en) Modeling big data
US10394805B2 (en) Database management for mobile devices
US10956400B2 (en) Query processing using primary data versioning and secondary data
CN104205039A (en) Interest-driven business intelligence systems and methods of data analysis using interest-driven data pipelines
JP2016100005A (en) Reconcile method, processor and storage medium
US20230018975A1 (en) Monolith database to distributed database transformation
CN115329011A (en) Data model construction method, data query method, data model construction device and data query device, and storage medium
CN114036130A (en) Metadata analysis processing method and device
Gui et al. IFC-based partial data model retrieval for distributed collaborative design
CN114880405A (en) Data lake-based data processing method and system
CN114297173A (en) Knowledge graph construction method and system for large-scale mass data
CN115640300A (en) Big data management method, system, electronic equipment and storage medium
US20100161682A1 (en) Metadata model repository
CN114329096A (en) Method and system for processing native map database
US8832653B2 (en) Centralized, object-level change tracking
CN111708895B (en) Knowledge graph system construction method and device
US8548980B2 (en) Accelerating queries based on exact knowledge of specific rows satisfying local conditions
Arputhamary et al. A review on big data integration
Gonzalez-Aparicio et al. Evaluation of ACE properties of traditional SQL and NoSQL big data systems
Mishra et al. Challenges in big data application: a review
El Beggar et al. Towards an MDA-oriented UML profiles for data warehouses design and development
CN113568892A (en) Method and equipment for carrying out data query on data source based on memory calculation
Planting Developing a data repository for the Climate Adaptive City Enschede

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination