CN109388637B - Data warehouse information processing method, device, system and medium - Google Patents

Data warehouse information processing method, device, system and medium Download PDF

Info

Publication number
CN109388637B
CN109388637B CN201811111998.7A CN201811111998A CN109388637B CN 109388637 B CN109388637 B CN 109388637B CN 201811111998 A CN201811111998 A CN 201811111998A CN 109388637 B CN109388637 B CN 109388637B
Authority
CN
China
Prior art keywords
historical
query statement
tables
query
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811111998.7A
Other languages
Chinese (zh)
Other versions
CN109388637A (en
Inventor
范叶亮
卢周
钱勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jingdong Technology Holding Co Ltd
Original Assignee
JD Digital Technology Holdings Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by JD Digital Technology Holdings Co Ltd filed Critical JD Digital Technology Holdings Co Ltd
Priority to CN201811111998.7A priority Critical patent/CN109388637B/en
Publication of CN109388637A publication Critical patent/CN109388637A/en
Application granted granted Critical
Publication of CN109388637B publication Critical patent/CN109388637B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present disclosure provides a data warehouse information processing method, where the data warehouse includes a plurality of history tables stored in association, and the method includes: obtaining at least one historical query statement, wherein the historical query statement is used for querying relevant data of a plurality of historical tables in the history tables stored in an associated manner; determining a plurality of history tables corresponding to the at least one history query statement; generating a target table based on a particular history table of the plurality of history tables, the target table including relevant data in the particular history table.

Description

Data warehouse information processing method, device, system and medium
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a data warehouse information processing method, a data warehouse information processing apparatus, a data warehouse information processing system, and a computer-readable storage medium.
Background
As internet enters the era of big data, data collection and storage are the direction of important research, for example, relevant data of various business scenarios are stored in a data warehouse, and it is usually necessary to query relevant data from a large amount of data stored in the data warehouse, and perform an integration analysis to make business decisions, where data in the data warehouse is usually stored in a form of table, different tables store different data, for example, different tables store user data, commodity data, merchant data, and the like, and when it is necessary to query relevant sales data, relevant data needs to be obtained from multiple tables, thereby causing a cumbersome query process, and therefore, in the field of data warehouse, a wide table containing relevant data of multiple tables is usually constructed by business personnel based on multiple tables, and the wide table can facilitate query of the required data, and therefore, how to optimize the construction process of the wide table becomes a problem which needs to be solved urgently at present.
In the process of implementing the disclosed concept, the inventor finds that at least the following problems exist in the prior art, the existing wide table building process is excessively dependent on the experience of business experts, so that the building of the wide table requires a large amount of manual participation, and the building process of the wide table is not objective enough due to the excessive dependence on the manual participation.
Disclosure of Invention
In view of the above, the present disclosure provides a data warehouse information processing method, where the data warehouse includes a plurality of history tables associated with storage, the method includes: the method comprises the steps of obtaining at least one historical query statement, wherein the historical query statement is used for querying relevant data of a plurality of historical tables in the history tables stored in an associated mode, determining the plurality of historical tables corresponding to the at least one historical query statement, and generating a target table based on a specific historical table in the plurality of historical tables, wherein the target table comprises the relevant data in the specific historical table.
According to an embodiment of the present disclosure, the obtaining at least one historical query statement includes: obtaining historical operating data of the data warehouse involved in querying relevant data of a plurality of historical tables in the history tables stored in an associated manner through the historical query statement, and determining the at least one historical query statement based on the operating data.
According to an embodiment of the present disclosure, the determining the plurality of history tables corresponding to the at least one history query statement includes: analyzing the at least one historical query statement to obtain associated information of the at least one historical query statement, wherein the associated information comprises associated fields and associated conditions, and determining the plurality of historical tables corresponding to the at least one historical query statement based on the associated information.
According to an embodiment of the present disclosure, the method further includes: and acquiring a query statement meeting a first preset condition from a plurality of initial historical query statements as the at least one historical query statement.
According to an embodiment of the present disclosure, the obtaining, from a plurality of initial historical query statements, a query statement that satisfies a first preset condition as the at least one historical query statement includes: clustering the plurality of initial historical query sentences to obtain at least one query sentence group, wherein the similarity between the historical query sentences in each query sentence group meets a first preset threshold, determining the query sentence group meeting the first preset condition from the at least one query sentence group as a target query sentence group, and the target query sentence group comprises the at least one historical query sentence.
According to an embodiment of the present disclosure, the clustering the plurality of initial historical query statements to obtain at least one query statement group includes: processing the initial historical query sentences to obtain vectors corresponding to the initial historical query sentences, and clustering the vectors corresponding to the initial historical query sentences to obtain at least one query sentence group, wherein the at least one query sentence group comprises the vectors corresponding to the corresponding query sentences.
According to an embodiment of the present disclosure, the specific history table is a history table that satisfies a second preset threshold among the plurality of history tables.
According to an embodiment of the present disclosure, the method further includes: and storing the target table under the condition that the target table meets a second preset condition.
According to an embodiment of the present disclosure, the storing the target table in the case that the target table satisfies a second preset condition includes: and acquiring the similarity between the target table and other historical tables in the data warehouse, and storing the target table under the condition that the similarity meets a third preset threshold.
Another aspect of the present disclosure provides a data warehouse information processing apparatus, including a plurality of history tables stored in association in the data warehouse, the apparatus including: the device comprises a first obtaining module, a determining module and a generating module. The first obtaining module obtains at least one historical query statement, the historical query statement is used for querying relevant data of a plurality of historical tables in the historical tables stored in an associated mode, the determining module determines the plurality of historical tables corresponding to the at least one historical query statement, and the generating module generates a target table based on a specific historical table in the plurality of historical tables, wherein the target table comprises the relevant data in the specific historical table.
According to an embodiment of the present disclosure, the obtaining at least one historical query statement includes: obtaining historical operating data of the data warehouse involved in querying relevant data of a plurality of historical tables in the history tables stored in an associated manner through the historical query statement, and determining the at least one historical query statement based on the operating data.
According to an embodiment of the present disclosure, the determining the plurality of history tables corresponding to the at least one history query statement includes: analyzing the at least one historical query statement to obtain associated information of the at least one historical query statement, wherein the associated information comprises associated fields and associated conditions, and determining the plurality of historical tables corresponding to the at least one historical query statement based on the associated information.
According to an embodiment of the present disclosure, the apparatus further includes: and the second acquisition module is used for acquiring the query statement meeting the first preset condition from the plurality of initial historical query statements as the at least one historical query statement.
According to an embodiment of the present disclosure, the obtaining, from a plurality of initial historical query statements, a query statement that satisfies a first preset condition as the at least one historical query statement includes: clustering the plurality of initial historical query sentences to obtain at least one query sentence group, wherein the similarity between the historical query sentences in each query sentence group meets a first preset threshold, determining the query sentence group meeting the first preset condition from the at least one query sentence group as a target query sentence group, and the target query sentence group comprises the at least one historical query sentence.
According to an embodiment of the present disclosure, the clustering the plurality of initial historical query statements to obtain at least one query statement group includes: processing the initial historical query sentences to obtain vectors corresponding to the initial historical query sentences, and clustering the vectors corresponding to the initial historical query sentences to obtain at least one query sentence group, wherein the at least one query sentence group comprises the vectors corresponding to the corresponding query sentences.
According to an embodiment of the present disclosure, the specific history table is a history table that satisfies a second preset threshold among the plurality of history tables.
According to an embodiment of the present disclosure, the apparatus further includes: the storage module is used for storing the target table under the condition that the target table meets a second preset condition.
According to an embodiment of the present disclosure, the storing the target table in the case that the target table satisfies a second preset condition includes: and acquiring the similarity between the target table and other historical tables in the data warehouse, and storing the target table under the condition that the similarity meets a third preset threshold.
Another aspect of the present disclosure provides a computer-readable storage medium storing computer-executable instructions for implementing the method as described above when executed.
Another aspect of the disclosure provides a computer program comprising computer executable instructions for implementing the method as described above when executed.
According to the embodiment of the disclosure, the problem that the construction process of the wide table in the prior art depends on the experience of a business expert excessively, so that a large amount of manual participation is needed for constructing the wide table, and the construction process of the wide table is not objective enough due to the excessive dependence on the manual participation can be solved, and therefore, the optimization of the construction process of the wide table in a data warehouse can be realized, for example, the technical effect of realizing the automatic construction of the wide table can be realized.
Drawings
The above and other objects, features and advantages of the present disclosure will become more apparent from the following description of embodiments of the present disclosure with reference to the accompanying drawings, in which:
fig. 1 schematically illustrates a system architecture of a data warehouse information processing method and processing system according to an embodiment of the present disclosure;
2A-2C schematically illustrate application scenarios of a data warehouse information processing method and processing system according to embodiments of the present disclosure;
FIG. 3A schematically illustrates a flow diagram of a data warehouse information processing method, in accordance with an embodiment of the present disclosure;
FIG. 3B schematically illustrates a historical operational data diagram of a data warehouse, according to an embodiment of the present disclosure;
FIG. 3C schematically illustrates a visualization diagram of an abstract syntax tree, in accordance with an embodiment of the present disclosure;
FIG. 4 schematically illustrates a flow diagram of a data warehouse information processing method, in accordance with another embodiment of the present disclosure;
FIG. 5 schematically illustrates a flow diagram of a data warehouse information processing method, in accordance with yet another embodiment of the present disclosure;
FIG. 6 schematically illustrates a data warehouse wide table building flow diagram according to an embodiment of the present disclosure;
FIG. 7 schematically illustrates a data warehouse wide table candidate template generation and review flow diagram, in accordance with an embodiment of the present disclosure;
fig. 8 schematically shows a block diagram of a data warehouse information processing apparatus according to an embodiment of the present disclosure;
fig. 9 schematically illustrates a block diagram of a data warehouse information processing apparatus, in accordance with another embodiment of the present disclosure;
fig. 10 schematically shows a block diagram of a data warehouse information processing apparatus according to yet another embodiment of the present disclosure; and
FIG. 11 schematically illustrates a block diagram of a computer system suitable for data warehouse information processing, in accordance with an embodiment of the present disclosure.
Detailed Description
Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is illustrative only and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present disclosure.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.
All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It is noted that the terms used herein should be interpreted as having a meaning that is consistent with the context of this specification and should not be interpreted in an idealized or overly formal sense.
Where a convention analogous to "at least one of A, B and C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B and C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.). Where a convention analogous to "A, B or at least one of C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B or C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.). It will be further understood by those within the art that virtually any disjunctive word and/or phrase presenting two or more alternative terms, whether in the description, claims, or drawings, should be understood to contemplate the possibilities of including one of the terms, either of the terms, or both terms. For example, the phrase "a or B" should be understood to include the possibility of "a" or "B", or "a and B".
The embodiment of the present disclosure provides a data warehouse information processing method, where the data warehouse includes a plurality of history tables stored in association, and the method includes: the method comprises the steps of obtaining at least one historical query statement, wherein the historical query statement is used for querying relevant data of a plurality of historical tables in the historical tables stored in an associated mode, determining the plurality of historical tables corresponding to the at least one historical query statement, and generating a target table based on a specific historical table in the plurality of historical tables, wherein the target table comprises the relevant data in the specific historical table.
Fig. 1 schematically shows a system architecture of a data warehouse information processing method and a data warehouse information processing system according to an embodiment of the present disclosure. It should be noted that fig. 1 is only an example of a system architecture to which the embodiments of the present disclosure may be applied to help those skilled in the art understand the technical content of the present disclosure, and does not mean that the embodiments of the present disclosure may not be applied to other devices, systems, environments or scenarios.
As shown in fig. 1, the system architecture 100 according to this embodiment may include terminal devices 101, 102, 103, a network 104 and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The terminal devices 101, 102, 103 may have installed thereon various communication client applications, such as shopping-like applications, web browser applications, search-like applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only).
The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 105 may be a server providing various services, such as a background management server (for example only) providing support for websites browsed by users using the terminal devices 101, 102, 103. The background management server may analyze and perform other processing on the received data such as the user request, and feed back a processing result (e.g., a webpage, information, or data obtained or generated according to the user request) to the terminal device.
It should be noted that the data warehouse information processing method provided by the embodiment of the present disclosure may be generally executed by the server 105. Accordingly, the data warehouse information processing apparatus provided by the embodiment of the present disclosure may be generally disposed in the server 105. The data warehouse information processing method provided by the embodiment of the present disclosure may also be executed by a server or a server cluster that is different from the server 105 and is capable of communicating with the terminal devices 101, 102, 103 and/or the server 105. Accordingly, the data warehouse information processing apparatus provided by the embodiment of the present disclosure may also be disposed in a server or a server cluster different from the server 105 and capable of communicating with the terminal devices 101, 102, 103 and/or the server 105.
For example, the history query statement and the history table of the embodiment of the present disclosure may be stored in the terminal apparatuses 101, 102, and 103, the query statement and the history table are transmitted to the server 105 through the terminal apparatuses 101, 102, and 103, the server 105 creates the target table based on the query statement and the history table, or the terminal apparatuses 101, 102, and 103 may also create the target table directly based on the query statement and the history table. In addition, the query statement and the history table may also be directly stored in the server 105, and the target table is created by the server 105 directly based on the query statement and the history table.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Fig. 2A to 2C schematically show application scenarios of the data warehouse information processing method and the data warehouse information processing system according to the embodiment of the present disclosure. It should be noted that fig. 2A to 2C are only examples of scenarios in which the embodiments of the present disclosure may be applied to help those skilled in the art understand the technical content of the present disclosure, but do not mean that the embodiments of the present disclosure may not be applied to other devices, systems, environments or scenarios.
As shown in fig. 2A-2C, the application scenario 200 may include, for example, a plurality of table modes in a data warehouse, including, for example, a star mode 210 and a snowflake mode 220, and a wide table 230.
The star schema 210 and the snowflake schema 220 may be, for example, tabular schemas for storing data in a data warehouse, each tabular schema including, for example, a plurality of associated tables, according to embodiments of the present disclosure. As shown in FIG. 2A, the star schema 210 includes, for example, a table 211, a table 212, and a table 213, among others. As shown in fig. 2B, the snowflake pattern 220 includes, for example, a table 221, a table 222, a table 223, a table 224, a table 225, a table 226, and the like.
For example, for sales data, each of the multiple forms stores user information, commodity information, merchant information, and the like, respectively.
In the embodiment of the present disclosure, the related tables in the data warehouse are generally queried through query statements, and analysis and mining are performed by acquiring data in the related tables, so as to provide decisions for various services.
Because the process of querying a plurality of relevant tables in the data warehouse through a query statement to obtain relevant information is complex, the information stored in the plurality of tables and the association relationship between the plurality of tables need to be known to better query the required data. For example, when the sales data needs to be queried, the user information table, the commodity information table, the merchant information table, and the like need to be queried in association.
In order to improve the convenience of service use, it is generally required to establish a wide table, for example, a wide table about sales data, the wide table including data information of a plurality of tables, for example, data information including a user information table, a commodity information table, and a merchant information table, for facilitating service inquiry use.
The embodiment of the present disclosure can determine the related tables from the query statement by obtaining the query statement for querying a plurality of tables, for example, obtaining the query statements for querying the tables 221, 222, 223, 224, 225, 226, for example, determining the tables related to the query statement as the tables 221, 222, 223, 224, 225, 226, and creating the related wide table 230 based on the plurality of tables, as shown in fig. 2C, where the wide table 230 includes data information of the plurality of tables, for example.
The method and the device for automatically constructing the wide table have the advantages that the multiple tables in the data warehouse are determined from the query statement, the wide table of the data warehouse is created based on the multiple tables, and the automatic construction process of the wide table is achieved.
Fig. 3A schematically illustrates a flow chart of a data warehouse information processing method according to an embodiment of the present disclosure.
As shown in fig. 3A, the method includes operations S310 to S330.
In the embodiment of the present disclosure, the main function of the Data warehouse is to systematically analyze and arrange a large amount of Data generated by the business System through online transaction processing (OLTP), and use various analysis methods, such as online analysis processing (OLAP) and Data Mining (Data Mining), through a Data storage architecture specific to the Data warehouse theory, so as to serve systems such as a Decision Support System (Decision Support System). The data warehouse can help a decision maker to quickly and effectively analyze valuable information from a large amount of data so as to facilitate decision making and quickly reflect external environment changes and help to construct a commercial intelligent solution.
In the embodiment of the present disclosure, in the process of building a data warehouse, a dimension design maps a relationship to a set of relationship tables, and in general, the dimension design adopts two ways: star patterns and snowflake patterns. The star pattern can be described as a simple star: the central table contains fact data, and a plurality of tables are radially distributed centering on the central table and connected with each other by a main key and an outer key.
According to an embodiment of the present disclosure, the data warehouse includes a plurality of history tables associated with storage, and it is understood that the tables described in the embodiment of the present disclosure include a data table for storing data in the data warehouse, where the data table may include a plurality of data columns (or data fields), and the data in each data column is data of a different field type. Specifically, the plurality of history tables may be, for example, fact tables, dimension tables, or already created wide tables in a data warehouse, and the like.
In operation S310, at least one historical query statement for querying related data of a plurality of historical tables among the associatively stored historical tables is obtained.
According to the embodiment of the present disclosure, the history query statement may be, for example, a query statement used by a relevant business person to query data in a data warehouse, and the query statement may be used to query relevant data of a plurality of history tables from the history tables in the associated storage in the data warehouse. The query statement may be an SQL query statement.
For example, obtaining at least one historical query statement includes: obtaining historical operating data of a data warehouse involved in querying related data of a plurality of historical tables in the history tables stored in a correlation mode through historical query statements, and determining at least one historical query statement based on the historical operating data.
In the embodiment of the disclosure, in the process of querying the relevant data of a plurality of historical tables through the historical query statement, the historical operating data of the data warehouse can be generated, and the historical operating data can be, for example, the original operating logs related to the base warehouse table, the base mart table and the user-defined table in the data warehouse when the relevant data in the data warehouse is queried.
FIG. 3B schematically shows a historical operational data diagram of a data warehouse, according to an embodiment of the disclosure.
The raw operation data of the data warehouse may be, for example, a raw operation log of the data warehouse shown in fig. 3B, which includes historical query statements for querying relevant data in the data warehouse.
In the disclosed embodiment, at least one historical query statement is determined from the running data, and for example, the clean, complete and ordered historical query statement and relevant important system running information (including relevant warning and error information, for example) are extracted from the original running log of the data warehouse.
For example, the relatively chaotic operation log may be simply cleaned by using a customized regular expression, for example, the original operation log in fig. 3B is cleaned, and the cleaning result is shown in table 1. Wherein the cleaning result comprises a historical query statement, for example comprising SQL content.
TABLE 1
Figure BDA0001807808390000111
In operation S320, a plurality of history tables corresponding to at least one history query statement are determined.
In the embodiment of the present disclosure, the plurality of history tables may be, for example, tables involved in a history query statement, that is, when related data in the data warehouse needs to be queried, data of the plurality of tables in the data warehouse may be queried through the history query statement, where the plurality of tables are history tables corresponding to the history query statement.
Determining a plurality of history tables corresponding to at least one history query statement comprises: and analyzing the at least one historical query statement to obtain the associated information of the at least one historical query statement, wherein the associated information comprises associated fields and associated conditions.
In the embodiment of the present disclosure, the historical query statement may be analyzed to obtain related information of the historical query statement, for example, the SQL analyzer may be used to analyze the historical query statement to obtain related information of the historical query statement.
The SQL parser is a tool for analyzing related analysis such as main foreign key relation, table association and the like among fields in query SQL by utilizing metadata information and data query SQL.
For example, the historical query statement is shown in table 2, and the result of analyzing the historical query statement by using the SQL parser is shown in table 3.
The association information may include, for example, an association field and an association condition in the query statement, where the association field may include, for example, an aggregation field, an ordering field, a condition field, a query field, and the like, and the association condition may include, for example, table association, field association.
TABLE 2
SELECT
A,B,Z,COUNT(1)AS CT
FROM ODS.FOO FOO
INNER JOIN ODS.BAR BAR
ON FOO.A=BAR.X AND FOO.B=BAR.Y
GROUP BY A.B.Z
ORDER BY A,B,Z DESC;
TABLE 3
Figure BDA0001807808390000121
According to the embodiment of the disclosure, a plurality of historical tables corresponding to at least one historical query statement are determined based on the association information.
For example, a plurality of history tables corresponding to the query statement may be determined through the association information in table 3, and the plurality of history tables are stored in a data warehouse, for example, the determined plurality of history tables are shown in table 4 and table 5.
The specific workflow of the SQL parser is briefly introduced below.
Fig. 3C schematically illustrates a visualization diagram of an abstract syntax tree according to an embodiment of the present disclosure.
For ease of illustration, an example is made herein for a relatively simple historical query statement, as shown in Table 6.
TABLE 4
Field(s) Type of field Field comments
A STRING Column A
B STRING Column B
C STRING Column C
D STRING Column D
TABLE 5
Field(s) Type of field Field comments
X STRING Column X
Y STRING Column Y
Z STRING Column Z
TABLE 6
SELECT A.Z
FROM ODS.FOO FOO
INNER JOIN ODS.BAR BAR
ON FOO.B=BAR.Y
WHERE BAR.Z LIKE′LEO′;
The original query SQL (historical query statements) is first parsed into an SQL abstract syntax tree, with the visualization effect as shown in fig. 3C.
Extracting the associated information in the SQL abstract syntax tree according to the constructed SQL abstract syntax tree, such as extracting table association (Join classes), field association (Join classes Conditions), aggregation field (Group By), sorting field (OrderBy), condition field (Where classes), Query field (Query Columns) and the like.
In the embodiment of the present disclosure, the association information (for example, as shown in table 3) analyzed by the SQL parser may be indexed and warehoused in the query engine for subsequent query invocation.
In operation S330, a target table is generated based on a specific history table of the plurality of history tables, the target table including related data in the specific history table.
According to an embodiment of the present disclosure, the specific history table is, for example, all or a part of the plurality of history tables. For example, the specific history table is a history table satisfying a second preset threshold value among the plurality of history tables. The second preset threshold may be, for example, a plurality of tables with high frequency of occurrence of the history tables, that is, the specific history table may be a history table with a large number of times involved in the history query statement, and since the number of times of occurrence of the specific history table is high, it can be indicated that the number of times of querying the specific history table is large, and thus it is known that the business personnel has a large demand for the specific history table.
In the embodiment of the disclosure, the specific history table is created into a target table of the data warehouse, for example, a wide table of the data warehouse is created, the target table includes relevant data in the specific history table, and the target table is convenient for a user to use, in other words, the user is more convenient to query data from the wide table, so as to effectively and quickly query and analyze valuable information from the data warehouse, and make a decision.
According to the embodiment of the disclosure, by determining a plurality of historical tables based on the historical query statement, and constructing a target table based on the plurality of historical tables, wherein the target table comprises relevant data of the plurality of historical tables, and the target table is a wide table of a data warehouse, for example, the technical effect of optimizing the construction process of the wide table in the data warehouse, for example, achieving the automatic construction of the wide table, can be achieved through the scheme of the embodiment of the disclosure.
Fig. 4 schematically illustrates a flow chart of a data warehouse information processing method according to another embodiment of the present disclosure.
As shown in fig. 4, the method includes operations S310 to S330 and operation S410. Operations S310 to S320 are the same as or similar to the operations described above with reference to fig. 3A, and are not described again here.
In operation S410, a query statement satisfying a first preset condition is acquired from a plurality of initial historical query statements as at least one historical query statement.
According to the implementation of the present disclosure, the plurality of initial historical query statements may be, for example, query statements that have long been used for querying related data in the data warehouse, wherein the first preset condition may be, for example, a query statement with high similarity, that is, a query statement with high similarity is obtained from the plurality of initial historical query statements as at least one historical query statement.
According to the embodiment of the disclosure, acquiring a query statement satisfying a first preset condition from a plurality of initial historical query statements as at least one historical query statement includes: clustering the plurality of initial historical query sentences to obtain at least one query sentence group, wherein the similarity between the historical query sentences in each query sentence group meets a first preset threshold value.
In the embodiment of the present disclosure, for example, a cluster with a high similarity in a plurality of historical query statements may be analyzed offline through a clustering method, where the cluster with a high similarity includes, for example, at least one historical query statement, and the cluster with a high similarity may be used to check whether a redundant wide table with an excessively high similarity exists between a current data warehouse and a mart wide table. And aiming at the existing data warehouse and mart broad table and all temporary query statements customized by a user, establishing a real-value query index according to the vectorized real-value vector and a unique query ID, and warehousing the real-value query index to a real-value vector query engine for subsequent process query.
In the embodiment of the present disclosure, before clustering the plurality of initial historical query statements, the plurality of initial historical query statements may also be preprocessed.
The initial historical query statement is subjected to data preprocessing, for example, the SQL parser may process the association information obtained by the initial historical query statement, and semantic mapping is constructed based on the association data. Semantic mapping can be understood as that for the same concept (for example, commodity ID), the field names in the table 4 are A, the field names in the table 5 are B, and according to the association information, a unique identifier is determined to replace the field names in different table names in SQL. Besides, the preprocessing also includes some works such as the normalization adjustment of SQL syntax, case conversion, etc., in order to ensure that the code segments with semantic consistency naturally ensure the similarity of their contents.
In an implementation of the present disclosure, clustering a plurality of initial historical query statements to obtain at least one query statement group includes:
and processing the plurality of initial historical query sentences to obtain vectors corresponding to the plurality of initial historical query sentences.
For example, before clustering a plurality of initial historical query statements, it is necessary to vectorize the query statements and perform clustering processing on the quantized query statements. For example, the initial historical query statement is converted into a real-valued vector using a natural language vectorization method such as Word2Vec, sequence 2Vec, and Document2 Vec. For example, one example of the conversion is shown in table 7.
TABLE 7
Figure BDA0001807808390000161
And clustering vectors corresponding to the plurality of initial historical query sentences to obtain at least one query sentence group, wherein the at least one query sentence group comprises the vectors corresponding to the corresponding query sentences.
For example, the vectors corresponding to the initial historical query statements are clustered to obtain at least one query statement group, each query statement group includes vectors of a plurality of query statements, and the query statements in each query statement group have a certain similarity, which may be a preset similarity set according to a requirement, for example.
And determining a query statement group meeting a first preset condition from the at least one query statement group as a target query statement group, wherein the target query statement group comprises at least one historical query statement.
In the embodiment of the present disclosure, the first preset condition may be, for example, a preset quantity value, where the query statements in the query statement group have corresponding quantity values, and when the quantity values satisfy the preset quantity value, the query statement group may be regarded as the target query statement group. The set of target query statements includes, for example, a plurality of historical query statements.
Fig. 5 schematically illustrates a flow chart of a data warehouse information processing method according to yet another embodiment of the present disclosure.
As shown in fig. 5, the method includes operations S310 to S320, operation S410, and operation S510. Operations S310 to S330 are the same as or similar to the operations described above with reference to fig. 3A, and operation S410 is the same as or similar to the operations described above with reference to fig. 4, and are not repeated herein.
In operation S510, in the case where the target table satisfies a second preset condition, the target table is stored. Examples include: and acquiring the similarity between the target table and other historical tables in the data warehouse, and storing the target table under the condition that the similarity meets a third preset threshold.
According to the embodiment of the present disclosure, the second preset condition may be, for example, a target table meeting a preset similarity, for example, determining the similarity between the target table and other historical tables in the data warehouse, and storing the target table when the similarity meets a third preset threshold, where the third preset threshold may be, for example, specific data, so as to avoid data redundancy caused by high similarity of tables in the data warehouse.
FIG. 6 schematically illustrates a data warehouse wide table building flow diagram according to an embodiment of the disclosure.
As shown in fig. 6, the embodiment of the present disclosure discloses an automatic construction method of a data warehouse wide table based on information extraction, and the whole construction method includes operations S610 to S650.
In operation S610, relevant unstructured data, such as query SQL, query logs, etc., of some basic warehouse tables, basic mart tables, and user-defined tables in the warehouse layer, the mart layer, and the user layer, which are obtained by integrating layer data summarization, are collected.
In operation S620, the customized SQL parser is used to parse the main foreign key relationships, association queries, aliases, and other related relationships between different tables from the query SQL and the query log, and establish a query engine for related data.
In operation S630, a user-defined SQL vectorization method (SQL2Vec) is used to obtain similar queries according to the query SQL, the query log and the constructed related data index mining, and a historical query SQL real-valued vector query engine and a query SQL similarity clustering result are established.
In operation S640, the constructed related data engine, the historical query SQL real-valued vector query engine, and the query SQL similarity clustering result are used to count and summarize information such as data fields and data tables with high co-occurrence frequency, so as to generate a candidate template of the new data warehouse wide table.
In operation S650, a final new data warehouse wide table is obtained and solidified according to the candidate templates of the new data warehouse wide table and the service expert advice.
FIG. 7 schematically illustrates a data warehouse wide table candidate template generation and review flow diagram according to an embodiment of the disclosure.
As shown in fig. 7, in the embodiment of the present disclosure, the last process of the automated data warehouse wide table construction scheme is the generation of candidate templates of the data warehouse wide table, and the auditing by the business experts and the solidification of the final new data warehouse wide table. The flow includes operations S710 to S790.
In operation S710, for the new query SQL for customization, a vectorization result thereof is obtained through the preprocessing in the above flow.
In operation S720, the vectorized result is added to the historical query SQL real-valued vector query engine.
In operation S730, the query SQL similarity clustering result is updated.
In operation S740, the updated historical query SQL real-valued vector data and the query SQL similarity clustering result are periodically transferred to the trigger, and the trigger determines whether to generate a new data warehouse wide-table template according to a defined rule. In the trigger, the core trigger rule may be understood as that when a large number of new queries have a high similarity degree to be aggregated into a cluster, and at the same time, the similarity degree with all queries in the existing database table is less than a certain value, then information such as data tables and data fields with high co-occurrence frequency is extracted from the new queries aggregated into a cluster.
In operation S750, a template of a new data warehouse wide table is generated based on the extracted information such as the data table and the data field having the higher co-occurrence frequency.
In operation S760, after a new data warehouse wide table template is generated, an audit process is triggered, and an expert in the data warehouse performs audit and correction.
In operation S770, the modified wide-list template is finally solidified into a new data warehouse wide-list.
The relevant information of the new data warehouse wide table is updated to the historical query SQL real-valued vector query engine in operation S780.
Relevant information of the new data warehouse wide table is updated to the query SQL similarity clustering result in operation S790.
Fig. 8 schematically shows a block diagram of a data warehouse information processing apparatus according to an embodiment of the present disclosure.
As shown in fig. 8, the data warehouse information processing apparatus 800 includes a first acquisition module 810, a determination module 820, and a generation module 830.
The first obtaining module 810 can obtain at least one historical query statement, the historical query statement being used for querying relevant data of a plurality of historical tables in association with stored historical tables.
According to an embodiment of the present disclosure, obtaining at least one historical query statement includes: obtaining historical operating data of a data warehouse involved in querying related data of a plurality of historical tables in the history tables stored in a correlation mode through historical query statements, and determining at least one historical query statement based on the operating data.
According to an embodiment of the present disclosure, the first obtaining module 810 may perform, for example, the operation S310 described above with reference to fig. 3A, which is not described herein again.
The determination module 820 may determine a plurality of historical tables corresponding to at least one historical query statement.
According to the embodiment of the disclosure, determining a plurality of history tables corresponding to at least one history query statement comprises: analyzing the at least one historical query statement to obtain associated information of the at least one historical query statement, wherein the associated information comprises associated fields and associated conditions, and determining a plurality of historical tables corresponding to the at least one historical query statement based on the associated information.
According to an embodiment of the present disclosure, the determining module 820 may perform, for example, the operation S320 described above with reference to fig. 3A, which is not described herein again.
The generation module 830 can generate a target table based on a particular history table of the plurality of history tables, the target table including relevant data in the particular history table.
According to an embodiment of the present disclosure, the specific history table is a history table that satisfies a second preset threshold value among the plurality of history tables.
According to the embodiment of the present disclosure, the generating module 830 may perform the operation S330 described above with reference to fig. 3A, for example, and is not described herein again.
Fig. 9 schematically shows a block diagram of a data warehouse information processing apparatus according to another embodiment of the present disclosure.
As shown in fig. 9, the data warehouse information processing apparatus 900 includes a first acquisition module 810, a determination module 820, a generation module 830, and a second acquisition module 910. The first obtaining module 810, the determining module 820 and the generating module 830 are the same as or similar to the modules described above with reference to fig. 8, and are not described herein again.
The second obtaining module 910 may obtain, as at least one historical query statement, a query statement satisfying a first preset condition from among the plurality of initial historical query statements.
According to the embodiment of the disclosure, acquiring a query statement satisfying a first preset condition from a plurality of initial historical query statements as at least one historical query statement includes: clustering a plurality of initial historical query sentences to obtain at least one query sentence group, wherein the similarity between the historical query sentences in each query sentence group meets a first preset threshold, determining the query sentence group meeting a first preset condition from the at least one query sentence group as a target query sentence group, and the target query sentence group comprises at least one historical query sentence.
According to the embodiment of the present disclosure, clustering a plurality of initial historical query statements to obtain at least one query statement group includes: processing the initial historical query sentences to obtain vectors corresponding to the initial historical query sentences, and clustering the vectors corresponding to the initial historical query sentences to obtain at least one query sentence group, wherein the at least one query sentence group comprises the vectors corresponding to the corresponding query sentences.
According to the embodiment of the present disclosure, the second obtaining module 910 may, for example, perform operation S410 described above with reference to fig. 4, which is not described herein again.
Fig. 10 schematically shows a block diagram of a data warehouse information processing apparatus according to still another embodiment of the present disclosure.
As shown in fig. 10, the data warehouse information processing apparatus 1000 includes a first acquisition module 810, a determination module 820, a generation module 830, a second acquisition module 910, and a storage module 1010. The first obtaining module 810, the determining module 820 and the generating module 830 are the same as or similar to the modules described above with reference to fig. 8, and are not described herein again. The second obtaining module 910 is the same as or similar to the module described above with reference to fig. 9, and is not described herein again.
The storage module 1010 may store the target table in a case where the target table satisfies a second preset condition.
According to the embodiment of the present disclosure, in the case that the target table satisfies the second preset condition, storing the target table includes: and acquiring the similarity between the target table and other historical tables in the data warehouse, and storing the target table under the condition that the similarity meets a third preset threshold.
According to the embodiment of the present disclosure, the storage module 1010 may perform, for example, the operation S510 described above with reference to fig. 5, which is not described herein again.
Any number of modules, sub-modules, units, sub-units, or at least part of the functionality of any number thereof according to embodiments of the present disclosure may be implemented in one module. Any one or more of the modules, sub-modules, units, and sub-units according to the embodiments of the present disclosure may be implemented by being split into a plurality of modules. Any one or more of the modules, sub-modules, units, sub-units according to embodiments of the present disclosure may be implemented at least in part as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented in any other reasonable manner of hardware or firmware by integrating or packaging a circuit, or in any one of or a suitable combination of software, hardware, and firmware implementations. Alternatively, one or more of the modules, sub-modules, units, sub-units according to embodiments of the disclosure may be at least partially implemented as a computer program module, which when executed may perform the corresponding functions.
For example, any plurality of the first obtaining module 810, the determining module 820, the generating module 830, the second obtaining module 910, and the storing module 1010 may be combined and implemented in one module, or any one of the modules may be split into a plurality of modules. Alternatively, at least part of the functionality of one or more of these modules may be combined with at least part of the functionality of the other modules and implemented in one module. According to an embodiment of the present disclosure, at least one of the first obtaining module 810, the determining module 820, the generating module 830, the second obtaining module 910, and the storing module 1010 may be at least partially implemented as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented by hardware or firmware in any other reasonable manner of integrating or packaging a circuit, or implemented by any one of three implementations of software, hardware, and firmware, or an appropriate combination of any several of them. Alternatively, at least one of the first obtaining module 810, the determining module 820, the generating module 830, the second obtaining module 910 and the storing module 1010 may be at least partially implemented as a computer program module, which when executed may perform a corresponding function.
FIG. 11 schematically illustrates a block diagram of a computer system suitable for data warehouse information processing, in accordance with an embodiment of the present disclosure. The computer system illustrated in FIG. 11 is only one example and should not impose any limitations on the scope of use or functionality of embodiments of the disclosure.
As shown in fig. 11, a computer system 1100 according to an embodiment of the present disclosure includes a processor 1101, which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)1102 or a program loaded from a storage section 1108 into a Random Access Memory (RAM) 1103. The processor 1101 may comprise, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or associated chipset, and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), among others. The processor 1101 may also include on-board memory for caching purposes. The processor 1101 may comprise a single processing unit or a plurality of processing units for performing the different actions of the method flows according to the embodiments of the present disclosure.
In the RAM 1103, various programs and data necessary for the operation of the system 1100 are stored. The processor 1101, the ROM1102, and the RAM 1103 are connected to each other by a bus 1104. The processor 1101 performs various operations of the method flow according to the embodiments of the present disclosure by executing programs in the ROM1102 and/or the RAM 1103. It is noted that the programs may also be stored in one or more memories other than the ROM1102 and RAM 1103. The processor 1101 may also perform various operations of the method flows according to the embodiments of the present disclosure by executing programs stored in the one or more memories.
System 1100 may also include an input/output (I/O) interface 1105, which input/output (I/O) interface 1105 is also connected to bus 1104, according to an embodiment of the present disclosure. The system 1100 may also include one or more of the following components connected to the I/O interface 1105: an input portion 1106 including a keyboard, mouse, and the like; an output portion 1107 including a signal output unit such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and a speaker; a storage section 1108 including a hard disk and the like; and a communication section 1109 including a network interface card such as a LAN card, a modem, or the like. The communication section 1109 performs communication processing via a network such as the internet. A driver 1110 is also connected to the I/O interface 1105 as necessary. A removable medium 1116 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted as necessary on the drive 1110, so that a computer program read out therefrom is mounted as necessary in the storage section 1108.
According to embodiments of the present disclosure, method flows according to embodiments of the present disclosure may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable storage medium, the computer program containing program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication portion 1109 and/or installed from the removable medium 1111. The computer program, when executed by the processor 1101, performs the above-described functions defined in the system of the embodiment of the present disclosure. The systems, devices, apparatuses, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the present disclosure.
The present disclosure also provides a computer-readable storage medium, which may be contained in the apparatus/device/system described in the above embodiments; or may exist separately and not be assembled into the device/apparatus/system. The computer-readable storage medium carries one or more programs which, when executed, implement the method according to an embodiment of the disclosure.
According to embodiments of the present disclosure, the computer-readable storage medium may be a computer-non-volatile computer-readable storage medium, which may include, for example and without limitation: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
For example, according to embodiments of the present disclosure, a computer-readable storage medium may include the ROM1102 and/or the RAM 1103 and/or one or more memories other than the ROM1102 and the RAM 1103 described above.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Those skilled in the art will appreciate that various combinations and/or combinations of features recited in the various embodiments and/or claims of the present disclosure can be made, even if such combinations or combinations are not expressly recited in the present disclosure. In particular, various combinations and/or combinations of the features recited in the various embodiments and/or claims of the present disclosure may be made without departing from the spirit or teaching of the present disclosure. All such combinations and/or associations are within the scope of the present disclosure.
The embodiments of the present disclosure have been described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described separately above, this does not mean that the measures in the embodiments cannot be used in advantageous combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be devised by those skilled in the art without departing from the scope of the present disclosure, and such alternatives and modifications are intended to be within the scope of the present disclosure.

Claims (14)

1. A data warehouse information processing method, wherein the data warehouse comprises a plurality of history tables stored in association, the method comprises:
obtaining at least one historical query statement, wherein the historical query statement is used for querying relevant data of a plurality of historical tables in the history tables stored in an associated manner;
determining a plurality of history tables corresponding to the at least one history query statement;
generating a target table based on a specific history table in the plurality of history tables, wherein the target table comprises relevant data in the specific history table, and the target table is used as a wide table of the data warehouse, and history tables with the number of times related to the history query statement larger than a second preset threshold value are used as the specific history table for the plurality of history tables;
and acquiring the similarity between the target table and other historical tables in the data warehouse, and storing the target table under the condition that the similarity between the target table and other historical tables in the data warehouse is smaller than a third preset threshold.
2. The method of claim 1, wherein the obtaining at least one historical query statement comprises:
acquiring historical operating data of the data warehouse, wherein the historical operating data is related when the historical query statement queries related data of a plurality of historical tables in the history tables stored in an associated manner;
determining the at least one historical query statement based on the run data.
3. The method of claim 1, wherein the determining a plurality of historical tables to which the at least one historical query statement corresponds comprises:
analyzing the at least one historical query statement to obtain associated information of the at least one historical query statement, wherein the associated information comprises associated fields and associated conditions;
determining the plurality of historical tables corresponding to the at least one historical query statement based on the association information.
4. The method of claim 1, further comprising:
and acquiring a query statement meeting a first preset condition from a plurality of initial historical query statements as the at least one historical query statement.
5. The method according to claim 4, wherein the obtaining a query statement satisfying a first preset condition from a plurality of initial historical query statements as the at least one historical query statement comprises:
clustering the plurality of initial historical query sentences to obtain at least one query sentence group, wherein the similarity between the historical query sentences in each query sentence group meets a first preset threshold;
determining a query statement group satisfying the first preset condition from the at least one query statement group as a target query statement group, the target query statement group including the at least one historical query statement.
6. The method of claim 5, wherein said clustering said plurality of initial historical query statements into at least one query statement group comprises:
processing the initial historical query sentences to obtain vectors corresponding to the initial historical query sentences;
and clustering vectors corresponding to a plurality of initial historical query statements to obtain the at least one query statement group, wherein the at least one query statement group comprises the vectors corresponding to the corresponding query statements.
7. A data warehouse information processing apparatus, the data warehouse including a plurality of associatively stored history tables therein, the apparatus comprising:
the first acquisition module is used for acquiring at least one historical query statement, and the historical query statement is used for querying related data of a plurality of historical tables in the historical tables which are stored in an associated manner;
the determining module is used for determining a plurality of historical tables corresponding to the at least one historical query statement;
a generation module, configured to generate a target table based on a specific history table in the plurality of history tables, the target table including relevant data in the specific history table, the target table being a wide table of the data warehouse, wherein, for the plurality of history tables, a history table related to the history query statement with a number greater than a second preset threshold is used as the specific history table;
and the storage module is used for acquiring the similarity between the target table and other historical tables in the data warehouse, and storing the target table under the condition that the similarity between the target table and other historical tables in the data warehouse is smaller than a third preset threshold.
8. The apparatus of claim 7, wherein the obtaining at least one historical query statement comprises:
acquiring historical operating data of the data warehouse, wherein the historical operating data is related when the historical query statement queries related data of a plurality of historical tables in the history tables stored in an associated manner;
determining the at least one historical query statement based on the run data.
9. The apparatus of claim 7, wherein the determining a plurality of historical tables to which the at least one historical query statement corresponds comprises:
analyzing the at least one historical query statement to obtain associated information of the at least one historical query statement, wherein the associated information comprises associated fields and associated conditions;
determining the plurality of historical tables corresponding to the at least one historical query statement based on the association information.
10. The apparatus of claim 7, further comprising:
and the second acquisition module is used for acquiring the query statement meeting the first preset condition from the plurality of initial historical query statements as the at least one historical query statement.
11. The apparatus of claim 10, wherein the obtaining, as the at least one historical query statement, a query statement satisfying a first preset condition from a plurality of initial historical query statements comprises:
clustering the plurality of initial historical query sentences to obtain at least one query sentence group, wherein the similarity between the historical query sentences in each query sentence group meets a first preset threshold;
determining a query statement group satisfying the first preset condition from the at least one query statement group as a target query statement group, the target query statement group including the at least one historical query statement.
12. The apparatus of claim 11, wherein the clustering the plurality of initial historical query statements into at least one query statement group comprises:
processing the initial historical query sentences to obtain vectors corresponding to the initial historical query sentences;
and clustering vectors corresponding to a plurality of initial historical query statements to obtain the at least one query statement group, wherein the at least one query statement group comprises the vectors corresponding to the corresponding query statements.
13. A data warehouse information processing system, comprising:
one or more processors;
a storage device for storing one or more programs,
wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method of any of claims 1-6.
14. A computer readable storage medium having stored thereon executable instructions which, when executed by a processor, cause the processor to perform the method of any one of claims 1 to 6.
CN201811111998.7A 2018-09-21 2018-09-21 Data warehouse information processing method, device, system and medium Active CN109388637B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811111998.7A CN109388637B (en) 2018-09-21 2018-09-21 Data warehouse information processing method, device, system and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811111998.7A CN109388637B (en) 2018-09-21 2018-09-21 Data warehouse information processing method, device, system and medium

Publications (2)

Publication Number Publication Date
CN109388637A CN109388637A (en) 2019-02-26
CN109388637B true CN109388637B (en) 2020-09-01

Family

ID=65417630

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811111998.7A Active CN109388637B (en) 2018-09-21 2018-09-21 Data warehouse information processing method, device, system and medium

Country Status (1)

Country Link
CN (1) CN109388637B (en)

Families Citing this family (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111694891B (en) * 2019-03-12 2021-01-12 马上消费金融股份有限公司 Data table processing method and device
CN110275920B (en) * 2019-06-27 2021-08-03 中国石油集团东方地球物理勘探有限责任公司 Data query method and device, electronic equipment and computer readable storage medium
CN110781203A (en) * 2019-09-09 2020-02-11 国网电子商务有限公司 Method and device for determining data width table
CN112540978A (en) * 2019-09-23 2021-03-23 北京顺源开华科技有限公司 Wide table generation method and device and electronic equipment
CN110674117A (en) * 2019-09-26 2020-01-10 京东数字科技控股有限公司 Data modeling method and device, computer readable medium and electronic equipment
CN110837507B (en) * 2019-11-08 2022-10-14 土巴兔集团股份有限公司 Dynamic processing method, equipment and storage medium of data table
CN110895533B (en) * 2019-11-29 2023-01-17 北京锐安科技有限公司 A form mapping method, device, computer equipment and storage medium
CN111198918B (en) * 2020-01-17 2022-10-04 国网福建省电力有限公司 Data processing system based on big data platform and link optimization method
CN111399843B (en) * 2020-03-11 2023-08-01 中国邮政储蓄银行股份有限公司 Method, system and electronic equipment for mapping SQL running information to SQL file
CN111694813A (en) * 2020-05-08 2020-09-22 北京明略软件系统有限公司 Data source management method and device
CN111858601B (en) * 2020-07-23 2024-11-29 中国平安财产保险股份有限公司 Tree structure data query method, device, equipment and storage medium
CN111984631A (en) * 2020-09-02 2020-11-24 深圳壹账通智能科技有限公司 Production data migration method, device, computer equipment and storage medium
CN113535817B (en) * 2021-07-13 2024-05-14 浙江网商银行股份有限公司 Feature broad table generation and service processing model training method and device
CN113901074A (en) * 2021-09-26 2022-01-07 广州虎牙科技有限公司 Abnormality determination method and apparatus for ad hoc query, electronic device, and medium
CN114168595B (en) * 2021-12-09 2024-08-27 中国建设银行股份有限公司 Data analysis method and device
CN114238286B (en) * 2022-02-28 2022-08-05 连连(杭州)信息技术有限公司 Data warehouse data processing method and device, electronic equipment and storage medium
CN116049216A (en) * 2022-12-29 2023-05-02 杭州海康威视数字技术股份有限公司 An information query method, device and storage medium
CN116719827B (en) * 2023-06-16 2026-02-03 深圳市跨越新科技有限公司 Wide-table updating method, device, equipment and computer readable storage medium
CN118964363B (en) * 2024-10-17 2025-03-21 宁波紫湾科技有限公司 A data comprehensive analysis method, system, electronic device and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102479223A (en) * 2010-11-25 2012-05-30 中国移动通信集团浙江有限公司 Data query method and system
CN102542009A (en) * 2011-12-14 2012-07-04 中兴通讯股份有限公司 Data querying method and device
CN103064853A (en) * 2011-10-20 2013-04-24 北京百度网讯科技有限公司 Search suggestion generation method, device and system
CN106951552A (en) * 2017-03-27 2017-07-14 重庆邮电大学 A kind of user behavior data processing method based on Hadoop

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102479223A (en) * 2010-11-25 2012-05-30 中国移动通信集团浙江有限公司 Data query method and system
CN103064853A (en) * 2011-10-20 2013-04-24 北京百度网讯科技有限公司 Search suggestion generation method, device and system
CN102542009A (en) * 2011-12-14 2012-07-04 中兴通讯股份有限公司 Data querying method and device
CN106951552A (en) * 2017-03-27 2017-07-14 重庆邮电大学 A kind of user behavior data processing method based on Hadoop

Also Published As

Publication number Publication date
CN109388637A (en) 2019-02-26

Similar Documents

Publication Publication Date Title
CN109388637B (en) Data warehouse information processing method, device, system and medium
US12056120B2 (en) Deriving metrics from queries
CN111971666B (en) Dimensional context propagation technology for optimizing SQL query plans
US11698918B2 (en) System and method for content-based data visualization using a universal knowledge graph
US9465831B2 (en) System and method for optimizing storage of multi-dimensional data in data storage
US20120246154A1 (en) Aggregating search results based on associating data instances with knowledge base entities
CN114461603A (en) Multi-source heterogeneous data fusion method and device
US20230086966A1 (en) Search systems and methods utilizing search based user clustering
US10628421B2 (en) Managing a single database management system
CN108572963A (en) Information acquisition method and device
US11803865B2 (en) Graph based processing of multidimensional hierarchical data
CN112000773B (en) Search engine technology-based data association relation mining method and application
CN114358636B (en) Indicator configuration method, data acquisition method, device, equipment and medium
CN103154996A (en) Providing information management
CN111159213A (en) Data query method, device, system and storage medium
CN117407414A (en) Method, device, equipment and medium for processing structured query statement
CN117472940A (en) Data blood relationship construction method and device, electronic equipment and storage medium
CN119646022A (en) Log query method, device, equipment, medium and program product
US20240220876A1 (en) Artificial intelligence (ai) based data product provisioning
US12124480B2 (en) Simplified schema generation for data ingestion
US10877998B2 (en) Highly atomized segmented and interrogatable data systems (HASIDS)
US20090300000A1 (en) Method and System For Improved Search Relevance In Business Intelligence systems through Networked Ranking
Monica et al. Survey on big data by coordinating mapreduce to integrate variety of data
CN114116784A (en) Database request evaluation method and device, readable storage medium and electronic equipment
US12505089B1 (en) Systems and methods for hydrating and maintaining data integrity of a data lake

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: Room 221, 2nd floor, Block C, 18 Kechuang 11th Street, Beijing Daxing District, Beijing

Applicant after: Jingdong Digital Technology Holding Co., Ltd.

Address before: Room 221, 2nd floor, Block C, 18 Kechuang 11th Street, Daxing Economic and Technological Development Zone, Beijing, 100176

Applicant before: Beijing Jingdong Financial Technology Holding Co., Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant
CP01 Change in the name or title of a patent holder

Address after: Room 221, 2 / F, block C, 18 Kechuang 11th Street, Daxing District, Beijing, 100176

Patentee after: Jingdong Technology Holding Co.,Ltd.

Address before: Room 221, 2 / F, block C, 18 Kechuang 11th Street, Daxing District, Beijing, 100176

Patentee before: JINGDONG DIGITAL TECHNOLOGY HOLDINGS Co.,Ltd.

CP01 Change in the name or title of a patent holder