Detailed Description
Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is illustrative only and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present disclosure.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.
All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It is noted that the terms used herein should be interpreted as having a meaning that is consistent with the context of this specification and should not be interpreted in an idealized or overly formal sense.
Where a convention analogous to "at least one of A, B and C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B and C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.). Where a convention analogous to "A, B or at least one of C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B or C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.). It will be further understood by those within the art that virtually any disjunctive word and/or phrase presenting two or more alternative terms, whether in the description, claims, or drawings, should be understood to contemplate the possibilities of including one of the terms, either of the terms, or both terms. For example, the phrase "a or B" should be understood to include the possibility of "a" or "B", or "a and B".
The embodiment of the present disclosure provides a data warehouse information processing method, where the data warehouse includes a plurality of history tables stored in association, and the method includes: the method comprises the steps of obtaining at least one historical query statement, wherein the historical query statement is used for querying relevant data of a plurality of historical tables in the historical tables stored in an associated mode, determining the plurality of historical tables corresponding to the at least one historical query statement, and generating a target table based on a specific historical table in the plurality of historical tables, wherein the target table comprises the relevant data in the specific historical table.
Fig. 1 schematically shows a system architecture of a data warehouse information processing method and a data warehouse information processing system according to an embodiment of the present disclosure. It should be noted that fig. 1 is only an example of a system architecture to which the embodiments of the present disclosure may be applied to help those skilled in the art understand the technical content of the present disclosure, and does not mean that the embodiments of the present disclosure may not be applied to other devices, systems, environments or scenarios.
As shown in fig. 1, the system architecture 100 according to this embodiment may include terminal devices 101, 102, 103, a network 104 and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The terminal devices 101, 102, 103 may have installed thereon various communication client applications, such as shopping-like applications, web browser applications, search-like applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only).
The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 105 may be a server providing various services, such as a background management server (for example only) providing support for websites browsed by users using the terminal devices 101, 102, 103. The background management server may analyze and perform other processing on the received data such as the user request, and feed back a processing result (e.g., a webpage, information, or data obtained or generated according to the user request) to the terminal device.
It should be noted that the data warehouse information processing method provided by the embodiment of the present disclosure may be generally executed by the server 105. Accordingly, the data warehouse information processing apparatus provided by the embodiment of the present disclosure may be generally disposed in the server 105. The data warehouse information processing method provided by the embodiment of the present disclosure may also be executed by a server or a server cluster that is different from the server 105 and is capable of communicating with the terminal devices 101, 102, 103 and/or the server 105. Accordingly, the data warehouse information processing apparatus provided by the embodiment of the present disclosure may also be disposed in a server or a server cluster different from the server 105 and capable of communicating with the terminal devices 101, 102, 103 and/or the server 105.
For example, the history query statement and the history table of the embodiment of the present disclosure may be stored in the terminal apparatuses 101, 102, and 103, the query statement and the history table are transmitted to the server 105 through the terminal apparatuses 101, 102, and 103, the server 105 creates the target table based on the query statement and the history table, or the terminal apparatuses 101, 102, and 103 may also create the target table directly based on the query statement and the history table. In addition, the query statement and the history table may also be directly stored in the server 105, and the target table is created by the server 105 directly based on the query statement and the history table.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Fig. 2A to 2C schematically show application scenarios of the data warehouse information processing method and the data warehouse information processing system according to the embodiment of the present disclosure. It should be noted that fig. 2A to 2C are only examples of scenarios in which the embodiments of the present disclosure may be applied to help those skilled in the art understand the technical content of the present disclosure, but do not mean that the embodiments of the present disclosure may not be applied to other devices, systems, environments or scenarios.
As shown in fig. 2A-2C, the application scenario 200 may include, for example, a plurality of table modes in a data warehouse, including, for example, a star mode 210 and a snowflake mode 220, and a wide table 230.
The star schema 210 and the snowflake schema 220 may be, for example, tabular schemas for storing data in a data warehouse, each tabular schema including, for example, a plurality of associated tables, according to embodiments of the present disclosure. As shown in FIG. 2A, the star schema 210 includes, for example, a table 211, a table 212, and a table 213, among others. As shown in fig. 2B, the snowflake pattern 220 includes, for example, a table 221, a table 222, a table 223, a table 224, a table 225, a table 226, and the like.
For example, for sales data, each of the multiple forms stores user information, commodity information, merchant information, and the like, respectively.
In the embodiment of the present disclosure, the related tables in the data warehouse are generally queried through query statements, and analysis and mining are performed by acquiring data in the related tables, so as to provide decisions for various services.
Because the process of querying a plurality of relevant tables in the data warehouse through a query statement to obtain relevant information is complex, the information stored in the plurality of tables and the association relationship between the plurality of tables need to be known to better query the required data. For example, when the sales data needs to be queried, the user information table, the commodity information table, the merchant information table, and the like need to be queried in association.
In order to improve the convenience of service use, it is generally required to establish a wide table, for example, a wide table about sales data, the wide table including data information of a plurality of tables, for example, data information including a user information table, a commodity information table, and a merchant information table, for facilitating service inquiry use.
The embodiment of the present disclosure can determine the related tables from the query statement by obtaining the query statement for querying a plurality of tables, for example, obtaining the query statements for querying the tables 221, 222, 223, 224, 225, 226, for example, determining the tables related to the query statement as the tables 221, 222, 223, 224, 225, 226, and creating the related wide table 230 based on the plurality of tables, as shown in fig. 2C, where the wide table 230 includes data information of the plurality of tables, for example.
The method and the device for automatically constructing the wide table have the advantages that the multiple tables in the data warehouse are determined from the query statement, the wide table of the data warehouse is created based on the multiple tables, and the automatic construction process of the wide table is achieved.
Fig. 3A schematically illustrates a flow chart of a data warehouse information processing method according to an embodiment of the present disclosure.
As shown in fig. 3A, the method includes operations S310 to S330.
In the embodiment of the present disclosure, the main function of the Data warehouse is to systematically analyze and arrange a large amount of Data generated by the business System through online transaction processing (OLTP), and use various analysis methods, such as online analysis processing (OLAP) and Data Mining (Data Mining), through a Data storage architecture specific to the Data warehouse theory, so as to serve systems such as a Decision Support System (Decision Support System). The data warehouse can help a decision maker to quickly and effectively analyze valuable information from a large amount of data so as to facilitate decision making and quickly reflect external environment changes and help to construct a commercial intelligent solution.
In the embodiment of the present disclosure, in the process of building a data warehouse, a dimension design maps a relationship to a set of relationship tables, and in general, the dimension design adopts two ways: star patterns and snowflake patterns. The star pattern can be described as a simple star: the central table contains fact data, and a plurality of tables are radially distributed centering on the central table and connected with each other by a main key and an outer key.
According to an embodiment of the present disclosure, the data warehouse includes a plurality of history tables associated with storage, and it is understood that the tables described in the embodiment of the present disclosure include a data table for storing data in the data warehouse, where the data table may include a plurality of data columns (or data fields), and the data in each data column is data of a different field type. Specifically, the plurality of history tables may be, for example, fact tables, dimension tables, or already created wide tables in a data warehouse, and the like.
In operation S310, at least one historical query statement for querying related data of a plurality of historical tables among the associatively stored historical tables is obtained.
According to the embodiment of the present disclosure, the history query statement may be, for example, a query statement used by a relevant business person to query data in a data warehouse, and the query statement may be used to query relevant data of a plurality of history tables from the history tables in the associated storage in the data warehouse. The query statement may be an SQL query statement.
For example, obtaining at least one historical query statement includes: obtaining historical operating data of a data warehouse involved in querying related data of a plurality of historical tables in the history tables stored in a correlation mode through historical query statements, and determining at least one historical query statement based on the historical operating data.
In the embodiment of the disclosure, in the process of querying the relevant data of a plurality of historical tables through the historical query statement, the historical operating data of the data warehouse can be generated, and the historical operating data can be, for example, the original operating logs related to the base warehouse table, the base mart table and the user-defined table in the data warehouse when the relevant data in the data warehouse is queried.
FIG. 3B schematically shows a historical operational data diagram of a data warehouse, according to an embodiment of the disclosure.
The raw operation data of the data warehouse may be, for example, a raw operation log of the data warehouse shown in fig. 3B, which includes historical query statements for querying relevant data in the data warehouse.
In the disclosed embodiment, at least one historical query statement is determined from the running data, and for example, the clean, complete and ordered historical query statement and relevant important system running information (including relevant warning and error information, for example) are extracted from the original running log of the data warehouse.
For example, the relatively chaotic operation log may be simply cleaned by using a customized regular expression, for example, the original operation log in fig. 3B is cleaned, and the cleaning result is shown in table 1. Wherein the cleaning result comprises a historical query statement, for example comprising SQL content.
TABLE 1
In operation S320, a plurality of history tables corresponding to at least one history query statement are determined.
In the embodiment of the present disclosure, the plurality of history tables may be, for example, tables involved in a history query statement, that is, when related data in the data warehouse needs to be queried, data of the plurality of tables in the data warehouse may be queried through the history query statement, where the plurality of tables are history tables corresponding to the history query statement.
Determining a plurality of history tables corresponding to at least one history query statement comprises: and analyzing the at least one historical query statement to obtain the associated information of the at least one historical query statement, wherein the associated information comprises associated fields and associated conditions.
In the embodiment of the present disclosure, the historical query statement may be analyzed to obtain related information of the historical query statement, for example, the SQL analyzer may be used to analyze the historical query statement to obtain related information of the historical query statement.
The SQL parser is a tool for analyzing related analysis such as main foreign key relation, table association and the like among fields in query SQL by utilizing metadata information and data query SQL.
For example, the historical query statement is shown in table 2, and the result of analyzing the historical query statement by using the SQL parser is shown in table 3.
The association information may include, for example, an association field and an association condition in the query statement, where the association field may include, for example, an aggregation field, an ordering field, a condition field, a query field, and the like, and the association condition may include, for example, table association, field association.
TABLE 2
| SELECT
|
| A,B,Z,COUNT(1)AS CT
|
| FROM ODS.FOO FOO
|
| INNER JOIN ODS.BAR BAR
|
| ON FOO.A=BAR.X AND FOO.B=BAR.Y
|
| GROUP BY A.B.Z
|
| ORDER BY A,B,Z DESC; |
TABLE 3
According to the embodiment of the disclosure, a plurality of historical tables corresponding to at least one historical query statement are determined based on the association information.
For example, a plurality of history tables corresponding to the query statement may be determined through the association information in table 3, and the plurality of history tables are stored in a data warehouse, for example, the determined plurality of history tables are shown in table 4 and table 5.
The specific workflow of the SQL parser is briefly introduced below.
Fig. 3C schematically illustrates a visualization diagram of an abstract syntax tree according to an embodiment of the present disclosure.
For ease of illustration, an example is made herein for a relatively simple historical query statement, as shown in Table 6.
TABLE 4
| Field(s)
|
Type of field
|
Field comments
|
| A
|
STRING
|
Column A
|
| B
|
STRING
|
Column B
|
| C
|
STRING
|
Column C
|
| D
|
STRING
|
Column D |
TABLE 5
| Field(s)
|
Type of field
|
Field comments
|
| X
|
STRING
|
Column X
|
| Y
|
STRING
|
Column Y
|
| Z
|
STRING
|
Column Z |
TABLE 6
| SELECT A.Z
|
| FROM ODS.FOO FOO
|
| INNER JOIN ODS.BAR BAR
|
| ON FOO.B=BAR.Y
|
| WHERE BAR.Z LIKE′LEO′; |
The original query SQL (historical query statements) is first parsed into an SQL abstract syntax tree, with the visualization effect as shown in fig. 3C.
Extracting the associated information in the SQL abstract syntax tree according to the constructed SQL abstract syntax tree, such as extracting table association (Join classes), field association (Join classes Conditions), aggregation field (Group By), sorting field (OrderBy), condition field (Where classes), Query field (Query Columns) and the like.
In the embodiment of the present disclosure, the association information (for example, as shown in table 3) analyzed by the SQL parser may be indexed and warehoused in the query engine for subsequent query invocation.
In operation S330, a target table is generated based on a specific history table of the plurality of history tables, the target table including related data in the specific history table.
According to an embodiment of the present disclosure, the specific history table is, for example, all or a part of the plurality of history tables. For example, the specific history table is a history table satisfying a second preset threshold value among the plurality of history tables. The second preset threshold may be, for example, a plurality of tables with high frequency of occurrence of the history tables, that is, the specific history table may be a history table with a large number of times involved in the history query statement, and since the number of times of occurrence of the specific history table is high, it can be indicated that the number of times of querying the specific history table is large, and thus it is known that the business personnel has a large demand for the specific history table.
In the embodiment of the disclosure, the specific history table is created into a target table of the data warehouse, for example, a wide table of the data warehouse is created, the target table includes relevant data in the specific history table, and the target table is convenient for a user to use, in other words, the user is more convenient to query data from the wide table, so as to effectively and quickly query and analyze valuable information from the data warehouse, and make a decision.
According to the embodiment of the disclosure, by determining a plurality of historical tables based on the historical query statement, and constructing a target table based on the plurality of historical tables, wherein the target table comprises relevant data of the plurality of historical tables, and the target table is a wide table of a data warehouse, for example, the technical effect of optimizing the construction process of the wide table in the data warehouse, for example, achieving the automatic construction of the wide table, can be achieved through the scheme of the embodiment of the disclosure.
Fig. 4 schematically illustrates a flow chart of a data warehouse information processing method according to another embodiment of the present disclosure.
As shown in fig. 4, the method includes operations S310 to S330 and operation S410. Operations S310 to S320 are the same as or similar to the operations described above with reference to fig. 3A, and are not described again here.
In operation S410, a query statement satisfying a first preset condition is acquired from a plurality of initial historical query statements as at least one historical query statement.
According to the implementation of the present disclosure, the plurality of initial historical query statements may be, for example, query statements that have long been used for querying related data in the data warehouse, wherein the first preset condition may be, for example, a query statement with high similarity, that is, a query statement with high similarity is obtained from the plurality of initial historical query statements as at least one historical query statement.
According to the embodiment of the disclosure, acquiring a query statement satisfying a first preset condition from a plurality of initial historical query statements as at least one historical query statement includes: clustering the plurality of initial historical query sentences to obtain at least one query sentence group, wherein the similarity between the historical query sentences in each query sentence group meets a first preset threshold value.
In the embodiment of the present disclosure, for example, a cluster with a high similarity in a plurality of historical query statements may be analyzed offline through a clustering method, where the cluster with a high similarity includes, for example, at least one historical query statement, and the cluster with a high similarity may be used to check whether a redundant wide table with an excessively high similarity exists between a current data warehouse and a mart wide table. And aiming at the existing data warehouse and mart broad table and all temporary query statements customized by a user, establishing a real-value query index according to the vectorized real-value vector and a unique query ID, and warehousing the real-value query index to a real-value vector query engine for subsequent process query.
In the embodiment of the present disclosure, before clustering the plurality of initial historical query statements, the plurality of initial historical query statements may also be preprocessed.
The initial historical query statement is subjected to data preprocessing, for example, the SQL parser may process the association information obtained by the initial historical query statement, and semantic mapping is constructed based on the association data. Semantic mapping can be understood as that for the same concept (for example, commodity ID), the field names in the table 4 are A, the field names in the table 5 are B, and according to the association information, a unique identifier is determined to replace the field names in different table names in SQL. Besides, the preprocessing also includes some works such as the normalization adjustment of SQL syntax, case conversion, etc., in order to ensure that the code segments with semantic consistency naturally ensure the similarity of their contents.
In an implementation of the present disclosure, clustering a plurality of initial historical query statements to obtain at least one query statement group includes:
and processing the plurality of initial historical query sentences to obtain vectors corresponding to the plurality of initial historical query sentences.
For example, before clustering a plurality of initial historical query statements, it is necessary to vectorize the query statements and perform clustering processing on the quantized query statements. For example, the initial historical query statement is converted into a real-valued vector using a natural language vectorization method such as Word2Vec, sequence 2Vec, and Document2 Vec. For example, one example of the conversion is shown in table 7.
TABLE 7
And clustering vectors corresponding to the plurality of initial historical query sentences to obtain at least one query sentence group, wherein the at least one query sentence group comprises the vectors corresponding to the corresponding query sentences.
For example, the vectors corresponding to the initial historical query statements are clustered to obtain at least one query statement group, each query statement group includes vectors of a plurality of query statements, and the query statements in each query statement group have a certain similarity, which may be a preset similarity set according to a requirement, for example.
And determining a query statement group meeting a first preset condition from the at least one query statement group as a target query statement group, wherein the target query statement group comprises at least one historical query statement.
In the embodiment of the present disclosure, the first preset condition may be, for example, a preset quantity value, where the query statements in the query statement group have corresponding quantity values, and when the quantity values satisfy the preset quantity value, the query statement group may be regarded as the target query statement group. The set of target query statements includes, for example, a plurality of historical query statements.
Fig. 5 schematically illustrates a flow chart of a data warehouse information processing method according to yet another embodiment of the present disclosure.
As shown in fig. 5, the method includes operations S310 to S320, operation S410, and operation S510. Operations S310 to S330 are the same as or similar to the operations described above with reference to fig. 3A, and operation S410 is the same as or similar to the operations described above with reference to fig. 4, and are not repeated herein.
In operation S510, in the case where the target table satisfies a second preset condition, the target table is stored. Examples include: and acquiring the similarity between the target table and other historical tables in the data warehouse, and storing the target table under the condition that the similarity meets a third preset threshold.
According to the embodiment of the present disclosure, the second preset condition may be, for example, a target table meeting a preset similarity, for example, determining the similarity between the target table and other historical tables in the data warehouse, and storing the target table when the similarity meets a third preset threshold, where the third preset threshold may be, for example, specific data, so as to avoid data redundancy caused by high similarity of tables in the data warehouse.
FIG. 6 schematically illustrates a data warehouse wide table building flow diagram according to an embodiment of the disclosure.
As shown in fig. 6, the embodiment of the present disclosure discloses an automatic construction method of a data warehouse wide table based on information extraction, and the whole construction method includes operations S610 to S650.
In operation S610, relevant unstructured data, such as query SQL, query logs, etc., of some basic warehouse tables, basic mart tables, and user-defined tables in the warehouse layer, the mart layer, and the user layer, which are obtained by integrating layer data summarization, are collected.
In operation S620, the customized SQL parser is used to parse the main foreign key relationships, association queries, aliases, and other related relationships between different tables from the query SQL and the query log, and establish a query engine for related data.
In operation S630, a user-defined SQL vectorization method (SQL2Vec) is used to obtain similar queries according to the query SQL, the query log and the constructed related data index mining, and a historical query SQL real-valued vector query engine and a query SQL similarity clustering result are established.
In operation S640, the constructed related data engine, the historical query SQL real-valued vector query engine, and the query SQL similarity clustering result are used to count and summarize information such as data fields and data tables with high co-occurrence frequency, so as to generate a candidate template of the new data warehouse wide table.
In operation S650, a final new data warehouse wide table is obtained and solidified according to the candidate templates of the new data warehouse wide table and the service expert advice.
FIG. 7 schematically illustrates a data warehouse wide table candidate template generation and review flow diagram according to an embodiment of the disclosure.
As shown in fig. 7, in the embodiment of the present disclosure, the last process of the automated data warehouse wide table construction scheme is the generation of candidate templates of the data warehouse wide table, and the auditing by the business experts and the solidification of the final new data warehouse wide table. The flow includes operations S710 to S790.
In operation S710, for the new query SQL for customization, a vectorization result thereof is obtained through the preprocessing in the above flow.
In operation S720, the vectorized result is added to the historical query SQL real-valued vector query engine.
In operation S730, the query SQL similarity clustering result is updated.
In operation S740, the updated historical query SQL real-valued vector data and the query SQL similarity clustering result are periodically transferred to the trigger, and the trigger determines whether to generate a new data warehouse wide-table template according to a defined rule. In the trigger, the core trigger rule may be understood as that when a large number of new queries have a high similarity degree to be aggregated into a cluster, and at the same time, the similarity degree with all queries in the existing database table is less than a certain value, then information such as data tables and data fields with high co-occurrence frequency is extracted from the new queries aggregated into a cluster.
In operation S750, a template of a new data warehouse wide table is generated based on the extracted information such as the data table and the data field having the higher co-occurrence frequency.
In operation S760, after a new data warehouse wide table template is generated, an audit process is triggered, and an expert in the data warehouse performs audit and correction.
In operation S770, the modified wide-list template is finally solidified into a new data warehouse wide-list.
The relevant information of the new data warehouse wide table is updated to the historical query SQL real-valued vector query engine in operation S780.
Relevant information of the new data warehouse wide table is updated to the query SQL similarity clustering result in operation S790.
Fig. 8 schematically shows a block diagram of a data warehouse information processing apparatus according to an embodiment of the present disclosure.
As shown in fig. 8, the data warehouse information processing apparatus 800 includes a first acquisition module 810, a determination module 820, and a generation module 830.
The first obtaining module 810 can obtain at least one historical query statement, the historical query statement being used for querying relevant data of a plurality of historical tables in association with stored historical tables.
According to an embodiment of the present disclosure, obtaining at least one historical query statement includes: obtaining historical operating data of a data warehouse involved in querying related data of a plurality of historical tables in the history tables stored in a correlation mode through historical query statements, and determining at least one historical query statement based on the operating data.
According to an embodiment of the present disclosure, the first obtaining module 810 may perform, for example, the operation S310 described above with reference to fig. 3A, which is not described herein again.
The determination module 820 may determine a plurality of historical tables corresponding to at least one historical query statement.
According to the embodiment of the disclosure, determining a plurality of history tables corresponding to at least one history query statement comprises: analyzing the at least one historical query statement to obtain associated information of the at least one historical query statement, wherein the associated information comprises associated fields and associated conditions, and determining a plurality of historical tables corresponding to the at least one historical query statement based on the associated information.
According to an embodiment of the present disclosure, the determining module 820 may perform, for example, the operation S320 described above with reference to fig. 3A, which is not described herein again.
The generation module 830 can generate a target table based on a particular history table of the plurality of history tables, the target table including relevant data in the particular history table.
According to an embodiment of the present disclosure, the specific history table is a history table that satisfies a second preset threshold value among the plurality of history tables.
According to the embodiment of the present disclosure, the generating module 830 may perform the operation S330 described above with reference to fig. 3A, for example, and is not described herein again.
Fig. 9 schematically shows a block diagram of a data warehouse information processing apparatus according to another embodiment of the present disclosure.
As shown in fig. 9, the data warehouse information processing apparatus 900 includes a first acquisition module 810, a determination module 820, a generation module 830, and a second acquisition module 910. The first obtaining module 810, the determining module 820 and the generating module 830 are the same as or similar to the modules described above with reference to fig. 8, and are not described herein again.
The second obtaining module 910 may obtain, as at least one historical query statement, a query statement satisfying a first preset condition from among the plurality of initial historical query statements.
According to the embodiment of the disclosure, acquiring a query statement satisfying a first preset condition from a plurality of initial historical query statements as at least one historical query statement includes: clustering a plurality of initial historical query sentences to obtain at least one query sentence group, wherein the similarity between the historical query sentences in each query sentence group meets a first preset threshold, determining the query sentence group meeting a first preset condition from the at least one query sentence group as a target query sentence group, and the target query sentence group comprises at least one historical query sentence.
According to the embodiment of the present disclosure, clustering a plurality of initial historical query statements to obtain at least one query statement group includes: processing the initial historical query sentences to obtain vectors corresponding to the initial historical query sentences, and clustering the vectors corresponding to the initial historical query sentences to obtain at least one query sentence group, wherein the at least one query sentence group comprises the vectors corresponding to the corresponding query sentences.
According to the embodiment of the present disclosure, the second obtaining module 910 may, for example, perform operation S410 described above with reference to fig. 4, which is not described herein again.
Fig. 10 schematically shows a block diagram of a data warehouse information processing apparatus according to still another embodiment of the present disclosure.
As shown in fig. 10, the data warehouse information processing apparatus 1000 includes a first acquisition module 810, a determination module 820, a generation module 830, a second acquisition module 910, and a storage module 1010. The first obtaining module 810, the determining module 820 and the generating module 830 are the same as or similar to the modules described above with reference to fig. 8, and are not described herein again. The second obtaining module 910 is the same as or similar to the module described above with reference to fig. 9, and is not described herein again.
The storage module 1010 may store the target table in a case where the target table satisfies a second preset condition.
According to the embodiment of the present disclosure, in the case that the target table satisfies the second preset condition, storing the target table includes: and acquiring the similarity between the target table and other historical tables in the data warehouse, and storing the target table under the condition that the similarity meets a third preset threshold.
According to the embodiment of the present disclosure, the storage module 1010 may perform, for example, the operation S510 described above with reference to fig. 5, which is not described herein again.
Any number of modules, sub-modules, units, sub-units, or at least part of the functionality of any number thereof according to embodiments of the present disclosure may be implemented in one module. Any one or more of the modules, sub-modules, units, and sub-units according to the embodiments of the present disclosure may be implemented by being split into a plurality of modules. Any one or more of the modules, sub-modules, units, sub-units according to embodiments of the present disclosure may be implemented at least in part as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented in any other reasonable manner of hardware or firmware by integrating or packaging a circuit, or in any one of or a suitable combination of software, hardware, and firmware implementations. Alternatively, one or more of the modules, sub-modules, units, sub-units according to embodiments of the disclosure may be at least partially implemented as a computer program module, which when executed may perform the corresponding functions.
For example, any plurality of the first obtaining module 810, the determining module 820, the generating module 830, the second obtaining module 910, and the storing module 1010 may be combined and implemented in one module, or any one of the modules may be split into a plurality of modules. Alternatively, at least part of the functionality of one or more of these modules may be combined with at least part of the functionality of the other modules and implemented in one module. According to an embodiment of the present disclosure, at least one of the first obtaining module 810, the determining module 820, the generating module 830, the second obtaining module 910, and the storing module 1010 may be at least partially implemented as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented by hardware or firmware in any other reasonable manner of integrating or packaging a circuit, or implemented by any one of three implementations of software, hardware, and firmware, or an appropriate combination of any several of them. Alternatively, at least one of the first obtaining module 810, the determining module 820, the generating module 830, the second obtaining module 910 and the storing module 1010 may be at least partially implemented as a computer program module, which when executed may perform a corresponding function.
FIG. 11 schematically illustrates a block diagram of a computer system suitable for data warehouse information processing, in accordance with an embodiment of the present disclosure. The computer system illustrated in FIG. 11 is only one example and should not impose any limitations on the scope of use or functionality of embodiments of the disclosure.
As shown in fig. 11, a computer system 1100 according to an embodiment of the present disclosure includes a processor 1101, which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)1102 or a program loaded from a storage section 1108 into a Random Access Memory (RAM) 1103. The processor 1101 may comprise, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or associated chipset, and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), among others. The processor 1101 may also include on-board memory for caching purposes. The processor 1101 may comprise a single processing unit or a plurality of processing units for performing the different actions of the method flows according to the embodiments of the present disclosure.
In the RAM 1103, various programs and data necessary for the operation of the system 1100 are stored. The processor 1101, the ROM1102, and the RAM 1103 are connected to each other by a bus 1104. The processor 1101 performs various operations of the method flow according to the embodiments of the present disclosure by executing programs in the ROM1102 and/or the RAM 1103. It is noted that the programs may also be stored in one or more memories other than the ROM1102 and RAM 1103. The processor 1101 may also perform various operations of the method flows according to the embodiments of the present disclosure by executing programs stored in the one or more memories.
System 1100 may also include an input/output (I/O) interface 1105, which input/output (I/O) interface 1105 is also connected to bus 1104, according to an embodiment of the present disclosure. The system 1100 may also include one or more of the following components connected to the I/O interface 1105: an input portion 1106 including a keyboard, mouse, and the like; an output portion 1107 including a signal output unit such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and a speaker; a storage section 1108 including a hard disk and the like; and a communication section 1109 including a network interface card such as a LAN card, a modem, or the like. The communication section 1109 performs communication processing via a network such as the internet. A driver 1110 is also connected to the I/O interface 1105 as necessary. A removable medium 1116 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted as necessary on the drive 1110, so that a computer program read out therefrom is mounted as necessary in the storage section 1108.
According to embodiments of the present disclosure, method flows according to embodiments of the present disclosure may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable storage medium, the computer program containing program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication portion 1109 and/or installed from the removable medium 1111. The computer program, when executed by the processor 1101, performs the above-described functions defined in the system of the embodiment of the present disclosure. The systems, devices, apparatuses, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the present disclosure.
The present disclosure also provides a computer-readable storage medium, which may be contained in the apparatus/device/system described in the above embodiments; or may exist separately and not be assembled into the device/apparatus/system. The computer-readable storage medium carries one or more programs which, when executed, implement the method according to an embodiment of the disclosure.
According to embodiments of the present disclosure, the computer-readable storage medium may be a computer-non-volatile computer-readable storage medium, which may include, for example and without limitation: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
For example, according to embodiments of the present disclosure, a computer-readable storage medium may include the ROM1102 and/or the RAM 1103 and/or one or more memories other than the ROM1102 and the RAM 1103 described above.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Those skilled in the art will appreciate that various combinations and/or combinations of features recited in the various embodiments and/or claims of the present disclosure can be made, even if such combinations or combinations are not expressly recited in the present disclosure. In particular, various combinations and/or combinations of the features recited in the various embodiments and/or claims of the present disclosure may be made without departing from the spirit or teaching of the present disclosure. All such combinations and/or associations are within the scope of the present disclosure.
The embodiments of the present disclosure have been described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described separately above, this does not mean that the measures in the embodiments cannot be used in advantageous combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be devised by those skilled in the art without departing from the scope of the present disclosure, and such alternatives and modifications are intended to be within the scope of the present disclosure.