CN113094444A - Data processing method, data processing apparatus, computer device, and medium - Google Patents

Data processing method, data processing apparatus, computer device, and medium Download PDF

Info

Publication number
CN113094444A
CN113094444A CN202010024591.1A CN202010024591A CN113094444A CN 113094444 A CN113094444 A CN 113094444A CN 202010024591 A CN202010024591 A CN 202010024591A CN 113094444 A CN113094444 A CN 113094444A
Authority
CN
China
Prior art keywords
aggregation
tables
screening
dimension
query request
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010024591.1A
Other languages
Chinese (zh)
Other versions
CN113094444B (en
Inventor
杨文波
李铭浩
沈俊杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Wodong Tianjun Information Technology Co Ltd
Original Assignee
Beijing Wodong Tianjun Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Wodong Tianjun Information Technology Co Ltd filed Critical Beijing Wodong Tianjun Information Technology Co Ltd
Priority to CN202010024591.1A priority Critical patent/CN113094444B/en
Publication of CN113094444A publication Critical patent/CN113094444A/en
Application granted granted Critical
Publication of CN113094444B publication Critical patent/CN113094444B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present disclosure provides a data processing method, including: a data source is acquired. Wherein the data source comprises at least one dimension item and at least one index item. And respectively carrying out multiple times of aggregation on the data sources to obtain multiple aggregation tables. And receiving a query request from a client, and screening the aggregation tables according with the query request from the aggregation tables. And sending the aggregation table which is obtained by screening and accords with the query request to the client. The present disclosure also provides a data processing apparatus, a computer device, and a computer-readable storage medium.

Description

Data processing method, data processing apparatus, computer device, and medium
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a data processing method, a data processing apparatus, a computer device, and a medium.
Background
In the related art, a user needs to specify a data source and an aggregation table when sending a query request to a data warehouse, but as the business requirement changes, the dimension of user analysis also changes, which results in aggregation of multiple aggregation tables from one data source. The data tables of the existing data warehouse can be regarded as materialized views or indexes in the database, and a table which meets the query requirement and has the optimal query performance needs to be selected from a large number of returned aggregation tables after a user sends a query request every time, so that the user needs to clearly know the structure and the model of the data tables. Therefore, not only the learning cost of the user is increased, but also the maintenance cost of the server is increased.
Disclosure of Invention
In view of the above, the present disclosure provides a data processing method, a data processing apparatus, a computer device, and a medium.
One aspect of the present disclosure provides a data processing method, including: a data source is acquired. Wherein the data source comprises at least one dimension item and at least one index item. And respectively carrying out multiple times of aggregation on the data sources to obtain multiple aggregation tables. And receiving a query request from a client, and screening the aggregation tables according with the query request from the aggregation tables. And sending the aggregation table which is obtained by screening and accords with the query request to the client.
According to an embodiment of the present disclosure, the aggregating the data sources for a plurality of times to obtain a plurality of aggregation tables respectively includes: for any one of a plurality of polymerizations, a dimension term for the any one polymerization is determined from at least one dimension term, an indicator term for the any one polymerization is determined from at least one indicator term, and a polymerization function for the any one polymerization is determined. And aggregating the data sources based on the dimension item for the any aggregation, the index item for the any aggregation and the aggregation function for the any aggregation to obtain an aggregation table for the any aggregation.
According to the embodiment of the present disclosure, the data source performs data update and version update every predetermined period. The version of any of the plurality of aggregated tables is the same as the version of the data source for the any aggregated table.
According to an embodiment of the present disclosure, the query request includes a query version. The step of obtaining the aggregation table meeting the query request from the aggregation tables includes: and screening the aggregation tables according to the query version to obtain the aggregation table subjected to the first screening.
According to an embodiment of the present disclosure, the query request further includes: a specified dimension item and a specified index item. The step of screening the aggregation tables meeting the query request from the multiple aggregation tables further includes: and screening the aggregation tables containing the specified dimension items and the specified index items from the aggregation tables subjected to the first screening to obtain aggregation tables subjected to the second screening.
According to the embodiment of the disclosure, the query request further includes a filter condition, and the filter condition includes the specified dimension item and a value of the specified dimension item. The step of screening the aggregation tables meeting the query request from the multiple aggregation tables further includes: for any aggregation table in the aggregation tables subjected to the second screening, searching the dimension items in the any aggregation table according to a preset sequence. And matching each searched dimension item with the specified dimension item, if the matching is successful, determining the weight of the dimension item, continuing to search the next dimension item, and if the matching is failed, ending the search. A score for any of the aggregated tables is then determined based on the determined weights of the dimension items. After determining the scores of the aggregation tables subjected to the second screening, the aggregation table having the highest score is screened as the aggregation table subjected to the third screening.
According to an embodiment of the present disclosure, the screening the aggregation tables meeting the query request from the multiple aggregation tables further includes: and screening the aggregation table with the minimum dimension item number and/or the minimum data number from the aggregation tables subjected to the third screening.
Another aspect of the present disclosure provides a data processing apparatus including: the device comprises an acquisition module, an aggregation module, a receiving module, a screening module and a sending module. The acquisition module is used for acquiring a data source. Wherein the data source comprises at least one dimension item and at least one index item. The aggregation module is used for respectively carrying out multiple times of aggregation on the data sources so as to obtain multiple aggregation tables. And the dimension items of the aggregation tables are different, and/or the index items of the aggregation tables are different. The receiving module is used for receiving a query request from a client. The screening module is used for screening the aggregation tables meeting the query request from the aggregation tables. And the sending module is used for sending the aggregation table meeting the query request to the client.
Another aspect of the present disclosure provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method as described above when executing the program.
Another aspect of the present disclosure provides a computer-readable storage medium storing computer-executable instructions for implementing the method as described above when executed.
Another aspect of the disclosure provides a computer program comprising computer executable instructions for implementing the method as described above when executed.
According to the embodiment of the disclosure, when a user needs to query aggregation tables related to certain dimensions and certain indexes, the aggregation tables most meeting the query request are screened from a plurality of aggregation tables obtained by preprocessing based on the query request, and then the screened aggregation tables are returned to a client. The client does not need to return a plurality of aggregation tables to the client as in the related art, so that a user does not need to select from the returned aggregation tables, the user can know interested data as fast as possible, a large amount of learning and using cost is not needed, and the user requirement is met.
Drawings
The above and other objects, features and advantages of the present disclosure will become more apparent from the following description of embodiments of the present disclosure with reference to the accompanying drawings, in which:
fig. 1 schematically shows an exemplary system architecture of an application data processing method and a data processing apparatus according to an embodiment of the present disclosure;
FIG. 2 schematically shows a flow chart of a data processing method according to an embodiment of the present disclosure;
FIG. 3 schematically shows a flow chart of a data processing method according to another embodiment of the present disclosure;
FIG. 4 schematically shows a block diagram of a data processing apparatus according to an embodiment of the present disclosure; and
FIG. 5 schematically shows a block diagram of a computer device according to an embodiment of the disclosure.
Detailed Description
Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is illustrative only and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present disclosure.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.
All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It is noted that the terms used herein should be interpreted as having a meaning that is consistent with the context of this specification and should not be interpreted in an idealized or overly formal sense.
Where a convention analogous to "at least one of A, B and C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B and C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.). Where a convention analogous to "A, B or at least one of C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B or C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).
The embodiment of the disclosure provides a data processing method and a data processing device. The method includes a preparation process, a reception process, and a routing process. The preparation process may be divided into an acquisition process and an aggregation process. In the acquisition process, a data source is acquired. Wherein the data source comprises at least one dimension item and at least one index item. In the aggregation process, the acquired data sources are aggregated for multiple times respectively to obtain multiple aggregation tables. And the dimension items of the aggregation tables are different, and/or the index items of the aggregation tables are different. After the preparation is finished, the receiving process can be carried out. In the receiving process, a query request from a client is received. Then, a routing process is performed, and the routing process can be divided into a screening process and a sending process. In the screening process, the aggregation tables which accord with the query request are screened from the aggregation tables. And in the sending process, sending the aggregation table which is obtained by screening and accords with the query request to the corresponding client.
Fig. 1 schematically shows an exemplary system architecture 100 to which the data processing method and data processing apparatus may be applied, according to an embodiment of the present disclosure. It should be noted that fig. 1 is only an example of a system architecture to which the embodiments of the present disclosure may be applied to help those skilled in the art understand the technical content of the present disclosure, and does not mean that the embodiments of the present disclosure may not be applied to other devices, systems, environments or scenarios.
As shown in fig. 1, the system architecture 100 may include a terminal device 101, a network 102, a primary server 103, and secondary servers 104, 105, and 106, where different servers in different levels or in the same level may provide different service-related data.
Network 102 is a medium used to provide a communication link between terminal device 101 and primary server 103. Network 102 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few. The primary server 103 may communicate with the secondary servers 104, 105, 106 via various wired or wireless communication links.
The terminal device 101 may be any of various electronic devices including, but not limited to, a smart phone, a personal computer, a tablet computer, a smart watch, and the like, without limitation.
A client application (hereinafter referred to as a "client") having various functions may be installed in the terminal apparatus 101. The functional support of each client in the terminal device 101 can be broken down into various tiers of servers. For example, one client in the terminal device 101 has an advertisement delivery function, the user sends an advertisement delivery request to the primary server 103 through the terminal device 101, and the primary server 103 delivers advertisements of different services through different secondary servers. For users who have delivered advertisements, they frequently check the delivery effect of the advertisements after the advertisements are delivered. The primary server 103 needs to collect data capable of representing the advertisement putting effect from each secondary server 104, 105, 106, and push the data to the terminal device 101 in the form of a report so as to be displayed to the user. In this example, the advertisement delivery service scenario is taken as an example, and in other examples, the method may be applied to various service scenarios, which is not limited herein.
It should be understood that the terminal devices, the network, the number of primary and secondary servers, and the number of tiers of servers in fig. 1 are merely illustrative. Any number may be provided according to practical requirements.
Currently, Online Analytical Processing (OLAP) technology is widely applied to data analysis and intelligent decision making, and the OLAP technology can be applied to the process of collecting data and forming a report.
The OLAP technology mainly uses a data warehouse to store and query data. Although there are great gaps in the architectural design and implementation details of various data warehouses due to different application scenarios, they all allow users to analyze multidimensional data from various angles, and the most commonly used is Roll-Up (Roll-Up), which represents aggregation operation on data along a certain dimension according to a certain rule. Aggregation is performed along a dimension, i.e., in a hierarchical relationship, from a child dimension to a parent dimension. In actual engineering practice, in order to meet the requirements of different analysis dimensions, multiple Roll-Up tables (also referred to as "aggregation tables") are usually created based on the same data source, and have different table modes (tableschemas) therebetween and are distinguished by different names. When a user wants to analyze data from a new dimension, if an original Roll-Up table cannot meet the requirement, a new Roll-Up table needs to be established based on the dimension analyzed by the user, and a new request is constructed based on the new Roll-Up table name to acquire the data.
When a user sends a query request to a data warehouse, a data source and a Roll-Up table need to be specified, but as the requirement of a service changes, the dimension of user analysis also changes, and multiple aggregation tables can be aggregated from one data source. The Roll-Up table of the existing data warehouse can be regarded as a materialized view or index in the database, and a user needs to select a table which meets the query requirement and has optimal query performance from a large number of returned Roll-Up tables after sending a query request every time, so that the user needs to clearly know the mode (schema) and the data model of the Roll-Up table, the learning cost of the user is increased, and the maintenance cost of a service end is also increased.
When a user carries out a query request, the user does not care how many Roll-Up tables are on a specified data source and the mode of each Roll-Up table, and the user only needs to care about dimensions and data structures under indexes which meet filtering conditions, so that a data warehouse should have a strategy of automatically routing to the optimal Roll-Up table according to the request, and the learning and use cost of the user is reduced.
According to the embodiment of the present disclosure, a data processing method for returning a data result that best meets the user requirement in response to a query request of a user is provided, and the method is exemplarily described below by way of an illustration. It should be noted that the sequence numbers of the respective operations in the following methods are merely used as representations of the operations for description, and should not be construed as representing the execution order of the respective operations. The method need not be performed in the exact order shown, unless explicitly stated.
Fig. 2 schematically shows a flow chart of a data processing method according to an embodiment of the present disclosure.
As shown in fig. 2, the method may include operations S201 to S205.
In operation S201, a data source is acquired.
Wherein the data source comprises at least one dimension (key) item and at least one indicator (value) item.
For example, in a business scenario of advertisement delivery, the data source is in a Table form, which may be referred to as a Base Table (Base Table), and the data source contains all dimension items and index items of the supported query. As shown in table 1.
TABLE 1
ID Date City Clicks Cost
1 2017 Beijing 100 10
1 2017 Shanghai province 100 10
1 2018 Beijing 200 20
2 2017 Beijing 150 15
2 2017 Shanghai province 200 20
2 2018 Beijing 200 20
In table 1, it can be seen from the header part that three dimensional items are included: ID (identification information), Date (Date), and City (City), and two index items: clincks (number of Clicks) and Cost (amount of spending). The body part lists 6 data, and each data comprises a value of a dimension item 'ID', a value of a dimension item 'Date', a value of a dimension item 'City', a value of an index item 'clinks' and a value of an index item 'Cost'. In this example, the different dimension items may be used to represent different advertisement effect data, such as identification information of a user who puts an advertisement, a putting plan, a putting unit, an advertisement creative idea, and the like, and may be divided according to actual business needs, which is not limited herein. The index item is a specific data presentation of the advertisement delivery effect, such as clicking, presentation, consumption and the like, and may be divided according to actual business needs, which is not limited herein. Table 1 above is only an exemplary illustration, in other scenarios, both the dimension items and the index items in the data source may be flexibly changed, for example, in a service scenario for music recommendation, the dimension items may be divided into music types, musicians, recommendation time, and the like, and the index items may be divided into play times, praise times, collection times, and the like.
Then, in operation S202, the data sources are aggregated for a plurality of times to obtain a plurality of aggregation tables, respectively.
In order to facilitate the query of a user, the data sources are respectively aggregated into a plurality of aggregation tables according to different dimension items and/or index items. The dimension items of the aggregation tables are different, and/or the index items of the aggregation tables are different. The data source comprises all dimension items and index items of the supported query, and the aggregation table obtained by aggregation generally only comprises partial dimension items and/or partial index items. The plurality of aggregation tables are different from each other, and the difference between any two aggregation tables is at least one of: different dimension items, different index items and different polymerization modes.
Next, in operation S203, a query request from a client is received.
Among other things, a query request from a client may include one or more parameters that can describe a query target of a user.
Next, in operation S204, an aggregation table that matches the query request is obtained by filtering from the aggregation tables.
In this step, in operation S204, aggregation tables that all correspond to the parameters in the query request are obtained from the multiple aggregation tables and are used as the query result for the query request.
Next, in operation S205, the filtered aggregation table that matches the query request is sent to the client.
As can be understood by those skilled in the art, when a user needs to query aggregation tables related to certain dimensions and certain indexes, the data processing method according to the embodiment of the present disclosure screens an aggregation table that best meets a query request from a plurality of aggregation tables obtained by preprocessing based on the query request, and then returns the screened aggregation table to a client. The client does not need to return a plurality of aggregation tables to the client as in the related art, so that a user does not need to select from the returned aggregation tables, the user can know interested data as fast as possible, a large amount of learning and using cost is not needed, and the user requirement is met.
According to the embodiment of the present disclosure, taking a single aggregation process as an example, the above process of aggregating data sources respectively for multiple times to obtain multiple aggregation tables is exemplarily described. For any one of the multiple aggregations, first, a dimension item for the aggregation is determined from at least one dimension item contained in the data source, an index item for the aggregation is determined from at least one index item contained in the data source, and an aggregation function for the aggregation is also determined. Then, the data sources are aggregated based on the dimension item for the aggregation, the index item for the aggregation and the aggregation function for the aggregation, so as to obtain an aggregation table for the aggregation.
For example, when data is imported into the data source, in order to facilitate data analysis, ascending order or descending order may be performed according to values of the dimension items, and if a plurality of dimension items exist, the dimension items are sequentially used as a sorting basis according to a predefined order. As in table 1 above, the order of the predefined dimension items is "ID" → "Date" → "City". The data are preferably arranged in an ascending order according to the value of the dimension item ID, the data with the same value of the dimension item ID are arranged in an ascending order according to the value of the dimension item Date, and the data with the same value of the dimension item Date are arranged in an ascending order according to the value of the dimension item City. And according to different types of values of all dimension items, using a sorting rule suitable for the types of the dimension items, such as sorting the integer according to the size, sorting the date type according to the time sequence and the like.
When the data source is subjected to primary aggregation, firstly, the aggregation operation for the current time is determinedThe dimension items include dimension item K1And the dimension term K2The index item aiming at the current aggregation operation comprises an index item V1And an aggregation function f for the current aggregation operation. In the data source, if there are N dimension items K of data1Are the same and the dimension item K1Is the same, the index item V of the N data is calculated based on the aggregation function f1To aggregate the N data into corresponding dimension items K1The dimension item K2And an index item V1N is a positive integer greater than 1. The common aggregation function may include: summation (Sum), counting (Count), minimum (Min), maximum (Max), etc. Illustratively, the data source is shown in table 2, and includes a dimension item "Date", a dimension item "City", and an index item "Cost".
TABLE 2
Date City Cost
2017 Beijing 10
2017 Tianjin 20
2018 Beijing 10
2018 Tianjin 20
The data source is aggregated for the first time, resulting in the aggregated table shown in table 3. The dimension item for the current aggregation is 'Date', the index item for the current aggregation is 'Cost', and the aggregation function for the current aggregation is a summation function.
TABLE 3
Date Cost
2017 30
2018 30
The data source was aggregated a second time, resulting in the aggregated table shown in table 4. The dimension item for the aggregation at this time is "City", the index item for the aggregation at this time is "Cost", and the aggregation function for the aggregation at this time is also a summation function.
TABLE 4
City Cost
Beijing 20
Tianjin 40
In the embodiment of the disclosure, since the data monitored by the data source is generated continuously with the passage of time, the data source performs data update and version (version) update every predetermined period. For example, the base table of the formed data source is filled with data for the first time as the data source of version 1, the base table of the formed data source is filled with data for the second time as the data source of version 2, and so on. When the aggregation operation is performed, the aggregation table obtained by aggregation has the same version as the data source targeted by the aggregation operation. For example, if table 3 above is aggregated for a data source of version 1, the version number of table 3 is set to version 1.
According to embodiments of the present disclosure, a query request from a client may include a query version. The process of screening the aggregation tables meeting the query request from the multiple aggregation tables may include: and screening the aggregation tables according to the query version to obtain the aggregation table subjected to the first screening. For example, if the latest version of the data source in the server is version 3, and the user wishes to view the aggregated data of the latest version, the query request includes a parameter representing "version 3". If the server side comprises a first server, a second server and a third server which are distributed and deployed, wherein the first server and the second server are updated to version 3, and the third server is only updated to version 2 currently due to delay between machines. When a query request containing a parameter representing 'version 3' is distributed to a first server or a second server, the first server or the second server screens an aggregation table with the version number of 'version 3' based on the query request, and returns the screened aggregation table to a user. And when the query request containing the parameter representing the version 3 is distributed to the third server, the third server finds that the aggregation table with the version number of the version 3 is not available at present based on the query request, and then the prompt message of not finding the query result is returned to the user. By accurately screening the query versions, the problem of inconsistent data aggregation results caused by asynchronous data updating among different machines can be avoided. The first filtering may be referred to as version rule based filtering.
According to an embodiment of the present disclosure, the query request may further include: a specified dimension item and a specified index item. One or more dimension items can be specified, and one or more index items can be specified. The specified dimension items and the specified index items represent dimension items and index items concerned by the user who carries out the query. The process of screening the aggregation tables meeting the query request from the multiple aggregation tables may further include: and screening the aggregation tables containing the specified dimension items and the specified index items from the aggregation tables subjected to the first screening to obtain aggregation tables subjected to the second screening. The second filtering may be referred to as a query term rule-based filtering.
It can be understood that the aggregation table subjected to the second filtering contains the dimension items and the index items concerned by the user, but may also contain some dimension items and index items not concerned by the user. And if the sorting mode is not proper, the efficiency of searching the concerned dimension items and index items by the user is not high. For this reason, the aggregation table subjected to the second filtering may further be subjected to third filtering, and according to an embodiment of the present disclosure, the query request may further include a filtering condition, where the filtering condition includes the specified dimension item and a value of the specified dimension item. There may be one or more of the dimension-specifying items.
The process of screening the aggregation tables meeting the query request from the multiple aggregation tables may further include: for any aggregation table in the aggregation tables subjected to the second screening, searching the dimension items in the any aggregation table according to a preset sequence. And matching each searched dimension item with the specified dimension item, if the matching is successful, determining the weight of the dimension item, continuing to search the next dimension item, and if the matching is failed, ending the search. A score for any of the aggregated tables is then determined based on the determined weights of the dimension items. After determining the scores of the aggregation tables subjected to the second screening, the aggregation table having the highest score is screened as the aggregation table subjected to the third screening. The higher the score of an aggregation table is, the faster all the designated dimension items and the values of all the designated dimension items can be found in the aggregation table according to a predetermined sequence. The third filtering may be referred to as filtering, sorting rule based filtering.
Further, according to an embodiment of the present disclosure, the screening the aggregation tables meeting the query request from the multiple aggregation tables may further include: and screening the aggregation table with the minimum dimension item number and/or the minimum data number from the aggregation tables subjected to the third screening. For example, if the dimension items are arranged in columns, the aggregation table with fewer columns corresponding to the dimension items is selected, because the aggregation table with fewer columns corresponding to the dimension items has a higher aggregation granularity, a smaller data size, and a higher query efficiency. Similarly, the row number in the aggregation table indicates the data number, and then the aggregation table with fewer row numbers is selected, so that the data amount is small, and the query efficiency is higher.
The above embodiments are illustrated with reference to specific examples below with reference to fig. 3.
Fig. 3 schematically shows a flow chart of a data processing method according to another embodiment of the present disclosure. In the example shown in fig. 3, the data sources are shown in table 1, in which different dimension items are distributed in columns, different index items are also distributed in columns, and different data are distributed in rows. Hereinafter, a column corresponding to a dimension item is referred to as a dimension column, and a column corresponding to an index item is referred to as an index column. A plurality of aggregated tables are aggregated for the data sources shown in table 1.
As shown in fig. 3, the method may include operations S301 to S307.
In operation S301, a query request from a client is received.
In operation S302, a first filtering is performed on a plurality of aggregation tables based on a version rule according to a query version in a query request.
To achieve concurrent access to data, Multi-Version Concurrency Control (MVCC) is generally implemented in a data warehouse to avoid users from acquiring inconsistent data when requesting it. Therefore, a user can include a version number of a current request when requesting, the data processing method according to the present disclosure can select the aggregation table according to the version number, if the version number of a current query provided by an aggregation table is not less than the version number of the request, the rule is satisfied, the next rule judgment is performed, otherwise, the aggregation table is excluded.
Then, in operation S303, according to the specified dimension item and the specified index item in the query request, a second filtering is performed on the aggregation table subjected to the first filtering based on the query item rule.
And judging whether the aggregation table contains all the specified dimension items and the specified index items, if so, entering next rule judgment, and if not, excluding the aggregation table.
Next, in operation S304, according to the filtering condition in the query request, a third filtering is performed on the aggregation table subjected to the second filtering based on the filtering and sorting rule.
The multiple aggregation tables of the same data source have different dimension columns or index columns, the dimension columns are sorted, and the index columns are aggregated. Similar to the B-index of the database, the columns of the query filter condition are left-most matched according to the dimension column order of the aggregation table, and more efficient query efficiency is achieved when more columns are matched.
For example, the aggregation tables subjected to the second screening include two aggregation tables as shown in tables 5 and 6, respectively.
TABLE 5
ID Date City Cost
1 2017 Beijing 10
1 2017 Shanghai province 10
1 2018 Beijing 20
2 2017 Beijing 15
2 2017 Shanghai province 20
2 2018 Beijing 20
TABLE 6
ID City Date Cost
1 Beijing 2017 10
1 Beijing 2018 20
1 Shanghai province 2017 10
2 Beijing 2017 15
2 Beijing 2018 20
2 Shanghai province 2017 20
For example, the filter criteria in the query request include: "ID" 1, "Date" 2017. For the aggregation table shown in table 5, searching is performed in the order from left to right, for each searched column, whether the dimension item corresponding to the column matches with each specified dimension item in the filter condition is determined, and the weight of the column is determined according to the matching result. After the search is completed, the scores of the aggregation table shown in table 5 are calculated using the weight values of the respective columns. Similarly, for the aggregation table shown in table 6, searching is performed in the order from left to right, and for each searched column, whether the dimension item corresponding to the column matches with each specified dimension item in the filter condition is determined, and the weight of the column is determined according to the matching result. After the search is completed, the scores of the aggregation table shown in table 6 are calculated using the weight values of the respective columns.
Illustratively, the above-described manner of calculating the score of each aggregation table is as follows:
for column: schema// column-by-column traversal table structure from left to right, the following is performed for each column traversed
Do:
If column in query_colunms:
weight < 1+ w (i)// if a column is a dimension column included in the filter condition, the current score equals to the last score left-shifted by 1 bit, and then the weight of the column is added
else:
break// if a column is not a dimension column contained in the filter condition, then the loop is skipped
Done
Wherein, weight represents the final score of the aggregation table, the initial value is 0, coulmn represents each column in the aggregation table, and query _ columns represents the dimension columns contained in the filtering condition of the query request. w (i) represents the weight determined by the ith column based on filtering matching, i is a positive integer, the value is set according to the selectivity (or the matching degree), and the higher the selectivity (or the matching degree is higher), the higher the weight is, for example, the weight of a complete match can be set to 5, the weight of a non-complete match including 4, and the like, and the description is not limited herein.
Corresponding to the aggregation table of table 5, in order from left to right, the first column of table 5 matches "ID" in the filter condition, and it is determined that w (1) ═ 5, where weight is 0+5 ═ 5. The search continues to the second column, which matches "Date" in the filter term, and w (2) ═ 5 is determined, at which time weight is 10+5 ═ 15. The search continues to the third column, which does not match the filter criteria, and the search ends. The final score was 15. Corresponding to the aggregation table of table 6, the first column of table 6 matches "ID" in the filtering condition in order from left to right, and it is determined that w (1) ═ 5, where weight ═ 0+5 ═ 5. And continuing searching to a second column, wherein the second column is not matched with the filtering condition, and ending the search. The final score was 5. The score of the aggregation table shown in table 5 is greater than the score of the aggregation table shown in table 6, the aggregation table shown in table 5 is screened as the aggregation table conforming to the query request.
If there are more aggregation tables after the third filtering, refer to fig. 3.
Next, in operation S305, an aggregation table with fewer dimension columns is selected.
If a plurality of aggregation tables remain after operation S305, operation S306 is performed.
Next, in operation S306, an aggregation table with fewer rows is selected. Wherein a smaller number of lines indicates a smaller number of data.
Next, in operation S307, the aggregation table obtained by the final filtering is sent to the client.
It can be understood that the data processing method according to the present disclosure designs an automatic routing policy for a data warehouse during query, the policy contains a plurality of rules, the policy shields a plurality of aggregation forms from a user, and learning and use costs of the user are reduced. The most efficient aggregate table that satisfies the query request can be found through the policy.
Fig. 4 schematically shows a block diagram of a data processing device according to an embodiment of the present disclosure.
As shown in fig. 4, the data processing apparatus 400 includes: an acquisition module 410, an aggregation module 420, a receiving module 430, a screening module 440, and a sending module 450.
The obtaining module 410 is used for obtaining a data source.
Wherein the data source comprises at least one dimension item and at least one index item.
The aggregation module 420 is configured to perform multiple aggregation on the data sources, respectively, to obtain multiple aggregation tables.
The receiving module 430 is used for receiving a query request from a client.
The screening module 440 is configured to screen aggregation tables that meet the query request from a plurality of aggregation tables.
The sending module 450 is configured to send the aggregation table meeting the query request to the client.
It should be noted that the implementation, solved technical problems, implemented functions, and achieved technical effects of each module/unit/subunit and the like in the apparatus part embodiment are respectively the same as or similar to the implementation, solved technical problems, implemented functions, and achieved technical effects of each corresponding step in the method part embodiment, and are not described herein again.
Any number of modules, sub-modules, units, sub-units, or at least part of the functionality of any number thereof according to embodiments of the present disclosure may be implemented in one module. Any one or more of the modules, sub-modules, units, and sub-units according to the embodiments of the present disclosure may be implemented by being split into a plurality of modules. Any one or more of the modules, sub-modules, units, sub-units according to embodiments of the present disclosure may be implemented at least in part as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented in any other reasonable manner of hardware or firmware by integrating or packaging a circuit, or in any one of or a suitable combination of software, hardware, and firmware implementations. Alternatively, one or more of the modules, sub-modules, units, sub-units according to embodiments of the disclosure may be at least partially implemented as a computer program module, which when executed may perform the corresponding functions.
For example, any plurality of the obtaining module 410, the aggregating module 420, the receiving module 430, the screening module 440, and the sending module 450 may be combined and implemented in one module, or any one of the modules may be split into a plurality of modules. Alternatively, at least part of the functionality of one or more of these modules may be combined with at least part of the functionality of the other modules and implemented in one module. According to an embodiment of the present disclosure, at least one of the obtaining module 410, the aggregating module 420, the receiving module 430, the screening module 440, and the sending module 450 may be implemented at least partially as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented in hardware or firmware in any other reasonable manner of integrating or packaging a circuit, or may be implemented in any one of three implementations of software, hardware, and firmware, or in a suitable combination of any several of them. Alternatively, at least one of the obtaining module 410, the aggregating module 420, the receiving module 430, the screening module 440 and the sending module 450 may be at least partially implemented as a computer program module, which when executed may perform a corresponding function.
Fig. 5 schematically shows a block diagram of a computer device adapted to implement the above described method according to an embodiment of the present disclosure. The computer device shown in fig. 5 is only an example and should not bring any limitation to the function and scope of use of the embodiments of the present disclosure.
As shown in fig. 5, a computer device 500 according to an embodiment of the present disclosure includes a processor 501, which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)502 or a program loaded from a storage section 508 into a Random Access Memory (RAM) 503. The processor 501 may comprise, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or associated chipset, and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), among others. The processor 501 may also include onboard memory for caching purposes. Processor 501 may include a single processing unit or multiple processing units for performing different actions of a method flow according to embodiments of the disclosure.
In the RAM 503, various programs and data necessary for the operation of the apparatus 500 are stored. The processor 501, the ROM 502, and the RAM 503 are connected to each other by a bus 504. The processor 501 performs various operations of the method flows according to the embodiments of the present disclosure by executing programs in the ROM 502 and/or the RAM 503. Note that the programs may also be stored in one or more memories other than the ROM 502 and the RAM 503. The processor 501 may also perform various operations of method flows according to embodiments of the present disclosure by executing programs stored in the one or more memories.
According to an embodiment of the present disclosure, device 500 may also include an input/output (I/O) interface 505, input/output (I/O) interface 505 also being connected to bus 504. The device 500 may also include one or more of the following components connected to the I/O interface 505: an input portion 506 including a keyboard, a mouse, and the like; an output portion 507 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 508 including a hard disk and the like; and a communication section 509 including a network interface card such as a LAN card, a modem, or the like. The communication section 509 performs communication processing via a network such as the internet. The driver 510 is also connected to the I/O interface 505 as necessary. A removable medium 511 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 510 as necessary, so that a computer program read out therefrom is mounted into the storage section 508 as necessary.
According to embodiments of the present disclosure, method flows according to embodiments of the present disclosure may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable storage medium, the computer program containing program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 509, and/or installed from the removable medium 511. The computer program, when executed by the processor 501, performs the above-described functions defined in the system of the embodiments of the present disclosure. The systems, devices, apparatuses, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the present disclosure.
The present disclosure also provides a computer-readable storage medium, which may be contained in the apparatus/device/system described in the above embodiments; or may exist separately and not be assembled into the device/apparatus/system. The computer-readable storage medium carries one or more programs which, when executed, implement the method according to an embodiment of the disclosure.
According to embodiments of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. For example, according to embodiments of the present disclosure, a computer-readable storage medium may include ROM 502 and/or RAM 503 and/or one or more memories other than ROM 502 and RAM 503 described above.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Those skilled in the art will appreciate that various combinations and/or combinations of features recited in the various embodiments and/or claims of the present disclosure can be made, even if such combinations or combinations are not expressly recited in the present disclosure. In particular, various combinations and/or combinations of the features recited in the various embodiments and/or claims of the present disclosure may be made without departing from the spirit or teaching of the present disclosure. All such combinations and/or associations are within the scope of the present disclosure.
The embodiments of the present disclosure have been described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described separately above, this does not mean that the measures in the embodiments cannot be used in advantageous combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be devised by those skilled in the art without departing from the scope of the present disclosure, and such alternatives and modifications are intended to be within the scope of the present disclosure.

Claims (10)

1. A method of data processing, comprising:
acquiring a data source, wherein the data source comprises at least one dimension item and at least one index item;
respectively carrying out multiple times of aggregation on the data sources to obtain multiple aggregation tables;
receiving a query request from a client;
screening the aggregation tables meeting the query request from the aggregation tables; and
and sending the aggregation table which accords with the query request to the client.
2. The method of claim 1, wherein the aggregating the data sources a plurality of times to obtain a plurality of aggregated tables respectively comprises:
for any one of the plurality of polymerizations,
determining a dimension item for the any one aggregation from the at least one dimension item;
determining an indicator term for the any one aggregation from the at least one indicator term;
determining an aggregation function for the any one aggregation; and
aggregating the data sources based on the dimension term for the any aggregation, the index term for the any aggregation, and the aggregation function for the any aggregation to obtain an aggregation table for the any aggregation.
3. The method of claim 1, wherein,
the data source performs data updating and version updating once every predetermined period, and the version of any aggregation table in the aggregation tables is the same as that of the data source for the aggregation table.
4. The method of claim 3, wherein the query request includes a query version;
the screening of the aggregation tables meeting the query request from the multiple aggregation tables comprises: and screening the aggregation tables according to the query version to obtain the aggregation table subjected to the first screening.
5. The method of claim 4, wherein the query request further comprises: specifying a dimension item and a specification index item;
the screening of the aggregation tables meeting the query request from the multiple aggregation tables further comprises: and screening the aggregation tables containing the specified dimension items and the specified index items from the aggregation tables subjected to the first screening to obtain aggregation tables subjected to the second screening.
6. The method of claim 5, wherein the query request further comprises a filter condition, the filter condition comprising the specified dimension item and a value of the specified dimension item;
the screening of the aggregation tables meeting the query request from the multiple aggregation tables further comprises:
for any one of the second screened aggregate tables,
searching dimension items in any aggregation table according to a preset sequence;
matching each searched dimension item with the specified dimension item, if the matching is successful, determining the weight of the dimension item, continuing to search the next dimension item, and if the matching is failed, ending the search;
determining a score of any aggregation table based on the determined weight of the dimension item; and
and screening the aggregation table with the highest score to serve as the aggregation table subjected to the third screening.
7. The method of claim 6, wherein the filtering the aggregated table from the plurality of aggregated tables that meets the query request further comprises:
and screening the aggregation table with the minimum dimension item number and/or the minimum data number from the aggregation tables subjected to the third screening.
8. A data processing apparatus comprising:
the acquisition module is used for acquiring a data source, wherein the data source comprises at least one dimension item and at least one index item;
the aggregation module is used for respectively performing multiple aggregation on the data sources to obtain multiple aggregation tables, wherein the multiple aggregation tables are different in dimension item and/or different in index item;
the receiving module is used for receiving a query request from a client;
the screening module is used for screening the aggregation tables to obtain the aggregation tables meeting the query request; and
and the sending module is used for sending the aggregation table which accords with the query request to the client.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor when executing the program implementing:
the method of any one of claims 1 to 7.
10. A computer-readable storage medium having stored thereon executable instructions that, when executed by a processor, cause the processor to perform:
the method of any one of claims 1 to 7.
CN202010024591.1A 2020-01-09 2020-01-09 Data processing method, data processing device, computer equipment and medium Active CN113094444B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010024591.1A CN113094444B (en) 2020-01-09 2020-01-09 Data processing method, data processing device, computer equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010024591.1A CN113094444B (en) 2020-01-09 2020-01-09 Data processing method, data processing device, computer equipment and medium

Publications (2)

Publication Number Publication Date
CN113094444A true CN113094444A (en) 2021-07-09
CN113094444B CN113094444B (en) 2024-10-18

Family

ID=76663568

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010024591.1A Active CN113094444B (en) 2020-01-09 2020-01-09 Data processing method, data processing device, computer equipment and medium

Country Status (1)

Country Link
CN (1) CN113094444B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113448986A (en) * 2021-09-01 2021-09-28 阿里云计算有限公司 Query method, query device, storage medium and program product
CN115563103A (en) * 2022-09-15 2023-01-03 河南星环众志信息科技有限公司 Multi-dimensional aggregation method, system, electronic device and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103034633A (en) * 2011-09-30 2013-04-10 国际商业机器公司 Method for generating expanded search result page summary and device for generating expanded search result page summary
US9529830B1 (en) * 2016-01-28 2016-12-27 International Business Machines Corporation Data matching for column-oriented data tables
CN109213829A (en) * 2017-06-30 2019-01-15 北京国双科技有限公司 Data query method and device
CN110287213A (en) * 2019-07-03 2019-09-27 中通智新(武汉)技术研发有限公司 Data query method, apparatus and system based on OLAP system
CN110309358A (en) * 2018-03-27 2019-10-08 京东方科技集团股份有限公司 A kind of resource recommendation method and system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103034633A (en) * 2011-09-30 2013-04-10 国际商业机器公司 Method for generating expanded search result page summary and device for generating expanded search result page summary
US9529830B1 (en) * 2016-01-28 2016-12-27 International Business Machines Corporation Data matching for column-oriented data tables
CN109213829A (en) * 2017-06-30 2019-01-15 北京国双科技有限公司 Data query method and device
CN110309358A (en) * 2018-03-27 2019-10-08 京东方科技集团股份有限公司 A kind of resource recommendation method and system
CN110287213A (en) * 2019-07-03 2019-09-27 中通智新(武汉)技术研发有限公司 Data query method, apparatus and system based on OLAP system

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113448986A (en) * 2021-09-01 2021-09-28 阿里云计算有限公司 Query method, query device, storage medium and program product
CN115563103A (en) * 2022-09-15 2023-01-03 河南星环众志信息科技有限公司 Multi-dimensional aggregation method, system, electronic device and storage medium
CN115563103B (en) * 2022-09-15 2023-12-08 河南星环众志信息科技有限公司 Multi-dimensional aggregation method, system, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN113094444B (en) 2024-10-18

Similar Documents

Publication Publication Date Title
US20210279224A1 (en) Combinators
US8918365B2 (en) Dedicating disks to reading or writing
EP2909745B1 (en) Profiling data with location information
US7962529B1 (en) Scalable user clustering based on set similarity
CN103620601A (en) Joining tables in a mapreduce procedure
US20140089289A1 (en) Systems and methods for facilitating open source intelligence gathering
US20150356085A1 (en) Guided Predictive Analysis with the Use of Templates
US20140006166A1 (en) System and method for determining offers based on predictions of user interest
US20150347549A1 (en) Database Query Processing Using Horizontal Data Record Alignment of Multi-Column Range Summaries
CN105426449A (en) Method and device for massive data query and server
CN113094444B (en) Data processing method, data processing device, computer equipment and medium
CN114022188A (en) Target crowd circling method, device, equipment and storage medium
CN110750555A (en) Method, apparatus, computing device, and medium for generating index
CN111125158A (en) Data table processing method, device, medium and electronic equipment
CN113849520B (en) Intelligent recognition method and device for abnormal SQL, electronic equipment and storage medium
US20100268723A1 (en) Method of partitioning a search query to gather results beyond a search limit
CN110781217B (en) Method and device for processing ordered data, storage medium and server
CN104063514A (en) Vertical search method
US20160019204A1 (en) Matching large sets of words
US20200401585A1 (en) Spatial joins in multi-processing computing systems including massively parallel processing database systems
CN110941714A (en) Classification rule base construction method, application classification method and device
CN112862536B (en) Data processing method, device, equipment and storage medium
CN110457122B (en) Task processing method, task processing device and computer system
Zheng et al. Adding ReputationRank to member promotion using skyline operator in social networks
CN116975084A (en) Data processing method, device, computer equipment, storage medium and product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant