CN113360494B - Wide-table data generation method, updating method and related device - Google Patents

Wide-table data generation method, updating method and related device Download PDF

Info

Publication number
CN113360494B
CN113360494B CN202010148063.7A CN202010148063A CN113360494B CN 113360494 B CN113360494 B CN 113360494B CN 202010148063 A CN202010148063 A CN 202010148063A CN 113360494 B CN113360494 B CN 113360494B
Authority
CN
China
Prior art keywords
data
dimension
wide
dimension data
tables
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010148063.7A
Other languages
Chinese (zh)
Other versions
CN113360494A (en
Inventor
吴帅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Wodong Tianjun Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN202010148063.7A priority Critical patent/CN113360494B/en
Publication of CN113360494A publication Critical patent/CN113360494A/en
Application granted granted Critical
Publication of CN113360494B publication Critical patent/CN113360494B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24553Query execution of query operations
    • G06F16/24554Unary operations; Data partitioning operations

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method for generating and updating wide table data and a related device, and relates to the technical field of computers. One embodiment of the method comprises the following steps: obtaining a source table according to the data table which is not dynamically updated in the dimension, obtaining a dimension data table according to the data table which is dynamically updated in the dimension, generating a corresponding summary table according to the configured first corresponding relation between the source table and the summary table, generating corresponding wide table data according to the configured summary table, the dimension data of the dimension data table and the second corresponding relation between the wide table according to the summary table and the dimension data. When the generated wide table data is updated, the embodiment does not need to modify a data processing script and calculate all the topic table data, overcomes the defects of heavy tasks, high cost and high risk, reduces repeated operation, greatly reduces the data quantity needing repeated calculation, shortens the whole calculation time, and reduces the waste of server resources.

Description

Wide-table data generation method, updating method and related device
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a method for generating and updating broad table data, and a related device.
Background
Today, large data platform data processing is based on generating wide table data for archiving according to certain dimensions. Due to changes in certain dimension data, an update to the historical archive data is required. A common way today is to modify the data processing script (i.e. the wide-table data generation script), such as modifying different statistics times, partitions, etc. to update the historical data. The data processing script is modified to re-run the data, so that the task is heavy, the cost is high, the risk is high, and particularly, for the statistic dimension data with frequent change, the previous operation is required to be repeated for each data change. The running history data trace may be several years ago, the running history data task execution time is long, and the task execution is restarted every time the data processing script is modified. Each time the data is re-run, calculation is needed for all the data of the topic table (namely, the data table used for generating the wide table), and each topic table is huge in data volume, and the historical data is frequently re-run, so that server resources are wasted.
In the process of implementing the present invention, the inventor finds that at least the following problems exist in the prior art:
the existing wide table data generation and updating scheme ensures that when certain dimension data are changed, the data processing script is required to be modified to update the generated wide table data, the task is heavy, the cost is high, the risk is high, calculation is required to be carried out on all the topic table data, the repeated operation is excessive, the repeated calculation data amount is huge, the whole calculation time is long, and the server resource is wasted.
Disclosure of Invention
In view of this, embodiments of the present invention provide a method for generating, an updating method, and a related device for generating broad-table data, where when updating the generated broad-table data, no modification of a data processing script is required and no calculation is required for all topic table data, so that the defects of heavy tasks, high cost, and high risk are overcome, repetitive operations are reduced, the amount of data to be repeatedly calculated is greatly reduced, the overall calculation time can be shortened, and the waste of server resources is reduced.
To achieve the above object, according to an aspect of an embodiment of the present invention, there is provided a method of generating wide table data.
A method for generating wide table data comprises the following steps: obtaining a source table according to the data table of which the dimension is not dynamically updated in each data table, and obtaining a dimension data table according to the data table of which the dimension is dynamically updated in each data table; generating a corresponding summary table according to data of a source table according to a first corresponding relation between the configured source table and the summary table; and generating corresponding wide table data according to the configured summary table, the dimension data of the dimension data table and the second corresponding relation between the wide tables and the dimension data of the dimension data table.
Optionally, the method further includes pre-configuring the first correspondence and the second correspondence, where: the configuring the first correspondence relationship includes: configuring a summary table, and generating each source table, the fields required to be extracted by each source table and a dimension data primary key required to be extracted by each source table; the configuring of the second correspondence includes: and configuring a broad table, each summary table required for generating the broad table, the fields required to be extracted by each summary table and the dimension data primary key corresponding to each summary table.
Optionally, the data of the source table is dynamically increased, and the summary table comprises one or more partition tables; generating a corresponding summary table according to the data of the source table according to a first corresponding relation between the configured source table and the summary table, including: and periodically extracting data from the newly-added data of each source table according to the configured field to be extracted of each source table, wherein each period calculates and generates a partition table of the summary table according to the data extracted from the newly-added data.
Optionally, configuring the second correspondence further includes configuring dynamic partition information of the summary tables; the generating corresponding wide table data according to the second corresponding relation among the configured summary table, the dimension data of the dimension data table and the wide table and the dimension data of the summary table and the dimension data of the dimension data table comprises the following steps: determining a partition table to be used by each summary table according to the configured dynamic partition information; and according to the configured fields required to be extracted from each summary table, extracting data from the partition tables required to be used by each summary table, and according to the dimension data in the dimension data table, summarizing the data extracted from each partition table to generate corresponding wide table data.
According to another aspect of the embodiment of the invention, a method for updating wide table data is provided.
The method for updating the wide table data generated by the method for generating the wide table data according to the embodiment of the invention comprises the following steps: and under the condition that the dimension data of the dimension data table is updated, according to a second corresponding relation among the configured summary table, the dimension data of the dimension data table and the broad table, updating corresponding broad table data according to the updated dimension data of the summary table and the dimension data table.
Optionally, the updating the corresponding broad table data according to the second correspondence between the configured summary table, the dimension data of the dimension data table, and the broad table according to the updated dimension data of the summary table and the dimension data table includes: determining a dependency relationship between the summary table and the broad table according to the second corresponding relationship, wherein if one broad table data is generated based on one summary table, the dependency relationship is single-dependent; if one of the broad table data is generated based on a plurality of the summary tables, the dependency relationship is multi-dependent; performing summarization operation on each summarization table depending on each wide table according to the updated dimension data of the dimension data table in parallel to obtain updated wide table data; and grouping the summary tables corresponding to each wide table according to the minimum calculation granularity for each summary table which is multi-dependent on each wide table, de-duplicating all obtained groups, carrying out first summary on the summary tables of each de-duplicated group according to the updated dimension data of the dimension data table, storing each first summary result in the buffer tables corresponding to each group, respectively acquiring corresponding buffer tables according to the groups of the summary tables corresponding to each wide table, and carrying out second summary on the buffer tables corresponding to each wide table in parallel to obtain updated data of each wide table.
According to still another aspect of the embodiment of the present invention, there is provided a generating apparatus of wide table data.
A wide-table data generating apparatus, comprising: the data table extraction module is used for obtaining a source table according to the data tables which are dynamically updated in the dimensions in each data table, and obtaining a dimension data table according to the data tables which are dynamically updated in the dimensions in each data table; the system comprises a summary table generating module, a source table generating module and a data processing module, wherein the summary table generating module is used for generating a corresponding summary table according to a first corresponding relation between a configured source table and the summary table and according to the data of the source table; the wide table data generation module is used for generating corresponding wide table data according to the configured summary table, the dimension data of the dimension data table and the second corresponding relation among the wide tables and the dimension data of the summary table and the dimension data of the dimension data table.
Optionally, the method further comprises a configuration module, configured to pre-configure the first correspondence and the second correspondence, where: the configuring the first correspondence relationship includes: configuring a summary table, and generating each source table, the fields required to be extracted by each source table and a dimension data primary key required to be extracted by each source table; the configuring of the second correspondence includes: and configuring a broad table, each summary table required for generating the broad table, the fields required to be extracted by each summary table and the dimension data primary key corresponding to each summary table.
Optionally, the data of the source table is dynamically increased, and the summary table comprises one or more partition tables; the summary table generation module is further configured to: and periodically extracting data from the newly-added data of each source table according to the configured field to be extracted of each source table, wherein each period calculates and generates a partition table of the summary table according to the data extracted from the newly-added data.
Optionally, the configuration module is further configured to configure the second correspondence, and further includes configuring dynamic partition information of each summary table; the wide table data generating module is further configured to: determining a partition table to be used by each summary table according to the configured dynamic partition information; and according to the configured fields required to be extracted from each summary table, extracting data from the partition tables required to be used by each summary table, and according to the dimension data in the dimension data table, summarizing the data extracted from each partition table to generate corresponding wide table data.
According to still another aspect of the embodiment of the present invention, there is provided an updating apparatus for broad-table data.
An apparatus for updating broad-table data generated by a broad-table data generating apparatus according to an embodiment of the present invention includes a broad-table data updating module configured to: and under the condition that the dimension data of the dimension data table is updated, according to a second corresponding relation among the configured summary table, the dimension data of the dimension data table and the broad table, updating corresponding broad table data according to the updated dimension data of the summary table and the dimension data table.
Optionally, the broad-table data updating module is further configured to: determining a dependency relationship between the summary table and the broad table according to the second corresponding relationship, wherein if one broad table data is generated based on one summary table, the dependency relationship is single-dependent; if one of the broad table data is generated based on a plurality of the summary tables, the dependency relationship is multi-dependent;
performing summarization operation on each summarization table depending on each wide table according to the updated dimension data of the dimension data table in parallel to obtain updated wide table data; and grouping the summary tables corresponding to each wide table according to the minimum calculation granularity for each summary table which is multi-dependent on each wide table, de-duplicating all obtained groups, carrying out first summary on the summary tables of each de-duplicated group according to the updated dimension data of the dimension data table, storing each first summary result in the buffer tables corresponding to each group, respectively acquiring corresponding buffer tables according to the groups of the summary tables corresponding to each wide table, and carrying out second summary on the buffer tables corresponding to each wide table in parallel to obtain updated data of each wide table.
According to yet another aspect of an embodiment of the present invention, an electronic device is provided.
An electronic device, comprising: one or more processors; and the memory is used for storing one or more programs, and when the one or more programs are executed by the one or more processors, the one or more processors are enabled to realize the generation method of the wide table data or the updating method of the wide table data provided by the embodiment of the invention.
According to yet another aspect of an embodiment of the present invention, a computer-readable medium is provided.
A computer-readable medium has stored thereon a computer program which, when executed by a processor, implements a method for generating or a method for updating broad-table data provided by an embodiment of the present invention.
One embodiment of the above invention has the following advantages or benefits: obtaining a source table according to the data table which is not dynamically updated in the dimension, obtaining a dimension data table according to the data table which is dynamically updated in the dimension, generating a corresponding summary table according to the configured first corresponding relation between the source table and the summary table, generating corresponding wide table data according to the configured summary table, the dimension data of the dimension data table and the second corresponding relation between the wide table according to the summary table and the dimension data. In the case of updating dimension data, corresponding wide table data is updated according to the updated dimension data of the summary table and the dimension data table. By the embodiment of the invention, the data processing script is not required to be modified when the wide table data is updated, calculation is not required to be carried out on all the topic table data, the defects of heavy task, high cost and high risk are overcome, repeated operation is reduced, the data quantity required to be repeatedly calculated is greatly reduced, the whole calculation time is shortened, and the waste of server resources is reduced.
Further effects of the above-described non-conventional alternatives are described below in connection with the embodiments.
Drawings
The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:
FIG. 1 is a schematic diagram of the main steps of a method for generating broad-table data according to one embodiment of the present invention;
FIG. 2 is a schematic diagram of a logic architecture for wide table data generation in accordance with one embodiment of the present invention;
FIG. 3 is a schematic diagram of main steps of a method for updating broad-table data according to an embodiment of the present invention;
FIGS. 4 (a) and 4 (b) are schematic diagrams of single and multiple dependencies, respectively, according to embodiments of the invention;
FIG. 5 is a schematic diagram of a flow of updating of wide table data according to one embodiment of the invention;
FIG. 6 is a schematic diagram of a multitasking parallel execution of a wide table data update in accordance with one embodiment of the invention;
FIG. 7 is a schematic diagram of main blocks of a wide table data generating apparatus according to an embodiment of the present invention;
FIG. 8 is a schematic diagram of main modules of an apparatus for updating wide table data according to an embodiment of the present invention;
FIG. 9 is an exemplary system architecture diagram in which embodiments of the present invention may be applied;
fig. 10 is a schematic diagram of a computer system suitable for use in implementing an embodiment of the invention.
Detailed Description
Exemplary embodiments of the present invention will now be described with reference to the accompanying drawings, in which various details of the embodiments of the present invention are included to facilitate understanding, and are to be considered merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Fig. 1 is a schematic diagram of main steps of a method for generating broad-table data according to an embodiment of the present invention.
As shown in fig. 1, the method for generating the wide table data according to an embodiment of the present invention mainly includes the following steps S101 to S103.
Step S101: and obtaining a source table according to the data table of which the dimension is not dynamically updated in each data table, and obtaining a dimension data table according to the data table of which the dimension is dynamically updated in each data table.
Step S102: and generating a corresponding summary table according to the data of the source table according to the first corresponding relation between the configured source table and the summary table.
Step S103: and generating corresponding wide table data according to the configured summary table, the dimension data of the dimension data table and the second corresponding relation among the wide tables and the dimension data of the summary table and the dimension data table.
In the embodiment of the invention, the source table is a data table which is required to generate a summary table, and the data table can also be called a theme table, and the source table can also be called a source theme table. Taking the e-commerce industry as an example, the data of the data table, such as the record of clicking actions of a user, the PV (Page View), the UV (Unique viewer) and the like, record information, and the data amount of the data table may be billions of pieces per day.
The dimension data table is obtained according to a data table with dynamically updated dimensions, and the dimension data in the dimension data table is dynamically updated. Such as corporate organizational architecture, corresponding dimension data is updated dynamically as department personnel, etc., update.
The summary table is a table calculated by summarizing data of the source table.
The embodiment of the invention further comprises the step of pre-configuring the first corresponding relation and the second corresponding relation, wherein:
the configuring of the first correspondence includes: and configuring a summary table, each source table required for generating the summary table, the fields required to be extracted by each source table and the dimension data primary key. Preferably, the method also comprises the step of associating fields among the source tables and a dimension data table where a dimension data primary key is located.
The configuring of the second correspondence includes: and configuring a broad table, each summary table required for generating the broad table, the fields required to be extracted by each summary table and the dimension data primary keys corresponding to each summary table. Wherein the dimension data primary key is also an associated field between the summary tables. Preferably, the method further comprises the step of executing the task number in parallel, wherein the task number in parallel determines the concurrency number of the tasks for calculating the wide-table data.
In one embodiment, the data of the source table is dynamically increased. Dynamic augmentation refers to the fact that the dimensions are unchanged, but the data is increased over time, for example, sales data of a commodity, and with the increase over time, each cycle (for example, monthly) has a new piece of sales data.
The summary table may include one or more partition tables.
According to a first correspondence between the configured source table and the summary table, generating a corresponding summary table according to data of the source table may specifically include: periodically extracting data from the newly-added data of each source table according to the fields required to be extracted of each configured source table, wherein each period calculates and generates a partition table of the summary table according to the data extracted from the newly-added data.
In one embodiment, configuring the second correspondence further includes configuring dynamic partition information for each summary table. The dynamic partition information of the summary table indicates the latest N partition tables of the summary table that need to be used, such as configuration: and $4, which means that the latest data of 4 partition tables are used to generate corresponding wide table data in a summarizing way.
According to the second corresponding relation among the configured summary table, the dimension data of the dimension data table and the wide table, corresponding wide table data is generated according to the dimension data of the summary table and the dimension data table, and the method specifically may include: determining a partition table to be used by each summary table according to the configured dynamic partition information; and according to the fields required to be extracted from each configured summary table, extracting data from the partition tables required to be used by each summary table, and according to the dimension data in the dimension data table, summarizing the data extracted from each partition table to generate corresponding wide table data.
According to the method for generating the wide table data, disclosed by the embodiment of the invention, when the wide table data is updated subsequently, a data processing script is not required to be modified and calculation is not required to be performed on all the topic table data, so that the defects of heavy tasks, high cost and high risk are overcome, repeated operation is reduced, the data quantity required to be repeatedly calculated is greatly reduced, the whole calculation time is shortened, and the waste of server resources is reduced.
The method for generating the wide table data according to the embodiment of the present invention is described in further detail below.
Since the amount of data in a data warehouse is often very large, it is also updated or increased periodically, for example, once per day. According to business requirements, the data of the topic table needs to be processed and summarized into a data wide table, and taking the e-commerce industry as an example, the data of the topic table, such as user click behavior records, and recording information of PV, UV and the like of pages, and the data volume of the topic table can be billions of pieces per day.
The dimensions of some topic tables are not dynamically updated, but rather their data is periodically updated, while the dimensions of other topic tables are dynamically updated, such as corporate organization architecture, classification of goods, etc. When the dimension of each topic table generating the wide table data is not changed, with the periodical increase of part of topic table data, the corresponding wide table also periodically generates one data, and the data are usually distinguished by partitioning.
In the prior art, the way of generating the broad table data is to perform summary calculation on each topic table for generating the broad table data, for example, when the broad table 1 needs to be generated through topic table a, topic table B, topic table C and topic table D, the topic tables a to D are summarized and calculated to obtain the broad table 1. Assuming that the topic table D is dynamically updated in dimensions, if the topic table D is updated, there is no effect on newly calculated and processed broad tables, but updating is required for the old and well-archived broad tables, and according to the prior art scheme, all recalculation is required according to the logic of previous processing, so that the workload is very large and the time is very long.
An embodiment of the present invention proposes a logic architecture for generating wide table data, and fig. 2 is a schematic diagram of the logic architecture for generating wide table data according to an embodiment of the present invention. As shown in fig. 2, the data of the variable (dynamic update) calculation dimension in the process of generating the wide table data is configured by means of configuration management. In the process of calculation, the dimension data table is generated for the part of the information (the topic tables D and Z) with variable dimensions. The summary table is generated from the dimension-invariant data calculations, such as summary table H1 from topic table A, B, C and summary table H2 from topic table X, Y. Extracting the theme table information of the dynamically changed dimension by means of configuration management, optimizing generation of wide table data, generating a dimension data table and a summary table of an intermediate layer, calculating the summary table and the dimension data table in the intermediate layer to obtain wide table data, for example, in fig. 2, generating a wide table s1 according to a summary table H1 (H1 may be one or more of a plurality of summary tables) and a dimension data table w1, and generating wide tables s2 and s3 according to a summary table H2 (H2 may be one or more of a plurality of summary tables) and a dimension data table w 2. According to the logic architecture, when the wide table data is updated later, repeated calculation is only needed to be performed in the middle layer, and processing calculation is not needed to be performed on each topic table data.
Binding the main key of each summary table information with the dimension data table information, for example, the main key information of the summary table may include the main key information of the corresponding dimension data table. The dimension data table keeps the latest version data, and if the history data of the wide table (the history data generated in the past) needs to be updated later, only the dimension data table and the history summary table (i.e. the summary table generated in the past) need to be combined for calculation summary, so that compared with the prior art, the calculated data amount is reduced by several orders of magnitude.
The corresponding relation between the source table and the summary table is configured through the configuration file table, and the source table, namely the source topic table in table 1, is the topic table needed for generating the summary table. The corresponding relation between the configuration source table and the summary table specifically comprises each source table required for configuring the summary table, the fields required to be extracted by each source table, the associated fields among each source table, the dimension data table and the dimension data primary key. The configuration information in the configuration file table is shown in table 1, for example, and the target table in table 1 refers to a summary table to be generated.
TABLE 1
And configuring all the topic tables, the required fields in the topic tables, the fields associated with each topic table, the generated target table names, the dimension data tables required to be processed and the setting of the main keys of the dimension data tables through the configuration file tables, finally combining the preset table-building sentence templates, filling personalized parts (namely sentences which are not in the table-building sentence templates) into the templates, and finally periodically calculating to generate corresponding summary tables.
Since each summary table and dimension data table are prepared periodically by the above method, a broad table is calculated periodically by configuring information and monitoring summary table completion.
And the configuration file table is also provided with a summary table, a dimension data table and a corresponding relation among the broad tables. The method specifically comprises the steps of configuring a broad table, generating all the summary tables required by the broad table, the fields required to be extracted by all the summary tables and the associated fields among the summary tables, wherein the associated fields among the summary tables are dimension data primary keys corresponding to the summary tables. The configuration file table is also configured with periodic update information, dynamic partition information and parallel execution task number, and the parallel execution task number determines the concurrency number of the tasks for calculating the wide table data.
As shown in table 2, the source summary table in table 2 is a summary table of the configuration, and the target table is a broad table of the configuration. In the process of computing the wide table data, it may be necessary to perform computation processing on the partition table corresponding to a plurality of periods of the summary table. The configuration of the dynamic partition information fields in the configuration file table needs to be corresponding in a wild card manner to determine how many corresponding partition tables are used for calculation, for example, using: and $4 represents the computing task on the latest 4 partition tables. Finally, the personalized part is filled in the template through calculation by establishing a table sentence template and a written data query module.
TABLE 2
Fig. 3 is a schematic diagram of main steps of a method for updating broad-table data according to an embodiment of the present invention.
As shown in fig. 3, the method for updating the wide table data according to an embodiment of the present invention includes steps S301 to S304. The step S301 to the step S303 are the same as the step S101 to the step S103, and are not described again.
Step S304: under the condition of updating the dimension data of the dimension data table, according to the configured summary table, the dimension data of the dimension data table and the second corresponding relation among the broad tables, the corresponding broad table data is updated according to the updated dimension data of the summary table and the dimension data table.
In one embodiment, according to the second correspondence between the configured summary table, the dimension data of the dimension data table, and the wide table, updating the corresponding wide table data according to the updated dimension data of the summary table and the dimension data table may specifically include: determining a dependency relationship between the summary table and the broad table according to the second corresponding relationship, wherein if one broad table data is generated based on one summary table, the dependency relationship is single dependency; if one broad table data is generated based on a plurality of summary tables, the dependency relationship is multi-dependency; performing summarization operation on each summarization table depending on each wide table according to the updated dimension data of the dimension data table to obtain updated wide table data; and grouping the summary tables corresponding to each wide table according to the minimum calculation granularity for each summary table which is dependent on each wide table, de-duplicating all the obtained groups, carrying out first summary on the summary tables of each de-duplicated group according to the updated dimension data of the dimension data table, storing each first summary result in the buffer tables corresponding to each group respectively, acquiring corresponding buffer tables according to the groups of the summary tables corresponding to each wide table, and carrying out second summary on the buffer tables corresponding to each wide table in parallel to obtain updated data of each wide table.
The summary operation of the two summary tables can be considered as the minimum calculation granularity. For example, summary table 1 and summary table 2 may be grouped together to generate certain broad table data (in this case, the broad table data is not directly generated, but the buffer table is generated first).
In one embodiment, the summary table includes a plurality of partition tables, and then each group includes at least two partition tables, for example, assuming that the configuration dynamic partition information indicates that the latest 2 partition tables are used, summary table 1 performs a summary operation with summary table 2, that is, the latest two partition tables of summary table 1 and summary table 2 are used, and a total of 4 partition tables is one group, and a summary operation is performed to generate a cache table corresponding to the group.
According to the method for updating the wide table data, disclosed by the embodiment of the invention, the data processing script is not required to be modified, calculation is not required to be carried out on all the topic table data, the defects of heavy tasks, high cost and high risk are overcome, repeated operation is reduced, the data quantity required to be repeatedly calculated is greatly reduced, the whole calculation time is shortened, and the waste of server resources is reduced.
The method for updating the wide table data according to the embodiment of the present invention is described in further detail below.
When the dimension data of the configured dimension data table is changed, the history data of the wide table needs to be refreshed (i.e. updated). The rerun data script may be automatically generated to perform the updating of the generated broad-table data. The embodiment of the invention only needs to relate to the dimension data table and the wide table summary table in the process of updating the wide table data, does not need to repeatedly calculate the theme table, and greatly reduces the repeated operation and the data quantity needing to repeatedly calculate.
If the time related to the back flushing is long, in order to improve the calculation efficiency and avoid the overlong execution time, the embodiment of the invention reads the configuration information in the configuration file table when the history data of the broad table is updated, and determines the dependency relationship between the summary table and the broad table, wherein the dependency relationship comprises single dependency and multiple dependencies. The single dependency means that a broad table depends on a summary table, that is, the broad table data is generated by summary table according to the dimension data of the dimension data table, for example, as shown in fig. 4 (a), and the broad table s1 is obtained by summary table w1 according to the dimension data w in the dimension data table. The multi-dependency means that one broad table depends on a plurality of summary tables, that is, the broad table data is generated by summarizing the dimension data of the dimension data table from the plurality of summary tables, for example, as shown in fig. 4 (b), the broad table s2 and the broad table s3 are obtained by summarizing the dimension data w in the dimension data table from the plurality of summary tables w 2.
For single dependence, the summary table and the wide table of each dimension are in one-to-one correspondence, and will not change with the increase of the summary table, so that each calculation task is processed simultaneously in each period to obtain each wide table data through parallel calculation. For multiple dependencies, one broad table depends on multiple summary tables, and each summary table may be utilized by multiple broad table data, so in order to avoid repeated computation of data, in an embodiment of the present invention, a similar merge sort algorithm may be adopted, specifically, the summary tables corresponding to each broad table are grouped according to a minimum computation granularity, after all obtained packets are de-duplicated, the summary tables of each de-duplicated packet are summarized in parallel according to the latest dimension data, each summary result is respectively stored in a buffer table corresponding to each packet, corresponding buffer tables are respectively obtained according to the packets of the summary table corresponding to each broad table, and then the buffer tables corresponding to each broad table are summarized in parallel, so as to obtain updated data of each broad table.
According to the embodiment of the invention, the re-running data script is automatically generated according to the task types corresponding to the single dependency and the multi-dependency respectively, and the multi-tasks are executed concurrently, so that the back flushing flow of the wide-table historical data is optimized, and the task execution time is reduced.
The flow of updating the wide table data according to one embodiment of the present invention is shown in fig. 5. When updating the wide table data, firstly reading configuration file information, namely, configuration information in a configuration file table, judging the dependency relationship between the summary table and the wide table according to the configuration information, and if the dependency relationship is single dependency, designating the number of the back flushing tasks (namely, the number of the parallel execution tasks), and executing back flushing by multitasking in parallel; if the dependency relationship is multi-dependency, the number of the back flushing tasks is designated, each cache table is generated, and then the back flushing is executed by the multi-tasks in parallel. Wherein each task is used to calculate one broad table data.
FIG. 6 is a schematic diagram of a multitasking parallel execution of a wide table data update in accordance with one embodiment of the invention. As shown in fig. 6, broad table 1 is generated from summary tables 1 to 4, broad table 2 is generated from summary tables 2 to 5, broad table 3 is generated from summary tables 3 to 6, … …, and so on, and only a part of the summary tables and broad table are shown in fig. 6. If the corresponding broad table data is directly generated according to the plurality of summary tables, the calculated amount is N×M (where N is the total summary table number and M is the partition number required by each broad table), the embodiment of the invention optimizes the task execution flow and caches the calculated intermediate result set so as to facilitate the subsequent task calculation. Specifically, the plurality of summary tables corresponding to each broad table are grouped according to the minimum calculation granularity, namely, two summary tables are one group, according to fig. 6, namely, summary tables 1 and 2 are summarized to obtain a cache table 1, summary tables 3 and 4 are summarized to obtain a cache table 2, summary tables 5 and 6 are summarized to obtain a cache table 3 … …, and finally each cache table is generated, so that the processing calculation of the cache tables is completed, and only cache tables 1 to 7 are shown in fig. 6. And then calculating the cache table 1 and the cache table 2 to generate a wide table 1, calculating the cache table 5 (obtained by summarizing the tables 2 and 3) and the cache table 6 (obtained by summarizing the tables 4 and 5) to generate a wide table 2, calculating other wide tables in the same way, and finally finishing the processing calculation of all the wide tables. The calculation tasks can be executed in parallel in the processes of generating the cache table and the wide table, and the repeated calculation is reduced and the whole calculation time is shortened through space time exchange.
Fig. 7 is a schematic diagram of main blocks of a wide table data generating apparatus according to an embodiment of the present invention.
The apparatus 700 for generating wide table data according to an embodiment of the present invention mainly includes: a data table extraction module 701, a summary table generation module 702, and a wide table data generation module 703.
The data table extraction module 701 is configured to obtain a source table according to the data tables with non-dynamic updated dimensions in each data table, and obtain a dimension data table according to the data tables with dynamic updated dimensions in each data table.
And the summary table generating module 702 is configured to generate a corresponding summary table according to the data of the source table according to the first correspondence between the configured source table and the summary table.
The broad table data generating module 703 is configured to generate corresponding broad table data according to the configured summary table, the dimension data of the dimension data table, and the second correspondence between broad tables, and according to the dimension data of the summary table and the dimension data table.
The generating device 700 of the broad-table data may further include a configuration module configured to pre-configure the first correspondence and the second correspondence, where: the configuring of the first correspondence includes: and configuring a summary table, each source table required for generating the summary table, the fields required to be extracted by each source table and the dimension data primary key. The configuring of the second correspondence includes: and configuring a broad table, each summary table required for generating the broad table, the fields required to be extracted by each summary table and the dimension data primary keys corresponding to each summary table.
In one embodiment, the data of the source table may be dynamically increased, and the summary table includes one or more partition tables.
The summary table generation module 702 may be specifically configured to: and periodically extracting data from the newly-added data of each source table according to the fields required to be extracted of each configured source table, wherein each period calculates and generates a partition table of the summary table according to the data extracted from the newly-added data.
The configuration module may be further configured to configure the second correspondence further including configuring dynamic partition information of each summary table.
The wide table data generating module 703 may specifically be configured to: determining a partition table to be used by each summary table according to the configured dynamic partition information; and according to the fields required to be extracted from each configured summary table, extracting data from the partition tables required to be used by each summary table, and according to the dimension data in the dimension data table, summarizing the data extracted from each partition table to generate corresponding wide table data.
Fig. 8 is a schematic diagram of main modules of an apparatus for updating broad-table data according to an embodiment of the present invention.
The device 800 for updating the wide table data according to an embodiment of the present invention mainly includes a data table extraction module 801, a summary table generation module 802, a wide table data generation module 803, and a wide table data updating module 804.
The data table extraction module 801, the summary table generation module 802, and the wide table data generation module 803 have the same corresponding functions as the data table extraction module 701, the summary table generation module 702, and the wide table data generation module 703, and are not described herein.
A wide table data update module 804, configured to: under the condition of updating the dimension data of the dimension data table, according to the configured summary table, the dimension data of the dimension data table and the second corresponding relation among the broad tables, the corresponding broad table data is updated according to the updated dimension data of the summary table and the dimension data table.
The broad-table data update module 804 may be specifically configured to: determining a dependency relationship between the summary table and the broad table according to the second correspondence relationship, wherein if one broad table data is generated based on one summary table, the dependency relationship is single dependency; if one broad table data is generated based on a plurality of summary tables, the dependency relationship is multi-dependency;
performing summarization operation on each summarization table depending on each wide table according to the updated dimension data of the dimension data table to obtain updated wide table data;
and grouping the summary tables corresponding to each wide table according to the minimum calculation granularity for each summary table which is dependent on each wide table, de-duplicating all the obtained groups, carrying out first summary on the summary tables of each de-duplicated group according to the updated dimension data of the dimension data table, storing each first summary result in the buffer tables corresponding to each group respectively, acquiring corresponding buffer tables according to the groups of the summary tables corresponding to each wide table, and carrying out second summary on the buffer tables corresponding to each wide table in parallel to obtain updated data of each wide table.
In addition, in the embodiments of the present invention, the specific implementation contents of the apparatus for generating the wide-table data and the apparatus for updating the wide-table data are described in detail in the foregoing methods for generating the wide-table data and updating the wide-table data, respectively, and therefore, the description thereof will not be repeated here.
Fig. 9 shows an exemplary system architecture 900 to which the wide-table data generating method and the wide-table data updating method or the wide-table data generating apparatus and the wide-table data updating apparatus of the embodiment of the present invention can be applied.
As shown in fig. 9, system architecture 900 may include terminal devices 901, 902, 903, a network 904, and a server 905. The network 904 is the medium used to provide communications links between the terminal devices 901, 902, 903 and the server 905. The network 904 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
A user may interact with the server 905 over the network 904 using the terminal devices 901, 902, 903 to receive or send messages, etc. Various communication client applications may be installed on the terminal devices 901, 902, 903, such as shopping class applications, web browser applications, search class applications, instant messaging tools, mailbox clients, social platform software, and the like (by way of example only).
Terminal devices 901, 902, 903 may be a variety of electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablets, laptop and desktop computers, and the like.
The server 905 may be a server that provides various services, such as a background management server (by way of example only) that provides support for shopping-type websites browsed by users using terminal devices 901, 902, 903. The background management server may analyze and process the received data such as the product information query request, and feedback the processing result (e.g., the target push information, the product information—only an example) to the terminal device.
It should be noted that, the method for generating the wide table data and the method for updating the wide table data according to the embodiments of the present invention are generally executed by the server 905, and accordingly, the device for generating the wide table data and the device for updating the wide table data are generally disposed in the server 905.
It should be understood that the number of terminal devices, networks and servers in fig. 9 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Referring now to FIG. 10, there is illustrated a schematic diagram of a computer system 1000 suitable for use in implementing a terminal device or server of an embodiment of the present application. The terminal device or server illustrated in fig. 10 is merely an example, and should not impose any limitation on the functionality and scope of use of the embodiments of the present application.
As shown in fig. 10, the computer system 1000 includes a Central Processing Unit (CPU) 1001, which can execute various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 1002 or a program loaded from a storage section 1008 into a Random Access Memory (RAM) 1003. In the RAM 1003, various programs and data required for the operation of the system 1000 are also stored. The CPU 1001, ROM 1002, and RAM 1003 are connected to each other by a bus 1004. An input/output (I/O) interface 1005 is also connected to bus 1004.
The following components are connected to the I/O interface 1005: an input section 1006 including a keyboard, a mouse, and the like; an output portion 1007 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), etc., and a speaker, etc.; a storage portion 1008 including a hard disk or the like; and a communication section 1009 including a network interface card such as a LAN card, a modem, or the like. The communication section 1009 performs communication processing via a network such as the internet. The drive 1010 is also connected to the I/O interface 1005 as needed. A removable medium 1011, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like, is installed as needed in the drive 1010, so that a computer program read out therefrom is installed as needed in the storage section 1008.
In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication portion 1009, and/or installed from the removable medium 1011. The above-described functions defined in the system of the present application are performed when the computer program is executed by a Central Processing Unit (CPU) 1001.
The computer readable medium shown in the present invention may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present application, however, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules involved in the embodiments of the present invention may be implemented in software or in hardware. The described modules may also be provided in a processor, for example, as: a processor comprises a data table extraction module, a summary table generation module and a wide table data generation module. The names of these modules do not limit the module itself in some cases, for example, the data table extraction module may also be described as "a module for obtaining a source table from a data table that is dynamically updated according to a dimension in each data table, and obtaining a dimension data table from a data table that is dynamically updated according to a dimension in each data table".
As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be present alone without being fitted into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to include: obtaining a source table according to the data table of which the dimension is not dynamically updated in each data table, and obtaining a dimension data table according to the data table of which the dimension is dynamically updated in each data table; generating a corresponding summary table according to data of a source table according to a first corresponding relation between the configured source table and the summary table; and generating corresponding wide table data according to the configured summary table, the dimension data of the dimension data table and the second corresponding relation between the wide tables and the dimension data of the dimension data table. Or obtaining a source table according to the data table of which the dimension is not dynamically updated in each data table, and obtaining a dimension data table according to the data table of which the dimension is dynamically updated in each data table; generating a corresponding summary table according to data of a source table according to a first corresponding relation between the configured source table and the summary table; generating corresponding wide table data according to the configured summary table, the dimension data of the dimension data table and the second corresponding relation among the wide tables and the dimension data of the dimension data table; and under the condition that the dimension data of the dimension data table is updated, according to a second corresponding relation among the configured summary table, the dimension data of the dimension data table and the broad table, updating corresponding broad table data according to the updated dimension data of the summary table and the dimension data table.
According to the technical scheme of the embodiment of the invention, the source table is obtained according to the data table which is dynamically updated in the dimension in each data table, the dimension data table is obtained according to the data table which is dynamically updated in the dimension, the corresponding summary table is generated according to the data of the source table, and the corresponding wide table data is generated according to the summary table and the dimension data of the dimension data table. In the case of updating the dimension data of the dimension data table, the corresponding wide table data is updated according to the summary table and the updated dimension data of the dimension data table. By the embodiment of the invention, the data processing script is not required to be modified when the wide table data is updated, calculation is not required to be carried out on all the topic table data, the defects of heavy task, high cost and high risk are overcome, repeated operation is reduced, the data quantity required to be repeatedly calculated is greatly reduced, the whole calculation time is shortened, and the waste of server resources is reduced.
The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives can occur depending upon design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims (10)

1. A method for generating wide-table data, comprising:
obtaining a source table according to the data table of which the dimension is not dynamically updated in each data table, and obtaining a dimension data table according to the data table of which the dimension is dynamically updated in each data table;
generating a corresponding summary table according to data of a source table according to a first corresponding relation between the configured source table and the summary table;
generating corresponding wide table data according to the configured summary table, the dimension data of the dimension data table and the second corresponding relation among the wide tables and the dimension data of the dimension data table;
the method further comprises pre-configuring the second correspondence, wherein: the configuring of the second correspondence includes: and configuring a broad table, each summary table required for generating the broad table, the fields required to be extracted by each summary table and the dimension data primary keys corresponding to each summary table.
2. The method of claim 1, further comprising pre-configuring the first correspondence, wherein:
the configuring the first correspondence relationship includes: and configuring a summary table, and generating each source table, the fields required to be extracted by each source table and the dimension data primary key required by the summary table.
3. The method of claim 2, wherein the source table data is dynamically increased, the summary table comprising one or more partition tables;
generating a corresponding summary table according to the data of the source table according to a first corresponding relation between the configured source table and the summary table, including:
and periodically extracting data from the newly-added data of each source table according to the configured field to be extracted of each source table, wherein each period calculates and generates a partition table of the summary table according to the data extracted from the newly-added data.
4. The method of claim 2, wherein configuring the second correspondence further comprises configuring dynamic partition information for the summary tables;
the generating corresponding wide table data according to the second corresponding relation among the configured summary table, the dimension data of the dimension data table and the wide table and the dimension data of the summary table and the dimension data of the dimension data table comprises the following steps:
determining a partition table to be used by each summary table according to the configured dynamic partition information;
and according to the configured fields required to be extracted from each summary table, extracting data from the partition tables required to be used by each summary table, and according to the dimension data in the dimension data table, summarizing the data extracted from each partition table to generate corresponding wide table data.
5. A method of updating wide-table data generated by the method according to any one of claims 1 to 4, comprising:
and under the condition that the dimension data of the dimension data table is updated, according to a second corresponding relation among the configured summary table, the dimension data of the dimension data table and the broad table, updating corresponding broad table data according to the updated dimension data of the summary table and the dimension data table.
6. The method of claim 5, wherein updating the corresponding broad table data according to the second correspondence between the summary table, the dimension data of the dimension data table, and the broad table according to the configuration, and the updated dimension data of the summary table and the dimension data table, comprises:
determining a dependency relationship between the summary table and the broad table according to the second corresponding relationship, wherein if one broad table data is generated based on one summary table, the dependency relationship is single-dependent; if one of the broad table data is generated based on a plurality of the summary tables, the dependency relationship is multi-dependent;
performing summarization operation on each summarization table depending on each wide table according to the updated dimension data of the dimension data table in parallel to obtain updated wide table data;
And grouping the summary tables corresponding to each wide table according to the minimum calculation granularity for each summary table which is multi-dependent on each wide table, de-duplicating all obtained groups, carrying out first summary on the summary tables of each de-duplicated group according to the updated dimension data of the dimension data table, storing each first summary result in the buffer tables corresponding to each group, respectively acquiring corresponding buffer tables according to the groups of the summary tables corresponding to each wide table, and carrying out second summary on the buffer tables corresponding to each wide table in parallel to obtain updated data of each wide table.
7. A wide-table data generating apparatus, comprising:
the data table extraction module is used for obtaining a source table according to the data tables which are dynamically updated in the dimensions in each data table, and obtaining a dimension data table according to the data tables which are dynamically updated in the dimensions in each data table;
the system comprises a summary table generating module, a source table generating module and a data processing module, wherein the summary table generating module is used for generating a corresponding summary table according to a first corresponding relation between a configured source table and the summary table and according to the data of the source table;
the wide table data generation module is used for generating corresponding wide table data according to the configured summary table, the dimension data of the dimension data table and the second corresponding relation among the wide tables and the dimension data of the summary table and the dimension data table;
The device further comprises a configuration module, configured to pre-configure the second correspondence, wherein: the configuring of the second correspondence includes: and configuring a broad table, each summary table required for generating the broad table, the fields required to be extracted by each summary table and the dimension data primary key corresponding to each summary table.
8. An apparatus for updating wide-table data generated by the apparatus of claim 7, comprising a wide-table data updating module for:
and under the condition that the dimension data of the dimension data table is updated, according to a second corresponding relation among the configured summary table, the dimension data of the dimension data table and the broad table, updating corresponding broad table data according to the updated dimension data of the summary table and the dimension data table.
9. An electronic device, comprising:
one or more processors;
a memory for storing one or more programs,
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 1-6.
10. A computer readable medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any of claims 1-6.
CN202010148063.7A 2020-03-05 2020-03-05 Wide-table data generation method, updating method and related device Active CN113360494B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010148063.7A CN113360494B (en) 2020-03-05 2020-03-05 Wide-table data generation method, updating method and related device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010148063.7A CN113360494B (en) 2020-03-05 2020-03-05 Wide-table data generation method, updating method and related device

Publications (2)

Publication Number Publication Date
CN113360494A CN113360494A (en) 2021-09-07
CN113360494B true CN113360494B (en) 2024-04-05

Family

ID=77523784

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010148063.7A Active CN113360494B (en) 2020-03-05 2020-03-05 Wide-table data generation method, updating method and related device

Country Status (1)

Country Link
CN (1) CN113360494B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109189835A (en) * 2018-08-21 2019-01-11 北京京东尚科信息技术有限公司 The method and apparatus of the wide table of data are generated in real time

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8990675B2 (en) * 2011-10-04 2015-03-24 Microsoft Technology Licensing, Llc Automatic relationship detection for spreadsheet data items

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109189835A (en) * 2018-08-21 2019-01-11 北京京东尚科信息技术有限公司 The method and apparatus of the wide table of data are generated in real time

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Broadband rotary hybrid generator for wide-flow-rate fluid energy harvesting and bubble power generation;Yu Du等;《Energy Conversion and Management》;全文 *
面向SaaS应用基于多宽表模式的多租户索引研究;张雅文;刘春霞;党伟超;白尚旺;;计算机应用与软件(07);全文 *

Also Published As

Publication number Publication date
CN113360494A (en) 2021-09-07

Similar Documents

Publication Publication Date Title
US10521404B2 (en) Data transformations with metadata
CN109189835B (en) Method and device for generating data wide table in real time
CN109997126B (en) Event driven extraction, transformation, and loading (ETL) processing
US10977011B2 (en) Structured development for web application frameworks
TW201740294A (en) Model training method and device
US9448851B2 (en) Smarter big data processing using collaborative map reduce frameworks
US10540352B2 (en) Remote query optimization in multi data sources
CN111125064B (en) Method and device for generating database schema definition statement
CN109960212B (en) Task sending method and device
CN108985805B (en) Method and device for selectively executing push task
US11740825B2 (en) Object lifecycle management in a dispersed storage system
CN113360494B (en) Wide-table data generation method, updating method and related device
CN111444148A (en) Data transmission method and device based on MapReduce
CN110858199A (en) Document data distributed computing method and device
CN113485763A (en) Data processing method and device, electronic equipment and computer readable medium
Thingom et al. An integration of big data and cloud computing
CN113760966A (en) Data processing method and device based on heterogeneous database system
CN112817930A (en) Data migration method and device
Lee et al. On a hadoop-based analytics service system
CN110866002A (en) Method and device for processing sub-table data
CN113111119B (en) Method and device for operating data
CN113448940B (en) Database capacity expansion method and device
CN110727672A (en) Data mapping relation query method and device, electronic equipment and readable medium
Vengadeswaran et al. Grouping-aware data placement in hdfs for data-intensive applications based on graph clustering
CN110019162B (en) Method and device for realizing attribute normalization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant