CN114691682A - Data table generation method and device - Google Patents

Data table generation method and device Download PDF

Info

Publication number
CN114691682A
CN114691682A CN202210293153.4A CN202210293153A CN114691682A CN 114691682 A CN114691682 A CN 114691682A CN 202210293153 A CN202210293153 A CN 202210293153A CN 114691682 A CN114691682 A CN 114691682A
Authority
CN
China
Prior art keywords
data table
field
dimension
target
target data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210293153.4A
Other languages
Chinese (zh)
Inventor
杨瑞利
陈健璋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Zhenshi Information Technology Co Ltd
Original Assignee
Beijing Jingdong Zhenshi Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Zhenshi Information Technology Co Ltd filed Critical Beijing Jingdong Zhenshi Information Technology Co Ltd
Priority to CN202210293153.4A priority Critical patent/CN114691682A/en
Publication of CN114691682A publication Critical patent/CN114691682A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP

Abstract

The embodiment of the disclosure discloses a data table generation method and device. One embodiment of the method comprises: determining at least one business data table related to the business theme and row record information of a target data table according to the business theme; counting the occurrence frequency of each field in each business data table to obtain at least one target field of which the statistical value meets a preset condition, and generating each dimension table by taking each target field as each dimension of the target data table; selecting dimension attributes in each dimension table; and determining the attribute value of each dimension based on the row record information of the target data table and the attribute of each dimension, and generating the target data table at least comprising the attribute value of each dimension. A more comprehensive and efficient data table generation method and apparatus are realized.

Description

Data table generation method and device
Technical Field
The embodiment of the disclosure relates to the technical field of computers, in particular to the technical field of data processing, and particularly relates to a data table generation method and device.
Background
In the data warehouse, a detail data layer DWD is used as an isolation layer between a business layer and the data warehouse and is mainly used for carrying out data cleaning and data normalization operation on an operation data storage system ODS. The current design process of the real-time DWD data model comprises the following steps: (1) selecting a business theme; (2) confirming granularity which represents the service detail degree expressed by one record in the fact table; (3) confirming dimension, namely selecting dimension information capable of describing the environment where the clear business process is positioned; (4) confirming the fact: and determining various indexes to be measured by the data model. However, when confirming the dimension process, it is common practice to cover the source data field completely, which is prone to model field redundancy and waste of computing resources. In order to reduce model redundancy, a large amount of customer requirements need to be collected, or high-frequency fields of source data are extracted according to past business experiences to serve as model fields, but the collection cost is high in the mode, the collection is not easy to be comprehensive, and the risk of later-stage model optimization is increased.
Disclosure of Invention
The embodiment of the disclosure provides a data table generation method and device.
In a first aspect, an embodiment of the present disclosure provides a data table generating method, including: determining at least one business data table related to the business theme and row record information of a target data table according to the business theme; counting the occurrence frequency of each field in each business data table to obtain at least one target field of which the statistical value meets a preset condition, and generating each dimension table by taking each target field as each dimension of the target data table; selecting dimension attributes in each dimension table; and determining the attribute value of each dimension based on the row record information of the target data table and the attribute of each dimension, and generating the target data table at least comprising the attribute value of each dimension.
In some embodiments, counting the occurrence frequency of each field in each service data table to obtain at least one target field whose statistical value meets a preset condition includes: and counting the occurrence frequency of each field in the script library corresponding to each business data table by using a word frequency-inverse document frequency technology to obtain at least one target field of which the statistical value meets the preset condition, wherein the statistical value represents the product of the word frequency of the field and the inverse document frequency of the corresponding field.
In some embodiments, the preset condition is that the statistical value is greater than a threshold value and/or the statistical value is located before a preset serial number after all the statistical values are sorted; counting the occurrence frequency of each field in the script library corresponding to each business data table by using a word frequency-inverse document frequency technology to obtain at least one target field of which the statistical value meets the preset condition, wherein the counting comprises the following steps: dividing each field in the script library according to each preset condition in the plurality of preset conditions to obtain various fields corresponding to each preset condition; counting the occurrence frequency of each type of field by using a word frequency-inverse document frequency technology to obtain at least one sub-target field of each type of field, wherein the statistical value meets the corresponding preset condition; and combining the sub-target fields of the fields to obtain the target fields.
In some embodiments, before counting the occurrence frequency of each field in each service data table in the script library by using a word frequency-inverse document frequency technique to obtain at least one target field whose statistical value meets a preset condition, the method further includes: and marking the stop words for each field in the script library so as to omit each field marked as the stop words in the subsequent statistics of each field in the script library.
In some embodiments, the preset conditions are set based on the types of the field information; each preset condition corresponds to one type of field information.
In some embodiments, before selecting the dimension attribute in each dimension table, the method further includes: determining an associated field related to each dimension according to each dimension of the target data table; and taking the associated fields as the dimensions of the target data table, and combining the associated fields with the dimensions of the existing target data table to generate the final dimensions of the target data table.
In some embodiments, determining the row record information of the target data table comprises: and analyzing the business data table, and selecting the information which represents the business theme and has the finest detail degree as the row record information of the target data table.
In some embodiments, after determining the row record information of the target data table, the method further includes: and taking the row record information of the target data table as a main key of the target data table.
In some embodiments, the method further comprises: and displaying the target data table.
In a second aspect, an embodiment of the present disclosure provides a data table generating apparatus, including: the first determining unit is configured to determine at least one business data table related to a business theme and row record information of a target data table according to the business theme; the statistical unit is configured to count the occurrence frequency of each field in each business data table to obtain at least one target field of which the statistical value meets a preset condition, and each target field is used as the dimension of the target data table to generate each dimension table; the selecting unit is configured to select the dimension attributes in the dimension tables; and the first generation unit is configured to determine the attribute value of each dimension based on the row record information of the target data table and the attribute of each dimension, and generate the target data table at least comprising the attribute value of each dimension.
In some embodiments, the statistical unit is further configured to perform statistics on the occurrence frequency of each field in each service data table in the script library corresponding to each service data table by using a word frequency-inverse document frequency technique to obtain at least one target field whose statistical value satisfies a preset condition, where the statistical value represents a product of the word frequency of the field and the inverse document frequency of the corresponding field.
In some embodiments, the apparatus further comprises: and the marking unit is configured to mark stop words for each field in the script library so as to omit each field marked as a stop word in the subsequent statistics of each field in the script library.
In some embodiments, the preset condition in the statistical unit is that the statistical value is greater than a threshold value and/or the statistical value is located before a preset serial number after all the statistical values are sorted; a statistics unit comprising: the field dividing module is configured to divide fields in the script library according to each preset condition in the preset conditions to obtain various fields corresponding to each preset condition; the frequency counting module is configured to count the occurrence frequency of each type of field by using a word frequency-inverse document frequency technology to obtain at least one sub-target field of each type of field, the statistical value of which meets corresponding preset conditions; and the merging module is configured to merge the sub-target fields of various fields to obtain each target field.
In some embodiments, the preset conditions in the statistical unit are set based on the types of the field information; each preset condition in the statistical unit corresponds to one type of field information.
In some embodiments, the apparatus further comprises: the second determining unit is configured to determine an associated field related to each dimension according to each dimension of the target data table; and the second generation unit is configured to take the associated fields as the dimensions of the target data table, combine the associated fields with the dimensions of the existing target data table, and generate the final dimensions of the target data table.
In some embodiments, the first determining unit is further configured to analyze the business data table, and select information indicating the finest level of detail of the business topic as the row record information of the target data table.
In some embodiments, the apparatus further comprises: and a setting unit configured to take the row record information of the target data table as a primary key of the target data table.
In some embodiments, the apparatus further comprises: and the display unit is configured to display the target data table.
In a third aspect, an embodiment of the present disclosure provides an electronic device, including: one or more processors; a storage device having one or more programs stored thereon, which when executed by one or more processors, cause the one or more processors to implement the method as described in any of the implementations of the first aspect.
In a fourth aspect, an embodiment of the present disclosure provides a computer-readable medium on which a computer program is stored, wherein the computer program, when executed by a processor, implements the method as described in any implementation manner of the first aspect.
The data table generation method and device provided by the embodiment of the disclosure determine at least one business data table related to a business topic and row record information of a target data table according to the business topic, count the occurrence frequency of each field in each business data table to obtain at least one target field of which the statistical value meets a preset condition, use each target field as each dimension of the target data table, generate each dimension table, select dimension attributes in each dimension table, determine attribute values of each dimension based on the row record information of the target data table and the attributes of each dimension, and generate the target data table at least comprising the attribute values of each dimension. A more comprehensive and efficient data table generation method and apparatus are realized.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
Other features, objects and advantages of the disclosure will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:
FIG. 1 is an exemplary system architecture diagram in which some embodiments of the present disclosure may be applied;
FIG. 2 is a flow diagram for one embodiment of a data table generation method according to the present disclosure;
FIG. 3 is a schematic diagram of an application scenario of a data table generation method according to the present disclosure;
FIG. 4 is a flow diagram of yet another embodiment of a data table generation method according to the present disclosure;
FIG. 5 is a schematic block diagram illustrating one embodiment of a data table generation apparatus according to the present disclosure;
FIG. 6 is a schematic structural diagram of an electronic device suitable for use in implementing embodiments of the present disclosure.
Detailed Description
The present disclosure is described in further detail below with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
It should be noted that, in the present disclosure, the embodiments and features of the embodiments may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
Fig. 1 illustrates an exemplary system architecture 100 of a data table generation method or data table generation apparatus to which embodiments of the present disclosure may be applied.
As shown in fig. 1, the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to send transmission requests or to receive transmission data or the like. The terminal devices 101, 102, 103 may have various communication client applications installed thereon, such as a shopping application, a pick-up application, a web browser application, a search application, an instant messaging tool, a mailbox client, social platform software, and the like.
The terminal apparatuses 101, 102, and 103 may be hardware or software. When the terminal devices 101, 102, 103 are hardware, they may be various electronic devices with a display screen to support information browsing, including but not limited to smart phones, tablet computers, e-book readers, laptop portable computers, desktop computers, and the like. The terminal devices 101, 102, 103 may interact with a server via the network 104 to obtain information and the like. When the terminal apparatuses 101, 102, 103 are software, they can be installed in the electronic apparatuses listed above. It may be implemented, for example, as multiple software or software modules to provide distributed services, or as a single software or software module. And is not particularly limited herein.
The server 105 may be an application server that provides various services, such as an application server that provides support for transmission requests of the terminal devices 101, 102, 103. The application server may perform processing such as determination on the received data such as the transmission request, and feed back a processing result (e.g., a target data table) to the terminal device.
The server 105 may be hardware or software. When the server is hardware, it may be implemented as a distributed server cluster formed by multiple servers, or may be implemented as a single server. When the server is software, it may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.
It should be noted that the operations performed by the server 105 may also be performed by other electronic devices.
It should be noted that the data table generating method provided by the embodiment of the present disclosure is generally executed by the server 105, and the corresponding data table generating device is generally disposed in the server 105.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
With continued reference to FIG. 2, a flow 200 of one embodiment of a data table generation method according to the present disclosure is shown. The data table generation method comprises the following steps:
step 201, determining at least one business data table associated with the business topic and the row record information of the target data table according to the business topic.
In this embodiment, when an execution agent (for example, the server shown in fig. 1) receives a Data table generation request or runs to perform a Data table generation operation, a service topic, that is, a specific content to be reflected by a target Data table, may be first selected, and then Data research is performed using an Operation Data Storage (ODS) layer Data source according to the selected service topic, so as to determine at least one service Data table associated with the service topic and row record information of the target Data table associated with the service topic, that is, to determine granularity of the tables. The business theme can be at least one of a business event, a state of the business event and a business process comprising a plurality of business events, such as a medium and small-sized waybill, a business process of successful transaction, a transaction order process and the like. The number of the business data tables is related to the business theme, generally, all the business data tables related to the business theme need to be determined so as to enable the generation of the target data table to be more comprehensive and effective, for example, if the business theme is a medium and small waybill, the determined business data tables of the waybill system comprise a waybill main table and a plurality of waybill expansion tables. Here, the determination of whether all the service data tables or part of the service data tables related to the service theme are determined according to the service theme is not particularly limited.
The granularity is the minimum unit of the data line number, the service detail degree expressed by one record in the representation table is used for determining the detail level of the service expressed by one line in the table (such as a fact table), and the expansibility of the data table dimension is determined. Before determining the dimensions and facts of a table, the table granularity needs to be determined, and each dimension and fact of the table needs to be consistent with the defined granularity. The granularity is usually expressed by a service description, for example, the row record information of the target data table may be a freight order number, an order number, and the like. The granularity of the pre-specified table ensures that the understanding of the meaning of the rows in the fact table is not confused, ensuring that all facts are recorded at the same level of detail.
It should be noted that there cannot be multiple different granularities in the same fact table, and the granularity of all facts in a fact table needs to be consistent with the granularity stated in the table.
Step 202, counting the occurrence frequency of each field in each service data table to obtain at least one target field of which the statistical value meets the preset condition, and generating each dimension table by taking each target field as each dimension of the target data table.
In this embodiment, the execution main body may perform statistics on the occurrence frequency of each field in each service data table determined in step 201 to obtain a statistical value of each field, determine whether each statistical value satisfies a preset condition, use a field whose statistical value satisfies the preset condition as a target field to obtain at least one target field, and use each target field as each dimension of the target data table to generate each dimension table. The target data table is used for reflecting the business theme. The preset condition is preset based on selecting dimension information capable of describing an environment in which the business process is clearly located. The same field in the target field is only used as one dimension of the target data table, so that the dimension redundancy of the target data table caused by field redundancy is avoided.
As an example, when the business theme is a medium and small freight note, the target fields obtained after statistics may include a freight note number, a freight note state, a freight note type, a freight note identifier, a warehouse, a distribution center, a distribution mode, an address, a sorting site, a freight note weight, a freight note volume, and the like, and the identified high-frequency fields are unified and integrated into the model as dimension information.
Step 203, selecting dimension attributes in each dimension table.
In this embodiment, the execution subject may select, according to the dimensions of the target data table determined in step 202 and a preset data selection principle, names of the dimension members from corresponding dimension tables as the dimension attributes. The dimension attributes may include some or all attributes related to the business topic, and there is a corresponding relationship between the attributes of each dimension and the row record information of the target data table. The attributes of each dimension may be considered as a respective column of the target data table. For a fact table, the attributes of each dimension are equivalent to each fact of the fact table. By way of example, when the business theme is a ordering business process, the selected attributes of each dimension include a product ID, a product price and a purchase amount.
Step 204, determining the attribute value of each dimension based on the row record information of the target data table and the attribute of each dimension, and generating the target data table at least comprising the attribute value of each dimension.
In this embodiment, the executing body may query the database corresponding to the service theme based on the row record information of the target data table and the attribute of each dimension, determine the attribute value of each dimension, and generate the target data table at least including the attribute value of each dimension. The target data table may also generate a target data table including attribute values of each dimension by using the row record information of the target data table as a primary key and using the attributes of each dimension as each column.
With continued reference to fig. 3, fig. 3 is a schematic diagram 300 of an application scenario of the data table generation method according to the present embodiment. The data table generating method of the present embodiment operates in the electronic device 301. Firstly, the electronic device 301 determines at least one service data table related to a service theme and row record information 302 of a target data table according to the service theme, then the electronic device 301 counts occurrence frequency of each field in each service data table to obtain at least one target field of which a statistical value meets a preset condition, each target field is used as each dimension of the target data table to generate each dimension table 303, then the electronic device 301 selects a dimension attribute 304 in each dimension table, and finally the electronic device 301 determines an attribute value of each dimension based on the row record information of the target data table and the attribute of each dimension to generate the target data table 305 at least comprising the attribute value of each dimension.
The data table generating method provided by the embodiment of the disclosure determines at least one service data table related to a service theme and row record information of a target data table according to the service theme, counts occurrence frequency of each field in each service data table to obtain at least one target field of which a statistical value meets a preset condition, generates each dimension table by taking each target field as each dimension of the target data table, selects a dimension attribute in each dimension table, determines an attribute value of each dimension based on the row record information of the target data table and the attribute of each dimension, and generates the target data table at least comprising the attribute value of each dimension. A more comprehensive and efficient data table generation method is realized.
With further reference to FIG. 4, a flow diagram of yet another embodiment of a data table generation method is shown. The flow 400 of the data table generating method includes the following steps:
step 401, determining at least one business data table associated with the business topic and row record information of the target data table according to the business topic.
Step 402, counting the occurrence frequency of each field in the script library corresponding to each business data table by using a word frequency-inverse document frequency technology to obtain at least one target field of which the statistical value meets a preset condition, and generating each dimension table by taking each target field as each dimension of the target data table.
In this embodiment, the execution main body uses a word frequency-inverse document frequency technology (that is, a TF-IDF algorithm technology) to count the occurrence frequency of each field in the script library corresponding to each service data table, to obtain at least one target field whose statistical value meets a preset condition, and uses each target field as each dimension of the target data table to generate each dimension table. Wherein the statistical value represents a product of the word frequency TF value of a field and the inverse document frequency IDF value of the corresponding field. The preset condition may be that the statistical value is greater than the threshold value and/or that the statistical value is located before the preset sequence number after all the statistical values are sorted. The script library can comprise a historical query script library, a real-time task script library and the like.
Specifically, the statistical process of the target field includes: dividing each field in the script library according to each preset condition in the plurality of preset conditions to obtain various fields corresponding to each preset condition; counting the occurrence frequency of each type of field by using a word frequency-inverse document frequency technology to obtain at least one sub-target field of each type of field, wherein the statistical value meets the corresponding preset condition; and combining the sub-target fields of the fields to obtain the target fields.
In some optional implementations of this embodiment, each preset condition is set based on each type of the field information; each preset condition corresponds to one type of field information. The types of field information may include: basic information class, logistics information class, person information class and other four classes. Different preset conditions are set according to types of field information, so that high and low frequency standards for different categories are different, for example, the frequency of basic information fields is higher, and then a higher frequency screening standard is set for the basic information fields, so that the statistics of target fields with higher pertinence is realized.
In some optional implementation manners of this embodiment, before counting the occurrence frequency of each field in each service data table in the script library by using a word frequency-inverse document frequency technique to obtain at least one target field whose statistical value meets a preset condition, the method further includes: and marking the stop words for each field in the script library so as to omit each field marked as the stop words in the subsequent statistics of each field in the script library. The stop words (stop words) indicate the most frequent words in Chinese text mining, and the words which are not helpful to find the result and must be filtered, such as the most common words of 'yes', 'in', and the like. Here, the query keywords such as SELECT, WHERE, and FROM in the query script may be set as stop words, and the scripting language keywords such as new, class, public, void, and return in the real-time task script library may be set as stop words, so that the fields marked as stop words are omitted in the subsequent statistics of the fields in the script library.
Step 403, selecting dimension attributes in each dimension table.
In some optional implementation manners of this embodiment, before selecting the dimension attribute in each dimension table, the method further includes: determining an associated field related to each dimension according to each dimension of the target data table; and taking the associated fields as the dimensions of the target data table, and combining the associated fields with the dimensions of the existing target data table to generate the final dimensions of the target data table. The method is further optimized on the basis of the dimension construction of the original data table, for example, dimensions such as buyer and seller star level, shop name, category level and the like are all related to the fact table, so that the fact table is more complete and comprehensive, and the efficiency of filtering, inquiring and counting aggregation of the fact table is improved.
Step 404, determining an attribute value of each dimension based on the row record information of the target data table and the attribute of each dimension, and generating the target data table at least comprising the attribute value of each dimension.
In some optional implementations of this embodiment, determining the row record information of the target data table includes: and analyzing the business data table, and selecting the information which represents the business theme and has the finest detail degree as the row record information of the target data table. By selecting the finest level of granularity, greater flexibility in the application of the table is ensured.
In some optional implementations of this embodiment, the method further includes: and displaying the target data table for further confirming and applying the data table generation result.
It should be noted that the TF-IDF algorithm is a well-known technology widely studied and applied at present, and is not described herein again.
In this embodiment, the specific operations of steps 401, 403, and 404 are substantially the same as the operations of steps 201, 203, and 204 in the embodiment shown in fig. 2, and are not described again here.
As can be seen from fig. 4, compared with the embodiment corresponding to fig. 2, in the process 400 for generating a data table in this embodiment, the word frequency-inverse document frequency technology is used to count the occurrence frequency of each field in the script library corresponding to each business data table, so as to obtain at least one target field whose statistical value meets the preset condition, and each target field is used as each dimension of the target data table to generate each dimension table. The field redundancy in data table dimension confirmation in the prior art is avoided, computing resources are saved, and the problems of high cost and incomplete field collection caused by collecting customer requirements are solved. By using the TF-IDF algorithm, high-frequency and low-frequency identification is carried out on the use frequency of each field in the data source, the model field is automatically selected, the coverage rate of the AP side of the model is improved, and the cost for collecting customer requirements and the business experience threshold of research personnel are reduced. By setting different preset conditions, various high-frequency fields are used as the dimensionality of the table, the model covers most of high-frequency used fields as much as possible, and the data model is more flexible and targeted to be constructed.
The terminology used in the present disclosure is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used in this disclosure and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items. It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present disclosure. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.
With further reference to fig. 5, as an implementation of the method shown in fig. 2 to fig. 4, the present disclosure provides an embodiment of a data table generating apparatus, where the embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 2, and the apparatus may be specifically applied to various electronic devices.
As shown in fig. 5, the data table generating apparatus 500 of the present embodiment includes: a first determination unit 501, a statistical unit 502, a selection unit 503, and a first generation unit 504. The first determining unit is configured to determine at least one business data table related to a business theme and row record information of a target data table according to the business theme; the statistical unit is configured to perform statistics on the occurrence frequency of each field in each service data table to obtain at least one target field of which the statistical value meets a preset condition, and generate each dimension table by taking each target field as the dimension of the target data table; the selecting unit is configured to select the dimension attributes in the dimension tables; and the first generation unit is configured to determine the attribute value of each dimension based on the row record information of the target data table and the attribute of each dimension, and generate the target data table at least comprising the attribute value of each dimension.
In this embodiment, the specific processes of the first determining unit 501, the counting unit 502, the selecting unit 503, and the first generating unit 504 of the data table generating apparatus 500 and the technical effects thereof can refer to the related descriptions of step 201 to step 204 in the embodiment corresponding to fig. 2, and are not described herein again.
In some optional implementations of this embodiment, the statistical unit is further configured to perform statistics on occurrence frequencies of fields in each service data table in the script library corresponding to each service data table by using a word frequency-inverse document frequency technique, so as to obtain at least one target field whose statistical value satisfies a preset condition, where the statistical value represents a product of a word frequency of a field and an inverse document frequency of a corresponding field.
In some optional implementations of this embodiment, the apparatus further includes: and the marking unit is configured to mark stop words for each field in the script library so as to omit each field marked as a stop word in the subsequent statistics of each field in the script library.
In some optional implementation manners of this embodiment, the preset condition in the statistical unit is that the statistical value is greater than the threshold value and/or the statistical value is located before the preset sequence number after all the statistical values are sorted; a statistics unit comprising: the field dividing module is configured to divide fields in the script library according to each preset condition in the preset conditions to obtain various fields corresponding to each preset condition; the frequency counting module is configured to count the occurrence frequency of each type of field by using a word frequency-inverse document frequency technology to obtain at least one sub-target field of each type of field, wherein the statistical value of the sub-target field meets the corresponding preset condition; and the merging module is configured to merge the sub-target fields of various fields to obtain each target field.
In some optional implementation manners of this embodiment, each preset condition in the statistical unit is set based on each type of the field information; each preset condition in the statistical unit corresponds to one type of field information.
In some optional implementations of this embodiment, the apparatus further includes: the second determining unit is configured to determine an associated field related to each dimension according to each dimension of the target data table; and the second generation unit is configured to take the associated fields as the dimensions of the target data table, combine the associated fields with the dimensions of the existing target data table, and generate the final dimensions of the target data table.
In some optional implementations of this embodiment, the first determining unit is further configured to analyze the service data table, and select information indicating the finest level of detail of the service topic as the row record information of the target data table.
In some optional implementations of this embodiment, the apparatus further includes: and a setting unit configured to take the row record information of the target data table as a primary key of the target data table.
In some optional implementations of this embodiment, the apparatus further includes: and the display unit is configured to display the target data table.
Referring now to FIG. 6, shown is a schematic diagram of an electronic device 600 suitable for use in implementing embodiments of the present disclosure. The terminal device in the embodiments of the present disclosure may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a vehicle terminal (e.g., a car navigation terminal), and the like, and a fixed terminal such as a digital TV, a desktop computer, and the like. The electronic device shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
As shown in fig. 6, electronic device 600 may include a processing means (e.g., central processing unit, graphics processor, etc.) 601 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage means 608 into a Random Access Memory (RAM) 603. In the RAM603, various programs and data necessary for the operation of the electronic apparatus 600 are also stored. The processing device 601, the ROM 602, and the RAM603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
Generally, the following devices may be connected to the I/O interface 605: input devices 606 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 607 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 608 including, for example, tape, hard disk, etc.; and a communication device 609. The communication means 609 may allow the electronic device 600 to communicate with other devices wirelessly or by wire to exchange data. While fig. 6 illustrates an electronic device 600 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided. Each block shown in fig. 6 may represent one device or may represent multiple devices as desired.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 609, or may be installed from the storage means 608, or may be installed from the ROM 602. The computer program, when executed by the processing device 601, performs the above-described functions defined in the methods of embodiments of the present disclosure.
It should be noted that the computer readable medium of the embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In embodiments of the disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In embodiments of the present disclosure, however, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.
The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: determining at least one business data table related to the business theme and row record information of a target data table according to the business theme; counting the occurrence frequency of each field in each business data table to obtain at least one target field of which the statistical value meets a preset condition, and generating each dimension table by taking each target field as each dimension of the target data table; selecting dimension attributes in each dimension table; and determining the attribute value of each dimension based on the row record information of the target data table and the attribute of each dimension, and generating the target data table at least comprising the attribute value of each dimension.
Computer program code for carrying out operations for embodiments of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present disclosure may be implemented by software or hardware. The described units may also be provided in a processor, which may be described as: a processor includes a first determining unit, a counting unit, a selecting unit, and a first generating unit. The names of these units do not constitute a limitation to the unit itself in some cases, for example, the first determination unit may also be described as a "unit that determines at least one service data table associated with a service topic and row record information of a target data table according to the service topic".
The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the embodiments of the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is made without departing from the inventive concept as defined above. For example, the above features and (but not limited to) technical features with similar functions disclosed in the embodiments of the present disclosure are mutually replaced to form the technical solution.

Claims (12)

1. A data table generation method, comprising:
determining at least one business data table related to the business theme and row record information of a target data table according to the business theme;
counting the occurrence frequency of each field in each business data table to obtain at least one target field of which the statistical value meets a preset condition, and generating each dimension table by taking each target field as each dimension of the target data table;
selecting dimension attributes in each dimension table;
and determining the attribute value of each dimension based on the row record information of the target data table and the attribute of each dimension, and generating the target data table at least comprising the attribute value of each dimension.
2. The method according to claim 1, wherein the counting the occurrence frequency of each field in each service data table to obtain at least one target field whose statistical value satisfies a preset condition includes:
and counting the occurrence frequency of each field in the script library corresponding to each business data table by using a word frequency-inverse document frequency technology to obtain at least one target field of which the statistical value meets the preset condition, wherein the statistical value represents the product of the word frequency of the field and the inverse document frequency of the corresponding field.
3. The method according to claim 2, wherein the preset condition is that the statistical value is greater than a threshold value and/or the statistical value is located before a preset sequence number after all statistical values are sorted; the counting of the occurrence frequency of each field in the script library corresponding to each business data table by using the word frequency-inverse document frequency technology to obtain at least one target field of which the statistical value meets the preset condition comprises the following steps:
dividing each field in the script library according to each preset condition in a plurality of preset conditions to obtain various fields corresponding to each preset condition;
counting the occurrence frequency of each type of field by using a word frequency-inverse document frequency technology to obtain at least one sub-target field of each type of field, wherein the statistical value meets the corresponding preset condition;
and combining the sub-target fields of the various fields to obtain the target fields.
4. The method according to claim 3, wherein the respective preset conditions are set based on respective types of field information; each preset condition corresponds to a type of the field information.
5. The method according to claim 1, wherein before said selecting the dimension attributes in each dimension table, further comprising:
determining an associated field related to each dimension according to each dimension of the target data table;
and taking the associated fields as dimensions of the target data table, and combining the associated fields with the dimensions of the existing target data table to generate final dimensions of the target data table.
6. A data table generation apparatus comprising:
the system comprises a first determining unit, a second determining unit and a third determining unit, wherein the first determining unit is configured to determine at least one business data table related to a business theme and row record information of a target data table according to the business theme;
the statistical unit is configured to count the occurrence frequency of each field in each service data table to obtain at least one target field of which the statistical value meets a preset condition, and each target field is used as each dimension of the target data table to generate each dimension table;
the selecting unit is configured to select the dimension attributes in the dimension tables;
a first generating unit configured to determine an attribute value of each dimension based on the row record information of the target data table and the attribute of each dimension, and generate the target data table including at least the attribute value of each dimension.
7. The apparatus of claim 6, wherein the statistical unit is further configured to perform statistics on the occurrence frequency of each field in the script library corresponding to each service data table by using a word frequency-inverse document frequency technique, so as to obtain at least one target field whose statistical value satisfies a preset condition, where the statistical value represents a product of a word frequency of a field and an inverse document frequency of a corresponding field.
8. The apparatus according to claim 7, wherein the predetermined condition in the statistical unit is that the statistical value is greater than a threshold value and/or that the statistical value is located before a predetermined sequence number after all the statistical values are sorted; the statistical unit comprises:
the field dividing module is configured to divide fields in the script library according to each preset condition in a plurality of preset conditions to obtain various fields corresponding to each preset condition;
the frequency counting module is configured to count the occurrence frequency of each type of field by using a word frequency-inverse document frequency technology to obtain at least one sub-target field of each type of field, the statistical value of which meets corresponding preset conditions;
and the merging module is configured to merge the sub-target fields of the various fields to obtain the target fields.
9. The apparatus of claim 8, wherein the respective preset conditions in the statistical unit are set based on respective types of field information; each preset condition in the statistical unit corresponds to one type of the field information.
10. The apparatus of claim 6, further comprising:
the second determining unit is configured to determine an associated field related to each dimension of the target data table according to each dimension;
and the second generating unit is configured to take the associated fields as dimensions of the target data table, merge the associated fields with the dimensions of the existing target data table, and generate final dimensions of the target data table.
11. An electronic device, comprising:
one or more processors;
a storage device having one or more programs stored thereon;
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-5.
12. A computer-readable medium, on which a computer program is stored, wherein the program, when executed by a processor, implements the method of any one of claims 1-5.
CN202210293153.4A 2022-03-23 2022-03-23 Data table generation method and device Pending CN114691682A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210293153.4A CN114691682A (en) 2022-03-23 2022-03-23 Data table generation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210293153.4A CN114691682A (en) 2022-03-23 2022-03-23 Data table generation method and device

Publications (1)

Publication Number Publication Date
CN114691682A true CN114691682A (en) 2022-07-01

Family

ID=82140087

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210293153.4A Pending CN114691682A (en) 2022-03-23 2022-03-23 Data table generation method and device

Country Status (1)

Country Link
CN (1) CN114691682A (en)

Similar Documents

Publication Publication Date Title
US20200250732A1 (en) Method and apparatus for use in determining tags of interest to user
US10318546B2 (en) System and method for test data management
CN111427971B (en) Business modeling method, device, system and medium for computer system
US10692045B2 (en) Intelligent attention management for unified messaging
CN111258988B (en) Asset management method, device, electronic equipment and medium
CN110609783B (en) Method and device for identifying abnormal behavior user
CN110245684B (en) Data processing method, electronic device, and medium
CN109992719B (en) Method and apparatus for determining push priority information
CN113326095A (en) Commission data processing method and device
US11238105B2 (en) Correlating user device attribute groups
CN113760521A (en) Virtual resource allocation method and device
CN112016792A (en) User resource quota determining method and device and electronic equipment
CN108959636B (en) Data processing method, device, system and computer readable medium
CN116594683A (en) Code annotation information generation method, device, equipment and storage medium
CN116244751A (en) Data desensitization method, device, electronic equipment, storage medium and program product
US20200118016A1 (en) Data attribution using frequent pattern analysis
CN114691682A (en) Data table generation method and device
CN111338621B (en) Method, apparatus, electronic device and computer readable medium for displaying data
CN111125514B (en) Method, device, electronic equipment and storage medium for analyzing user behaviors
CN110888583B (en) Page display method, system and device and electronic equipment
CN113391988A (en) Method and device for losing user retention, electronic equipment and storage medium
US8538813B2 (en) Method and system for providing an SMS-based interactive electronic marketing offer search and distribution system
CN111833085A (en) Method and device for calculating price of article
CN110796506A (en) Abnormal order judgment method and device
CN113360765B (en) Event information processing method and device, electronic equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination