CN107229718B - Method and device for processing report data - Google Patents

Method and device for processing report data Download PDF

Info

Publication number
CN107229718B
CN107229718B CN201710398736.2A CN201710398736A CN107229718B CN 107229718 B CN107229718 B CN 107229718B CN 201710398736 A CN201710398736 A CN 201710398736A CN 107229718 B CN107229718 B CN 107229718B
Authority
CN
China
Prior art keywords
data
key
row
columns
report
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710398736.2A
Other languages
Chinese (zh)
Other versions
CN107229718A (en
Inventor
李铭浩
朱锟炜
沈俊杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Jingdong Shangke Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN201710398736.2A priority Critical patent/CN107229718B/en
Publication of CN107229718A publication Critical patent/CN107229718A/en
Application granted granted Critical
Publication of CN107229718B publication Critical patent/CN107229718B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • G06F16/244Grouping and aggregation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24553Query execution of query operations
    • G06F16/24554Unary operations; Data partitioning operations
    • G06F16/24556Aggregation; Duplicate elimination

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention provides a method and a device for processing report data, which are beneficial to higher processing speed under the condition of supporting multiple data query modes. The method comprises the following steps: respectively marking the types of a plurality of data columns selected from a data source, wherein the types of the data columns comprise a key name column and a key value column; saving the current report and saving the line index dictionary; and calculating a corresponding value for a plurality of key name columns of the current row data in the acquired data source by using the preset calculation mode, judging whether the corresponding value is an existing key name in the row index dictionary, and performing data merging on the row data of which the sequence number is the key value corresponding to the existing key name in the current report and the current row data by adopting an aggregation function corresponding to each key value column in the current report.

Description

Method and device for processing report data
Technical Field
The invention relates to the technical field of computers and software, in particular to a method and a device for processing report data.
Background
In the digital age, more and more information is provided to people in the form of data reports, such as information on goods, or information on advertisement placement effectiveness … …, etc. Taking advertisements as an example, for advertisers who have delivered the advertisements, they can frequently check the delivery effect of the advertisements after the advertisements are delivered, and the delivery effect is basically displayed to the advertisers in the form of reports. In a large data query scenario, a data structure based on report storage and calculation needs to be designed to support time delay sensitive query, and have good scalability, and the original calculation speed can be maintained along with the increase of data size and query total amount. Currently relevant technologies such as RDBMS views and HBase storage structures.
The RDBMS view refers to a view of a relational database, and currently, Mysql is mainly used by a business layer. When a query results very frequently and is used as a subquery for another query, the view can be used as a temporary virtual table to store the data. The data model of the view is consistent with the database used by the view, and the same storage structure is still available.
The limitations imposed by the database still exist, as the data structure used by the view is still consistent with the database. When the result set is large and requires computation, lookup, or sorting of the result set, it is still inefficient if there is no index. The creation of the index is again very scene dependent and very inflexible.
HBase is a distributed column storage database, reliable storage and high-performance calculation are provided, and timeliness of mass data operation is superior to that of other RDBMS databases. The storage model of HBase is based on a column storage model, each row of data has a Rowkey, the data is stored in order according to the Rowkey, and the data structure enables data to be quickly located during query. Meanwhile, the storage model of HBase supports dynamic column increase, and the storage space cannot be increased when the storage value of the column is empty, so that the use is more flexible, and the problem that the RDBMS needs to modify the storage structure is solved. Meanwhile, HBase has obvious disadvantages on the scene of a report system, HBase is low in condition query performance, and the report system can use a large amount of condition queries.
At present, a plurality of systems query reports by directly querying a database by using a single type of database statement, such as an SQL statement, to obtain report data, and the database is MySQL, so that the corresponding technical scheme mainly optimizes the query speed in ways of optimizing the SQL statement, optimizing the configuration of MySQL, increasing hardware configuration, dividing databases into tables, and the like, and the technical scheme is required to be designed according to the scene of each report.
The prior art mainly has the following defects:
1. the method completely depends on a storage structure when a single database statement is adopted for query, the query performance is low, and the response speed of millisecond level cannot be achieved when tens of millions of data are processed.
2. If the database changes, all optimization schemes need to be redesigned, a storage format needs to be designed for each scene, and the compatibility and the expansibility are poor.
3. The improvement can only be optimized based on the currently employed database itself, and if the database version is updated, the design needs to be merged again with the new version, causing unnecessary workload.
Disclosure of Invention
In view of this, embodiments of the present invention provide a method and an apparatus for processing report data, which are helpful for higher processing speed in a case of supporting multiple data query modes. Other benefits and advantages of the invention, as well as non-conventional alternative implementations, will be described in conjunction with the detailed description.
To achieve the above object, according to an aspect of the embodiments of the present invention, a method for processing report data is provided.
The method for processing report data in the embodiment of the invention comprises the following steps: respectively marking the types of a plurality of data columns selected from a data source, wherein the types of the data columns comprise a key name column and a key value column; storing a current report and a row index dictionary, wherein the current report comprises a plurality of key name columns and a plurality of key value columns, and the key names and the key values of all rows in the row index dictionary are respectively numerical values and serial numbers of the row obtained by respectively calculating the plurality of key names contained in all rows of the current report according to a preset calculation mode; and calculating a corresponding value for a plurality of key value columns of the obtained current row data in the data source by using the preset calculation mode, judging whether the corresponding value is an existing key name in a row index dictionary, and performing data merging on a row of data with a sequence number being a key value corresponding to the existing key name in the current report and the current row of data by adopting an aggregation function corresponding to each key value column in the current report.
Optionally, if it is determined that the corresponding value is not an existing key name in the row index dictionary, a new row is created in the current report, where a plurality of key name columns and a plurality of key value columns of the new row are respectively a plurality of key name columns and a plurality of key value columns in the current row of data.
Optionally, in a case that it is determined that the corresponding value is not an existing key name in the row index dictionary, the method further includes: and adding a corresponding key name and a key value in the row index dictionary, wherein the key name is the corresponding value, and the key value is the sequence number of the new row.
Optionally, the method further comprises: storing a character dictionary, wherein key names in the character dictionary are character strings, and key values are numerical values which are uniquely corresponding to the character strings and have preset lengths; and in the case that the current line data contains the character string, replacing the character string with the corresponding key value of the character string in the character dictionary.
Optionally, the preset calculation mode is hash calculation.
According to another aspect of the embodiment of the invention, a device for processing report data is provided.
The device for processing report data in the embodiment of the invention comprises the following steps: the data processing device comprises a configuration module, a data processing module and a data processing module, wherein the configuration module is used for respectively marking the types of a plurality of data columns selected from a data source, and the types of the data columns comprise a key name column and a key value column; the report storage module is used for storing a current report, and the current report comprises a plurality of key name columns and a plurality of key value columns; the row index dictionary module is used for storing a row index dictionary, and the key names and key values of all rows in the row index dictionary are respectively numerical values and serial numbers of the row obtained by respectively calculating a plurality of key names contained in all rows of the current report according to a preset calculation mode; and the table filling module is used for calculating a corresponding value for a plurality of key name columns of the obtained current row data in the data source by using the preset calculation mode, judging that if the corresponding value is an existing key name in a row index dictionary, and performing data combination on a row of data with a sequence number being a key value corresponding to the existing key name in the current report and the current row data by adopting an aggregation function corresponding to each key value column in the current report.
Optionally, the table filling module is further configured to establish a new row in the current report if the corresponding value is not an existing key name in the row index dictionary, where a plurality of key name columns and a plurality of key value columns of the new row are respectively a plurality of key name columns and a plurality of key value columns in the current row of data.
Optionally, the form filling module is further configured to, in a case that it is determined that the corresponding value is not an existing key name in the row index dictionary, further: and adding a corresponding key name and a key value in the row index dictionary, wherein the key name is the corresponding value, and the key value is the sequence number of the new row.
Optionally, the apparatus further includes a character dictionary module, configured to store a character dictionary, where a key name in the character dictionary is a character string, and a key value is a numerical value of a predetermined length uniquely corresponding to the character string; and the form filling module is also used for replacing the character string with the corresponding key value of the character string in the character dictionary under the condition that the current row data contains the character string.
According to still another aspect of an embodiment of the present invention, there is provided an electronic apparatus including: one or more processors; a storage device, configured to store one or more programs, which when executed by the one or more processors, cause the one or more processors to implement the method according to the embodiment of the present invention.
According to a further aspect of embodiments of the present invention, there is provided a computer readable medium, on which a computer program is stored, which when executed by a processor, implements a method according to embodiments of the present invention.
According to the technical scheme of the embodiment of the invention, the report table comprises a plurality of key name columns and a plurality of key value columns, is suitable for multi-index data query, has a function of automatic data aggregation, completes part of calculation work when a large amount of data needing centralized processing is written into a storage structure, greatly saves time, and has a large amount of data aggregation operation for a report query scene, so that the design can improve the query speed of report data. When the inquired data is written into the report table, the invention uses a calculation mode of dynamic index, does not need to design the index in advance, automatically creates the index according to the data column required to be inquired, and improves the inquiry efficiency. As can be seen from the structure of the report table of the present invention, it is not a typical relational data storage model, there are no mandatory constraints between data columns, and at the same time, it supports logical computation relationships of partial relational data, such as left join and full join operations of the report table. The calculation modes depend on the row design of the report table, and the key name column of each row is unique, so that the association operation between the tables can be carried out, and the report table can carry out further complicated logic operation. The technical scheme of the invention is beneficial to higher processing speed under the condition of supporting a plurality of data query modes.
Drawings
The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:
FIG. 1 is a schematic diagram of the structure of a report table according to an embodiment of the present invention;
FIG. 2 is a diagram illustrating a basic structure of a row index dictionary according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of the main steps of processing report data according to an embodiment of the present invention;
FIG. 4 is a schematic illustration of row aggregation according to an embodiment of the present invention;
FIG. 5 is a diagram illustrating a basic structure of an apparatus for processing report data according to an embodiment of the present invention;
FIG. 6 illustrates an exemplary system architecture to which a method or apparatus for processing report data according to embodiments of the present invention may be applied;
FIG. 7 is a block diagram of a computer system suitable for use with a terminal device implementing an embodiment of the invention.
Detailed Description
Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
In the embodiment of the invention, in consideration of various conditions and contents of data query, a new data table structure is provided, and data to be queried is acquired from a data source and is saved in a current report with the new structure. For convenience of description, the current report of this new structure is referred to as a report table. The report table contains a plurality of key name columns and a plurality of key value columns, i.e., data columns are divided into two categories. As shown in fig. 1, fig. 1 is a schematic diagram of a structure of a report table according to an embodiment of the present invention. In fig. 1, the report table 10 includes a plurality of key name columns and a plurality of key value columns, all of which are located on the right side of all of the key name columns. The key names of the same column and different rows may be the same or different. The number of columns of key names and key value columns depends on the actual data query needs.
The serial numbers of the rows of the report table, namely the row numbers, are compiled in sequence, namely from the 1 st row to the Nth row; and the mapping relation between the line numbers and the line indexes is maintained in the line index dictionary. Specifically, in the row index dictionary of the row 1 of the report table, in the case where one key value is 1 (i.e., "1" in the row 1), and the key name corresponding to the key value is the Hash value of each key name column in the row 1 of the report table, for example, if there are 3 key name columns, respectively, key1, key2, and key3, the Hash value Hash (key1, key2, key3) is taken as the key name corresponding to the key value 1. The structure of the row index dictionary may be as shown in fig. 2, and fig. 2 is a schematic diagram of the basic structure of the row index dictionary according to the embodiment of the present invention. In fig. 2, each row is a key name-key value pair, the left side is the key name, which is the hash value of each key name column of the report table, and the right side is the key value, which is the row number of the report table. In fig. 2, the hash value is taken as an example, and other preset calculation methods may be adopted as long as the unique value can be calculated.
In the case that some data is already queried and the report table is filled, new data to be queried may need to be newly built when the report table is filled, and may need to be merged with the original row in the report table. The following is described with reference to the flow shown in fig. 3. FIG. 3 is a diagram illustrating the main steps of processing report data according to an embodiment of the present invention.
Step S31: and configuring the data source. When querying data, there is usually a query condition or query dimension, and the query condition determines the structure of the report table. For example, if one wants to look up the sales volume, sales amount, and current inventory of each item in different categories within one year of the history of a good and cheap supermarket, then there are the following columns: time | commodity classification | commodity name | sales amount | current inventory. The first three columns are dimension, belonging to key name column in report table, the last three columns are index, belonging to key value column in report table. According to the requirement of inquiry, all columns or partial columns in the data source can be selected, and for the selected multiple data columns, the types of the data columns are respectively marked in the step, wherein the types are two types of key name columns and key value columns.
Step S32: the current report is saved and the row index dictionary is saved. In the initial state, both are empty. The first piece of data put in is the 1 st column, and the 1 st key name-key value pair is formed in the row index dictionary.
Step S33: a line of data in a data source is acquired. Since the data source has already been configured, the row of data includes a plurality of key name columns and a plurality of key value columns.
Step S34: a corresponding value is calculated by using a preset calculation method for a plurality of key name columns in one line of data acquired in step S33.
Step S35: it is determined whether the corresponding value is an existing key name in the row index dictionary. Judging according to the corresponding value calculated in the step S34, if the corresponding value exists in the row index dictionary, it indicates that the data of the same index has been acquired, and the newly acquired data should be merged with the acquired data, at this time, the step S36 is entered, otherwise, the step S37 is entered, and a row of data is newly created in the current report.
Step S36: and carrying out data merging. Each key value column of the report table in the embodiment of the invention corresponds to an aggregation function. The argument of the aggregation function is two key values to be merged, and the argument is the merged key value. FIG. 4 is a schematic illustration of row aggregation according to an embodiment of the present invention. As shown in FIG. 4, for example, row 41 is the original row in the current report, and row 42 is a row of data obtained from the data source, because all key name columns are the same, so according to the above steps, row 41 and row 42 need to be merged. As an example, the aggregation function of the key-value column 43 is the addition of two arguments, and the aggregation function of the key-value column 44 is the multiplication of two arguments. The row number of the new row 412 obtained after polymerization remains the same as the row number of row 41 (the row number is not shown in the figure).
It can be seen that the design idea of data row aggregation is to automatically aggregate according to the number of key names and columns, and the aggregation condition is as follows: and if the numerical values of the key name columns of the two rows are completely consistent, combining the two rows into one row, and performing aggregation calculation on the key name columns according to the aggregation function of the key name columns. The design reduces the storage space, does not store redundant row data, improves the calculation efficiency, and performs aggregation calculation when the data is written into the current report.
Step S37: a new row is created in the current report. The plurality of key name columns and the plurality of key value columns of the new row are the plurality of key name columns and the plurality of key value columns, respectively, in the one row of data acquired in step S33. In addition, the row index dictionary may be updated, and a key name-key value pair may be added, the key name being the corresponding value calculated in step S34, and the key value being the serial number of the new row.
The above is the basic flow for obtaining data from the data source and storing the data into the current report. In order to make the length of each line of the report table consistent, a character dictionary may be established, in which the key name is a character string and the key value is a numerical value of a predetermined length uniquely corresponding to the character string, for example, a hash value of the character string. Therefore, when the report table is written, the character strings in the data source are replaced by the key values in the character dictionary, so that the length of the computer memory occupied by each row is consistent, the memory use space is saved, and the utilization rate of memory allocation is increased.
By using the structure of the report table in the embodiment of the invention, the customized operation related to the table can be supported, for example, when data is acquired from a plurality of data sources, the connection operation among a plurality of tables can be realized. When the left connection operation is performed on the two tables, the following steps can be adopted:
step 1: for the existing table one and table two, the table one is used as a main table, and the table two disassociates the table one.
Step 2: check if all key name columns of table two are in table one.
And step 3: check if all key-value columns of table two are in table one.
And 4, step 4: and when the checking results of the steps 2 and 3 are yes, continuing to perform the following steps, otherwise, outputting prompt information.
And 5: and traversing each row of the first table to obtain a key name column which is completely consistent with the second table.
Step 6: and taking out the data of the key name column in the step 5, and splicing the data together to calculate a hash value.
And 7: and 6, indexing the hash value in the step 6, namely the row of the table two, into a key name, and acquiring a corresponding row of the table according to the key name.
And 8: and filling the corresponding key value columns into the first table according to the data rows of the second table acquired in the step 7.
Fig. 5 is a schematic diagram of the basic structure of an apparatus for processing report data according to an embodiment of the present invention. The apparatus may be provided in a computer as software. As shown in fig. 5, the apparatus 50 for processing report data mainly includes a configuration module, a report saving module, a row index dictionary module, and a table filling module.
The configuration module is used for respectively marking the types of a plurality of data columns selected from a data source, wherein the types of the data columns comprise a key name column and a key value column; the report storage module is used for storing a current report, and the current report comprises a plurality of key name columns and a plurality of key value columns; the row index dictionary module is used for storing a row index dictionary, and the key names and key values of all rows in the row index dictionary are respectively numerical values and serial numbers of the row obtained by respectively calculating a plurality of key names contained in all rows of the current report according to a preset calculation mode; the table filling module is used for calculating a corresponding value for a plurality of key name columns of the obtained current row data in the data source by using the preset calculation mode, judging whether the corresponding value is an existing key name in a row index dictionary, and performing data combination on a row of data with a sequence number being a key value corresponding to the existing key name in the current report and the current row data by adopting an aggregation function corresponding to each key value column in the current report.
The table filling module can also be used for establishing a new row in the current report under the condition that the corresponding value is judged not to be one existing key name in the row index dictionary, and a plurality of key name columns and a plurality of key value columns of the new row are respectively a plurality of key name columns and a plurality of key value columns in the current row of data.
The table filling module may be further configured to, in the case that it is determined that the corresponding value is not an existing key name in the row index dictionary, further include: and adding a corresponding key name and a key value in the row index dictionary, wherein the key name is the corresponding value, and the key value is the sequence number of the new row.
The apparatus 50 for processing report data may further include a character dictionary module for storing a character dictionary, in which key names are character strings and key values are numerical values of a predetermined length uniquely corresponding to the character strings; in this way, the form filling module may be further configured to replace the character string with a corresponding key name of the character string in the character dictionary if the character string is included in the current line data.
FIG. 6 illustrates an exemplary system architecture 600 to which the method or apparatus for processing report data according to embodiments of the present invention may be applied.
As shown in fig. 6, the system architecture 600 may include terminal devices 601, 602, 603, a network 604, and a server 605. The network 604 serves to provide a medium for communication links between the terminal devices 601, 602, 603 and the server 605. Network 604 may include various types of connections, such as wire, wireless communication links, or fiber optic cables, to name a few.
A user may use the terminal devices 601, 602, 603 to interact with the server 605 via the network 604 to receive or send messages or the like. The terminal devices 601, 602, 603 may have installed thereon various communication client applications, such as shopping applications, web browser applications, search applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only).
The terminal devices 601, 602, 603 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 605 may be a server that provides various services, such as a background management server that supports shopping websites browsed by users using the terminal devices 601, 602, and 603. The backend management server may analyze and perform other processing on the received data such as the product information query request, and feed back a processing result (for example, target push information, product information — just an example) to the terminal device.
It should be noted that the method for processing report data provided by the embodiment of the present invention may be executed by a server or a terminal device, and accordingly, the apparatus for processing report data may be disposed in the server or the terminal device.
It should be understood that the number of terminal devices, networks, and servers in fig. 6 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Referring now to FIG. 7, FIG. 7 is a block diagram of a computer system 70 suitable for use with a terminal device implementing an embodiment of the present invention. The terminal device shown in fig. 7 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.
As shown in fig. 7, the computer system 70 includes a Central Processing Unit (CPU) that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) or a program loaded from a storage section into a Random Access Memory (RAM). In the RAM, various programs and data necessary for the operation of the computer system 60 are also stored. The CPU, ROM, and RAM are connected to each other via a bus. An input/output interface (I/O interface) is also connected to the bus.
The following components are connected to the I/O interface: an input section including, for example, a keyboard, a mouse, and the like; an output section including, for example, a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), or the like; a storage section including, for example, a hard disk or the like; the communication section, for example, includes a network interface card such as a LAN card, a modem, or the like. The communication section performs communication processing via a network such as the internet. The drive is also connected to the I/O interface as needed. A removable medium such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive as necessary, so that a computer program read out therefrom is mounted into the storage section as necessary.
In particular, the processes described above may be implemented as computer software programs, according to the disclosed embodiments of the invention. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program containing program code for performing the methods illustrated by the embodiments of the present disclosure. In such an embodiment, the computer program may be downloaded and installed from a network via the communication section, and/or installed from a removable medium. The computer program performs the above-described functions defined in the system of the present invention when executed by a Central Processing Unit (CPU).
It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules described in the embodiments of the present invention may be implemented by software or hardware. The described modules may also be provided in a processor, which may be described as: a processor includes a configuration module, a report save module, a row index dictionary module, and a table fill module. The names of these modules do not in some cases form a limitation on the module itself, for example, the report saving module may also be described as a "module for saving a current report".
As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs which, when executed by a device, enable the device to perform the methods described above.
According to the technical scheme of the embodiment of the invention, the report table comprises a plurality of key name columns and a plurality of key value columns, is suitable for multi-index data query, has a function of automatic data aggregation, completes part of calculation work when a large amount of data needing centralized processing is written into a storage structure, greatly saves time, and has a large amount of data aggregation operation for a report query scene, so that the design can improve the query speed of report data. When the inquired data is written into the report table, the embodiment of the invention uses a calculation mode of dynamic index, does not need to design the index in advance, and automatically creates the index according to the data column to be inquired, thereby improving the efficiency in inquiry. As can be seen from the structure of the report table of the embodiment of the invention, the report table is not a typical relational data storage model, and the data columns have no mandatory constraint and support the logical calculation relationship of partial relational data. The calculation modes depend on the row design of the report table, and the key name column of each row is unique, so that the association operation between the tables can be carried out, and the report table can carry out further complicated logic operation.
The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (12)

1. A method for processing report data, comprising:
respectively marking the types of a plurality of data columns selected from a data source, wherein the types of the data columns comprise a key name column and a key value column;
storing a current report and a row index dictionary, wherein the current report comprises a plurality of key name columns and a plurality of key value columns, and the key names and the key values of all rows in the row index dictionary are respectively numerical values and serial numbers of the row obtained by respectively calculating the plurality of key names contained in all rows of the current report according to a preset calculation mode;
and calculating a corresponding value for a plurality of key value columns of the obtained current row data in the data source by using the preset calculation mode, judging whether the corresponding value is an existing key name in a row index dictionary, and performing data merging on a row of data with a sequence number being a key value corresponding to the existing key name in the current report and the current row of data by adopting an aggregation function corresponding to each key value column in the current report.
2. The method of claim 1, wherein a new row is created in the current report if the corresponding value is not an existing key name in the row index dictionary, and the multiple key name columns and the multiple key value columns of the new row are multiple key name columns and multiple key value columns in the current row of data, respectively.
3. The method of claim 2, wherein in case that it is determined that the corresponding value is not an existing key name in the row index dictionary, the method further comprises: and adding a corresponding key name and a key value in the row index dictionary, wherein the key name is the corresponding value, and the key value is the sequence number of the new row.
4. The method for processing report data according to claim 1, 2 or 3,
the method further comprises the following steps: storing a character dictionary, wherein key names in the character dictionary are character strings, and key values are numerical values which are uniquely corresponding to the character strings and have preset lengths;
and in the case that the current line data contains the character string, replacing the character string with the corresponding key value of the character string in the character dictionary.
5. A method for processing report data according to claim 1, 2 or 3, characterized in that said predetermined calculation manner is hash calculation.
6. An apparatus for processing report data, comprising:
the data processing device comprises a configuration module, a data processing module and a data processing module, wherein the configuration module is used for respectively marking the types of a plurality of data columns selected from a data source, and the types of the data columns comprise a key name column and a key value column;
the report storage module is used for storing a current report, and the current report comprises a plurality of key name columns and a plurality of key value columns;
the row index dictionary module is used for storing a row index dictionary, and the key names and key values of all rows in the row index dictionary are respectively numerical values and serial numbers of the row obtained by respectively calculating a plurality of key names contained in all rows of the current report according to a preset calculation mode;
and the table filling module is used for calculating a corresponding value for a plurality of key name columns of the obtained current row data in the data source by using the preset calculation mode, judging that if the corresponding value is an existing key name in a row index dictionary, and performing data combination on a row of data with a sequence number being a key value corresponding to the existing key name in the current report and the current row data by adopting an aggregation function corresponding to each key value column in the current report.
7. The apparatus for processing report data according to claim 6, wherein the table filling module is further configured to establish a new row in the current report if the corresponding value is not an existing key name in the row index dictionary, and the multiple key name columns and the multiple key value columns of the new row are the multiple key name columns and the multiple key value columns in the current row of data, respectively.
8. The apparatus for processing report data according to claim 7, wherein said table filling module further comprises, in case that it is determined that the corresponding value is not an existing key name in the row index dictionary: and adding a corresponding key name and a key value in the row index dictionary, wherein the key name is the corresponding value, and the key value is the sequence number of the new row.
9. The apparatus for processing report data according to claim 6, 7 or 8,
the device also comprises a character dictionary module used for storing a character dictionary, wherein key names in the character dictionary are character strings, and key values are numerical values with preset length which are uniquely corresponding to the character strings;
and the form filling module is also used for replacing the character string with the corresponding key value of the character string in the character dictionary under the condition that the current row data contains the character string.
10. An apparatus for processing report data according to claim 6, 7 or 8 characterized in that said predetermined calculation manner is a hash calculation.
11. An electronic device, comprising:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-5.
12. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-5.
CN201710398736.2A 2017-05-31 2017-05-31 Method and device for processing report data Active CN107229718B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710398736.2A CN107229718B (en) 2017-05-31 2017-05-31 Method and device for processing report data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710398736.2A CN107229718B (en) 2017-05-31 2017-05-31 Method and device for processing report data

Publications (2)

Publication Number Publication Date
CN107229718A CN107229718A (en) 2017-10-03
CN107229718B true CN107229718B (en) 2020-06-05

Family

ID=59934042

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710398736.2A Active CN107229718B (en) 2017-05-31 2017-05-31 Method and device for processing report data

Country Status (1)

Country Link
CN (1) CN107229718B (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109947861B (en) * 2017-11-09 2021-06-29 北京京东尚科信息技术有限公司 Method, apparatus and computer readable medium for data warehouse to generate target table
CN110019162B (en) * 2017-12-04 2021-07-06 北京京东尚科信息技术有限公司 Method and device for realizing attribute normalization
US10762294B2 (en) * 2017-12-29 2020-09-01 Dassault Systèmes Americas Corp. Universally unique resources with no dictionary management
CN110555070B (en) * 2018-06-01 2021-10-22 百度在线网络技术(北京)有限公司 Method and apparatus for outputting information
CN109359141B (en) * 2018-08-07 2022-02-22 创新先进技术有限公司 Visual report data display method and device
CN109376157B (en) * 2018-11-21 2021-03-30 北京像素软件科技股份有限公司 Data integration method and device
CN110427599A (en) * 2019-06-06 2019-11-08 北京辰森世纪科技股份有限公司 The statistical method and device of report subtotal, storage medium, electronic device
CN110428153A (en) * 2019-07-19 2019-11-08 中国建设银行股份有限公司 Message polymerization and device
CN111046074B (en) * 2019-12-13 2023-09-01 北京百度网讯科技有限公司 Streaming data processing method, device, equipment and medium
CN112015738A (en) * 2020-08-28 2020-12-01 支付宝(杭州)信息技术有限公司 Method and device for realizing linked list processing of multiple data detail lists
CN118035503B (en) * 2024-04-11 2024-06-28 福建时代星云科技有限公司 Method for storing key value pair database

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102663116A (en) * 2012-04-11 2012-09-12 中国人民大学 Multi-dimensional OLAP (On Line Analytical Processing) inquiry processing method facing column storage data warehouse
CN102955843A (en) * 2012-09-20 2013-03-06 北大方正集团有限公司 Method for realizing multi-key finding of key value database
CN106407349A (en) * 2016-09-06 2017-02-15 北京三快在线科技有限公司 Product recommendation method and device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140317093A1 (en) * 2013-04-22 2014-10-23 Salesforce.Com, Inc. Facilitating dynamic creation of multi-column index tables and management of customer queries in an on-demand services environment

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102663116A (en) * 2012-04-11 2012-09-12 中国人民大学 Multi-dimensional OLAP (On Line Analytical Processing) inquiry processing method facing column storage data warehouse
CN102955843A (en) * 2012-09-20 2013-03-06 北大方正集团有限公司 Method for realizing multi-key finding of key value database
CN106407349A (en) * 2016-09-06 2017-02-15 北京三快在线科技有限公司 Product recommendation method and device

Also Published As

Publication number Publication date
CN107229718A (en) 2017-10-03

Similar Documents

Publication Publication Date Title
CN107229718B (en) Method and device for processing report data
US20230334030A1 (en) System and method for slowly changing dimension and metadata versioning in a multidimensional database environment
CN109614402B (en) Multidimensional data query method and device
US9405794B2 (en) Information retrieval system
US9418101B2 (en) Query optimization
US11366805B2 (en) Integrated entity view across distributed systems
US10915532B2 (en) Supporting a join operation against multiple NoSQL databases
CN107704202B (en) Method and device for quickly reading and writing data
EP3627426A1 (en) Integrated entity view across distributed systems
US10824620B2 (en) Compiling a relational datastore query from a user input
US11423063B2 (en) Flattening hierarchical database records using inverted indexing
CN111061680A (en) Data retrieval method and device
US20190324965A1 (en) Minimizing processing using an index when non-leading columns match an aggregation key
US20200110838A1 (en) Intelligent visualization of unstructed data in column-oriented data tables
CN112783887A (en) Data processing method and device based on data warehouse
US8539006B2 (en) Logical chart of accounts with hashing
US11609924B2 (en) Database query execution on multiple databases
US20150046881A1 (en) Archiving business objects
CN112783914B (en) Method and device for optimizing sentences
CN113760966A (en) Data processing method and device based on heterogeneous database system
CN112988857A (en) Service data processing method and device
CN112711572A (en) Online capacity expansion method and device suitable for sub-warehouse and sub-meter
CN111753019B (en) Data partitioning method and device applied to data warehouse
CN113515504B (en) Data management method, device, electronic equipment and storage medium
US11586604B2 (en) In-memory data structure for data access

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant