CN106844541B - Online analysis processing method and device - Google Patents

Online analysis processing method and device Download PDF

Info

Publication number
CN106844541B
CN106844541B CN201611259329.5A CN201611259329A CN106844541B CN 106844541 B CN106844541 B CN 106844541B CN 201611259329 A CN201611259329 A CN 201611259329A CN 106844541 B CN106844541 B CN 106844541B
Authority
CN
China
Prior art keywords
condition
queried
sub
data
library
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201611259329.5A
Other languages
Chinese (zh)
Other versions
CN106844541A (en
Inventor
汤奇峰
罗青山
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zamplus Advertising Shanghai Co ltd
Original Assignee
Zamplus Advertising Shanghai Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zamplus Advertising Shanghai Co ltd filed Critical Zamplus Advertising Shanghai Co ltd
Priority to CN201611259329.5A priority Critical patent/CN106844541B/en
Publication of CN106844541A publication Critical patent/CN106844541A/en
Application granted granted Critical
Publication of CN106844541B publication Critical patent/CN106844541B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24553Query execution of query operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

An on-line analysis processing method and device, the method includes the following steps: receiving a query instruction, wherein the query instruction comprises a condition to be queried; determining a column storage sub-library associated with the condition to be queried in a preset column storage library based on the condition to be queried, wherein the preset column storage library comprises at least one column storage sub-library, at least one record is stored in the column storage sub-library in a column manner, each record comprises an identifier and data corresponding to the identifier, and the data in different column storage sub-libraries have different attributes; and searching the column storage sub-library associated with the condition to be queried to obtain the record meeting the condition to be queried. The technical scheme provided by the invention can better solve the problem of disk IO bottleneck caused by large data volume, effectively improve the processing speed and processing efficiency of online analysis processing, and is beneficial to large-scale popularization and application of an online analysis processing mode.

Description

Online analysis processing method and device
Technical Field
The invention relates to the field of big data application, in particular to an online analysis processing method and device.
Background
At present, in the field of data warehouse systems, On-Line analytical processing (OLAP for short) is mainly relied On to perform complex analysis operations On mass data, so as to provide data support for decisions of decision-making personnel and high-level management personnel. When the computer is used for on-line analysis processing, complex query processing with large data volume can be rapidly and flexibly carried out according to the requirements of analysts, and query results are presented to decision-making personnel in an intuitive and understandable mode, so that users can conveniently and accurately master the operation condition of enterprises, know the requirements of objects and further formulate correct schemes.
Professional analysts and administrative decision makers in an enterprise typically need to examine business metrics from different perspectives when analyzing business-operated data. For example, a user may integrate time periods, product categories, distribution channels, geographic distribution, customer clusters, and other factors into a comprehensive consideration when analyzing sales data. Although the analysis angles can be reflected by the report forms, each analysis angle can generate one report form, and different combinations of the analysis angles can generate different report forms, which undoubtedly increases the workload of report form makers and is often difficult to keep pace with the thinking of decision makers.
In order to cope with diversified demands of users, a scheme of processing mass data based on-line analysis processing has come to be developed. When the online analysis processing is carried out, the computer can directly imitate the multi-angle thinking mode of the user and construct a multi-dimensional data model for the user in advance, wherein the dimension refers to the analysis angle configured by the user. Still taking the analysis of sales data as an example, time periods, product categories, distribution channels, geographic distributions, customer clusters, respectively, may be taken as one dimension. After the multi-dimensional data model is built, a user can quickly acquire data from each analysis angle and can dynamically and flexibly switch among the angles or perform multi-angle comprehensive analysis. In general, the online analysis process is fundamentally different from the old management information system in design concept and real implementation.
However, in practical applications of the existing online analysis processing scheme, if the amount of stored data reaches a certain level (for example, TB level), a bottleneck problem of a disk Input and Output (IO) may occur. One existing solution is to increase the number of disks, for example, by arranging 12, 24, or even more disks to share IO; another solution is to enlarge the memory so that as much data as possible can be stored or cached in the memory. However, both of these solutions lead to a drastic increase in cost, and a higher failure rate may be caused for the first solution, which is not favorable for large-scale popularization and application of the online analysis processing mode.
Disclosure of Invention
The invention solves the technical problems that the existing online analysis processing scheme is easy to generate disk IO bottleneck and high failure rate when processing mass data.
In order to solve the above technical problem, an embodiment of the present invention provides an online analysis processing method, including the following steps: receiving a query instruction, wherein the query instruction comprises a condition to be queried; determining a column storage sub-library associated with the condition to be queried in a preset column storage library based on the condition to be queried, wherein the preset column storage library comprises at least one column storage sub-library, at least one record is stored in the column storage sub-library in a column manner, each record comprises an identifier and data corresponding to the identifier, and the data in different column storage sub-libraries have different attributes; and searching the column storage sub-library associated with the condition to be queried to obtain the record meeting the condition to be queried.
Optionally, the query instruction includes a plurality of conditions to be queried and an algebraic calculation formula based on the plurality of conditions to be queried.
Optionally, searching and obtaining the record meeting the condition to be queried from the column storage sub-library associated with the condition to be queried includes: for each condition to be queried, searching and obtaining records meeting the condition to be queried from the row storage sub-library corresponding to the condition to be queried; and according to the algebraic calculation formula, carrying out algebraic calculation on the records obtained by respective inquiry of the conditions to be inquired so as to obtain an algebraic calculation result.
Optionally, the data in the column storage sub-library is converted from the original data based on the index dictionary, so that the data has a preset length.
Optionally, the column memory sub-library includes a plurality of sub-regions determined by dividing the data range of the data.
Optionally, searching and obtaining the record meeting the condition to be queried from the column storage sub-library associated with the condition to be queried includes: comparing the relation between the condition to be queried and the data range to determine a sub-region which accords with the condition to be queried; and searching and obtaining the data from the sub-area which meets the condition to be inquired.
Optionally, the column store is stored on a flash memory card.
An embodiment of the present invention further provides an online analysis processing apparatus, including: the receiving module is used for receiving a query instruction, and the query instruction comprises a condition to be queried; the determining module is used for determining a column storage sub-library associated with the condition to be queried in a preset column storage library based on the condition to be queried, wherein the preset column storage library comprises at least one column storage sub-library, at least one record is stored in the column storage sub-library in a row manner, each record comprises an identifier and data corresponding to the identifier, and the data in different column storage sub-libraries have different attributes; and the searching module is used for searching the row storage sub-library associated with the condition to be queried to obtain the record meeting the condition to be queried.
Optionally, the query instruction includes a plurality of conditions to be queried and an algebraic calculation formula based on the plurality of conditions to be queried.
Optionally, the searching module includes: the first searching sub-module is used for searching and obtaining records meeting the conditions to be inquired from the row storage sub-library corresponding to the conditions to be inquired for each condition to be inquired; and the computation submodule is used for carrying out algebraic computation on the records obtained by respective query of the conditions to be queried according to the algebraic computation formula so as to obtain an algebraic computation result.
Optionally, the online analysis processing apparatus further includes a conversion module, where the data in the column storage sub-library is converted from the original data by the conversion module based on the index dictionary, so that the data has a preset length.
Optionally, the column memory sub-library includes a plurality of sub-regions determined by dividing the data range of the data.
Optionally, the searching module includes: the comparison sub-module is used for comparing the relation between the condition to be queried and the data range to determine a sub-region which accords with the condition to be queried; and the second searching submodule is used for searching and obtaining the data from the sub-area which accords with the condition to be inquired.
Optionally, the column store is stored on a flash memory card.
Compared with the prior art, the technical scheme of the embodiment of the invention has the following beneficial effects:
after receiving a query instruction, determining a column storage sub-library associated with the condition to be queried in a preset column storage library based on the condition to be queried included in the query instruction, so as to search and obtain records meeting the condition to be queried in the column storage sub-library, wherein the records included in the column storage sub-library are stored in columns. Compared with the existing online analysis processing scheme, the technical scheme of the embodiment of the invention stores the data with the same attribute in the same column storage sub-library, and because the preset column storage library comprises at least one column storage sub-library, when the data with two or more different attributes exist, the data with different attributes can be respectively stored on the basis of different column storage sub-libraries, and when the online analysis processing is carried out on the mass data, one column storage sub-library in the stored mass data can be read in a targeted manner according to the attribute of the condition to be inquired, so that the disk IO pressure during the online analysis processing is greatly relieved, the occurrence of the disk IO bottleneck is favorably avoided, and the failure rate is reduced. Further, each record includes an identifier and data corresponding to the identifier, so that a user can associate records in different column storage sub-libraries based on the identifier. For example, the same identifier may correspond to a plurality of data in a plurality of column store sub-libraries, respectively, which may describe the same thing from different dimensions (i.e., attributes).
Further, the query instruction may include a plurality of conditions to be queried and an algebraic calculation formula based on the plurality of conditions to be queried, and for each condition to be queried, a record that is in accordance with the query instruction can be searched and obtained from a column storage sub-library corresponding to the condition to be queried based on the technical solution of the embodiment of the present invention, and algebraic calculation is performed on records obtained by respective queries of the plurality of conditions to be queried according to the algebraic calculation formula, so as to obtain an accurate algebraic calculation result, and a finally obtained query result is in accordance with a user expectation.
Drawings
FIG. 1 is a flow chart of an online analytical processing method according to a first embodiment of the present invention;
FIG. 2 is a flow chart of an online analytical processing method according to a second embodiment of the present invention;
FIG. 3 is a diagram illustrating an application scenario of a column store sub-library in a second embodiment of the present invention;
fig. 4 is a schematic structural diagram of an online analysis processing apparatus according to a third embodiment of the present invention.
Detailed Description
As will be understood by those skilled in the art, as the background art, the existing online analysis processing scheme cannot effectively solve the problem of disk IO bottleneck at a low cost and a low failure rate when performing complex analysis operations on mass data.
The inventor has found that the above problems are caused by the fact that the existing online analysis processing scheme stores the data to be analyzed according to rows.
In order to solve the technical problem, according to the technical scheme of the embodiment of the present invention, after a query instruction is received, a column storage sub-library associated with a condition to be queried is determined in a preset column storage library based on the condition to be queried included in the query instruction, so as to search for and obtain records meeting the condition to be queried in the column storage sub-library, wherein the records included in the column storage sub-library are stored in a column. Compared with the existing online analysis processing scheme, the technical scheme of the embodiment of the invention stores the data with the same attribute in the same column storage sub-library, and because the preset column storage library comprises at least one column storage sub-library, when the data with two or more different attributes exist, the data with different attributes can be respectively stored on the basis of different column storage sub-libraries, and when the online analysis processing is carried out on the mass data, one column storage sub-library in the stored mass data can be read in a targeted manner according to the attribute of the condition to be inquired, so that the disk IO pressure during the online analysis processing is greatly relieved, the occurrence of the disk IO bottleneck is favorably avoided, and the failure rate is reduced. Further, each record includes an identifier and data corresponding to the identifier, so that a user can associate records in different column storage sub-libraries based on the identifier. For example, the same identifier may correspond to a plurality of data in a plurality of column store sub-libraries, respectively, which may describe the same thing from different dimensions (i.e., attributes).
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below.
Fig. 1 is a flowchart of an online analysis processing method according to a first embodiment of the present invention. Specifically, in this embodiment, step S101 is executed first, and an inquiry instruction is received, where the inquiry instruction includes a condition to be inquired. More specifically, the condition to be queried is used to indicate an attribute of data that a user wishes to process based on online analysis. For example, if the user needs to perform online analysis processing on the geographic location distribution and the business operation of ten thousand merchants based on the technical solution of the embodiment of the present invention, the condition to be queried may include a geographic range, so that the merchant distribution within the geographic range of ten thousand merchants can be obtained through analysis based on the technical solution of the embodiment of the present invention.
Then, step S102 is executed, a column storage sub-library associated with the condition to be queried is determined in a preset column storage library based on the condition to be queried, the preset column storage library includes at least one column storage sub-library, at least one record is stored in the column storage sub-library in a column, each record includes an identifier and data corresponding to the identifier, and data in different column storage sub-libraries have different attributes. In particular, the attributes may be used to describe the type of the data. For example, name, age, gender, merchant address, etc. may all be stored as attributes of the column sub-repository. Wherein, a plurality of records in the same column memory sub-library are stored along the column direction (i.e. vertical direction).
In a preferred example, the condition to be queried is associated with the column storage sub-library, that is, the condition to be queried matches with an attribute of data stored in the column storage sub-library. For example, when the condition to be queried is the geographic range, it may be determined that a column storage sub-library having a geographic location (e.g., an address) as an attribute is associated with the condition to be queried.
And finally, executing a step S103, searching and obtaining records meeting the conditions to be queried from the row storage sub-library associated with the conditions to be queried. Specifically, the condition to be queried may be an accurate value or a range value. For example, the condition to be queried may be that if a record with age of 30 years is searched in the preset row repository, the step searches a record with data of 30 in the row repository with the attribute of age as the query result of the step; for another example, the condition to be queried may be to search for records in the preset column repository with the age of 20 to 40, and this step may search for records in the column repository with the attribute of age, the records having the data falling between 20 and 40, as the query result of this step.
Further, the preset column repository may be stored in a processing terminal that executes the technical solution of the embodiment of the present invention, may also be stored in an external storage device coupled to the processing terminal, and may also be stored in a cloud. Preferably, the column memory bank can be stored on a Flash memory card (Flash card) and also on a Flash memory, which has the advantage that the Flash memory card uses a PCIe socket, has a wider bandwidth, and can better improve the read-write speed of the disk compared to the existing conventional storage media. Those skilled in the art understand that the technical solution of the embodiment of the present invention implements an external interface by using a protocol supporting the SQL-2003 standard, which facilitates interactive query and reduces the usage threshold of the user.
Further, the identifier has uniqueness, records with the same identifier in different column storage sub-libraries all belong to the object to be analyzed pointed by the identifier, but the data attributes stored in different column storage sub-libraries are different. For example, in the conventional online analysis processing scheme, all records are stored in rows, one row of records corresponds to one object to be analyzed, and for each row of records, the records include data of the corresponding object to be analyzed in all dimensions. In the technical solution of the embodiment of the present invention, since the data of the object to be analyzed in different dimensions are respectively stored in different column storage sub-libraries, the identifier may correspond to the object to be analyzed to represent the relevance of the data in the different column storage sub-libraries.
Further, the data in the column storage sublibrary is converted from the original data based on the index dictionary so that the data has a preset length. Those skilled in the art understand that for original data with excessively long character strings, encoding conversion can be performed in advance based on the index dictionary, so that data stored in the column storage sub-library all have the same or similar length, thereby facilitating management of the column storage sub-library and avoiding that the online analysis processing speed of the embodiment of the invention is affected by occupation of excessive storage space of the column storage sub-library. Preferably, the index dictionary may be stored in the same location as the preset column repository, or may be stored in a different location from the preset column repository, and those skilled in the art may change more embodiments according to actual needs, which does not affect the technical content of the present invention.
For example, in constructing or updating the column storage sub-library, if the length of a character string of original data to be added to the column storage sub-library exceeds an approved length, the character string of the original data may be encoded to assign an index identifier having the same uniqueness as the original data, and the index identifier may be stored as data in the column storage sub-library, and at the same time, the corresponding relationship between the index identifier and the original data is recorded in the index dictionary for subsequent lookup.
Those skilled in the art understand that, in the technical solution of the embodiment of the present invention, the identifier and the index identifier have different meanings, where the identifier is used to refer to the object to be analyzed, and the index identifier is used to refer to the original data. In a typical application scenario, one object to be analyzed preferably corresponds to one identifier, and one identifier may respectively correspond to data in the plurality of column storage sub-libraries, and if two or more data in the plurality of column storage sub-libraries are obtained in advance based on the conversion of the index dictionary, the one identifier may also correspond to a plurality of index identifiers.
In a variation of this embodiment, the query instruction includes a plurality of conditions to be queried and an algebraic calculation formula based on the plurality of conditions to be queried. Further, the step S103 may be replaced by "for each condition to be queried, finding and obtaining records meeting the condition to be queried from the column storage sub-library corresponding to the condition to be queried; and according to the algebraic calculation formula, carrying out algebraic calculation on the records obtained by respective inquiry of the conditions to be inquired so as to obtain an algebraic calculation result.
Further, the multiple conditions to be queried may have the same attribute, or may have different attributes; or, some of the multiple conditions to be queried have the same attribute, and the rest have different attributes.
For example, if a user wants to perform online analysis processing on population data of age between 30 and 40 years, gender and surname king, the algebraic calculation formula can be written as ("age: 30-40") and ("gender: woman") and (("name: king") or ("name: king")), i.e., the plurality of conditions to be queried are processed through and or not equal logical operators to obtain the algebraic calculation formula.
For another example, when performing online analysis processing based on the algebraic computing formula, the variation preferably extracts the plurality of conditions to be queried in the algebraic computing formula, and determines the column storage sub-library associated with each condition to be queried, so as to search for a record meeting the condition to be queried. For example, based on the condition "age: 30-40 "can determine to obtain a column store sub-library with the attribute of age and look up records from it with the obtained data falling between 30 and 40, based on the condition" gender: female can determine to obtain a column storage sub-library with the attribute of gender and obtain data from the column storage sub-library as female records, and can determine to obtain a column storage sub-library with the attribute of name and obtain data from the column storage sub-library beginning with king on the basis of the condition to be queried, namely 'name'.
For another example, for the records obtained from each row of the storage sub-libraries, the present variation preferably performs algebraic calculation on the data in the records having the same identifier based on the algebraic calculation formula to obtain the algebraic calculation result. For example, a record "age: age 34 ", looking up from the column store sub-library with attribute gender, a record is obtained as" gender: woman ", looking up a record" name: wanyi "and the three records have the same identifier (for example, all 1), the three records can be determined to describe the same object to be analyzed, and based on the algebraic calculation formula, algebraic calculation can be performed on the data" 34 years old "," women "and" Wanyi "to obtain algebraic calculation results, so as to indicate that the present variation finds out a query result of name Wanyi and gender women and age 34 based on the query instruction.
In this way, with the scheme of the first embodiment, the data with the same attribute is stored in the same column storage sub-library, and since the preset column storage library includes at least one column storage sub-library, when there is data with two or more different attributes, the data with different attributes may be stored based on different column storage sub-libraries. Further, each record includes an identifier and data corresponding to the identifier, so that a user can associate records in different column storage sub-libraries based on the identifier. For example, the same identifier may correspond to a plurality of data in a plurality of column store sub-libraries, respectively, which may describe the same thing from different dimensions (i.e., attributes).
It is understood by those skilled in the art that in the existing online analysis processing scheme, the records to be processed are still stored in rows, for example, one record may include data of one object to be analyzed in all dimensions (i.e., attributes), and when a user wishes to perform online analysis processing on data satisfying one attribute in all the stored records, all the stored records need to be traversed to obtain data of all the objects to be analyzed in the attribute.
In the technical solution of the embodiment of the present invention, preferably, all records are respectively stored into different column storage sub-libraries according to different attributes, and when a user wishes to perform online analysis processing on data of a certain attribute, only the column storage sub-library corresponding to the attribute needs to be traversed, so that the workload is greatly reduced, and the response speed to the user requirements is accelerated; on the other hand, if the user needs to perform online analysis processing on data of multiple attributes of the object to be analyzed at the same time and needs to transversely know the data of the same object to be analyzed on the multiple attributes, based on the technical scheme of the embodiment of the present invention, because the identifier of the same object to be analyzed has uniqueness, the data of the object to be analyzed on different attributes can be associated based on the identifier, so that the description of the same object to be analyzed from different dimensions by using multiple data is realized, and the diversified requirements of the user are met.
Fig. 2 is a flowchart of an online analysis processing method according to a second embodiment of the present invention. Specifically, in this embodiment, step S201 is executed first, and an inquiry instruction is received, where the inquiry instruction includes a condition to be inquired. More specifically, a person skilled in the art may refer to step S101 in the embodiment shown in fig. 1, which is not described herein again.
Then, step S202 is executed, a column storage sub-library associated with the condition to be queried is determined in a preset column storage library based on the condition to be queried, where the preset column storage library includes at least one column storage sub-library, at least one record is stored in the column storage sub-library in a column, each record includes an identifier and data corresponding to the identifier, and data in different column storage sub-libraries have different attributes. Specifically, a person skilled in the art may refer to step S102 in the embodiment shown in fig. 1, which is not described herein again. Preferably, the column memory sub-bank includes a plurality of sub-regions determined by dividing the data range of the data. In a preferred example combined with fig. 3, the preset column memory bank includes a column memory sub-bank 1 and a column memory sub-bank 2, where the column memory sub-bank 1 includes sub-regions 11 to 14, and the column memory sub-bank 2 also includes sub-regions 21 to 24, and a data range of data stored in each sub-region is as shown in fig. 3.
Step S203 is executed next, and the relationship between the condition to be queried and the data range is compared to determine a sub-region meeting the condition to be queried. In a preferred example in conjunction with fig. 3, the condition to be queried matches with the attribute of the data stored in the column store sub-bank 1, then this step preferably compares in which sub-region of the column store sub-bank 1 the condition to be queried falls within. For example, if the condition to be queried is data with a lookup value of 105, it may be determined that the sub-region 2 in the column memory sub-library 1 is a sub-region that meets the condition to be queried.
And finally, executing a step S204, and searching and obtaining the data from the sub-area which meets the condition to be inquired. In a preferred embodiment with reference to fig. 3, if the condition to be queried is still data with a lookup value of 105, after determining that the sub-region 2 in the column memory sub-library 1 is a sub-region that meets the condition to be queried based on the step S203, this step may preferably search for a record with data of 105 in the sub-region 2, so as to obtain the query result of this embodiment.
Furthermore, the technical scheme based on the embodiment of the invention can realize online analysis processing in a proxy (proxy) mode, namely, the capacity of a single machine node is parallelly expanded by adopting a distributed storage and distributed computation mode, and management and online analysis processing of TB-level data are realized.
Further, compared with the existing Batch (BAT) operation mode, the technical scheme of the embodiment of the invention can realize online analysis processing through simpler array operation and realize effective use of the multi-core CPU by adopting a parallel processing engine.
Further, a byte code (bytecodes) engine is built In the processing terminal executing the embodiment of the present invention to support script implementation and Just-In-Time (JIT) compilation of the logic In the embodiment of the present invention, so as to sufficiently ensure system extensibility and high performance of the processing terminal executing the embodiment of the present invention.
By adopting the scheme of the second embodiment, the data structure is optimized better by further refining the column storage sub-library, so that when the embodiment is executed, all data stored in the column storage sub-library does not need to be traversed, thereby reducing the overhead of disk IO while consuming the performance of the CPU as little as possible, better relieving the bottleneck of disk IO when online analysis processing is executed, and improving the online analysis processing speed.
Those skilled in the art understand that, compared with the above embodiment shown in fig. 1, step S203 and step S204 in this embodiment may be a specific implementation manner of step S103 in the above embodiment shown in fig. 1, on the basis of determining the column storage sub-library associated with the condition to be queried, determining a sub-region in the column storage sub-library closest to the condition to be queried, and finally searching for the record meeting the condition to be queried by searching for the sub-region.
Fig. 3 shows a typical application scenario, and in conjunction with fig. 2 and fig. 3, the attribute of the column memory sub-library 2 is the birth date, wherein the minimum and maximum values of the sub-region 21, the minimum and maximum values of the sub-region 22, the minimum and maximum values of the sub-region 23, and the minimum and maximum values of the sub-region 24 are shown in fig. 3. In the present application scenario, the records stored in the row store sub-bank 2 are first compared with the data range of each sub-region to determine which sub-region the record to be stored should be stored in, for example, the record to be stored is 0.0.1:1993-05-06 (i.e. labeled 0.0.1, birth date 1993-05-06), and the comparison can determine that the record to be stored should be stored in the sub-region 21 of the row store sub-bank 2.
Further, in this application scenario, when the query instruction is received, if the condition to be queried is to search for data born in 9 months 2010, it may be determined, with reference to the technical solution in the embodiment shown in fig. 2, that data meeting the condition to be queried needs to be searched in the sub-area 24 of the column storage sub-library 2.
Further, when the query instruction includes a plurality of conditions to be queried, for each condition to be queried, the step S203 and the step S204 may be executed in parallel to obtain data meeting the plurality of conditions to be queried, respectively.
Further, if the column memory sub-library needs to be expanded during the execution of the embodiment of the present invention, and the data of the newly added record exceeds the existing data range of the column memory sub-library, the data range of the existing sub-area in the column memory sub-library may be expanded upwards or downwards to store the newly added record. As a variation, the newly added record may also be stored by adding a sub-region to the column store sub-bank.
It is understood that the specific embodiments of the sub-regions 11, 12, 13, 14 and 22 can refer to the above description, and are not repeated herein.
Fig. 4 is a schematic structural diagram of an online analysis processing apparatus according to a third embodiment of the present invention. Those skilled in the art will understand that the online analysis processing device 4 of the present embodiment is used for implementing the method solutions in the embodiments shown in fig. 1 to fig. 3. Specifically, in this embodiment, the online analysis processing apparatus 4 includes a receiving module 42, configured to receive an inquiry command, where the inquiry command includes a condition to be inquired; a determining module 43, configured to determine, based on the condition to be queried, a column storage sub-library associated with the condition to be queried in a preset column storage library, where the preset column storage library includes at least one column storage sub-library, where at least one record is stored in the column storage sub-library in a row, each record includes an identifier and data corresponding to the identifier, and data in different column storage sub-libraries have different attributes; and a searching module 44, configured to search the column storage sub-library associated with the condition to be queried to obtain a record meeting the condition to be queried.
In a preferred application scenario, the query instruction includes a plurality of conditions to be queried and an algebraic calculation formula based on the plurality of conditions to be queried. Preferably, the search module 44 includes a first search submodule 441, which, for each condition to be queried, searches and obtains records meeting the condition to be queried from the column storage sub-library corresponding to the condition to be queried; the calculating sub-module 442 is configured to perform algebraic calculation on the records obtained by respective query of the multiple conditions to be queried according to the algebraic calculation formula, so as to obtain an algebraic calculation result.
Further, the online analysis processing device 4 further includes a conversion module 41, where the data in the column storage sub-library is converted from the original data by the conversion module based on the index dictionary, so that the data has a preset length.
In another preferred application scenario, the column memory sub-library comprises a plurality of sub-regions determined by dividing the data range of the data. Preferably, the search module 44 includes a comparison sub-module 443, configured to compare the relation between the condition to be queried and the data range, so as to determine a sub-region meeting the condition to be queried; and the second searching submodule 444 is used for searching and obtaining the data from the sub-area which meets the condition to be queried.
Preferably, the column store is stored on a flash memory card.
In a variation of this embodiment, when the query instruction includes a plurality of conditions to be queried and an algebraic calculation formula based on the plurality of conditions to be queried, for each condition to be queried, the comparing sub-module 443 may be invoked to determine a sub-region that meets the condition to be queried, then the second searching sub-module 444 is invoked to search for the data from the sub-region that meets the condition to be queried, and finally, according to the algebraic calculation formula, the records obtained by querying each of the plurality of conditions to be queried are algebraically calculated to obtain an algebraic calculation result.
For more details of the operation principle and the operation mode of the online analysis processing apparatus 4, reference may be made to the description in fig. 1 to 3, which is not repeated here.
Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer-readable storage medium, and the storage medium may include: ROM, RAM, magnetic or optical disks, and the like.
Although the present invention is disclosed above, the present invention is not limited thereto. Various changes and modifications may be effected therein by one skilled in the art without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (14)

1. An on-line analysis processing method, characterized by comprising the steps of:
receiving a query instruction, wherein the query instruction comprises a condition to be queried;
determining a column storage sub-library associated with the condition to be queried in a preset column storage library based on the condition to be queried, wherein the preset column storage library comprises at least one column storage sub-library, at least one record is stored in the column storage sub-library in a column manner, each record comprises an identifier and data corresponding to the identifier, the data in different column storage sub-libraries have different attributes, the identifier corresponds to the object to be analyzed and has uniqueness, and the records with the same identifier in different column storage sub-libraries all belong to the object to be analyzed pointed by the identifier;
searching and obtaining records meeting the condition to be inquired from the column storage sub-library associated with the condition to be inquired;
and for the plurality of records obtained by searching, carrying out algebraic calculation on the data in the records with the same identification based on an algebraic calculation formula to obtain an algebraic calculation result.
2. The on-line analysis processing method according to claim 1, wherein the query instruction comprises a plurality of conditions to be queried and an algebraic calculation based on the plurality of conditions to be queried.
3. The method as claimed in claim 2, wherein searching the list of the sub-repository associated with the query condition to obtain the record meeting the query condition comprises: for each condition to be queried, searching and obtaining records meeting the condition to be queried from the row storage sub-library corresponding to the condition to be queried;
and according to the algebraic calculation formula, carrying out algebraic calculation on the records obtained by respective inquiry of the conditions to be inquired so as to obtain an algebraic calculation result.
4. The on-line analytical processing method of claim 1, wherein the data in the column repository is transformed from the original data based on an index dictionary such that the data has a predetermined length.
5. The online analytical processing method of claim 1, wherein the column store sub-library comprises a plurality of sub-regions determined by data range division of the data.
6. The method as claimed in claim 5, wherein searching the list of the sub-repository associated with the query condition to obtain the record meeting the query condition comprises: comparing the relation between the condition to be queried and the data range to determine a sub-region which accords with the condition to be queried;
and searching and obtaining the data from the sub-area which meets the condition to be inquired.
7. The on-line analysis processing method of any of claims 1 to 6, wherein the column store is stored on a flash memory card.
8. An online analytical processing device, comprising:
the receiving module is used for receiving a query instruction, and the query instruction comprises a condition to be queried;
a determining module, configured to determine, based on the condition to be queried, a column storage sub-library associated with the condition to be queried in a preset column storage library, where the preset column storage library includes at least one column storage sub-library, where at least one record is stored in the column storage sub-library in a row, each record includes an identifier and data corresponding to the identifier, and data in different column storage sub-libraries have different attributes, where the identifier corresponds to an object to be analyzed and has uniqueness, and records with the same identifier in different column storage sub-libraries all belong to the object to be analyzed to which the identifier points;
the searching module is used for searching and obtaining the records meeting the conditions to be inquired from the column storage sub-library associated with the conditions to be inquired;
and for the plurality of records obtained by searching, carrying out algebraic calculation on the data in the records with the same identification based on an algebraic calculation formula to obtain an algebraic calculation result.
9. The on-line analysis processing device of claim 8, wherein the query instruction comprises a plurality of conditions to be queried and an algebraic calculation based on the plurality of conditions to be queried.
10. The online analytical processing device of claim 9 wherein the lookup module comprises:
the first searching sub-module is used for searching and obtaining records meeting the conditions to be inquired from the row storage sub-library corresponding to the conditions to be inquired for each condition to be inquired;
and the computation submodule is used for carrying out algebraic computation on the records obtained by respective query of the conditions to be queried according to the algebraic computation formula so as to obtain an algebraic computation result.
11. The on-line analysis processing device according to claim 8, further comprising a conversion module, wherein the data in the column storage sub-library is converted from the raw data by the conversion module based on the index dictionary, so that the data has a preset length.
12. The online analytical processing device of claim 8, wherein the column store sub-library comprises a plurality of sub-regions determined by data range partitioning of the data.
13. The online analytical processing device of claim 12 wherein the lookup module comprises:
the comparison sub-module is used for comparing the relation between the condition to be queried and the data range to determine a sub-region which accords with the condition to be queried;
and the second searching submodule is used for searching and obtaining the data from the sub-area which accords with the condition to be inquired.
14. The on-line analytical processing device of any one of claims 8 to 13 wherein the column store is stored on a flash memory card.
CN201611259329.5A 2016-12-30 2016-12-30 Online analysis processing method and device Active CN106844541B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611259329.5A CN106844541B (en) 2016-12-30 2016-12-30 Online analysis processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611259329.5A CN106844541B (en) 2016-12-30 2016-12-30 Online analysis processing method and device

Publications (2)

Publication Number Publication Date
CN106844541A CN106844541A (en) 2017-06-13
CN106844541B true CN106844541B (en) 2020-05-29

Family

ID=59115008

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611259329.5A Active CN106844541B (en) 2016-12-30 2016-12-30 Online analysis processing method and device

Country Status (1)

Country Link
CN (1) CN106844541B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109241195B (en) * 2017-07-03 2022-03-18 北京国双科技有限公司 Ranking calculation method and device
CN107577436B (en) * 2017-09-18 2020-07-07 杭州时趣信息技术有限公司 Data storage method and device
CN107729500B (en) * 2017-10-20 2021-01-05 锐捷网络股份有限公司 Data processing method and device for online analysis processing and background equipment
CN109471874A (en) * 2018-10-30 2019-03-15 华为技术有限公司 Data analysis method, device and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101727465A (en) * 2008-11-03 2010-06-09 中国移动通信集团公司 Methods for establishing and inquiring index of distributed column storage database, device and system thereof
CN102663116A (en) * 2012-04-11 2012-09-12 中国人民大学 Multi-dimensional OLAP (On Line Analytical Processing) inquiry processing method facing column storage data warehouse
CN103678556A (en) * 2013-12-06 2014-03-26 华为技术有限公司 Method for processing column-oriented database and processing equipment
CN104424258A (en) * 2013-08-28 2015-03-18 腾讯科技(深圳)有限公司 Multidimensional data query method and system, query server and column storage server
CN104424287A (en) * 2013-08-30 2015-03-18 深圳市腾讯计算机系统有限公司 Query method and query device for data

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101727465A (en) * 2008-11-03 2010-06-09 中国移动通信集团公司 Methods for establishing and inquiring index of distributed column storage database, device and system thereof
CN102663116A (en) * 2012-04-11 2012-09-12 中国人民大学 Multi-dimensional OLAP (On Line Analytical Processing) inquiry processing method facing column storage data warehouse
CN104424258A (en) * 2013-08-28 2015-03-18 腾讯科技(深圳)有限公司 Multidimensional data query method and system, query server and column storage server
CN104424287A (en) * 2013-08-30 2015-03-18 深圳市腾讯计算机系统有限公司 Query method and query device for data
CN103678556A (en) * 2013-12-06 2014-03-26 华为技术有限公司 Method for processing column-oriented database and processing equipment

Also Published As

Publication number Publication date
CN106844541A (en) 2017-06-13

Similar Documents

Publication Publication Date Title
JP6744854B2 (en) Data storage method, data inquiry method, and device thereof
CN106844541B (en) Online analysis processing method and device
US20200250179A1 (en) Interactive identification of similar sql queries
WO2017019879A1 (en) Multi-query optimization
CN111258966A (en) Data deduplication method, device, equipment and storage medium
US10210280B2 (en) In-memory database search optimization using graph community structure
CN110837520A (en) Data processing method, platform and system
US11775529B2 (en) Recursive functionality in relational database systems
CN103646079A (en) Distributed index for graph database searching and parallel generation method of distributed index
US20190370599A1 (en) Bounded Error Matching for Large Scale Numeric Datasets
CN111984625B (en) Database load characteristic processing method and device, medium and electronic equipment
US20210303533A1 (en) Automated optimization for in-memory data structures of column store databases
Elmeiligy et al. An efficient parallel indexing structure for multi-dimensional big data using spark
CN111198917A (en) Data processing method, device, equipment and storage medium
CN113625967B (en) Data storage method, data query method and server
US20170031909A1 (en) Locality-sensitive hashing for algebraic expressions
WO2018136371A1 (en) Compressed encoding for bit sequence
CN114896250A (en) Key value separated key value storage engine index optimization method and device
CN114328524A (en) Document processing method, query method, system, equipment and medium
Antaris et al. In-memory stream indexing of massive and fast incoming multimedia content
US10762084B2 (en) Distribute execution of user-defined function
CN114564501A (en) Database data storage and query methods, devices, equipment and medium
CN110297842B (en) Data comparison method, device, terminal and storage medium
US11822582B2 (en) Metadata clustering
Dritsas et al. An Apache Spark Implementation for Text Document Clustering

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant