CN106844541A - A kind of on-line analytical processing method and device - Google Patents

A kind of on-line analytical processing method and device Download PDF

Info

Publication number
CN106844541A
CN106844541A CN201611259329.5A CN201611259329A CN106844541A CN 106844541 A CN106844541 A CN 106844541A CN 201611259329 A CN201611259329 A CN 201611259329A CN 106844541 A CN106844541 A CN 106844541A
Authority
CN
China
Prior art keywords
checked
condition
row
data
word bank
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201611259329.5A
Other languages
Chinese (zh)
Other versions
CN106844541B (en
Inventor
汤奇峰
罗青山
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ZAMPLUS ADVERTISING (SHANGHAI) CO Ltd
Original Assignee
ZAMPLUS ADVERTISING (SHANGHAI) CO Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ZAMPLUS ADVERTISING (SHANGHAI) CO Ltd filed Critical ZAMPLUS ADVERTISING (SHANGHAI) CO Ltd
Priority to CN201611259329.5A priority Critical patent/CN106844541B/en
Publication of CN106844541A publication Critical patent/CN106844541A/en
Application granted granted Critical
Publication of CN106844541B publication Critical patent/CN106844541B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24553Query execution of query operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A kind of on-line analytical processing method and device, methods described comprises the following steps:Query statement is received, the query statement includes condition to be checked;Determine to store word bank with the row of the conditions relevant to be checked in default row thesaurus based on the condition to be checked, the default row thesaurus includes at least one row storage word bank, it is stored with least one record by row in the row storage word bank, each record includes mark and the corresponding data of the mark, and the data in different row storage word banks have different attributes;The record for obtaining and meeting the condition to be checked is searched from the row storage word bank with the conditions relevant to be checked.The technical scheme provided by the present invention can preferably solve disk I/O bottleneck problem caused by big data quantity, effectively improve the processing speed and treatment effeciency of on-line analytical processing, be conducive to large-scale promotion and the application of on-line analytical processing pattern.

Description

A kind of on-line analytical processing method and device
Technical field
The present invention relates to big data application field, more particularly to a kind of on-line analytical processing method and device.
Background technology
At this stage, in data warehouse field, on-line analytical processing (On-Line Analytical are depended on Processing, abbreviation OLAP) come mass data is carried out complexity analysis operation so that be decision-maker and top management people The decision-making of member provides data and supports.Computer, can, spirit quick according to the requirement of analysis personnel when on-line analytical processing is carried out Carry out the complex query processing of big data quantity livingly, and with it is a kind of it is directly perceived and understandable in the form of Query Result is presented to decision-making Personnel, are easy to user accurately to grasp the management state of enterprise, understand the demand of object, and then formulate correct scheme.
For specialty analysis personnel and administrative decision personnel in enterprise, the data that they manage in analysis business When, it usually needs from different angles come the measurement index of the business of examining closely.For example, user is when sales data is analyzed, may The many factors such as generalized time cycle, product category, distribution channel, geographical distribution, client's realm carry out comprehensive consideration.These Although analytic angle can be reflected by form, the angle of each analysis can generate a form, each analysis angle The various combination of degree can generate different forms again, and this can undoubtedly increase the workload of report making personnel, and often difficult To keep up with the paces of administrative decision personnel thinking.
In order to tackle the diversified demand of user, the scheme for processing mass data based on on-line analytical processing meet the tendency of and It is raw.When on-line analytical processing is carried out, computer can directly copy the multi-angle of user to think deeply pattern, in advance for user is set up The data model of multidimensional, wherein, " dimension " refers to the analytic angle of user configuring.Still by taking the analysis to sales data as an example, can Using respectively by time cycle, product category, distribution channel, geographical distribution, client's realm as a dimension.When multidimensional data mould After the completion of type is set up, user rapidly can obtain data from each analytic angle, also dynamic, it is flexible all angles it Between switch or carry out multi-angle comprehensive analysis.Generally speaking, on-line analytical processing from design concept and it is real realize all with Old management information system is essentially different.
But, if existing on-line analytical processing scheme is in practical application, the data volume of storage reaches certain rank (for example, TB ranks), the then problem for disk input and delivery outlet (Input Output, abbreviation IO) bottleneck probably occur. Existing a solution is by the way of number of disks is increased, for example, by arranging 12 pieces, 24 blocks of even more magnetic Disk shares IO;Another solution is then to increase internal memory so that data as much as possible can be stored or cached in internal memory. But, both schemes can cause cost to sharply increase, and be also possible to trigger fault rate higher for the first scheme, unfavorable Large-scale promotion and application in on-line analytical processing pattern.
The content of the invention
Present invention solves the technical problem that being existing on-line analytical processing scheme when mass data is processed, easily occur The problem of disk I/O bottleneck and fault rate high.
In order to solve the above technical problems, the embodiment of the present invention provides a kind of on-line analytical processing method, comprise the following steps: Query statement is received, the query statement includes condition to be checked;It is true in default row thesaurus based on the condition to be checked The fixed row with the conditions relevant to be checked store word bank, and the default row thesaurus includes at least one row storage word bank, institute State and be stored with least one record by row in row storage word bank, each record includes mark and the corresponding data of the mark, Data in different row storage word banks have different attributes;Word bank is stored from the row with the conditions relevant to be checked It is middle to search the record for obtaining and meeting the condition to be checked.
Optionally, the query statement includes multiple conditions to be checked and the algebraically based on the multiple condition to be checked Calculating formula.
Optionally, search to obtain from the row storage word bank with the conditions relevant to be checked and meet described to be checked The record of condition includes:For each condition to be checked, from the row storage word bank corresponding with the condition to be checked Search the record for obtaining and meeting the condition to be checked;According to the algebraic manipulation formula, to the multiple condition to be checked each Inquiring about the record for obtaining carries out algebraic manipulation, to obtain algebraic manipulation result.
Optionally, the data in the row storage word bank are converted to based on index dictionary by initial data, so that The data have preset length.
Optionally, the row storage word bank includes dividing many sub-regions for determining by the data area of the data.
Optionally, search to obtain from the row storage word bank with the conditions relevant to be checked and meet described to be checked The record of condition includes:Compare the relation of the condition to be checked and the data area, to be determined for compliance with the bar to be checked The subregion of part;Searched from the subregion of the condition to be checked is met and obtain the data.
Optionally, the row thesaurus is stored on flash card.
The embodiment of the present invention also provides a kind of on-line analytical processing device, including:Receiver module, refers to for receiving inquiry Order, the query statement includes condition to be checked;Determining module, for based on the condition to be checked in default row thesaurus It is determined that storing word bank with the row of the conditions relevant to be checked, the default row thesaurus includes at least one row storage word bank, It is stored with least one record by row in the row storage word bank, each record includes mark and the corresponding number of the mark According to the data in different row storage word banks have different attributes;Searching modul, for from the conditions relevant to be checked Row storage word bank in search to obtain and meet the record of the condition to be checked.
Optionally, the query statement includes multiple conditions to be checked and the algebraically based on the multiple condition to be checked Calculating formula.
Optionally, the searching modul includes:First searches submodule, for each condition to be checked, from it is described The record for obtaining and meeting the condition to be checked is searched in the corresponding row storage word bank of condition to be checked;Calculating sub module, For according to the algebraic manipulation formula, algebraically meter being carried out to the record that the multiple condition to be checked each inquires about acquisition Calculate, to obtain algebraic manipulation result.
Optionally, the on-line analytical processing device also includes modular converter, the data in row storage word bank be by The modular converter is based on what index dictionary was converted to by initial data, so that the data have preset length.
Optionally, the row storage word bank includes dividing many sub-regions for determining by the data area of the data.
Optionally, the searching modul includes:Comparison sub-module, for comparing the condition to be checked with the data model The relation enclosed, to be determined for compliance with the subregion of the condition to be checked;Second searches submodule, for described to be checked from meeting Searched in the subregion of condition and obtain the data.
Optionally, the row thesaurus is stored on flash card.
Compared with prior art, the technical scheme of the embodiment of the present invention has the advantages that:
After query statement is received, based on the condition to be checked that the query statement includes, in default row thesaurus It is determined that storing word bank with the row of the conditions relevant to be checked, described treating is met so as to search acquisition in the row storage word bank The record of querying condition, wherein, the record that the row storage word bank includes is stored by row.Than existing on-line analytical processing side Case, the technical scheme of the embodiment of the present invention by the data storage with same alike result in same row storage word bank, due to described Default row thesaurus includes at least one row storage word bank, then when there are the data of two or more different attributes, can be based on Different row storage word banks store the data with different attribute respectively, and on-line analytical processing is being carried out to mass data When, some the row storage word bank in the mass data of storage can be targetedly read by the attribute of condition to be checked, from And disk I/O pressure when performing on-line analytical processing is greatly alleviated, and be conducive to avoiding the generation of disk I/O bottleneck, reduce Fault rate.Further, each record includes mark and the corresponding data of the mark, so that user can be based on the mark Different lines are stored into the record in word bank to be mapped.For example, same mark can respectively be corresponded in multiple row storage word bank Multiple data, the multiple data can describe same thing from different dimensions (i.e. attribute).
Further, the query statement can include multiple conditions to be checked and based on the multiple condition to be checked Algebraic manipulation formula, then for each condition to be checked, the technical scheme based on the embodiment of the present invention can be to be checked with described The record for obtaining and meeting is searched in the corresponding row storage word bank of inquiry condition, and according to the algebraic manipulation formula, it is to be checked to multiple The record that condition each inquires about acquisition carries out algebraic manipulation, so as to obtain accurate algebraic manipulation result so that final acquisition Query Result meets user's expection.
Brief description of the drawings
Fig. 1 is a kind of flow chart of on-line analytical processing method of the first embodiment of the present invention;
Fig. 2 is a kind of flow chart of on-line analytical processing method of the second embodiment of the present invention;
Fig. 3 is the application scenarios schematic diagram of row storage word bank in the second embodiment of the present invention;
Fig. 4 is a kind of structural representation of on-line analytical processing device of the third embodiment of the present invention.
Specific embodiment
It will be appreciated by those skilled in the art that as background technology is sayed, existing on-line analytical processing scheme is to mass data When carrying out the analysis operation of complexity, it is impossible to effectively solve the problems, such as disk I/O bottleneck with relatively low cost and fault rate.
Inventor has found that above mentioned problem is the number to needing to analyze due to existing on-line analytical processing scheme by research Caused by still being stored by row.
In order to solve the above-mentioned technical problem, the technical scheme of the embodiment of the present invention is after query statement is received, based on institute The condition to be checked that query statement includes is stated, determines to store son with the row of the conditions relevant to be checked in default row thesaurus Storehouse, the record for meeting the condition to be checked is obtained so as to be searched in the row storage word bank, wherein, the row storage word bank Including record by row store.Than existing on-line analytical processing scheme, the technical scheme of the embodiment of the present invention will be with phase With the data storage of attribute in same row storage word bank, because the default row thesaurus includes at least one row storage Storehouse, then when there are the data of two or more different attributes, can be based on different row storage word banks and store respectively described having The data of different attribute, when on-line analytical processing is carried out to mass data, can be targeted by the attribute of condition to be checked Reading storage mass data in some row storage word bank so that greatly alleviate perform on-line analytical processing when Disk I/O pressure, is conducive to avoiding the generation of disk I/O bottleneck, reduces fault rate.Further, it is each record include identify and It is described to identify corresponding data, the record in different lines storage word bank is mapped so that user can be based on the mark. For example, same mark can respectively correspond to multiple data in multiple row storage word bank, the multiple data can be from different dimensional (i.e. attribute) is spent to describe same thing.
It is understandable to enable above-mentioned purpose of the invention, feature and beneficial effect to become apparent, below in conjunction with the accompanying drawings to this The specific embodiment of invention is described in detail.
Fig. 1 is a kind of flow chart of on-line analytical processing method of the first embodiment of the present invention.Specifically, in this implementation In example, step S101 is first carried out, receives query statement, the query statement includes condition to be checked.More specifically, it is described Condition to be checked is used to represent that user is desired based on the attribute of the data of on-line analytical processing.For example, user is needed based on this hair The technical scheme of bright embodiment is carried out at on-line analysis to 10,000 the location distribution situations of trade company, business business circumstances etc. Reason, then condition to be checked described in this step can include geographic range, so that the technical scheme based on the embodiment of the present invention, can Trade company's distribution situation in analysis acquisition 10,000 trade companies in the geographic range.
Then step S102 is performed, is determined in default row thesaurus and the bar to be checked based on the condition to be checked The row storage word bank of part association, the default row thesaurus includes at least one row storage word bank, is pressed in the row storage word bank At least one record that is stored with is arranged, each record includes mark and the corresponding data of the mark, different row storage word banks In data there are different attributes.Specifically, the attribute can be used for describing the type of the data.For example, name, year Age, sex, trade company address etc. can be as the attributes of the row storage word bank.Wherein, the multiple in same row storage word bank Record is stored along column direction (namely vertical direction).
In a preference, the condition to be checked be associated and refers to row storage word bank, the condition to be checked and The attribute of the data stored in the row storage word bank matches.For example, when the condition to be checked is the geographic range, Can determine that the row storage word bank with geographical position (for example, address) as attribute is associated with the condition to be checked.
Step S103 is finally performed, is searched from the row storage word bank with the conditions relevant to be checked and is met The record of the condition to be checked.Specifically, the condition to be checked can be exact value, can also be value range.For example, institute Stating condition to be checked can be, the age is the record of 30 years old in the lookup default row thesaurus, then it is year that this step arrives attribute Age row storage word bank in searching data be 30 record as this step Query Result;Again for example, the condition to be checked It is also possible that searching record of the age between 20 years old to 40 years old in the default row thesaurus, then this step can arrive category Property in the row storage word bank at age the record that falls between 20 to 40 of searching data as this step Query Result.
Further, the default row thesaurus can be stored in the treatment end of the technical scheme for performing the embodiment of the present invention End, it is also possible to be stored in the external storage equipment coupled with the processing terminal, may be stored in high in the clouds.Preferably, it is described Row thesaurus can be stored on flash card (Flash cards), may be stored on flash memory, be the advantage is that, and existing Some conventional storage medias are compared, and flash card uses PCIe sockets, and its bandwidth is wider, can preferably improve disk read-write Speed.It will be appreciated by those skilled in the art that the technical scheme of the embodiment of the present invention is by using the agreement for supporting SQL-2003 standards To realize external interface, facilitate interactive query and reduce the use threshold of user, certainly, those skilled in the art can also root More embodiments are dissolved according to change is actually needed, be will not be described here.
Further, the mark has uniqueness, and the record for having like-identified in different lines storage word bank is belonged to The object to be analyzed that the mark is pointed to, but the data attribute stored in different lines storage word bank is different.For example, existing In machine analyzing and processing scheme, all of record is stored by row, a line record one object to be analyzed of correspondence, for each Row record, the record includes data of the corresponding object to be analyzed in all dimensions.And in the embodiment of the present invention In technical scheme, word bank is stored because the data by the object to be analyzed on different dimensions are stored to different row respectively In, then the mark can be corresponding with the object to be analyzed, to represent the pass of the data in the different lines storage word bank Connection property.
Further, the data in the row storage word bank are converted to based on index dictionary by initial data, with Make the data that there is preset length.It will be appreciated by those skilled in the art that the initial data long for character string, can be based on institute State index dictionary carries out code conversion in advance so that data of the storage in the row storage word bank all have same or like length Degree, consequently facilitating the management to the row storage word bank, it is to avoid take row storage word bank because taking excessive memory space and shadow The on-line analytical processing speed of the embodiment of the present invention is rung.Preferably, the index dictionary can be with the default row thesaurus It is stored in same position, it is also possible to be stored in diverse location with the default row thesaurus, those skilled in the art can basis It is actually needed change and dissolves more embodiments, this has no effect on technology contents of the invention.
For example, in building or updating the row storage word bank, if desired increasing in the row storage word bank The string length of initial data has exceeded appraises and decides length, then the character string of the initial data can be encoded, and thinks The initial data is distributed a same index with uniqueness and is identified, and indexes mark as data Cun Chudao institutes using described State row storage word bank in, while it is described index dictionary in record it is described index mark and the initial data corresponding relation, So as to follow-up lookup.
It will be appreciated by those skilled in the art that in the technical scheme of the embodiment of the present invention, the mark and the index mark have There are different implications, wherein, described to identify for referring to the object to be analyzed, the index mark is then used to refer to the original Beginning data.In a typical application scenarios, object to be analyzed preferably corresponds to a mark, and mark can be with Correspond to the data in multiple row storage word banks respectively, if also, have in the data in the multiple row storage word bank two or with On data be to be in advance based on the index dictionary to be converted to, then one mark can also correspond to multiple index marks Know.
In a change case of the present embodiment, the query statement includes multiple conditions to be checked and based on described many The algebraic manipulation formula of individual condition to be checked.Further, the step S103 may alternatively be " for each bar to be checked Part, searches the record for obtaining and meeting the condition to be checked from the row storage word bank corresponding with the condition to be checked; According to the algebraic manipulation formula, algebraic manipulation is carried out to the record that the multiple condition to be checked each inquires about acquisition, with Obtain algebraic manipulation result ".
Further, the multiple condition to be checked can have identical attribute, it is possible to have different attributes;Or Person, the part in the multiple condition to be checked has identical attribute, and remaining another part then has different attributes.
For example, user was wished to the age between 30 to 40 years old, the demographic data of sex female and surname Wang carries out online point Analysis is processed, then the algebraic manipulation formula can be write the (" age:30-40 ") and (" sex:Female ") and ((" name:King * ") or (" name:King * * ")), i.e., the multiple condition to be checked is processed by logical operators such as AOIs, to obtain State algebraic manipulation formula.
Again for example, when on-line analytical processing is carried out based on the algebraic manipulation formula, this change case is preferably extracted described Condition the multiple to be checked in algebraic manipulation formula, and determine the row storage word bank with each conditions relevant to be checked, so that Search the record for obtaining and meeting the condition to be checked.For example, being based on the condition " age to be checked:30-40 " can determine to obtain Attribute obtains the record that data fall between 30 to 40 for the row at age store word bank and therefrom search, based on condition to be checked " property Not:Female " can determine to obtain attribute for the row storage word bank of sex and therefrom obtain data for woman notes down, based on bar to be checked Part " name " can determine to obtain attribute for the row storage word bank of name and therefrom obtain data with the record of Wang Kaitou.
Again for example, for storing the record obtained in word bank from each row, this change case is preferably by with like-identified Data in record carry out algebraic manipulation based on the algebraic manipulation formula, to obtain the algebraic manipulation result.For example, dependence A record " age is obtained to be searched in the row storage word bank at age:34 years old ", dependence is looked into storing word bank for the row of sex It is " sex to look for one record of acquisition:Female ", dependence obtains a record " name to be searched in the row storage word bank of name:King One ", and foregoing three to note down the mark that includes identical (such as being 1), then can determine that described three are noted down for describing Same object to be analyzed, then based on the algebraic manipulation formula, can carry out algebraically to the data " 34 years old ", " female ", " king one " Calculate, obtain algebraic manipulation result, with represent this change case be based on the query statement search obtain a name king one and Sex female and the Query Result at 34 years old age.
By upper, using the scheme of first embodiment, by the data storage with same alike result in same row storage word bank, Because the default row thesaurus includes at least one row storage word bank, then when there are the data of two or more different attributes, Different row storage word banks can be based on and store the data with different attribute respectively.Further, each record includes mark Know and the corresponding data of the mark, so that user can be based on the mark by the record correspondence in different lines storage word bank Get up.For example, same mark can respectively correspond to multiple data in multiple row storage word bank, the multiple data can be never Same dimension (i.e. attribute) describes same thing.
It will be appreciated by those skilled in the art that in existing on-line analytical processing scheme, pending record is still using by row The mode of storage, for example, one record can include data of the object to be analyzed in all dimensions (i.e. attribute), then when User is wished with regard to meeting when the data of a certain attribute carry out on-line analytical processing, it is necessary to travel through storage in the History Log that stores History Log could obtain data of all objects to be analyzed on the attribute, such scheme can be produced in practical application Very big data processing amount, when the record of storage reaches TB ranks, easily triggers disk I/O bottleneck, can more have a strong impact on data Processing speed, is unfavorable for the quick response to user's request.
And in the technical scheme of the embodiment of the present invention, preferably store to not History Log respectively by different attributes In same row storage word bank, when user wishes to carry out on-line analytical processing to the data of a certain attribute, only the category need to be traveled through Property it is corresponding row storage word bank, considerably reduce workload, accelerate the reaction speed to user's request;The opposing party Face, if user is needed while carrying out on-line analytical processing to the data of many attribute of object to be analyzed, and need laterally When solving data of the same object to be analyzed in many attribute, the technical scheme based on the embodiment of the present invention, due to same The mark of object to be analyzed has uniqueness, can be based on data of the mark by the object to be analyzed on different attribute Associate, so as to realize describing same object to be analyzed from different dimensions using multiple data, meet the diversified need of user Ask.
Fig. 2 is a kind of flow chart of on-line analytical processing method of the second embodiment of the present invention.Specifically, in this implementation In example, step S201 is first carried out, receives query statement, the query statement includes condition to be checked.More specifically, ability Field technique personnel may be referred to step S101 described in above-mentioned embodiment illustrated in fig. 1, will not be described here.
Then step S202 is performed, is determined in default row thesaurus and the bar to be checked based on the condition to be checked The row storage word bank of part association, the default row thesaurus includes at least one row storage word bank, is pressed in the row storage word bank At least one record that is stored with is arranged, each record includes mark and the corresponding data of the mark, different row storage word banks In data there are different attributes.Specifically, those skilled in the art may be referred to be walked described in above-mentioned embodiment illustrated in fig. 1 Rapid S102, will not be described here.Preferably, what data area division of the row storage word bank including pressing the data determined is more Sub-regions.In a preference for combining Fig. 3, the default row thesaurus includes row storage word bank 1 and row storage word bank 2, wherein, the row storage word bank 1 includes subregion 11 to subregion 14, the row storage word bank 2 also include subregion 21 to Subregion 24, the data area of institute's data storage is as shown in Figure 3 in each subregion.
Next step S203 is performed, compares the relation of the condition to be checked and the data area, to be determined for compliance with The subregion of the condition to be checked.In a preference for combining Fig. 3, the condition to be checked and the row store word bank The attribute of the data stored in 1 matches, then preferably relatively the condition to be checked falls in the row storage word bank 1 this step Which sub-regions data area in.For example, the condition to be checked is to search the data that numerical value is 105, then can be true Subregion 2 in the fixed row storage word bank 1 is the subregion for meeting the condition to be checked.
Step S204 is finally performed, is searched from the subregion of the condition to be checked is met and is obtained the data.In knot Close in a preference of Fig. 3, the condition to be checked is still the data that lookup numerical value is 105, then based on the step S203 determines that the subregion 2 in the row storage word bank 1 is that this step can be with excellent after meeting the subregion of the condition to be checked Selection of land searches the record for obtaining that data are 105 in the subregion 2, as the Query Result of the present embodiment.
Further, the technical scheme based on the embodiment of the present invention can realize acting on behalf of the on-line analysis of (proxy) form Treatment, i.e., by the way of distributed storage and Distributed Calculation, parallel expansion single machine node capacity realizes TB rank data Management and on-line analytical processing.
Further, than existing batch processing (batch, abbreviation BAT) operator scheme, the technical side of the embodiment of the present invention Case can realize on-line analytical processing by more simple array manipulation, and realize multinuclear using parallel processing engines CPU's effectively uses.
Further, built-in bytecode (bytecodes) engine in the processing terminal for performing the embodiment of the present invention, to prop up The script for holding logic in the embodiment of the present invention is realized and (Just-In-Time, abbreviation JIT) compiling immediately, so as to fully ensure that Perform the system ductility and high-performance of the processing terminal of the embodiment of the present invention.
By upper, using the scheme of second embodiment, by the further refinement to the row storage word bank, preferably optimization Data structure so that when the present embodiment is performed, without traveling through all data stored in the row storage word bank, so as to the greatest extent Amount reduces the expense of disk I/O while consuming cpu performance less, preferably alleviate disk I/O bottle when performing on-line analytical processing Neck, and improve on-line analytical processing speed.
It will be appreciated by those skilled in the art that compared with above-mentioned embodiment illustrated in fig. 1, step S203 described in the present embodiment and institute The specific embodiment that step S204 can be step S103 described in above-mentioned embodiment illustrated in fig. 1 is stated, based on described Condition to be checked determined on the basis of the row storage word bank being associated, and is determined in the row storage word bank closest to described to be checked The subregion of inquiry condition, searches eventually through the subregion is searched and obtains the record for meeting the condition to be checked.
Fig. 3 shows a typical application scenarios, and with reference to Fig. 2 and Fig. 3, the attribute of the row storage word bank 2 is birth Date, wherein, the minimum value and maximum of the subregion 21, the minimum value of the subregion 22 and maximum, the sub-district The minimum value and maximum in domain 23, the minimum value of the subregion 24 and maximum are as shown in Figure 3.In this application scene, deposit Being stored in the record in the row storage word bank 2 can first compared with the data area of all subregion, to determine to wait to deposit in storage Which subregion storage record should be stored in, for example, the record to be stored is 0.0.1:1993-05-06 (is designated 0.0.1, date of birth 1993-05-06), can determine that the record to be stored should be stored in row storage by comparing In the subregion 21 in storehouse 2.
Further, in this application scene, when the query statement is received, if the condition to be checked is lookup In the data of September birth in 2010, then with reference to the technical scheme in embodiment illustrated in fig. 2, it may be determined that need to be deposited in the row The data for meeting the condition to be checked are searched in the subregion 24 of storage word bank 2.
Further, when the query statement includes multiple conditions to be checked, for each condition to be checked, can be simultaneously Row performs the step S203 and step S204, and the data of the multiple condition to be checked are met to obtain respectively.
Further, if, it is necessary to expand the row storage word bank during the embodiment of the present invention is performed, and increasing newly Record data beyond it is described row storage the existing data area of word bank, then can extend up or down it is described row storage The data area of existing subregion in word bank, to store the newly-increased record.As change case, it is also possible to by Increase the mode of subregion in the row storage word bank to store the newly-increased record.
It is understood that the specific implementation of subregion 11, subregion 12, subregion 13, subregion 14 and subregion 22 Mode can refer to above-mentioned statement, and here is omitted.
Fig. 4 is a kind of structural representation of on-line analytical processing device of the third embodiment of the present invention.Art technology Personnel understand that on-line analytical processing device 4 described in the present embodiment is used to implement the method in above-mentioned Fig. 1 to embodiment illustrated in fig. 3 Technical scheme.Specifically, in the present embodiment, the on-line analytical processing device 4 includes receiver module 42, is inquired about for receiving Instruction, the query statement includes condition to be checked;Determining module 43, for being stored in default row based on the condition to be checked Determine to store word bank with the row of the conditions relevant to be checked in storehouse, the default row thesaurus includes at least one row storage Storehouse, is stored with least one record in row storage word bank by row, and each record includes that mark and the mark are corresponding Data, the data in different row storage word banks have different attributes;And searching modul 44, for be checked with described The record for obtaining and meeting the condition to be checked is searched in the row storage word bank of conditions relevant.
In one preferably application scenarios, the query statement includes multiple conditions to be checked and based on the multiple The algebraic manipulation formula of condition to be checked.Preferably, the searching modul 44 includes that first searches submodule 441, for each Condition to be checked, searches to obtain and meets the condition to be checked from the row storage word bank corresponding with the condition to be checked Record;Calculating sub module 442, for according to the algebraic manipulation formula, acquisition each being inquired about to the multiple condition to be checked The record carry out algebraic manipulation, to obtain algebraic manipulation result.
Further, the on-line analytical processing device 4 also includes modular converter 41, the data in the row storage word bank Index dictionary is based on by the modular converter to be converted to by initial data, so that the data have preset length.
In another preferably application scenarios, the row storage word bank includes dividing true by the data area of the data Fixed many sub-regions.Preferably, the searching modul 44 includes comparison sub-module 443, for comparing the condition to be checked With the relation of the data area, to be determined for compliance with the subregion of the condition to be checked;Second searches submodule 444, is used for Searched from the subregion of the condition to be checked is met and obtain the data.
Preferably, the row thesaurus is stored on flash card.
In a change case of the present embodiment, multiple conditions to be checked are included and based on described when the query statement During the algebraic manipulation formula of multiple conditions to be checked, for each condition to be checked, the comparison sub-module 443 can be called true Surely meet the subregion of the condition to be checked, then call the second lookup submodule 444 from meeting the bar to be checked Searched in the subregion of part and obtain the data, finally according to the algebraic manipulation formula, to the multiple condition to be checked each Inquiring about the record for obtaining carries out algebraic manipulation, to obtain algebraic manipulation result.
More contents of operation principle, working method on the on-line analytical processing device 4, are referred to Fig. 1 extremely Associated description in Fig. 3, repeats no more here.
One of ordinary skill in the art will appreciate that all or part of step in the various methods of above-described embodiment is can Completed with instructing the hardware of correlation by program, the program can be stored in computer-readable recording medium, to store Medium can include:ROM, RAM, disk or CD etc..
Although present disclosure is as above, the present invention is not limited to this.Any those skilled in the art, are not departing from this In the spirit and scope of invention, can make various changes or modifications, therefore protection scope of the present invention should be with claim institute The scope of restriction is defined.

Claims (14)

1. a kind of on-line analytical processing method, it is characterised in that comprise the following steps:
Query statement is received, the query statement includes condition to be checked;
Determine to store word bank, institute with the row of the conditions relevant to be checked in default row thesaurus based on the condition to be checked Stating default row thesaurus includes at least one row storage word bank, is stored with least one record by row in row storage word bank, Each record includes mark and the corresponding data of the mark, and the data in different row storage word banks have different category Property;
The record for obtaining and meeting the condition to be checked is searched from the row storage word bank with the conditions relevant to be checked.
2. on-line analytical processing method according to claim 1, it is characterised in that the query statement includes multiple to be checked Inquiry condition and the algebraic manipulation formula based on the multiple condition to be checked.
3. on-line analytical processing method according to claim 2, it is characterised in that from the conditions relevant to be checked Search to obtain in the row storage word bank and meet the record of the condition to be checked and include:
For each condition to be checked, searched from the row storage word bank corresponding with the condition to be checked and met The record of the condition to be checked;
According to the algebraic manipulation formula, algebraically meter is carried out to the record that the multiple condition to be checked each inquires about acquisition Calculate, to obtain algebraic manipulation result.
4. on-line analytical processing method according to claim 1, it is characterised in that the data in the row storage word bank are It is converted to by initial data based on index dictionary, so that the data have preset length.
5. on-line analytical processing method according to claim 1, it is characterised in that the row storage word bank is included by described The data area of data divides many sub-regions for determining.
6. on-line analytical processing method according to claim 5, it is characterised in that from the conditions relevant to be checked Search to obtain in the row storage word bank and meet the record of the condition to be checked and include:
Compare the relation of the condition to be checked and the data area, to be determined for compliance with the subregion of the condition to be checked;
Searched from the subregion of the condition to be checked is met and obtain the data.
7. the on-line analytical processing method according to any one of claim 1 to 6, it is characterised in that the row store stock It is stored on flash card.
8. a kind of on-line analytical processing device, it is characterised in that including:
Receiver module, for receiving query statement, the query statement includes condition to be checked;
Determining module, for being determined and the conditions relevant to be checked in default row thesaurus based on the condition to be checked Row storage word bank, the default row thesaurus includes at least one row storage word bank, is stored with by row in the row storage word bank At least one record, each record includes mark and the corresponding data of the mark, the data in different row storage word banks With different attributes;
Searching modul, meets described to be checked for searching to obtain from the row storage word bank with the conditions relevant to be checked The record of inquiry condition.
9. on-line analytical processing device according to claim 8, it is characterised in that the query statement includes multiple to be checked Inquiry condition and the algebraic manipulation formula based on the multiple condition to be checked.
10. on-line analytical processing device according to claim 9, it is characterised in that the searching modul includes:
First searches submodule, for each condition to be checked, from row storage corresponding with the condition to be checked The record for obtaining and meeting the condition to be checked is searched in storehouse;
Calculating sub module, for according to the algebraic manipulation formula, each inquiring about described in acquisition the multiple condition to be checked Record carries out algebraic manipulation, to obtain algebraic manipulation result.
11. on-line analytical processing devices according to claim 8, it is characterised in that also including modular converter, the row are deposited Data in storage word bank are based on index dictionary and are converted to by initial data by the modular converter, so that the data have There is preset length.
12. on-line analytical processing devices according to claim 8, it is characterised in that the row storage word bank includes pressing institute The data area for stating data divides many sub-regions for determining.
13. on-line analytical processing devices according to claim 12, it is characterised in that the searching modul includes:
Comparison sub-module, the relation for comparing the condition to be checked and the data area is described to be checked to be determined for compliance with The subregion of inquiry condition;
Second searches submodule, and the data are obtained for being searched from the subregion of the condition to be checked is met.
The 14. on-line analytical processing device according to any one of claim 8 to 13, it is characterised in that the row thesaurus It is stored on flash card.
CN201611259329.5A 2016-12-30 2016-12-30 Online analysis processing method and device Active CN106844541B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611259329.5A CN106844541B (en) 2016-12-30 2016-12-30 Online analysis processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611259329.5A CN106844541B (en) 2016-12-30 2016-12-30 Online analysis processing method and device

Publications (2)

Publication Number Publication Date
CN106844541A true CN106844541A (en) 2017-06-13
CN106844541B CN106844541B (en) 2020-05-29

Family

ID=59115008

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611259329.5A Active CN106844541B (en) 2016-12-30 2016-12-30 Online analysis processing method and device

Country Status (1)

Country Link
CN (1) CN106844541B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107577436A (en) * 2017-09-18 2018-01-12 杭州时趣信息技术有限公司 A kind of date storage method and device
CN107729500A (en) * 2017-10-20 2018-02-23 锐捷网络股份有限公司 A kind of data processing method of on-line analytical processing, device and background devices
CN109241195A (en) * 2017-07-03 2019-01-18 北京国双科技有限公司 The calculation method and device of ranking
CN109471874A (en) * 2018-10-30 2019-03-15 华为技术有限公司 Data analysis method, device and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101727465A (en) * 2008-11-03 2010-06-09 中国移动通信集团公司 Methods for establishing and inquiring index of distributed column storage database, device and system thereof
CN102663116A (en) * 2012-04-11 2012-09-12 中国人民大学 Multi-dimensional OLAP (On Line Analytical Processing) inquiry processing method facing column storage data warehouse
CN103678556A (en) * 2013-12-06 2014-03-26 华为技术有限公司 Method for processing column-oriented database and processing equipment
CN104424258A (en) * 2013-08-28 2015-03-18 腾讯科技(深圳)有限公司 Multidimensional data query method and system, query server and column storage server
CN104424287A (en) * 2013-08-30 2015-03-18 深圳市腾讯计算机系统有限公司 Query method and query device for data

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101727465A (en) * 2008-11-03 2010-06-09 中国移动通信集团公司 Methods for establishing and inquiring index of distributed column storage database, device and system thereof
CN102663116A (en) * 2012-04-11 2012-09-12 中国人民大学 Multi-dimensional OLAP (On Line Analytical Processing) inquiry processing method facing column storage data warehouse
CN104424258A (en) * 2013-08-28 2015-03-18 腾讯科技(深圳)有限公司 Multidimensional data query method and system, query server and column storage server
CN104424287A (en) * 2013-08-30 2015-03-18 深圳市腾讯计算机系统有限公司 Query method and query device for data
CN103678556A (en) * 2013-12-06 2014-03-26 华为技术有限公司 Method for processing column-oriented database and processing equipment

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109241195A (en) * 2017-07-03 2019-01-18 北京国双科技有限公司 The calculation method and device of ranking
CN109241195B (en) * 2017-07-03 2022-03-18 北京国双科技有限公司 Ranking calculation method and device
CN107577436A (en) * 2017-09-18 2018-01-12 杭州时趣信息技术有限公司 A kind of date storage method and device
CN107577436B (en) * 2017-09-18 2020-07-07 杭州时趣信息技术有限公司 Data storage method and device
CN107729500A (en) * 2017-10-20 2018-02-23 锐捷网络股份有限公司 A kind of data processing method of on-line analytical processing, device and background devices
CN109471874A (en) * 2018-10-30 2019-03-15 华为技术有限公司 Data analysis method, device and storage medium
WO2020088262A1 (en) * 2018-10-30 2020-05-07 华为技术有限公司 Data analysis method and device, and storage medium

Also Published As

Publication number Publication date
CN106844541B (en) 2020-05-29

Similar Documents

Publication Publication Date Title
CN103177055B (en) It is stored as row storage and row stores the hybrid database table of the two
CN103177056B (en) It is stored as row storage and row stores the hybrid database table of the two
CN103177058B (en) It is stored as row storage and row stores the hybrid database table of the two
US9892150B2 (en) Unified data management for database systems
CN104866434B (en) Towards data-storage system and data storage, the call method applied more
CN108536692B (en) Execution plan generation method and device and database server
CN108897761A (en) A kind of clustering storage method and device
CN106844541A (en) A kind of on-line analytical processing method and device
CN105488231A (en) Self-adaption table dimension division based big data processing method
CN102479217B (en) Method and device for realizing computation balance in distributed data warehouse
CN109086413A (en) For searching for the method, equipment and readable storage medium storing program for executing of block chain data
CN105975587A (en) Method for organizing and accessing memory database index with high performance
CN106326475A (en) High-efficiency static hash table implement method and system
Zhang et al. A method to predict the performance and storage of executing contract for ethereum consortium-blockchain
CN106294815B (en) A kind of clustering method and device of URL
EP3314464A2 (en) Storage and retrieval of data from a bit vector search index
WO2016209932A1 (en) Matching documents using a bit vector search index
CN106951503A (en) Information providing method, device, equipment and storage medium
US20220358178A1 (en) Data query method, electronic device, and storage medium
US20110179013A1 (en) Search Log Online Analytic Processing
CN104391908A (en) Locality sensitive hashing based indexing method for multiple keywords on graphs
EP3314465A1 (en) Match fix-up to remove matching documents
WO2018200348A1 (en) Systems and methods for accelerating exploratory statistical analysis
WO2016209952A1 (en) Reducing matching documents for a search query
CN104484392A (en) Method and device for generating database query statement

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant