CN106844541A - A kind of on-line analytical processing method and device - Google Patents
A kind of on-line analytical processing method and device Download PDFInfo
- Publication number
- CN106844541A CN106844541A CN201611259329.5A CN201611259329A CN106844541A CN 106844541 A CN106844541 A CN 106844541A CN 201611259329 A CN201611259329 A CN 201611259329A CN 106844541 A CN106844541 A CN 106844541A
- Authority
- CN
- China
- Prior art keywords
- checked
- condition
- row
- data
- word bank
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24553—Query execution of query operations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2457—Query processing with adaptation to user needs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/283—Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
A kind of on-line analytical processing method and device, methods described comprises the following steps:Query statement is received, the query statement includes condition to be checked;Determine to store word bank with the row of the conditions relevant to be checked in default row thesaurus based on the condition to be checked, the default row thesaurus includes at least one row storage word bank, it is stored with least one record by row in the row storage word bank, each record includes mark and the corresponding data of the mark, and the data in different row storage word banks have different attributes;The record for obtaining and meeting the condition to be checked is searched from the row storage word bank with the conditions relevant to be checked.The technical scheme provided by the present invention can preferably solve disk I/O bottleneck problem caused by big data quantity, effectively improve the processing speed and treatment effeciency of on-line analytical processing, be conducive to large-scale promotion and the application of on-line analytical processing pattern.
Description
Technical field
The present invention relates to big data application field, more particularly to a kind of on-line analytical processing method and device.
Background technology
At this stage, in data warehouse field, on-line analytical processing (On-Line Analytical are depended on
Processing, abbreviation OLAP) come mass data is carried out complexity analysis operation so that be decision-maker and top management people
The decision-making of member provides data and supports.Computer, can, spirit quick according to the requirement of analysis personnel when on-line analytical processing is carried out
Carry out the complex query processing of big data quantity livingly, and with it is a kind of it is directly perceived and understandable in the form of Query Result is presented to decision-making
Personnel, are easy to user accurately to grasp the management state of enterprise, understand the demand of object, and then formulate correct scheme.
For specialty analysis personnel and administrative decision personnel in enterprise, the data that they manage in analysis business
When, it usually needs from different angles come the measurement index of the business of examining closely.For example, user is when sales data is analyzed, may
The many factors such as generalized time cycle, product category, distribution channel, geographical distribution, client's realm carry out comprehensive consideration.These
Although analytic angle can be reflected by form, the angle of each analysis can generate a form, each analysis angle
The various combination of degree can generate different forms again, and this can undoubtedly increase the workload of report making personnel, and often difficult
To keep up with the paces of administrative decision personnel thinking.
In order to tackle the diversified demand of user, the scheme for processing mass data based on on-line analytical processing meet the tendency of and
It is raw.When on-line analytical processing is carried out, computer can directly copy the multi-angle of user to think deeply pattern, in advance for user is set up
The data model of multidimensional, wherein, " dimension " refers to the analytic angle of user configuring.Still by taking the analysis to sales data as an example, can
Using respectively by time cycle, product category, distribution channel, geographical distribution, client's realm as a dimension.When multidimensional data mould
After the completion of type is set up, user rapidly can obtain data from each analytic angle, also dynamic, it is flexible all angles it
Between switch or carry out multi-angle comprehensive analysis.Generally speaking, on-line analytical processing from design concept and it is real realize all with
Old management information system is essentially different.
But, if existing on-line analytical processing scheme is in practical application, the data volume of storage reaches certain rank
(for example, TB ranks), the then problem for disk input and delivery outlet (Input Output, abbreviation IO) bottleneck probably occur.
Existing a solution is by the way of number of disks is increased, for example, by arranging 12 pieces, 24 blocks of even more magnetic
Disk shares IO;Another solution is then to increase internal memory so that data as much as possible can be stored or cached in internal memory.
But, both schemes can cause cost to sharply increase, and be also possible to trigger fault rate higher for the first scheme, unfavorable
Large-scale promotion and application in on-line analytical processing pattern.
The content of the invention
Present invention solves the technical problem that being existing on-line analytical processing scheme when mass data is processed, easily occur
The problem of disk I/O bottleneck and fault rate high.
In order to solve the above technical problems, the embodiment of the present invention provides a kind of on-line analytical processing method, comprise the following steps:
Query statement is received, the query statement includes condition to be checked;It is true in default row thesaurus based on the condition to be checked
The fixed row with the conditions relevant to be checked store word bank, and the default row thesaurus includes at least one row storage word bank, institute
State and be stored with least one record by row in row storage word bank, each record includes mark and the corresponding data of the mark,
Data in different row storage word banks have different attributes;Word bank is stored from the row with the conditions relevant to be checked
It is middle to search the record for obtaining and meeting the condition to be checked.
Optionally, the query statement includes multiple conditions to be checked and the algebraically based on the multiple condition to be checked
Calculating formula.
Optionally, search to obtain from the row storage word bank with the conditions relevant to be checked and meet described to be checked
The record of condition includes:For each condition to be checked, from the row storage word bank corresponding with the condition to be checked
Search the record for obtaining and meeting the condition to be checked;According to the algebraic manipulation formula, to the multiple condition to be checked each
Inquiring about the record for obtaining carries out algebraic manipulation, to obtain algebraic manipulation result.
Optionally, the data in the row storage word bank are converted to based on index dictionary by initial data, so that
The data have preset length.
Optionally, the row storage word bank includes dividing many sub-regions for determining by the data area of the data.
Optionally, search to obtain from the row storage word bank with the conditions relevant to be checked and meet described to be checked
The record of condition includes:Compare the relation of the condition to be checked and the data area, to be determined for compliance with the bar to be checked
The subregion of part;Searched from the subregion of the condition to be checked is met and obtain the data.
Optionally, the row thesaurus is stored on flash card.
The embodiment of the present invention also provides a kind of on-line analytical processing device, including:Receiver module, refers to for receiving inquiry
Order, the query statement includes condition to be checked;Determining module, for based on the condition to be checked in default row thesaurus
It is determined that storing word bank with the row of the conditions relevant to be checked, the default row thesaurus includes at least one row storage word bank,
It is stored with least one record by row in the row storage word bank, each record includes mark and the corresponding number of the mark
According to the data in different row storage word banks have different attributes;Searching modul, for from the conditions relevant to be checked
Row storage word bank in search to obtain and meet the record of the condition to be checked.
Optionally, the query statement includes multiple conditions to be checked and the algebraically based on the multiple condition to be checked
Calculating formula.
Optionally, the searching modul includes:First searches submodule, for each condition to be checked, from it is described
The record for obtaining and meeting the condition to be checked is searched in the corresponding row storage word bank of condition to be checked;Calculating sub module,
For according to the algebraic manipulation formula, algebraically meter being carried out to the record that the multiple condition to be checked each inquires about acquisition
Calculate, to obtain algebraic manipulation result.
Optionally, the on-line analytical processing device also includes modular converter, the data in row storage word bank be by
The modular converter is based on what index dictionary was converted to by initial data, so that the data have preset length.
Optionally, the row storage word bank includes dividing many sub-regions for determining by the data area of the data.
Optionally, the searching modul includes:Comparison sub-module, for comparing the condition to be checked with the data model
The relation enclosed, to be determined for compliance with the subregion of the condition to be checked;Second searches submodule, for described to be checked from meeting
Searched in the subregion of condition and obtain the data.
Optionally, the row thesaurus is stored on flash card.
Compared with prior art, the technical scheme of the embodiment of the present invention has the advantages that:
After query statement is received, based on the condition to be checked that the query statement includes, in default row thesaurus
It is determined that storing word bank with the row of the conditions relevant to be checked, described treating is met so as to search acquisition in the row storage word bank
The record of querying condition, wherein, the record that the row storage word bank includes is stored by row.Than existing on-line analytical processing side
Case, the technical scheme of the embodiment of the present invention by the data storage with same alike result in same row storage word bank, due to described
Default row thesaurus includes at least one row storage word bank, then when there are the data of two or more different attributes, can be based on
Different row storage word banks store the data with different attribute respectively, and on-line analytical processing is being carried out to mass data
When, some the row storage word bank in the mass data of storage can be targetedly read by the attribute of condition to be checked, from
And disk I/O pressure when performing on-line analytical processing is greatly alleviated, and be conducive to avoiding the generation of disk I/O bottleneck, reduce
Fault rate.Further, each record includes mark and the corresponding data of the mark, so that user can be based on the mark
Different lines are stored into the record in word bank to be mapped.For example, same mark can respectively be corresponded in multiple row storage word bank
Multiple data, the multiple data can describe same thing from different dimensions (i.e. attribute).
Further, the query statement can include multiple conditions to be checked and based on the multiple condition to be checked
Algebraic manipulation formula, then for each condition to be checked, the technical scheme based on the embodiment of the present invention can be to be checked with described
The record for obtaining and meeting is searched in the corresponding row storage word bank of inquiry condition, and according to the algebraic manipulation formula, it is to be checked to multiple
The record that condition each inquires about acquisition carries out algebraic manipulation, so as to obtain accurate algebraic manipulation result so that final acquisition
Query Result meets user's expection.
Brief description of the drawings
Fig. 1 is a kind of flow chart of on-line analytical processing method of the first embodiment of the present invention;
Fig. 2 is a kind of flow chart of on-line analytical processing method of the second embodiment of the present invention;
Fig. 3 is the application scenarios schematic diagram of row storage word bank in the second embodiment of the present invention;
Fig. 4 is a kind of structural representation of on-line analytical processing device of the third embodiment of the present invention.
Specific embodiment
It will be appreciated by those skilled in the art that as background technology is sayed, existing on-line analytical processing scheme is to mass data
When carrying out the analysis operation of complexity, it is impossible to effectively solve the problems, such as disk I/O bottleneck with relatively low cost and fault rate.
Inventor has found that above mentioned problem is the number to needing to analyze due to existing on-line analytical processing scheme by research
Caused by still being stored by row.
In order to solve the above-mentioned technical problem, the technical scheme of the embodiment of the present invention is after query statement is received, based on institute
The condition to be checked that query statement includes is stated, determines to store son with the row of the conditions relevant to be checked in default row thesaurus
Storehouse, the record for meeting the condition to be checked is obtained so as to be searched in the row storage word bank, wherein, the row storage word bank
Including record by row store.Than existing on-line analytical processing scheme, the technical scheme of the embodiment of the present invention will be with phase
With the data storage of attribute in same row storage word bank, because the default row thesaurus includes at least one row storage
Storehouse, then when there are the data of two or more different attributes, can be based on different row storage word banks and store respectively described having
The data of different attribute, when on-line analytical processing is carried out to mass data, can be targeted by the attribute of condition to be checked
Reading storage mass data in some row storage word bank so that greatly alleviate perform on-line analytical processing when
Disk I/O pressure, is conducive to avoiding the generation of disk I/O bottleneck, reduces fault rate.Further, it is each record include identify and
It is described to identify corresponding data, the record in different lines storage word bank is mapped so that user can be based on the mark.
For example, same mark can respectively correspond to multiple data in multiple row storage word bank, the multiple data can be from different dimensional
(i.e. attribute) is spent to describe same thing.
It is understandable to enable above-mentioned purpose of the invention, feature and beneficial effect to become apparent, below in conjunction with the accompanying drawings to this
The specific embodiment of invention is described in detail.
Fig. 1 is a kind of flow chart of on-line analytical processing method of the first embodiment of the present invention.Specifically, in this implementation
In example, step S101 is first carried out, receives query statement, the query statement includes condition to be checked.More specifically, it is described
Condition to be checked is used to represent that user is desired based on the attribute of the data of on-line analytical processing.For example, user is needed based on this hair
The technical scheme of bright embodiment is carried out at on-line analysis to 10,000 the location distribution situations of trade company, business business circumstances etc.
Reason, then condition to be checked described in this step can include geographic range, so that the technical scheme based on the embodiment of the present invention, can
Trade company's distribution situation in analysis acquisition 10,000 trade companies in the geographic range.
Then step S102 is performed, is determined in default row thesaurus and the bar to be checked based on the condition to be checked
The row storage word bank of part association, the default row thesaurus includes at least one row storage word bank, is pressed in the row storage word bank
At least one record that is stored with is arranged, each record includes mark and the corresponding data of the mark, different row storage word banks
In data there are different attributes.Specifically, the attribute can be used for describing the type of the data.For example, name, year
Age, sex, trade company address etc. can be as the attributes of the row storage word bank.Wherein, the multiple in same row storage word bank
Record is stored along column direction (namely vertical direction).
In a preference, the condition to be checked be associated and refers to row storage word bank, the condition to be checked and
The attribute of the data stored in the row storage word bank matches.For example, when the condition to be checked is the geographic range,
Can determine that the row storage word bank with geographical position (for example, address) as attribute is associated with the condition to be checked.
Step S103 is finally performed, is searched from the row storage word bank with the conditions relevant to be checked and is met
The record of the condition to be checked.Specifically, the condition to be checked can be exact value, can also be value range.For example, institute
Stating condition to be checked can be, the age is the record of 30 years old in the lookup default row thesaurus, then it is year that this step arrives attribute
Age row storage word bank in searching data be 30 record as this step Query Result;Again for example, the condition to be checked
It is also possible that searching record of the age between 20 years old to 40 years old in the default row thesaurus, then this step can arrive category
Property in the row storage word bank at age the record that falls between 20 to 40 of searching data as this step Query Result.
Further, the default row thesaurus can be stored in the treatment end of the technical scheme for performing the embodiment of the present invention
End, it is also possible to be stored in the external storage equipment coupled with the processing terminal, may be stored in high in the clouds.Preferably, it is described
Row thesaurus can be stored on flash card (Flash cards), may be stored on flash memory, be the advantage is that, and existing
Some conventional storage medias are compared, and flash card uses PCIe sockets, and its bandwidth is wider, can preferably improve disk read-write
Speed.It will be appreciated by those skilled in the art that the technical scheme of the embodiment of the present invention is by using the agreement for supporting SQL-2003 standards
To realize external interface, facilitate interactive query and reduce the use threshold of user, certainly, those skilled in the art can also root
More embodiments are dissolved according to change is actually needed, be will not be described here.
Further, the mark has uniqueness, and the record for having like-identified in different lines storage word bank is belonged to
The object to be analyzed that the mark is pointed to, but the data attribute stored in different lines storage word bank is different.For example, existing
In machine analyzing and processing scheme, all of record is stored by row, a line record one object to be analyzed of correspondence, for each
Row record, the record includes data of the corresponding object to be analyzed in all dimensions.And in the embodiment of the present invention
In technical scheme, word bank is stored because the data by the object to be analyzed on different dimensions are stored to different row respectively
In, then the mark can be corresponding with the object to be analyzed, to represent the pass of the data in the different lines storage word bank
Connection property.
Further, the data in the row storage word bank are converted to based on index dictionary by initial data, with
Make the data that there is preset length.It will be appreciated by those skilled in the art that the initial data long for character string, can be based on institute
State index dictionary carries out code conversion in advance so that data of the storage in the row storage word bank all have same or like length
Degree, consequently facilitating the management to the row storage word bank, it is to avoid take row storage word bank because taking excessive memory space and shadow
The on-line analytical processing speed of the embodiment of the present invention is rung.Preferably, the index dictionary can be with the default row thesaurus
It is stored in same position, it is also possible to be stored in diverse location with the default row thesaurus, those skilled in the art can basis
It is actually needed change and dissolves more embodiments, this has no effect on technology contents of the invention.
For example, in building or updating the row storage word bank, if desired increasing in the row storage word bank
The string length of initial data has exceeded appraises and decides length, then the character string of the initial data can be encoded, and thinks
The initial data is distributed a same index with uniqueness and is identified, and indexes mark as data Cun Chudao institutes using described
State row storage word bank in, while it is described index dictionary in record it is described index mark and the initial data corresponding relation,
So as to follow-up lookup.
It will be appreciated by those skilled in the art that in the technical scheme of the embodiment of the present invention, the mark and the index mark have
There are different implications, wherein, described to identify for referring to the object to be analyzed, the index mark is then used to refer to the original
Beginning data.In a typical application scenarios, object to be analyzed preferably corresponds to a mark, and mark can be with
Correspond to the data in multiple row storage word banks respectively, if also, have in the data in the multiple row storage word bank two or with
On data be to be in advance based on the index dictionary to be converted to, then one mark can also correspond to multiple index marks
Know.
In a change case of the present embodiment, the query statement includes multiple conditions to be checked and based on described many
The algebraic manipulation formula of individual condition to be checked.Further, the step S103 may alternatively be " for each bar to be checked
Part, searches the record for obtaining and meeting the condition to be checked from the row storage word bank corresponding with the condition to be checked;
According to the algebraic manipulation formula, algebraic manipulation is carried out to the record that the multiple condition to be checked each inquires about acquisition, with
Obtain algebraic manipulation result ".
Further, the multiple condition to be checked can have identical attribute, it is possible to have different attributes;Or
Person, the part in the multiple condition to be checked has identical attribute, and remaining another part then has different attributes.
For example, user was wished to the age between 30 to 40 years old, the demographic data of sex female and surname Wang carries out online point
Analysis is processed, then the algebraic manipulation formula can be write the (" age:30-40 ") and (" sex:Female ") and ((" name:King * ") or
(" name:King * * ")), i.e., the multiple condition to be checked is processed by logical operators such as AOIs, to obtain
State algebraic manipulation formula.
Again for example, when on-line analytical processing is carried out based on the algebraic manipulation formula, this change case is preferably extracted described
Condition the multiple to be checked in algebraic manipulation formula, and determine the row storage word bank with each conditions relevant to be checked, so that
Search the record for obtaining and meeting the condition to be checked.For example, being based on the condition " age to be checked:30-40 " can determine to obtain
Attribute obtains the record that data fall between 30 to 40 for the row at age store word bank and therefrom search, based on condition to be checked " property
Not:Female " can determine to obtain attribute for the row storage word bank of sex and therefrom obtain data for woman notes down, based on bar to be checked
Part " name " can determine to obtain attribute for the row storage word bank of name and therefrom obtain data with the record of Wang Kaitou.
Again for example, for storing the record obtained in word bank from each row, this change case is preferably by with like-identified
Data in record carry out algebraic manipulation based on the algebraic manipulation formula, to obtain the algebraic manipulation result.For example, dependence
A record " age is obtained to be searched in the row storage word bank at age:34 years old ", dependence is looked into storing word bank for the row of sex
It is " sex to look for one record of acquisition:Female ", dependence obtains a record " name to be searched in the row storage word bank of name:King
One ", and foregoing three to note down the mark that includes identical (such as being 1), then can determine that described three are noted down for describing
Same object to be analyzed, then based on the algebraic manipulation formula, can carry out algebraically to the data " 34 years old ", " female ", " king one "
Calculate, obtain algebraic manipulation result, with represent this change case be based on the query statement search obtain a name king one and
Sex female and the Query Result at 34 years old age.
By upper, using the scheme of first embodiment, by the data storage with same alike result in same row storage word bank,
Because the default row thesaurus includes at least one row storage word bank, then when there are the data of two or more different attributes,
Different row storage word banks can be based on and store the data with different attribute respectively.Further, each record includes mark
Know and the corresponding data of the mark, so that user can be based on the mark by the record correspondence in different lines storage word bank
Get up.For example, same mark can respectively correspond to multiple data in multiple row storage word bank, the multiple data can be never
Same dimension (i.e. attribute) describes same thing.
It will be appreciated by those skilled in the art that in existing on-line analytical processing scheme, pending record is still using by row
The mode of storage, for example, one record can include data of the object to be analyzed in all dimensions (i.e. attribute), then when
User is wished with regard to meeting when the data of a certain attribute carry out on-line analytical processing, it is necessary to travel through storage in the History Log that stores
History Log could obtain data of all objects to be analyzed on the attribute, such scheme can be produced in practical application
Very big data processing amount, when the record of storage reaches TB ranks, easily triggers disk I/O bottleneck, can more have a strong impact on data
Processing speed, is unfavorable for the quick response to user's request.
And in the technical scheme of the embodiment of the present invention, preferably store to not History Log respectively by different attributes
In same row storage word bank, when user wishes to carry out on-line analytical processing to the data of a certain attribute, only the category need to be traveled through
Property it is corresponding row storage word bank, considerably reduce workload, accelerate the reaction speed to user's request;The opposing party
Face, if user is needed while carrying out on-line analytical processing to the data of many attribute of object to be analyzed, and need laterally
When solving data of the same object to be analyzed in many attribute, the technical scheme based on the embodiment of the present invention, due to same
The mark of object to be analyzed has uniqueness, can be based on data of the mark by the object to be analyzed on different attribute
Associate, so as to realize describing same object to be analyzed from different dimensions using multiple data, meet the diversified need of user
Ask.
Fig. 2 is a kind of flow chart of on-line analytical processing method of the second embodiment of the present invention.Specifically, in this implementation
In example, step S201 is first carried out, receives query statement, the query statement includes condition to be checked.More specifically, ability
Field technique personnel may be referred to step S101 described in above-mentioned embodiment illustrated in fig. 1, will not be described here.
Then step S202 is performed, is determined in default row thesaurus and the bar to be checked based on the condition to be checked
The row storage word bank of part association, the default row thesaurus includes at least one row storage word bank, is pressed in the row storage word bank
At least one record that is stored with is arranged, each record includes mark and the corresponding data of the mark, different row storage word banks
In data there are different attributes.Specifically, those skilled in the art may be referred to be walked described in above-mentioned embodiment illustrated in fig. 1
Rapid S102, will not be described here.Preferably, what data area division of the row storage word bank including pressing the data determined is more
Sub-regions.In a preference for combining Fig. 3, the default row thesaurus includes row storage word bank 1 and row storage word bank
2, wherein, the row storage word bank 1 includes subregion 11 to subregion 14, the row storage word bank 2 also include subregion 21 to
Subregion 24, the data area of institute's data storage is as shown in Figure 3 in each subregion.
Next step S203 is performed, compares the relation of the condition to be checked and the data area, to be determined for compliance with
The subregion of the condition to be checked.In a preference for combining Fig. 3, the condition to be checked and the row store word bank
The attribute of the data stored in 1 matches, then preferably relatively the condition to be checked falls in the row storage word bank 1 this step
Which sub-regions data area in.For example, the condition to be checked is to search the data that numerical value is 105, then can be true
Subregion 2 in the fixed row storage word bank 1 is the subregion for meeting the condition to be checked.
Step S204 is finally performed, is searched from the subregion of the condition to be checked is met and is obtained the data.In knot
Close in a preference of Fig. 3, the condition to be checked is still the data that lookup numerical value is 105, then based on the step
S203 determines that the subregion 2 in the row storage word bank 1 is that this step can be with excellent after meeting the subregion of the condition to be checked
Selection of land searches the record for obtaining that data are 105 in the subregion 2, as the Query Result of the present embodiment.
Further, the technical scheme based on the embodiment of the present invention can realize acting on behalf of the on-line analysis of (proxy) form
Treatment, i.e., by the way of distributed storage and Distributed Calculation, parallel expansion single machine node capacity realizes TB rank data
Management and on-line analytical processing.
Further, than existing batch processing (batch, abbreviation BAT) operator scheme, the technical side of the embodiment of the present invention
Case can realize on-line analytical processing by more simple array manipulation, and realize multinuclear using parallel processing engines
CPU's effectively uses.
Further, built-in bytecode (bytecodes) engine in the processing terminal for performing the embodiment of the present invention, to prop up
The script for holding logic in the embodiment of the present invention is realized and (Just-In-Time, abbreviation JIT) compiling immediately, so as to fully ensure that
Perform the system ductility and high-performance of the processing terminal of the embodiment of the present invention.
By upper, using the scheme of second embodiment, by the further refinement to the row storage word bank, preferably optimization
Data structure so that when the present embodiment is performed, without traveling through all data stored in the row storage word bank, so as to the greatest extent
Amount reduces the expense of disk I/O while consuming cpu performance less, preferably alleviate disk I/O bottle when performing on-line analytical processing
Neck, and improve on-line analytical processing speed.
It will be appreciated by those skilled in the art that compared with above-mentioned embodiment illustrated in fig. 1, step S203 described in the present embodiment and institute
The specific embodiment that step S204 can be step S103 described in above-mentioned embodiment illustrated in fig. 1 is stated, based on described
Condition to be checked determined on the basis of the row storage word bank being associated, and is determined in the row storage word bank closest to described to be checked
The subregion of inquiry condition, searches eventually through the subregion is searched and obtains the record for meeting the condition to be checked.
Fig. 3 shows a typical application scenarios, and with reference to Fig. 2 and Fig. 3, the attribute of the row storage word bank 2 is birth
Date, wherein, the minimum value and maximum of the subregion 21, the minimum value of the subregion 22 and maximum, the sub-district
The minimum value and maximum in domain 23, the minimum value of the subregion 24 and maximum are as shown in Figure 3.In this application scene, deposit
Being stored in the record in the row storage word bank 2 can first compared with the data area of all subregion, to determine to wait to deposit in storage
Which subregion storage record should be stored in, for example, the record to be stored is 0.0.1:1993-05-06 (is designated
0.0.1, date of birth 1993-05-06), can determine that the record to be stored should be stored in row storage by comparing
In the subregion 21 in storehouse 2.
Further, in this application scene, when the query statement is received, if the condition to be checked is lookup
In the data of September birth in 2010, then with reference to the technical scheme in embodiment illustrated in fig. 2, it may be determined that need to be deposited in the row
The data for meeting the condition to be checked are searched in the subregion 24 of storage word bank 2.
Further, when the query statement includes multiple conditions to be checked, for each condition to be checked, can be simultaneously
Row performs the step S203 and step S204, and the data of the multiple condition to be checked are met to obtain respectively.
Further, if, it is necessary to expand the row storage word bank during the embodiment of the present invention is performed, and increasing newly
Record data beyond it is described row storage the existing data area of word bank, then can extend up or down it is described row storage
The data area of existing subregion in word bank, to store the newly-increased record.As change case, it is also possible to by
Increase the mode of subregion in the row storage word bank to store the newly-increased record.
It is understood that the specific implementation of subregion 11, subregion 12, subregion 13, subregion 14 and subregion 22
Mode can refer to above-mentioned statement, and here is omitted.
Fig. 4 is a kind of structural representation of on-line analytical processing device of the third embodiment of the present invention.Art technology
Personnel understand that on-line analytical processing device 4 described in the present embodiment is used to implement the method in above-mentioned Fig. 1 to embodiment illustrated in fig. 3
Technical scheme.Specifically, in the present embodiment, the on-line analytical processing device 4 includes receiver module 42, is inquired about for receiving
Instruction, the query statement includes condition to be checked;Determining module 43, for being stored in default row based on the condition to be checked
Determine to store word bank with the row of the conditions relevant to be checked in storehouse, the default row thesaurus includes at least one row storage
Storehouse, is stored with least one record in row storage word bank by row, and each record includes that mark and the mark are corresponding
Data, the data in different row storage word banks have different attributes;And searching modul 44, for be checked with described
The record for obtaining and meeting the condition to be checked is searched in the row storage word bank of conditions relevant.
In one preferably application scenarios, the query statement includes multiple conditions to be checked and based on the multiple
The algebraic manipulation formula of condition to be checked.Preferably, the searching modul 44 includes that first searches submodule 441, for each
Condition to be checked, searches to obtain and meets the condition to be checked from the row storage word bank corresponding with the condition to be checked
Record;Calculating sub module 442, for according to the algebraic manipulation formula, acquisition each being inquired about to the multiple condition to be checked
The record carry out algebraic manipulation, to obtain algebraic manipulation result.
Further, the on-line analytical processing device 4 also includes modular converter 41, the data in the row storage word bank
Index dictionary is based on by the modular converter to be converted to by initial data, so that the data have preset length.
In another preferably application scenarios, the row storage word bank includes dividing true by the data area of the data
Fixed many sub-regions.Preferably, the searching modul 44 includes comparison sub-module 443, for comparing the condition to be checked
With the relation of the data area, to be determined for compliance with the subregion of the condition to be checked;Second searches submodule 444, is used for
Searched from the subregion of the condition to be checked is met and obtain the data.
Preferably, the row thesaurus is stored on flash card.
In a change case of the present embodiment, multiple conditions to be checked are included and based on described when the query statement
During the algebraic manipulation formula of multiple conditions to be checked, for each condition to be checked, the comparison sub-module 443 can be called true
Surely meet the subregion of the condition to be checked, then call the second lookup submodule 444 from meeting the bar to be checked
Searched in the subregion of part and obtain the data, finally according to the algebraic manipulation formula, to the multiple condition to be checked each
Inquiring about the record for obtaining carries out algebraic manipulation, to obtain algebraic manipulation result.
More contents of operation principle, working method on the on-line analytical processing device 4, are referred to Fig. 1 extremely
Associated description in Fig. 3, repeats no more here.
One of ordinary skill in the art will appreciate that all or part of step in the various methods of above-described embodiment is can
Completed with instructing the hardware of correlation by program, the program can be stored in computer-readable recording medium, to store
Medium can include:ROM, RAM, disk or CD etc..
Although present disclosure is as above, the present invention is not limited to this.Any those skilled in the art, are not departing from this
In the spirit and scope of invention, can make various changes or modifications, therefore protection scope of the present invention should be with claim institute
The scope of restriction is defined.
Claims (14)
1. a kind of on-line analytical processing method, it is characterised in that comprise the following steps:
Query statement is received, the query statement includes condition to be checked;
Determine to store word bank, institute with the row of the conditions relevant to be checked in default row thesaurus based on the condition to be checked
Stating default row thesaurus includes at least one row storage word bank, is stored with least one record by row in row storage word bank,
Each record includes mark and the corresponding data of the mark, and the data in different row storage word banks have different category
Property;
The record for obtaining and meeting the condition to be checked is searched from the row storage word bank with the conditions relevant to be checked.
2. on-line analytical processing method according to claim 1, it is characterised in that the query statement includes multiple to be checked
Inquiry condition and the algebraic manipulation formula based on the multiple condition to be checked.
3. on-line analytical processing method according to claim 2, it is characterised in that from the conditions relevant to be checked
Search to obtain in the row storage word bank and meet the record of the condition to be checked and include:
For each condition to be checked, searched from the row storage word bank corresponding with the condition to be checked and met
The record of the condition to be checked;
According to the algebraic manipulation formula, algebraically meter is carried out to the record that the multiple condition to be checked each inquires about acquisition
Calculate, to obtain algebraic manipulation result.
4. on-line analytical processing method according to claim 1, it is characterised in that the data in the row storage word bank are
It is converted to by initial data based on index dictionary, so that the data have preset length.
5. on-line analytical processing method according to claim 1, it is characterised in that the row storage word bank is included by described
The data area of data divides many sub-regions for determining.
6. on-line analytical processing method according to claim 5, it is characterised in that from the conditions relevant to be checked
Search to obtain in the row storage word bank and meet the record of the condition to be checked and include:
Compare the relation of the condition to be checked and the data area, to be determined for compliance with the subregion of the condition to be checked;
Searched from the subregion of the condition to be checked is met and obtain the data.
7. the on-line analytical processing method according to any one of claim 1 to 6, it is characterised in that the row store stock
It is stored on flash card.
8. a kind of on-line analytical processing device, it is characterised in that including:
Receiver module, for receiving query statement, the query statement includes condition to be checked;
Determining module, for being determined and the conditions relevant to be checked in default row thesaurus based on the condition to be checked
Row storage word bank, the default row thesaurus includes at least one row storage word bank, is stored with by row in the row storage word bank
At least one record, each record includes mark and the corresponding data of the mark, the data in different row storage word banks
With different attributes;
Searching modul, meets described to be checked for searching to obtain from the row storage word bank with the conditions relevant to be checked
The record of inquiry condition.
9. on-line analytical processing device according to claim 8, it is characterised in that the query statement includes multiple to be checked
Inquiry condition and the algebraic manipulation formula based on the multiple condition to be checked.
10. on-line analytical processing device according to claim 9, it is characterised in that the searching modul includes:
First searches submodule, for each condition to be checked, from row storage corresponding with the condition to be checked
The record for obtaining and meeting the condition to be checked is searched in storehouse;
Calculating sub module, for according to the algebraic manipulation formula, each inquiring about described in acquisition the multiple condition to be checked
Record carries out algebraic manipulation, to obtain algebraic manipulation result.
11. on-line analytical processing devices according to claim 8, it is characterised in that also including modular converter, the row are deposited
Data in storage word bank are based on index dictionary and are converted to by initial data by the modular converter, so that the data have
There is preset length.
12. on-line analytical processing devices according to claim 8, it is characterised in that the row storage word bank includes pressing institute
The data area for stating data divides many sub-regions for determining.
13. on-line analytical processing devices according to claim 12, it is characterised in that the searching modul includes:
Comparison sub-module, the relation for comparing the condition to be checked and the data area is described to be checked to be determined for compliance with
The subregion of inquiry condition;
Second searches submodule, and the data are obtained for being searched from the subregion of the condition to be checked is met.
The 14. on-line analytical processing device according to any one of claim 8 to 13, it is characterised in that the row thesaurus
It is stored on flash card.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611259329.5A CN106844541B (en) | 2016-12-30 | 2016-12-30 | Online analysis processing method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611259329.5A CN106844541B (en) | 2016-12-30 | 2016-12-30 | Online analysis processing method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106844541A true CN106844541A (en) | 2017-06-13 |
CN106844541B CN106844541B (en) | 2020-05-29 |
Family
ID=59115008
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611259329.5A Active CN106844541B (en) | 2016-12-30 | 2016-12-30 | Online analysis processing method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106844541B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107577436A (en) * | 2017-09-18 | 2018-01-12 | 杭州时趣信息技术有限公司 | A kind of date storage method and device |
CN107729500A (en) * | 2017-10-20 | 2018-02-23 | 锐捷网络股份有限公司 | A kind of data processing method of on-line analytical processing, device and background devices |
CN109241195A (en) * | 2017-07-03 | 2019-01-18 | 北京国双科技有限公司 | The calculation method and device of ranking |
CN109471874A (en) * | 2018-10-30 | 2019-03-15 | 华为技术有限公司 | Data analysis method, device and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101727465A (en) * | 2008-11-03 | 2010-06-09 | 中国移动通信集团公司 | Methods for establishing and inquiring index of distributed column storage database, device and system thereof |
CN102663116A (en) * | 2012-04-11 | 2012-09-12 | 中国人民大学 | Multi-dimensional OLAP (On Line Analytical Processing) inquiry processing method facing column storage data warehouse |
CN103678556A (en) * | 2013-12-06 | 2014-03-26 | 华为技术有限公司 | Method for processing column-oriented database and processing equipment |
CN104424258A (en) * | 2013-08-28 | 2015-03-18 | 腾讯科技(深圳)有限公司 | Multidimensional data query method and system, query server and column storage server |
CN104424287A (en) * | 2013-08-30 | 2015-03-18 | 深圳市腾讯计算机系统有限公司 | Query method and query device for data |
-
2016
- 2016-12-30 CN CN201611259329.5A patent/CN106844541B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101727465A (en) * | 2008-11-03 | 2010-06-09 | 中国移动通信集团公司 | Methods for establishing and inquiring index of distributed column storage database, device and system thereof |
CN102663116A (en) * | 2012-04-11 | 2012-09-12 | 中国人民大学 | Multi-dimensional OLAP (On Line Analytical Processing) inquiry processing method facing column storage data warehouse |
CN104424258A (en) * | 2013-08-28 | 2015-03-18 | 腾讯科技(深圳)有限公司 | Multidimensional data query method and system, query server and column storage server |
CN104424287A (en) * | 2013-08-30 | 2015-03-18 | 深圳市腾讯计算机系统有限公司 | Query method and query device for data |
CN103678556A (en) * | 2013-12-06 | 2014-03-26 | 华为技术有限公司 | Method for processing column-oriented database and processing equipment |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109241195A (en) * | 2017-07-03 | 2019-01-18 | 北京国双科技有限公司 | The calculation method and device of ranking |
CN109241195B (en) * | 2017-07-03 | 2022-03-18 | 北京国双科技有限公司 | Ranking calculation method and device |
CN107577436A (en) * | 2017-09-18 | 2018-01-12 | 杭州时趣信息技术有限公司 | A kind of date storage method and device |
CN107577436B (en) * | 2017-09-18 | 2020-07-07 | 杭州时趣信息技术有限公司 | Data storage method and device |
CN107729500A (en) * | 2017-10-20 | 2018-02-23 | 锐捷网络股份有限公司 | A kind of data processing method of on-line analytical processing, device and background devices |
CN109471874A (en) * | 2018-10-30 | 2019-03-15 | 华为技术有限公司 | Data analysis method, device and storage medium |
WO2020088262A1 (en) * | 2018-10-30 | 2020-05-07 | 华为技术有限公司 | Data analysis method and device, and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN106844541B (en) | 2020-05-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103177055B (en) | It is stored as row storage and row stores the hybrid database table of the two | |
CN103177056B (en) | It is stored as row storage and row stores the hybrid database table of the two | |
CN103177058B (en) | It is stored as row storage and row stores the hybrid database table of the two | |
US9892150B2 (en) | Unified data management for database systems | |
CN104866434B (en) | Towards data-storage system and data storage, the call method applied more | |
CN108536692B (en) | Execution plan generation method and device and database server | |
CN108897761A (en) | A kind of clustering storage method and device | |
CN106844541A (en) | A kind of on-line analytical processing method and device | |
CN105488231A (en) | Self-adaption table dimension division based big data processing method | |
CN102479217B (en) | Method and device for realizing computation balance in distributed data warehouse | |
CN109086413A (en) | For searching for the method, equipment and readable storage medium storing program for executing of block chain data | |
CN105975587A (en) | Method for organizing and accessing memory database index with high performance | |
CN106326475A (en) | High-efficiency static hash table implement method and system | |
Zhang et al. | A method to predict the performance and storage of executing contract for ethereum consortium-blockchain | |
CN106294815B (en) | A kind of clustering method and device of URL | |
EP3314464A2 (en) | Storage and retrieval of data from a bit vector search index | |
WO2016209932A1 (en) | Matching documents using a bit vector search index | |
CN106951503A (en) | Information providing method, device, equipment and storage medium | |
US20220358178A1 (en) | Data query method, electronic device, and storage medium | |
US20110179013A1 (en) | Search Log Online Analytic Processing | |
CN104391908A (en) | Locality sensitive hashing based indexing method for multiple keywords on graphs | |
EP3314465A1 (en) | Match fix-up to remove matching documents | |
WO2018200348A1 (en) | Systems and methods for accelerating exploratory statistical analysis | |
WO2016209952A1 (en) | Reducing matching documents for a search query | |
CN104484392A (en) | Method and device for generating database query statement |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |