CN112434027A - Indexing method and device for multi-dimensional data, computer equipment and storage medium - Google Patents

Indexing method and device for multi-dimensional data, computer equipment and storage medium Download PDF

Info

Publication number
CN112434027A
CN112434027A CN202011184974.1A CN202011184974A CN112434027A CN 112434027 A CN112434027 A CN 112434027A CN 202011184974 A CN202011184974 A CN 202011184974A CN 112434027 A CN112434027 A CN 112434027A
Authority
CN
China
Prior art keywords
information
dimension
index
dimension member
index information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011184974.1A
Other languages
Chinese (zh)
Inventor
高巍峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kingdee Software China Co Ltd
Original Assignee
Kingdee Software China Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kingdee Software China Co Ltd filed Critical Kingdee Software China Co Ltd
Priority to CN202011184974.1A priority Critical patent/CN112434027A/en
Publication of CN112434027A publication Critical patent/CN112434027A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2264Multidimensional index structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application relates to a method and a device for indexing multi-dimensional data, computer equipment and a storage medium. The method comprises the following steps: acquiring information to be inquired; the information to be inquired comprises dimension information; identifying dimension member information corresponding to the dimension information, wherein the dimension member information carries a dimension member identifier; acquiring index information corresponding to the dimension member identification; performing superposition operation on the index information to obtain index information of a designated dimension combination corresponding to the information to be inquired; traversing and querying the position with the identification bit in the index information of the specified dimension combination to obtain the storage address of the specified dimension combination corresponding to the information to be queried; and accessing the storage address to obtain detailed data information corresponding to the information to be inquired. By adopting the method, the multidimensional data can be rapidly filtered, the complexity and the repeatability of the index are reduced, and the index positioning efficiency is effectively improved, so that the query efficiency of the multidimensional data is improved.

Description

Indexing method and device for multi-dimensional data, computer equipment and storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method and an apparatus for indexing multidimensional data, a computer device, and a storage medium.
Background
With the development of computer technology and the coming of the 5G era, the appearance of the Internet brings great convenience to modern life, and more enterprises can carry out statistics, analysis and calculation on a large amount of business data on line by using an Internet platform. Particularly, in some large-scale systems such as banks involving group users, the data of the relevant data carried by the large-scale systems is hundreds of millions, and how to quickly and accurately locate the valid data has become an important problem in these industries. In a traditional data retrieval mode, data filtering and accessing are realized by traversing and querying data under different dimensions in a hierarchical tree structure or index Map.
However, in the current data retrieval method, when traversing and querying a tree grouping structure or an index Map, if the query dimension order is different from the predefined dimension combination order, layer-by-layer traversal query needs to be performed on each tree branch or multiple layers of maps, which easily results in low query efficiency. Even when data retrieval is performed by using a relational database, cross-correlation among a plurality of data records occurs, and the problem of low correlation query performance is easily caused, thereby resulting in low query efficiency.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a method, an apparatus, a computer device and a storage medium for indexing multidimensional data, which can improve efficiency of querying multidimensional data.
A method of indexing multidimensional data, the method comprising:
acquiring information to be inquired; the information to be inquired comprises dimension information;
identifying dimension member information corresponding to the dimension information, wherein the dimension member information carries a dimension member identifier;
acquiring index information corresponding to the dimension member identification;
performing superposition operation on the index information to obtain index information of a designated dimension combination corresponding to the information to be inquired;
traversing and querying the position with the identification bit in the index information of the specified dimension combination to obtain the storage address of the specified dimension combination corresponding to the information to be queried;
and accessing the storage address to obtain detailed data information corresponding to the information to be inquired.
In one embodiment, the dimension member information corresponding to the dimension information is identified and includes at least two dimension member information; the acquiring of the index information corresponding to the dimension member identifier includes:
acquiring first-level index information corresponding to the dimension information; the first-level index information is used for storing dimension member information corresponding to each dimension information and second-level index information corresponding to each dimension member information;
and acquiring secondary index information corresponding to the dimension member identification in the primary index information.
In one embodiment, the obtaining of the secondary index information corresponding to the dimension member identifier in the primary index information includes:
acquiring secondary index position information corresponding to the dimension member identification in the primary index information;
and acquiring secondary index information corresponding to the dimension member identification according to the secondary index position information.
In one embodiment, the superimposing operation on the index information includes:
acquiring a query condition corresponding to the information to be queried;
and performing an And operation Or an Or operation on the detail record information according to the query condition.
An apparatus for indexing multidimensional data, the apparatus comprising:
the acquisition module is used for acquiring information to be inquired; the information to be inquired comprises dimension information;
the identification module is used for identifying dimension member information corresponding to the dimension information, and the dimension member information carries a dimension member identifier;
the acquisition module is further used for acquiring index information corresponding to the dimension member identification;
the operation module is used for performing superposition operation on the index information to obtain index information of a specified dimension combination corresponding to the information to be inquired;
the query module is used for traversing and querying the positions with identification bits in the index information of the specified dimension combination to obtain a storage address of the specified dimension combination corresponding to the information to be queried;
and the access module is used for accessing the storage address to obtain detailed data information corresponding to the information to be inquired.
A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:
acquiring information to be inquired; the information to be inquired comprises dimension information;
identifying dimension member information corresponding to the dimension information, wherein the dimension member information carries a dimension member identifier;
acquiring index information corresponding to the dimension member identification;
performing superposition operation on the index information to obtain index information of a designated dimension combination corresponding to the information to be inquired;
traversing and querying the position with the identification bit in the index information of the specified dimension combination to obtain the storage address of the specified dimension combination corresponding to the information to be queried;
and accessing the storage address to obtain detailed data information corresponding to the information to be inquired.
A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:
acquiring information to be inquired; the information to be inquired comprises dimension information;
identifying dimension member information corresponding to the dimension information, wherein the dimension member information carries a dimension member identifier;
acquiring index information corresponding to the dimension member identification;
performing superposition operation on the index information to obtain index information of a designated dimension combination corresponding to the information to be inquired;
traversing and querying the position with the identification bit in the index information of the specified dimension combination to obtain the storage address of the specified dimension combination corresponding to the information to be queried;
and accessing the storage address to obtain detailed data information corresponding to the information to be inquired.
A method of index generation for multidimensional data, the method comprising:
acquiring information to be stored; the information to be stored comprises dimension information;
identifying dimension member information corresponding to the dimension information;
acquiring a storage condition corresponding to the information to be stored;
generating index information corresponding to the dimension member information according to the storage condition and the identification bit corresponding to the dimension member information;
and storing the detailed data information corresponding to the dimension member information into the storage address recorded in the index information.
In one embodiment, the generating, according to the storage condition and the identification bit corresponding to the dimension member information, index information corresponding to the dimension member information includes:
when the information to be stored is continuous data, utilizing an array to create index information corresponding to the dimension member information according to the storage condition and the identification bit corresponding to the dimension member information;
and when the information to be stored is discontinuous data, creating index information corresponding to the dimension member information by using an index Map according to the storage condition and the identification bit corresponding to the dimension member information.
An apparatus for index generation of multidimensional data, the apparatus comprising:
the acquisition module is used for acquiring information to be stored; the information to be stored comprises dimension information;
the identification module is used for identifying dimension member information corresponding to the dimension information;
the acquisition module is also used for acquiring the storage condition corresponding to the information to be stored;
the generating module is used for generating index information corresponding to the dimension member information according to the storage condition and the identification bit corresponding to the dimension member information;
and the storage module is used for storing the detailed data information corresponding to the dimension member information into the storage address recorded in the index information.
A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:
acquiring information to be stored; the information to be stored comprises dimension information;
identifying dimension member information corresponding to the dimension information;
acquiring a storage condition corresponding to the information to be stored;
generating index information corresponding to the dimension member information according to the storage condition and the identification bit corresponding to the dimension member information;
and storing the detailed data information corresponding to the dimension member information into the storage address recorded in the index information.
A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:
acquiring information to be stored; the information to be stored comprises dimension information;
identifying dimension member information corresponding to the dimension information;
acquiring a storage condition corresponding to the information to be stored;
generating index information corresponding to the dimension member information according to the storage condition and the identification bit corresponding to the dimension member information;
and storing the detailed data information corresponding to the dimension member information into the storage address recorded in the index information.
According to the multi-dimensional data indexing method, the multi-dimensional data indexing device, the computer equipment and the storage medium, when the multi-dimensional data needs to be retrieved or queried, the information to be queried is obtained, and the information to be queried comprises the dimensional information. Compared with the traditional data query mode, the dimension member information corresponding to the dimension information is identified, the dimension member information carries a dimension member identifier, the index information corresponding to the dimension member identifier is obtained, the index information is subjected to superposition operation to obtain the index information of the specified dimension combination corresponding to the information to be queried, the position with an identifier bit in the index information of the specified dimension combination is queried in a traversing mode to obtain the storage address of the specified dimension combination corresponding to the information to be queried, and the detailed data information corresponding to the information to be queried is obtained by accessing the storage address. Therefore, through the superposition calculation of the index information recorded by the BitSet, the multi-dimensional data is quickly filtered, the complexity and the repeatability of the index are reduced, the index positioning efficiency is effectively improved, and the multi-dimensional data query efficiency is improved.
Drawings
FIG. 1 is a diagram of an application environment for a method for indexing multidimensional data, in one embodiment;
FIG. 2 is a flow diagram that illustrates a method for indexing multidimensional data, according to one embodiment;
FIG. 3A is a flowchart illustrating a step of obtaining actual stored values in a continuous space according to the BitSet index information in one embodiment;
FIG. 3B is a flowchart illustrating the step of obtaining index information corresponding to the dimension member identifier in one embodiment;
FIG. 4 is a flowchart illustrating a step of performing a superposition operation on index information according to an embodiment;
FIG. 5 is a flowchart illustrating a method for generating an index of multidimensional data according to another embodiment;
FIG. 6 is a block diagram of an apparatus for indexing multidimensional data, according to one embodiment;
FIG. 7 is a block diagram of an apparatus for index generation of multidimensional data in one embodiment;
FIG. 8 is a diagram illustrating an internal structure of a computer device according to an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
The multi-dimensional data indexing method provided by the application can be applied to the application environment shown in fig. 1. Wherein the terminal 102 communicates with the server 104 via a network. The server 104 obtains information to be queried sent by the terminal 102, wherein the information to be queried includes dimension information. The server 104 identifies dimension member information corresponding to the dimension information, and the dimension member information carries a dimension member identifier. The server 104 acquires the index information corresponding to the dimension member identification, and the server 104 performs superposition operation on the index information to obtain the index information of the specified dimension combination corresponding to the information to be queried. The server 104 searches the position with the identification bit in the index information of the specified dimension combination in a traversing manner to obtain the storage address of the specified dimension combination corresponding to the information to be searched. The server 104 accesses the storage address to obtain detailed data information corresponding to the information to be queried, and returns the queried detailed data information to the terminal 102. The terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices, and the server 104 may be implemented by an independent server or a server cluster formed by a plurality of servers.
In an embodiment, as shown in fig. 2, a method for indexing multidimensional data is provided, and this embodiment is illustrated by applying the method to a server, it is to be understood that the method may also be applied to a terminal, and may also be applied to a system including a terminal and a server, and is implemented by interaction between the terminal and the server. In this embodiment, the method includes the steps of:
step 202, obtaining information to be queried, wherein the information to be queried comprises dimension information.
Each Enterprise can uniformly manage all resources and information on supply and demand chains inside and outside the Enterprise by adopting a uniform service management information platform, and the integration can eliminate various information gaps and information islands caused by division of departments inside the Enterprise, for example, effective management on the whole supply chain can be realized by utilizing an Enterprise Resource Planning system, and ERP (Enterprise Resource Planning) is an Enterprise Resource plan, and is an Enterprise information management system which is mainly oriented to the manufacturing industry and performs integrated management on material resources, fund resources and information resources. In an ERP system, statistics, analysis, and calculation of various data are generally required. The operations usually include a series of operations such as filtering, grouping, re-aggregation calculation and the like on data, and all data participating in the calculation include multiple dimensional attributes, so that grouping calculation on data according to different dimensions is an essential purpose of multi-dimensional data operation. Specifically, a user can log in a service information system of a specific scene by inputting a user name and a password in a mobile phone Application program or a browser webpage, the user can initiate a query request of specific service data through an app (Application) client or a web client, namely a web browser, a server can simultaneously obtain data query requests sent by a plurality of different terminals, the data query requests include information to be queried, and the information to be queried includes dimension information. The dimension information in the information to be queried refers to data with different dimension attributes contained in the information to be queried, and the dimension information of the data may be divided according to different attributes, for example, the first dimension information may be organization information to which the data belongs, and the second dimension information may be subject information to which the data belongs. Dimension refers to a general term for a data attribute.
And 204, identifying dimension member information corresponding to the dimension information, wherein the dimension member information carries a dimension member identifier.
After the server acquires the information to be queried, the server can identify dimension member information corresponding to the dimension information in the information to be queried, wherein the dimension member information carries a dimension member identifier. The dimension members refer to detail records corresponding to one dimension, the dimension member identification is used for identifying the only dimension member, each dimension member has a corresponding independent BitSet record, the BitSet refers to a data structure stored by using continuous bits, one Bit is used in the BitSet to represent fact records corresponding to index information, and the fact records, namely actual values, refer to data value information of the data records after the dimension information is removed. For example: amount value, quantity value, etc. The principle of the Bitset is that the computer system originally only can represent one integer numerical value of 32 bits or 64 bits, and 32 or 64 integer numerical value serial numbers are marked instead. Due to the fact that the sequence numbers of the records are continuously and monotonically increased and are not repeated, the situation that the position of one Bit corresponds to a plurality of records cannot occur in the Bitset, and therefore when the server can use the Bitset to construct an index, the server can mark the corresponding sequence number for the actual value record corresponding to each Bit position in the Bitset. Each fact record uses only one dimension member value for each dimension, and thus the actual length of the Bitset records for different dimension members may vary. For example, dimension member information corresponding to the 10 th bit corresponding to the server identification dimension information is a member A-1 of the dimension A, dimension member information corresponding to the 20 th bit corresponding to the server identification dimension information is a member A-2 of the dimension A, wherein A-1 and A-2 are corresponding dimension member identifications.
Step 206, obtaining the index information corresponding to the dimension member identification.
After the server identifies the dimension member information corresponding to the dimension information, the server can acquire the index information corresponding to the dimension member identifier carried in the dimension member information. The index information is an index created for the representation of the document data. In order to improve the retrieval efficiency, the index can be established according to a certain rule, and the establishment of the index information can include the establishment of index data tables with various different structures, for example, data is uniformly stored in a continuous space, the BitSet is used for establishing corresponding index information, and after the server identifies the dimension member information corresponding to the dimension information, the server can query through the primary index information to obtain the BitSet index information corresponding to the dimension member identifier.
And 208, performing superposition operation on the index information to obtain index information of the designated dimension combination corresponding to the information to be inquired.
After the server obtains the BitSet index information corresponding to each dimension member identifier, the server may perform superposition operation on the obtained BitSet index information corresponding to the plurality of dimensions according to the query condition, so as to obtain the index information of the designated dimension combination corresponding to the information to be queried. The superposition operation refers to performing an And operation Or an Or operation on BitSet index information corresponding to multiple dimensions. For example, when information to be queried sent by a certain user terminal needs to be retrieved according to 3 dimensions, a query condition that needs to be satisfied is dimension 1 — a _1 or a _ 2; dimension 2 ═ B _1 or B _ 2; dimension 3 — C _1 or C _ 2. And the server performs Or operation superposition on the Bitset corresponding to the dimension member A _1 and the Bitset of the dimension member A _2, wherein the Bitset comprises all the recording positions which refer to the A _1, so that the Bitset result which refers to the recording positions of the A _1 Or the A _2 can be obtained. And by analogy, the server respectively performs corresponding Or operation superposition on the relationship in the dimension 2 and the dimension 3 to obtain a corresponding BitSet result. Further, the server performs an And operation on the Bitset results of the dimension 1, the dimension 2 And the dimension 3 to obtain index information of a specified dimension combination corresponding to the three dimensions, so as to obtain the Bitset index information simultaneously meeting the three dimension conditions.
Step 210, traversing the positions with identification bits in the index information of the query-specified dimension combination to obtain the storage address of the specified dimension combination corresponding to the information to be queried.
After the server performs superposition operation on the index information to obtain the index information of the designated dimension combination corresponding to the information to be queried, the server can search the position with the identification bit in the index information of the designated dimension combination in a traversing manner to obtain the storage address of the designated dimension combination corresponding to the information to be queried. The identification Bit is that in the Bitset record, one Bit, namely one Bit, is used in the Bitset to represent the corresponding fact record, the fact record is the actual value, when the Bit in the Bitset stores the corresponding fact record, the corresponding identification Bit is set to be 1, and if the Bit in the Bitset does not store the corresponding fact record, the corresponding identification Bit is set to be 0.
Step 212, accessing the storage address to obtain detailed data information corresponding to the information to be queried.
After the server traverses and queries the position with the identification bit in the index information of the specified dimension combination to obtain the storage address of the specified dimension combination corresponding to the information to be queried, the server can obtain the detailed data information corresponding to the information to be queried by accessing each storage address. The detailed data information refers to real data information, that is, data value information of the data record excluding the dimension information. The position with the identification bit is the position corresponding to the identification bit 1, that is, the identification bit 1 indicates that the position stores the corresponding actual data value.
In this embodiment, when the multidimensional data needs to be retrieved or queried, the information to be queried is obtained, where the information to be queried includes the dimension information. Compared with the traditional data query mode, the server identifies dimension member information corresponding to the dimension information, the dimension member information carries a dimension member identifier, the server acquires index information corresponding to the dimension member identifier and performs superposition operation on the index information to obtain index information of a specified dimension combination corresponding to the information to be queried, the server queries the position with an identification bit in the index information of the specified dimension combination in a traversing manner to obtain a storage address of the specified dimension combination corresponding to the information to be queried, and the detailed data information corresponding to the information to be queried is obtained by accessing the storage address. Therefore, the rapid filtering of the multi-dimensional data is realized through the superposition calculation of the index information recorded by the BitSet, the complexity and the repeatability of the index are reduced, the index positioning efficiency is effectively improved, and the multi-dimensional data query efficiency is improved.
In one embodiment, the step of acquiring the index information corresponding to the dimension member identifier includes:
acquiring first-level index information corresponding to the dimension information; the first-level index information is used for storing dimension member information corresponding to each dimension information and second-level index information corresponding to each dimension member information.
And acquiring secondary index information corresponding to the dimension member identification in the primary index information.
When the dimension member information corresponding to the server identification dimension information at least includes two dimension member information, the server may obtain index information corresponding to each dimension member identification. Specifically, as shown in fig. 3A, a schematic flow chart of the step of acquiring the actual storage value in the continuous space through the BitSet index information is shown. The server can obtain first-level index information corresponding to the dimension information, and the first-level index information is used for storing dimension member information corresponding to each dimension information and second-level index information corresponding to each dimension member information. Further, the server may obtain secondary index information corresponding to the dimension member identifier in the primary index information. Under the condition that the number of dimension members is small, for example, when only one dimension member exists, the server can directly use a single-level index, namely a first-level index, the first-level index information can directly provide corresponding dimension grouping information and storage address information in the index information corresponding to the dimension member, the server can directly obtain the storage address information in the BitSet information corresponding to the dimension member in the first-level index information, and the server can obtain detailed data information corresponding to the information to be inquired by accessing the storage address. Under the condition that a large number of dimensionality members exist, namely when the information to be inquired contains a plurality of dimensionality attribute information, the server can utilize dimensionality grouping information stored in the primary index and secondary index position information corresponding to each dimensionality member to perform rapid positioning, the secondary index is used for storing storage address information corresponding to BitSet information of each dimensionality member value, the server can obtain BitSet records corresponding to dimensionality members needing to participate in filtering from the primary index, the step is an optional step, and when the same BitSet records are accessed for multiple times, the server can directly access through the position addresses of the records without retrieving the primary index. And after the server performs bit superposition operation on the acquired BitSet records corresponding to the multiple dimensions, finally obtaining BitSet index information meeting the combination record of the specified dimensions. In the actual operation process, if the number of the bitsets to be superposed is large, the server can check whether the superposed value is 0 or not after superposition operation to avoid subsequent invalid superposition calculation. Therefore, when the server indexes, the two-stage indexes are used and the BitSet information corresponding to the index records is matched to filter the data records, so that the fact record value of storing the original data in the continuous space is quickly positioned, the complexity and the repeatability of the index can be reduced, and the efficiency of index positioning is effectively improved.
In one embodiment, as shown in fig. 3B, the step of obtaining the secondary index information corresponding to the dimension member identifier in the primary index information includes:
step 302, acquiring secondary index position information corresponding to the dimension member identification in the primary index information.
And 304, acquiring secondary index information corresponding to the dimension member identification according to the secondary index position information.
After the server obtains the first-level index information corresponding to the dimension information, the server may obtain second-level index information corresponding to the dimension member identifier in the first-level index information. Specifically, the server may obtain secondary index position information corresponding to the dimension member identifier in the primary index information. And the server acquires the secondary index information corresponding to the dimension member identification according to the secondary index position information. For example, the server may find the position of the secondary index entry for storing the dimension member in the dimension according to the ID of each dimension member. By using the two-stage index, the positioning query complexity of the multi-dimensional data always keeps consistent under the condition that the dimension members are fixed and is basically irrelevant to the number of the fact data records, so that the problems that the tree structure is continuously deepened along with the increase of the number of dimensions in the traditional data retrieval mode or the index Map retrieval efficiency is low due to too many dimension combinations can be solved. In addition, in the conventional Map indexing method, because the index Map is generally a binary tree implementation manner, during an update operation, the hierarchy and branches of the tree are adjusted, and at this time, the Map needs to be locked, or comparison and merging are subsequently performed, so that under the requirement of concurrent resource access, as the number of contenders increases, the waiting time is longer, and the query efficiency is easily reduced linearly. The scheme of the application can support concurrent read-write operation of data even in an MVCC mode, namely a Multi-Version ConcurrentControl Multi-Version concurrent Control mode, and even in the case of adding records backwards, the write operation of the data does not generate the requirement of mutual exclusion locking on inquiry and reading operation, namely when adding, modifying or deleting the records, the original record is not operated, a new record is added, the Version number of the corresponding record is increased progressively, and the updated record is obtained through the latest Version number. Since the position of the fact record is newly added in the Bitset, the original information is not influenced, And when the Ant Or operation of the Bitset is executed, the operation is performed by using the copy, And therefore, even if the retrieval is carried out in the multi-version concurrent mode, the problem of data inconsistency is not generated when the operation is updated.
In one embodiment, as shown in fig. 4, the step of performing the superposition operation on the index information includes:
step 402, obtaining a query condition corresponding to the information to be queried.
And step 404, performing an And operation Or an Or operation on the detail record information according to the query condition.
After the server acquires the BitSet index information corresponding to each dimension member identifier, the server may perform superposition operation on the acquired BitSet index information corresponding to the multiple dimensions to obtain index information of a specified dimension combination corresponding to the information to be queried. Specifically, the server may obtain a query condition corresponding to the information to be queried. And the server performs an And operation Or an Or operation on the detail record information according to the query condition. When the server superimposes the values in the Bitset, the following two cases may be included. When the server performs the ant operation, namely the corresponding values of the two dimension members are 1 at the same time, the output value is 1, otherwise, the output value is 0. When the Or operation is carried out, if the values corresponding to any two dimension members are both 1, the output is 1, otherwise, the output is 0. And the server performs corresponding Bitset superposition operation according to the number of the matched dimensions required by the retrieval condition. For example, when a fact record corresponding to information to be queried sent by a certain user terminal needs to be retrieved according to 2 dimensions, query conditions that need to be satisfied are dimension 1, a _1, and a _ 2; dimension 2 ═ B _1 or B _ 2; and the server superposes the Bitset corresponding to the A _1 And the Bitset of the A _2 with the Bitset containing all the reference A _1 recording positions, so that the Bitset result referencing the A _1 And A _2 recording positions is obtained. Meanwhile, the server performs Or operation superposition on the Bitset corresponding to the B _1 and the Bitset of the B _2, wherein the Bitset contains all the recording positions referring to the B _1, and thus a Bitset result referring to the recording positions of the A _1 Or the A _2 is obtained. Further, the server performs the Ant operation on the Bitset result after performing the Ant operation superposition on the dimension 1 And the Bitset result after performing the Or operation superposition on the dimension 2 to obtain the index information of the specified dimension combination corresponding to the two dimensions, And then the Bitset result of the recording position satisfying the two dimension conditions at the same time is obtained, namely the Bitset result referring to the recording positions of the dimension 1 And the dimension 2 is obtained. Therefore, the Bitset positions recorded by the positioning facts are matched, namely the superposition operation is carried out on the Bitset values, the operation based on the Bitset is only used for determining the recording positions meeting the conditions at the same time, so that the sequence is not divided, and the parallel calculation can be supported, therefore, the positioning speed is much faster and more efficient than the operations such as traversal and condition judgment, the quick filtering of multi-dimensional data can be realized, and the multi-dimensional data query efficiency is effectively improved.
In one embodiment, as shown in fig. 5, a method for generating an index of multidimensional data is provided, which is described by taking the method as an example applied to the server in fig. 1, and includes the following steps:
step 502, information to be stored is obtained, and the information to be stored includes dimension information.
And step 504, identifying dimension member information corresponding to the dimension information.
Step 506, obtaining a storage condition corresponding to the information to be stored.
And step 508, generating index information corresponding to the dimension member information according to the storage condition and the identification bit corresponding to the dimension member information.
Step 510, storing the detailed data information corresponding to the dimension member information into the storage address recorded in the index information.
In each enterprise information management system, statistics, analysis, and calculation of various data are generally required. These operations typically include a series of operations for filtering, grouping, re-aggregating computations on the data, etc. In order to improve the retrieval efficiency, the index may be established according to a certain rule, and the establishment of the index information may include establishing index data tables with various different structures, for example, uniformly storing data in a continuous space, and establishing index detail information corresponding to each dimension member by using BitSet. Specifically, the server may obtain information to be stored, where the information to be stored includes dimension information. And the server identifies the dimension member information corresponding to the dimension information and acquires the storage condition corresponding to the information to be stored. The server can generate index information corresponding to the dimension member information according to the storage condition and the identification bit corresponding to the dimension member information, and the server stores detailed data information corresponding to the dimension member information into a storage address recorded in the index information. The index information is index detail information constructed by using BitSet. One BitSet record corresponds to each dimension member. The length of each BitSet corresponds to the size of the space in which the value is actually stored, but the actual length of the BitSet depends on the location of the last marker. Each bit in the BitSet identifies whether the record at the location corresponding to the actual stored value contains the corresponding dimension member attribute. One Bit, namely one Bit, is used in the BitSet to represent the records, and due to the fact that the sequence numbers of the records are continuous and do not repeat in monotone increment, the situation that one position corresponds to a plurality of records cannot occur in the BitSet, and the server can perform numbering according to the sequence that the position marked in the BitSet corresponds to the actual record. Because each dimension member has its own independent Bitset, and each dimension in each fact record only uses one dimension member value, the actual lengths of the Bitset records corresponding to different dimension members may be different, and because the fact record storage is stored in a continuous order, and the position information stored in the Bitset is the continuous order number of each, the actual length of each Bitset depends on the maximum value stored inside, and the maximum value corresponds to the maximum order number of the fact record referring to the dimension member. The information stored in the Bitset corresponding to each dimension member only contains the location of the fact record that references that member. For example, the 10 th record contains the member A-1 of dimension A, and the 100 th record contains the member A-2 of dimension A, so that the Bitset corresponding to the member A-1 only needs to be marked to the position of 10, and A-2 marks the position of 100. The Bitset corresponding to dimension member A-1 is thus much shorter than the Bitset corresponding to dimension member A-2, and the length of the Bitset depends on the number of dimension members to which the data is stored. The position number of the record is directly stored in the traditional index Map, one integer is 32 bits or 64 bits recorded in a computer, so that one 32Bit or 64Bit can only store one value, when the index is constructed by using Bitset in the application, 64 positions can be identified by one 64-Bit integer value in the Bitset, identification of a continuous space can be realized by splicing a plurality of integer values, the number of the actually stored indexes is consistent with the number of the actually used dimension members, invalid space storage of unused dimension members does not exist, the length of the actual storage space of each BitSet record is only related to the index position with data at the last Bit, and therefore the actual storage space of most BitSet records is smaller than the space length of the actual value storage, and a large amount of storage space is saved. Meanwhile, the values of original data are stored in a continuous space, and a plurality of BitSet records for actually storing and recording address information corresponding to each dimension member are constructed by constructing dimensions, dimension member values and index information of the BitSet records, namely the actual length of the BitSet corresponding to each member is consistent with the length of the actually stored space and is irrelevant to the number of the dimensions, so that the problem of rapid increase of the index space under the condition of multiple dimensions does not exist. Therefore, under the condition of ensuring the efficiency of data query, the available memory space is effectively utilized to effectively group the data, and the recording capability of quickly positioning, accessing and processing the data can be provided.
In one embodiment, the step of generating index information corresponding to the dimension member information according to the storage condition and the identification bit corresponding to the dimension member information includes:
and when the information to be stored is continuous data, utilizing the array to create index information corresponding to the dimension member information according to the storage condition and the identification bit corresponding to the dimension member information.
And when the information to be stored is discontinuous data, creating index information corresponding to the dimension member information by using the index Map according to the storage condition and the identification bit corresponding to the dimension member information.
After the server identifies the dimension member information corresponding to the dimension information and acquires the storage condition corresponding to the information to be stored, the server can generate the index information corresponding to the dimension member information according to the storage condition and the identification bit corresponding to the dimension member information. Specifically, when the server detects that the information to be stored is continuous data, the server may create index information corresponding to the dimension member information by using the array according to the storage condition and the identification bit corresponding to the dimension member information. When the server detects that the information to be stored is discontinuous data, the server can create index information corresponding to the dimension member information by using the index Map according to the storage condition and the identification bit corresponding to the dimension member information. The secondary index may use Map or array depending on the actual application. That is, Map is used when the number of dimension members is large and the probability of occurrence is sparse, and array is used when the number of dimension members is small or the probability of occurrence is dense. For example, the information to be stored that needs to be stored is: [1, 10, 20, 30, 40, 50, 60, 70, 80, 90] these 10 numbers, it is necessary to use 10 numbers of 32 bits or 64 bits in a conventional manner. In the present application, when the server constructs an index using Bitset, the server can represent [0-90] data in this continuous space using only 2 64-bit values. A bit in a BitSet record corresponds to a value, i.e. the way a bit represents a number. When the server detects that the information to be stored is continuous data, the server can create a BitSet record corresponding to each dimension member information by using the array according to the storage condition and the identification bit corresponding to the dimension member information. In addition, the Key Value pair information stored in the secondary index is the ID of the dimension member, and the Value is the Bitset storage address corresponding to the dimension member. The storage address can directly store the Bitset array without storing an actual address, and the server can directly access the address of the Bitset array to reduce repeated retrieval steps under the condition of a large number of continuous accesses, so that the access speed is further improved.
It should be understood that although the various steps in the flow charts of fig. 1-5 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 1-5 may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed in turn or alternately with other steps or at least some of the other steps.
In one embodiment, as shown in fig. 6, there is provided an indexing apparatus for multidimensional data, including: an acquisition module 602, an identification module 604, an operation module 606, a query module 608, and an access module 610,
wherein:
the obtaining module 602 is configured to obtain information to be queried, where the information to be queried includes dimension information.
The identifying module 604 is configured to identify dimension member information corresponding to the dimension information, where the dimension member information carries a dimension member identifier.
The obtaining module 602 is further configured to obtain index information corresponding to the dimension member identifier.
And the operation module 606 is configured to perform superposition operation on the index information to obtain index information of a specified dimension combination corresponding to the information to be queried.
The query module 608 is configured to traverse the positions with the identification bits in the index information of the query-specified dimension combination to obtain a storage address of the specified dimension combination corresponding to the information to be queried.
And the accessing module 610 is configured to access the storage address to obtain detailed data information corresponding to the information to be queried.
In one embodiment, the obtaining module is further configured to obtain primary index information corresponding to the dimension information, where the primary index information is used to store dimension member information corresponding to each dimension information and secondary index information corresponding to each dimension member information; and acquiring secondary index information corresponding to the dimension member identification in the primary index information.
In one embodiment, the obtaining module is further configured to obtain secondary index position information corresponding to the dimension member identifier in the primary index information; and acquiring secondary index information corresponding to the dimension member identification according to the secondary index position information.
In one embodiment, the obtaining module is further configured to obtain a query condition corresponding to the information to be queried. The operation module is also used for carrying out an And operation Or an Or operation on the detail record information according to the query condition.
In one embodiment, as shown in fig. 7, there is provided an index generating apparatus for multidimensional data, including: an obtaining module 702, an identifying module 704, a generating module 706, and a storing module 708, wherein:
an obtaining module 702 is configured to obtain information to be stored, where the information to be stored includes dimension information.
And the identifying module 704 is configured to identify dimension member information corresponding to the dimension information.
The obtaining module 702 is further configured to obtain a storage condition corresponding to the information to be stored.
A generating module 706, configured to generate index information corresponding to the dimension member information according to the storage condition and the identification bit corresponding to the dimension member information.
The storage module 708 is configured to store the detailed data information corresponding to the dimension member information into the storage address recorded in the index information.
In one embodiment, the apparatus further comprises: a module is created.
The creating module is used for creating index information corresponding to the dimension member information by using the array according to the storage condition and the identification bit corresponding to the dimension member information when the information to be stored is continuous data; and when the information to be stored is discontinuous data, creating index information corresponding to the dimension member information by using the index Map according to the storage condition and the identification bit corresponding to the dimension member information.
For specific limitations of the indexing device for multidimensional data, reference may be made to the above limitations on the indexing method for multidimensional data, which are not described herein again. The modules in the indexing device for multidimensional data can be wholly or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as shown in fig. 8. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing index data of multi-dimensional data. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method of indexing multidimensional data.
Those skilled in the art will appreciate that the architecture shown in fig. 8 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, a computer device is provided, comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the steps of the above-described method embodiments being implemented when the computer program is executed by the processor.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical storage, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. A method of indexing multidimensional data, the method comprising:
acquiring information to be inquired; the information to be inquired comprises dimension information;
identifying dimension member information corresponding to the dimension information, wherein the dimension member information carries a dimension member identifier;
acquiring index information corresponding to the dimension member identification;
performing superposition operation on the index information to obtain index information of a designated dimension combination corresponding to the information to be inquired;
traversing and querying the position with the identification bit in the index information of the specified dimension combination to obtain the storage address of the specified dimension combination corresponding to the information to be queried;
and accessing the storage address to obtain detailed data information corresponding to the information to be inquired.
2. The method according to claim 1, wherein the dimension member information for identifying the dimension information at least comprises two dimension member information; the acquiring of the index information corresponding to the dimension member identifier includes:
acquiring first-level index information corresponding to the dimension information; the first-level index information is used for storing dimension member information corresponding to each dimension information and second-level index information corresponding to each dimension member information;
and acquiring secondary index information corresponding to the dimension member identification in the primary index information.
3. The method according to claim 2, wherein the obtaining of the secondary index information corresponding to the dimension member identifier in the primary index information comprises:
acquiring secondary index position information corresponding to the dimension member identification in the primary index information;
and acquiring secondary index information corresponding to the dimension member identification according to the secondary index position information.
4. The method of claim 1, wherein the superimposing the index information comprises:
acquiring a query condition corresponding to the information to be queried;
and performing an And operation Or an Or operation on the detail record information according to the query condition.
5. A method of index generation for multidimensional data, the method comprising:
acquiring information to be stored; the information to be stored comprises dimension information;
identifying dimension member information corresponding to the dimension information;
acquiring a storage condition corresponding to the information to be stored;
generating index information corresponding to the dimension member information according to the storage condition and the identification bit corresponding to the dimension member information;
and storing the detailed data information corresponding to the dimension member information into the storage address recorded in the index information.
6. The method of claim 5, wherein the generating index information corresponding to the dimension member information according to the storage condition and an identification bit corresponding to the dimension member information comprises:
when the information to be stored is continuous data, utilizing an array to create index information corresponding to the dimension member information according to the storage condition and the identification bit corresponding to the dimension member information;
and when the information to be stored is discontinuous data, creating index information corresponding to the dimension member information by using an index Map according to the storage condition and the identification bit corresponding to the dimension member information.
7. An apparatus for indexing multidimensional data, the apparatus comprising:
the acquisition module is used for acquiring information to be inquired; the information to be inquired comprises dimension information;
the identification module is used for identifying dimension member information corresponding to the dimension information, and the dimension member information carries a dimension member identifier;
the acquisition module is further used for acquiring index information corresponding to the dimension member identification;
the operation module is used for performing superposition operation on the index information to obtain index information of a specified dimension combination corresponding to the information to be inquired;
the query module is used for traversing and querying the positions with identification bits in the index information of the specified dimension combination to obtain a storage address of the specified dimension combination corresponding to the information to be queried;
and the access module is used for accessing the storage address to obtain detailed data information corresponding to the information to be inquired.
8. An apparatus for generating an index of multidimensional data, the apparatus comprising:
the acquisition module is used for acquiring information to be stored; the information to be stored comprises dimension information;
the identification module is used for identifying dimension member information corresponding to the dimension information;
the acquisition module is also used for acquiring the storage condition corresponding to the information to be stored;
the generating module is used for generating index information corresponding to the dimension member information according to the storage condition and the identification bit corresponding to the dimension member information;
and the storage module is used for storing the detailed data information corresponding to the dimension member information into the storage address recorded in the index information.
9. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps of the method of any one of claims 1 to 6 when executing the computer program.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 6.
CN202011184974.1A 2020-10-30 2020-10-30 Indexing method and device for multi-dimensional data, computer equipment and storage medium Pending CN112434027A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011184974.1A CN112434027A (en) 2020-10-30 2020-10-30 Indexing method and device for multi-dimensional data, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011184974.1A CN112434027A (en) 2020-10-30 2020-10-30 Indexing method and device for multi-dimensional data, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN112434027A true CN112434027A (en) 2021-03-02

Family

ID=74696520

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011184974.1A Pending CN112434027A (en) 2020-10-30 2020-10-30 Indexing method and device for multi-dimensional data, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112434027A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113268487A (en) * 2021-06-16 2021-08-17 中移(杭州)信息技术有限公司 Data statistical method, device and computer readable storage medium
CN113343043A (en) * 2021-06-29 2021-09-03 北京奇艺世纪科技有限公司 Index construction method, index retrieval method, corresponding device, terminal and medium
CN113946585A (en) * 2021-10-28 2022-01-18 苏州贝塔智能制造有限公司 Clothing piece data index construction method, clothing piece data index search method and clothing piece sorting method
CN114547380A (en) * 2022-01-25 2022-05-27 北京元年科技股份有限公司 Data traversal query method and device, electronic equipment and readable storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105824876A (en) * 2016-03-01 2016-08-03 乐视网信息技术(北京)股份有限公司 Data querying method and device
CN109857754A (en) * 2018-11-29 2019-06-07 华迪计算机集团有限公司 A kind of information text searching method and system based on information access rights in domain
CN110955665A (en) * 2019-12-03 2020-04-03 支付宝(杭州)信息技术有限公司 Cache query method and device and electronic equipment
CN111611225A (en) * 2020-05-15 2020-09-01 腾讯科技(深圳)有限公司 Data storage management method, query method, device, electronic equipment and medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105824876A (en) * 2016-03-01 2016-08-03 乐视网信息技术(北京)股份有限公司 Data querying method and device
CN109857754A (en) * 2018-11-29 2019-06-07 华迪计算机集团有限公司 A kind of information text searching method and system based on information access rights in domain
CN110955665A (en) * 2019-12-03 2020-04-03 支付宝(杭州)信息技术有限公司 Cache query method and device and electronic equipment
CN111611225A (en) * 2020-05-15 2020-09-01 腾讯科技(深圳)有限公司 Data storage management method, query method, device, electronic equipment and medium

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113268487A (en) * 2021-06-16 2021-08-17 中移(杭州)信息技术有限公司 Data statistical method, device and computer readable storage medium
CN113343043A (en) * 2021-06-29 2021-09-03 北京奇艺世纪科技有限公司 Index construction method, index retrieval method, corresponding device, terminal and medium
CN113343043B (en) * 2021-06-29 2023-06-23 北京奇艺世纪科技有限公司 Index construction method, index retrieval method, and corresponding device, terminal and medium
CN113946585A (en) * 2021-10-28 2022-01-18 苏州贝塔智能制造有限公司 Clothing piece data index construction method, clothing piece data index search method and clothing piece sorting method
CN113946585B (en) * 2021-10-28 2022-08-26 苏州贝塔智能制造有限公司 Clothing piece data index construction method, clothing piece data index search method and clothing piece sorting method
CN114547380A (en) * 2022-01-25 2022-05-27 北京元年科技股份有限公司 Data traversal query method and device, electronic equipment and readable storage medium
CN114547380B (en) * 2022-01-25 2022-11-15 北京元年科技股份有限公司 Data traversal query method and device, electronic equipment and readable storage medium

Similar Documents

Publication Publication Date Title
CN112434027A (en) Indexing method and device for multi-dimensional data, computer equipment and storage medium
US10725981B1 (en) Analyzing big data
US9361320B1 (en) Modeling big data
CA2562281C (en) Partial query caching
CN112287182B (en) Graph data storage and processing method and device and computer storage medium
CN112363979B (en) Distributed index method and system based on graph database
US20170255709A1 (en) Atomic updating of graph database index structures
CN104123288A (en) Method and device for inquiring data
US20170255708A1 (en) Index structures for graph databases
CN107203640B (en) Method and system for establishing physical model through database operation record
CN107016047A (en) Document query, document storing method and device
CN113127848A (en) Storage method of permission system data and related equipment
CN114691721A (en) Graph data query method and device, electronic equipment and storage medium
CN115552390A (en) Server-free data lake indexing subsystem and application programming interface
CN105550332A (en) Dual-layer index structure based origin graph query method
CN113918605A (en) Data query method, device, equipment and computer storage medium
CN102193988A (en) Method and system for retrieving node data in graphic database
US8548980B2 (en) Accelerating queries based on exact knowledge of specific rows satisfying local conditions
CN115858471A (en) Service data change recording method, device, computer equipment and medium
JP7373663B2 (en) Universal data index for rapid data exploration
CN114238334A (en) Heterogeneous data encoding method and device, heterogeneous data decoding method and device, computer equipment and storage medium
CN114461606A (en) Data storage method and device, computer equipment and storage medium
CN114356945A (en) Data processing method, data processing device, computer equipment and storage medium
CN112416966A (en) Ad hoc query method, apparatus, computer device and storage medium
CN117540056B (en) Method, device, computer equipment and storage medium for data query

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination