CN107704527A - Date storage method, device and storage medium - Google Patents

Date storage method, device and storage medium Download PDF

Info

Publication number
CN107704527A
CN107704527A CN201710841916.3A CN201710841916A CN107704527A CN 107704527 A CN107704527 A CN 107704527A CN 201710841916 A CN201710841916 A CN 201710841916A CN 107704527 A CN107704527 A CN 107704527A
Authority
CN
China
Prior art keywords
bitmap
stipulations
mapping
subregion
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710841916.3A
Other languages
Chinese (zh)
Other versions
CN107704527B (en
Inventor
钟超强
毕杰山
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Cloud Computing Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN201710841916.3A priority Critical patent/CN107704527B/en
Publication of CN107704527A publication Critical patent/CN107704527A/en
Priority to PCT/CN2018/087377 priority patent/WO2019052209A1/en
Application granted granted Critical
Publication of CN107704527B publication Critical patent/CN107704527B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2237Vectors, bitmaps or matrices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating

Abstract

This application discloses a kind of date storage method, device and storage medium, belong to technical field of information processing.Methods described includes:When getting at least one data record, at least one data record is based on by default mapping/stipulations model, the bitmap of the label value in each bitmap index subregion that bitmap index includes is determined, is stored at least one data record into corresponding bitmap index subregion with realizing.Because bitmap index includes at least one bitmap, the corresponding label value of each bitmap, therefore the supporting body with the label value can be searched by bitmap index based on label value and identified, improve the efficiency that data query is carried out based on label value.In addition, can concurrently determine the bitmap of the label value in each bitmap index subregion by presetting mapping/stipulations model, the efficiency of data storage is improved.

Description

Date storage method, device and storage medium
Technical field
The application is related to technical field of information processing, more particularly to a kind of date storage method, device and storage medium.
Background technology
Hadoop databases (Hadoop Database, HBase) have distributed, highly reliable, high-performance, based on key-value (Key-Value) the features such as storing, therefore increasing enterprise and user build tables of data using HBase.
Under normal circumstances, tables of data includes multirow data record, and per data line, record includes the mark of supporting body and is somebody's turn to do The label value for each label that supporting body has.For example for user A, it has sex " female " and occupation " engineer " Two label values, then the row in tables of data corresponding to user A include user A mark, label value " female " and label value " engineering Teacher ".That is, the mark of supporting body and the corresponding relation of its label value are have recorded in tables of data.
Based on the storage mode of above-mentioned tables of data, when needing to inquire about data in tables of data, according to the mark of supporting body It is high to carry out search efficiency during data query, and when according to a certain label value or label value query composition, in correlation technique only The label of each supporting body can be inquired about line by line according to the mark of supporting body according to row value filter (column value filter) Value, and due to the line number generally thousands of, therefore in relevant programme of tables of data, data query is carried out based on label value When, its efficiency data query is relatively low.
The content of the invention
During in order to solve to carry out data query based on label value in correlation technique, the relatively low problem of its efficiency data query, this Application provides a kind of date storage method, device and storage medium.The technical scheme is as follows:
First aspect, there is provided a kind of date storage method, methods described include:
At least one data record is obtained, record includes a supporting body mark and at least one label value per data;
The supporting body mark included based on every data record, N number of first rule included according to default mapping/stipulations model The about partition information of subregion, first kind classification is carried out at least one data record, obtains at least one first mapping ensemblen Close, the corresponding first stipulations subregion of each first mapping set;
Wherein, N number of first stipulations subregion is the partition information of the N number of bitmap index subregion included according to bitmap index Determine, N is positive integer, and the corresponding first stipulations subregion of each bitmap index subregion, each bitmap index subregion is included extremely A few bitmap, each bitmap correspond to a label value, and each bitmap includes at least one bitmap bits, and each bitmap bits are used for Whether one corresponding supporting body of supporting body mark of record possesses the label value corresponding to current bitmap;
By at least one each self-corresponding first stipulations Paralleled of first mapping set to described at least one Individual first mapping set carries out first kind stipulations processing, obtains the bitmap of the label value in each bitmap index subregion;
The bitmap of label value in obtained each bitmap index subregion is stored into corresponding bitmap index subregion.
In embodiments of the present invention, when getting at least one data record, default mapping/stipulations model can be based on At least one data record is stored into bitmap index, in order to after the data are stored, be passed through based on some label value Bitmap index is searched the supporting body with the label value and identified.In addition, can concurrently it be determined by default mapping/stipulations model The bitmap of label value in each bitmap index subregion, improve at least a data record storage to N number of bitmap index subregion Efficiency.
Alternatively, the partition information of each first stipulations subregion is held by bitmap index table mark and pre-set interval scope Signal of carrier forms;
The partition information of the N number of first stipulations subregion included according to default mapping/stipulations model, to described at least one Data record carries out first kind classification, obtains at least one first mapping set, including:
Concurrently at least one data record is carried out at first kind mapping by the default mapping/stipulations model Reason, obtains at least one first mapping result, and each first mapping result includes bitmap index table mark, supporting body mark With at least one label value;
According to the partition information of N number of first stipulations subregion, at least one first mapping result is classified, Obtain at least one first mapping set.
Further, the N number of first stipulations subregion included according to default mapping/stipulations model partition information to this extremely Before few a data record carries out first kind classification, also need concurrently to remember every data by default mapping/stipulations model Record carries out first kind mapping processing, in order to classify afterwards at least one first mapping result after mapping.
Alternatively, it is described by least one each self-corresponding first stipulations Paralleled of first mapping set pair At least one first mapping set carries out first kind stipulations processing, obtains the position of the label value in each bitmap index subregion Figure, including:
For each first mapping set, the first stipulations subregion corresponding to first mapping set is determined;
Identified according to the supporting body in each first mapping result in first mapping set, pass through first stipulations Subregion is ranked up to the first mapping result in first mapping set;
For each first mapping result after sequence, according to ranking results, from corresponding with the first stipulations subregion The bitmap of each label value at least one label value that first mapping result includes, and root are obtained in bitmap index subregion The bitmap of the label value is updated according to the bitmap bits of supporting body mark.
Wherein, for each stipulations subregion, the stipulations subregion is that processing belongs to the stipulations point in a certain order Multiple data in area, therefore for each first mapping set, the first stipulations subregion corresponding with first mapping set can be with First the first mapping result in first mapping set is ranked up, and handles first mapping ensemblen successively according to ranking results Each first mapping result in conjunction.
Alternatively, before the bitmap bits identified according to the supporting body update the bitmap of the label value, in addition to:
When first mapping result also includes the bitmap bits of supporting body mark, perform what is identified according to the supporting body Bitmap bits update the operation of the bitmap of the label value;Or
When first mapping result does not include the bitmap bits of supporting body mark, the bitmap of the supporting body mark is obtained Position, and perform the operation that the bitmap bits identified according to the supporting body update the bitmap of the label value.
During bitmap due to updating some label value, the bitmap bits of supporting body mark need to be first determined, and system may be in advance It is supporting body mark configuration bitmap bits, it is also possible to do not identify configuration bitmap bits, therefore the first mapping for the supporting body As a result the bitmap bits of carrying mark may be included in, the bitmap bits of supporting body mark can not also be included.When the first mapping When as a result without the bitmap bits including supporting body mark, before the bitmap of some label value is updated, the carrying need to be first obtained The bitmap bits of body mark.
Alternatively, after the bitmap bits for obtaining the supporting body mark, in addition to:
Store the corresponding relation between the bitmap bits and supporting body mark of the supporting body mark.
Further, when the first mapping result does not include the bitmap bits of supporting body mark, the supporting body is being obtained After the bitmap bits of mark, the corresponding relation between the bitmap bits of supporting body mark and supporting body mark can also be stored, In order to subsequently identify the bitmap bits inquired about the supporting body and identified according to the supporting body, or according to bitmap bits inquiry and the bitmap Supporting body mark corresponding to position.
Alternatively, the supporting body mark included based on every data record, is included according to default mapping/stipulations model N number of first stipulations subregion partition information, at least one data record carry out first kind classification before, in addition to:
The partition information of the bitmap index is determined, the partition information of the bitmap index is used to describe the bitmap index In supporting body mark corresponding to each bitmap index subregion set;
According to the partition information of the bitmap index, N number of first stipulations point in the default mapping/stipulations model are determined Area.
By the partition information of N number of first stipulations subregion that includes thus according to default mapping/stipulations model to this at least one Data record carries out first kind classification, so before first kind classification is carried out at least one data record, may be used also To determine that this is preset the subregion for N number of first stipulations subregion that mapping/stipulations model includes and believed according to the partition information of bitmap index Breath.
Alternatively, after at least one data record of the acquisition, in addition to:
The supporting body mark included based on every data record, the M included according to the default mapping/stipulations model the The partition information of two stipulations subregions, two classification is carried out at least one data record, at least one second is obtained and reflects Penetrate set, the corresponding second stipulations subregion of each second mapping set;
Wherein, the M the second stipulations subregions are that the partition information of the M data partition included according to tables of data determines , M is positive integer, and the corresponding second stipulations subregion of each data partition, each data partition identifies for recording carrying body With the corresponding relation of label value;
By at least one each self-corresponding second stipulations Paralleled of second mapping set to described at least one Individual second mapping set carries out the processing of the second class stipulations, obtains the data in each data partition;
By the data storage of obtained each data partition into corresponding data partition.
Further, in embodiments of the present invention, when getting at least one data record, it is also based on default reflect Penetrate/stipulations model stores at least one data record into tables of data, to realize at least one data record simultaneously Store into bitmap index and tables of data.Also, each data partition can concurrently be determined by default mapping/stipulations model In data, improve at least a data record storage to the efficiency of M data partition.
Alternatively, the partition information of each second stipulations subregion is by supporting body tables of data mark and pre-set interval scope Supporting body mark composition;
The partition information of the M included according to the default mapping/stipulations model the second stipulations subregions, to it is described extremely Few a data record carries out two classification, including:
Concurrently at least one data record is carried out at the second class mapping by the default mapping/stipulations model Reason, obtains at least one second mapping result, and each second mapping result includes tables of data mark, supporting body mark and extremely A few label value;
According to the partition information of the M the second stipulations subregions, at least one second mapping result is classified, Obtain at least one second mapping set.
Further, M the second stipulations subregions included according to default mapping/stipulations model partition information to this extremely Before few a data record carries out two classification, also need concurrently to remember every data by default mapping/stipulations model Record carries out the second class mapping processing, in order to classify afterwards at least one second mapping result after mapping.
Alternatively, N is less than or equal to M, and each data partition that N is more than or equal in 2, M data partition belongs to unique Bitmap index subregion, each bitmap index subregion in N number of bitmap index subregion includes at least one data partition.
In addition, in order to improve the efficiency of the inquiry data from tables of data, while looked into improve from bitmap index subregion Ask data efficiency, N number of bitmap index subregion that the M data partition and bitmap index that tables of data includes include can meet with Upper condition, that is, can be suitably a little bit smaller by the subregion scope setting of each data partition in tables of data, by bitmap index The subregion scope of each bitmap index subregion set it is big a bit.
Second aspect, there is provided a kind of data storage device, the device of the data storage, which has, realizes above-mentioned first party The function of date storage method behavior in face.The data storage device includes at least one module, and at least one module is used In the date storage method realized above-mentioned first aspect and provided.
The third aspect, there is provided another data storage device, the structure of the data storage device include processor And memory, the memory, which is used to store, supports data storage device to perform the data storage side that above-mentioned first aspect is provided The program of method, and the data involved by the date storage method provided for realizing above-mentioned first aspect are provided.The place Reason device is configurable for performing the program stored in the memory.The operation device of the storage device can also include logical Believe bus, the communication bus is used to establish connection between the processor and memory.
Fourth aspect, there is provided a kind of computer-readable recording medium, be stored with the computer-readable recording medium Instruction, when run on a computer so that computer performs the date storage method described in above-mentioned first aspect.
5th aspect, there is provided a kind of computer program product for including instruction, when run on a computer so that Computer performs date storage method described in above-mentioned first aspect.
The beneficial effect brought of technical scheme that the application provides is:
In this application, when getting at least one data record, can be based on by default mapping/stipulations model should At least one data record, determine the bitmap of the label value in each bitmap index subregion that bitmap index includes, with realize by At least one data record is stored into corresponding bitmap index subregion.Because bitmap index includes at least one bitmap, often The corresponding label value of individual bitmap, therefore the supporting body mark with the label value can be searched by bitmap index based on label value Know, improve the efficiency that data query is carried out based on label value.In addition, can be concurrently true by default mapping/stipulations model The bitmap of label value in fixed each bitmap index subregion, improve the efficiency of data storage.
Brief description of the drawings
Fig. 1 is a kind of bitmap schematic diagram of label value provided in an embodiment of the present invention;
Fig. 2 is a kind of date storage method flow chart provided in an embodiment of the present invention;
Fig. 3 is another date storage method flow chart provided in an embodiment of the present invention;
Fig. 4 is another date storage method flow chart provided in an embodiment of the present invention;
Fig. 5 A are a kind of data storage device block diagrams provided in an embodiment of the present invention;
Fig. 5 B are a kind of first sort module block diagrams provided in an embodiment of the present invention;
Fig. 5 C are another data storage device block diagrams provided in an embodiment of the present invention;
Fig. 5 D are a kind of second sort module block diagrams provided in an embodiment of the present invention;
Fig. 6 is another data storage device block diagram provided in an embodiment of the present invention.
Embodiment
In order to make it easy to understand, simple introduction is done to the relational language involved by the embodiment of the present invention first.
Label is a kind of Content Organizing mode, in a certain feature of characterize data and then help people description and classification Hold.For example common label has sex, educational background, occupation, color etc..Alternatively, label is artificial defined.
A kind of possible realization, label can include two kinds of enumerated label and boolean's label.Enumerated label refers to include multiple The label of enumerated value, such as, educational background includes training, undergraduate course, postgraduate, doctor etc., and for example, sex includes man or female; And boolean's label is served only for indicating whether possess the label, such as, if there is room, whether take drugs, whether had previous conviction etc. Deng.
When label is enumerated label, the label value of label refers to the specific value of label.For example it is by educational background of label Example, when educational background is undergraduate course, its label value is undergraduate course, and when educational background is postgraduate, its label value is postgraduate.And it is cloth in label During your label, the label value of label for label in itself.For example its label value is to have room, for another example, work as user when user has room When not having previous conviction, its corresponding label value is without previous conviction.
Supporting body:It is the object described by each label.Alternatively, supporting body can be people, car, telephone number or void Intend user account number etc..One supporting body can have a label, it is possible to have multiple labels.For example it is with supporting body Exemplified by people, sex, educational background can be had, whether have room, whether had previous conviction etc. by describing the label of people.Again for example, to hold Carrier can have color exemplified by car, to describe the label of car, whether have record etc. in violation of rules and regulations.
Tables of data:For the data record established in database using supporting body as index.Every data record in tables of data, The mark of a supporting body is recorded, records all label values that this supporting body has, and record the mark of this supporting body Corresponding relation between the label value having with the supporting body.
Bitmap index:For the secondary index established in database using the label value in tables of data as index.Alternatively, position Index of the picture records label value and bitmap, also records the one-to-one relationship between label value and bitmap.Wherein, it is each in bitmap The mark of the corresponding supporting body of individual bitmap bits, but the different bitmap bits in bitmap correspond to the mark of different bearer body, i.e. bitmap In all bitmap bits and all supporting bodies mark one-to-one corresponding.Each bitmap bits in bitmap record supporting body Whether the corresponding supporting body of mark possesses the label value corresponding to current bitmap (bitmap where this bitmap bits);Such as If some bitmap bits in the bitmap of a label value are 1, representing supporting body corresponding to this bitmap bits has this label Value, if conversely, this bitmap bits be 0, represent supporting body corresponding to this bitmap bits and do not have this label value.Different bitmaps Middle identical bitmap bits correspond to the mark of identical supporting body.
So that supporting body is Virtual User account as an example, it is assumed that totally 8 Virtual User account numbers, each Virtual User account number difference For user1, user2 ..., user8.Collection with label value " net purchase intelligent " is combined into:User1, user4, user8, there is mark The collection of label value " forum activist " is combined into:user1、user2、user8.Distributed in bitmap for 8 Virtual User account numbers Bitmap bits are followed successively by 1,2,3 ... 8, as shown in Figure 1;For label value " net purchase intelligent ", its corresponding bitmap is " 10010001 "; For label value " forum activist ", its corresponding bitmap includes 11000001.With bitmap corresponding to " net purchase intelligent " Exemplified by " 10010001 ", first " 1 " in bitmap represents that the Virtual User account number that bitmap bits are 1 is network intelligent, similar, Second " 1 " in bitmap represents that the Virtual User account number that bitmap bits are 4 is also net purchase intelligent, the 3rd " 1 " table in bitmap Show that the Virtual User account number that bitmap bits are 8 is also net purchase intelligent;Bitmap corresponding to label value " forum activist " The meaning of " 11000001 " expression is similar.As shown in Figure 1, user1 and user8 has " network intelligent " simultaneously and " forum is active Two label values of molecule ".
Next the application scenarios of the embodiment of the present invention are introduced, in practical application, client usually requires to pass through Server carries out data query, such as, when user end to server sends the tag queries request for some supporting body, clothes Device be engaged according to the tables of data prestored, is inquired about according to the mark of the supporting body from the tables of data corresponding with supporting body mark At least one label value, and by least one label corresponding at least one label value inquired be defined as the supporting body tool Some labels, the efficiency for now inquiring about data are higher.Again for example, when user end to server sends looking into for some label value When asking request, server is inquired about the label value of each supporting body, is somebody's turn to do with determining which supporting body has item by item according to the tables of data Label value, now inquire about the less efficient of data.It follows that how server stores the correspondence between supporting body and label value Relation, client after influence is carried out to the efficiency of data query by server.And the embodiment of the present invention is applied to service How device carries out the scene of data storage.
That is, date storage method provided in an embodiment of the present invention is applied in server.Wherein, server can be one Platform or multiple servers;Alternatively, multiple servers can provide database service in a manner of server cluster for terminal.One Kind may realize that database is provided with server, and the database can be HBase, Mongo database (Mongo Database, MongoDB), profile relational database services (Distribute Relational Database Service, DRDS), Volt databases (Volt Database, VoltDB) and ScaleBase distributed databases.
It should be noted that the method for data storage provided in an embodiment of the present invention mainly includes two-part content, one It is by least a data record storage into bitmap index, second, at least one data record is stored into tables of data. In order to subsequently be easy to illustrate, first bitmap index provided in an embodiment of the present invention and tables of data are introduced herein.
Wherein, tables of data is used for recording carrying body mark and the corresponding relation of label value, and bitmap index includes at least one Bitmap, each bitmap correspond to a label value, and each bitmap includes at least one bitmap bits, and each bitmap bits are used to record one Whether the corresponding supporting body of individual supporting body mark possesses the label value corresponding to current bitmap.
Further, in order to improve the efficiency of the inquiry data from tables of data and bitmap index, tables of data can be divided For M data partition, in a distributed manner by M data partition of different data storage data tables.Bitmap index is divided simultaneously For N number of bitmap index subregion, in a distributed manner by N number of data partition of different data storage data tables.That is, tables of data includes M data partition, bitmap index include N number of bitmap index subregion.
It is worth noting that, it is to inquire about the carrying for the ease of subsequently being identified according to supporting body to store data in tables of data Label value corresponding to body mark, therefore, can be suitably by tables of data in order to improve the efficiency that data are inquired about from tables of data First scope of each data partition sets a little bit smaller.And for the ease of subsequently according to mark when storing data in bitmap index Supporting body mark corresponding to the lookup of label value, because each bitmap index subregion includes all label values in tag definition table, because This, can be suitably by each bitmap index in bitmap index in order to improve the efficiency that data are inquired about from bitmap index subregion Second scope of subregion set it is big a bit.That is, for each bitmap index subregion in N number of bitmap index subregion, the position Index of the picture subregion includes the data at least one data partition.
That is, in embodiments of the present invention, M and N can meet following relation, and N is less than or equal to M, and N is more than or equal to 2, Each data partition in M data partition belongs to unique bitmap index subregion, each bitmap in N number of bitmap index subregion Index partition includes at least one data partition.
The optional mode of data partition is divided, data partition can be divided by specifying the quantity of data partition, or The subregion section of each data partition can directly be defined.In embodiments of the present invention, directly to define each data partition Subregion illustrates exemplified by section., will be every in order to subsequently be easy to illustrate when directly defining the subregion section of each data partition The collection of supporting body mark corresponding to individual data partition is collectively referred to as the first scope, namely supporting body mark corresponding to each data partition The set of knowledge is identical.Each data partition is used to store data record of the supporting body mark positioned at the subregion section, and often It is not present and occurs simultaneously between individual subregion section, avoids same data record from being stored in two different data partitions.
For example, following subregion section is set for tables of data:Data partition 1:[, a1), data partition 2:[a1,a2)、 Data partition 3:[a2, a3) ..., data partition 9:[a8,a9).Data partition 1 is used to store supporting body mark in subregion section [, a1) data record, data partition 2 be used to storing supporting body mark subregion section [a1, a2) data record, data point Area 3 be used for store supporting body mark subregion section [a2, a3) data record ..., data partition 9 be used for store supporting body mark Know subregion section [a8, a9) data record.Wherein, subregion section [, a1), [a1, a2), [a2, a3) ... and [a8, A9) it is not present and occurs simultaneously between any two.
In addition, each data partition of tables of data can fission or extend automatically.For example elapse over time, some The data of data partition are more and more, reach division threshold value in the data volume of this data partition, server can be by the number Carry out splitting into two data partitions according to subregion, so as to avoid the memory space due to this data partition can not after being filled with Continue to write new data to this data partition.
And the dividing mode of bitmap index subregion can be similar to the dividing mode of reference data subregion.Such as when data point When the dividing mode in area is defines the subregion section of each data partition, the dividing mode for dividing bitmap index subregion is also definition The subregion section of each bitmap index subregion.That is, the subregion section corresponding to each bitmap index subregion is set in advance for user Put, server marks off bitmap index subregion according to the subregion section pre-set.Wherein, each bitmap index subregion is corresponding Supporting body mark scope be also identical.In order to subsequently be easy to illustrate, by supporting body corresponding to each bitmap index subregion The scope of mark is referred to as the second scope.
For example, pre-set nonoverlapping subregion section [b0, c0), [c0, d0), [d0, e0), [e0, f0) and [f0, j0).Bitmap index can be marked off to bitmap index subregion 1, bitmap index subregion 2, bitmap rope according to several subregion sections Draw subregion 3, bitmap index subregion 4 and bitmap index subregion 5.Wherein, bitmap index subregion 1 is used to store carrying mark in subregion Section [b0, c0) in label value bitmap, bitmap index subregion 2 be used for store carrying mark subregion section [c0, d0) in Label value bitmap, bitmap index subregion 3 be used for store carrying mark subregion section [d0, e0) in label value position Figure, bitmap index subregion 4 be used to storing carrying mark subregion section [e0, f0) in label value bitmap, bitmap index point Area 5 be used for store carrying mark subregion section [f0, j0) in label value bitmap.
It should be noted that it is used for the position of label value for storing part carrying mark due to each bitmap index subregion Figure, therefore, for whole bitmap index, the bitmap of each label value is by the label value in each bitmap index subregion A part of bitmap is combined, and for convenience of description, a part of bitmap of the label value in a bitmap index subregion is referred to as The sub- bitmap of the label value, therefore, the bitmap of each label value are combined by corresponding sub- bitmap in all bitmap index subregions Form.
That is, the identical bitmap bits of the different sub- bitmaps in each bitmap index subregion correspond to identical supporting body Identify, the identical bitmap bits of the different sub- bitmaps in different bitmap index subregions correspond to the mark of the supporting body differed.
Due to being that all label values in tag definition table establish a sub- bitmap respectively in each bitmap index subregion, Therefore the quantity of the sub- bitmap in each bitmap index subregion is the quantity of whole labels in tag definition table.Such as, it is assumed that The total number of the label value set in tag definition table is 10, then in each bitmap index subregion is respectively tag definition table All label values establish respectively after sub- bitmap, the quantity of the sub- bitmap of each bitmap index subregion is also 10.
Alternatively, because bitmap index subregion extends or divides all bitmap index subregions that can cause in bitmap index Need to rebuild, cost is higher, and bitmap index subregion is arranged to divide or extend by server in the present embodiment.
Following two embodiments will be respectively used to explanation will at least a data record storage to bitmap index and tables of data Detailed process.
Fig. 2 is a kind of date storage method flow chart provided in an embodiment of the present invention, applied to by least one data Record storage to bitmap index scene.As shown in Fig. 2 the date storage method comprises the following steps:
Step 201:Obtain at least one data record, per data record include supporting body mark with it is at least one Label value.
Specifically, at least one source data is obtained, every source data includes supporting body mark and at least one label, for Every source data, according to the tag definition table pre-set, the label value of each label at least one label is determined, is obtained At least one label value.
Wherein, at least one source data can be to be stored in Hadoop distribution distributed file systems (Hadoop Distributed File System, HDFS) in data, that is, when client needs to store some data, this is counted According to server is sent to, server is first by the data storage in HDFS, afterwards, is entered by server according to source data in the HDFS The storage of row data.It should be noted that server can obtain at least one source data according to the path of acquiescence from HDFS, At least one source data can also be obtained from HDFS according to preset path, the embodiment of the present invention is not specifically limited herein.
In addition, tag definition table can be the information that server is obtained and stored in advance.Alternatively, tag definition table can be with Stored in the form of independent file, such as with extensible markup language (Extensible Markup Language, XML) file Form storage, can also store, such as be stored to ZooKeeper in third party's distributed memory system.
The tag definition pre-set token records default multiple label values.A kind of optional predetermined manner, according to going through History data set the label value that label includes, or the label value that artificially defined label includes.
Table 1 shows a kind of possible tag definition table.Certainly, table 1 is also possible to that more or less labels can be included, This is not limited.
Table 1
Label Label value Label configuration information
Sex Man, female Memory-resident
Educational background Training, undergraduate course, postgraduate, doctor Not memory-resident
Occupation Student, teacher, individual, enterprise staff Memory-resident
Net purchase madman Net purchase madman Memory-resident
Drug addict Drug addict Not memory-resident
Alternatively, as shown in table 1, tag definition table can also include label configuration information, and the label configuration information includes Represent label value and whether need memory-resident in, it is necessary to which whether the bitmap corresponding to the label value of memory-resident be also required to reside in Deposit, it is not necessary to which whether bitmap corresponding to the label value of memory-resident is also without residing in internal memory.
In table 1, mark " memory-resident " is provided with to the label value for needing memory-resident, to not needing memory-resident Label value is provided with mark " not memory-resident ".It should be appreciated that can also be to needing the label value of memory-resident to set mark " often In internal memory ", the label value that not need memory-resident can be not provided with identifying, table 1 sets the label value for not needing memory-resident It is only a kind of example to put mark " not memory-resident ".
Alternatively, tag definition table can also include the life cycle of each label value, and the life cycle refers to the label It is worth for the effective period;The other times of the life cycle are not belonging to, the label value is invalid.
Alternatively, server can also be that each label value in table 1 distributes a tag number.In storage label value and position Label value can be substituted during the mapping relations of figure with the tag number, storage tag number can save storage relative to storage label value Space.In addition, in, corresponding label value can be inquired according to tag number, can according to corresponding to being inquired about label value label Number.
For example, table 2 is a kind of form of source data provided in an embodiment of the present invention, every a line in table 2 represents a source There is unique supporting body to identify for data, every source data, and every source data also includes corresponding at least with supporting body mark One label.
Table 2
Supporting body identifies
a01 Sex:Man, educational background:Undergraduate course
a02 Sex:Female, educational background:Training, occupation:Individual, net purchase intelligent
b01 Sex:Man, educational background:Training, occupation:Enterprise staff
b02 Sex:Female, educational background:Undergraduate course, occupation:Student
c01 Sex:Man, educational background:Postgraduate,
c02 Sex:Female, educational background:Training, occupation:Enterprise staff, net purchase intelligent
d01 Sex:Man, educational background:Postgraduate, occupation:Enterprise staff, net purchase intelligent
d02 Sex:Female, educational background:Undergraduate course, occupation:Student
d03 Sex:Man, educational background:Training, occupation:Enterprise staff
e01 Sex:Man, educational background:Postgraduate, occupation:Individual, drug addict
e02 Sex:Female, educational background:Training, occupation:Individual
e03 Sex:Man, educational background:Undergraduate course, occupation:Student
f01 Sex:Man, educational background:Postgraduate, occupation:Teacher,
f02 Sex:Female, educational background:Training, occupation:Enterprise staff, net purchase intelligent
f03 Sex:Man, educational background:Undergraduate course, occupation:Student
f04 Sex:Female, educational background:Postgraduate, occupation:Individual, drug addict
For the source data shown in table 2, table 3 shows server according to corresponding to each bar source data that the determination of table 1 obtains Data record, wherein, one label value of content representation in table 3 in each [].Every a line in table 3 represents a data note Record, per data, record includes a supporting body mark and at least one label value.
Table 3
In embodiments of the present invention, after server gets at least one data record, can by default mapping/ Stipulations (map/reduce) model, realize at least one data storage in bitmap index.In order to subsequently be easy to illustrate, Default mapping/stipulations model is explained for this.
Default mapping/stipulations the model is a kind of model of parallel computation, mainly includes two calculating process, mapping process (map) and stipulations process (reduce), the type of mapping process namely the data stored as required is divided data record The process of class, stipulations process namely according to corresponding to the data record stipulations subregion by data record store into corresponding file Process.
Wherein, the default mapping/stipulations model includes multiple stipulations subregions, the corresponding data field of each stipulations subregion Between, each stipulations subregion is used to handle the data for belonging to the data interval, and is parallel processing side between different stipulations subregions Formula.Just because of being parallel processing manner between different stipulations subregions, therefore, by this preset mapping/stipulations model can be with Realize the bitmap of the label value concurrently determined in each bitmap index subregion.
In addition, mapping process is that every data record is concurrently mapped, therefore mapping/stipulations mould is preset by this Type can be parallel processing batch data, equally improve the efficiency of processing data.
It is worth noting that, in embodiments of the present invention, except will at least a data record storage into bitmap index, At least one data record can also be stored into tables of data, namely need simultaneously to store data record to tables of data and In bitmap index, therefore, for the ease of distinguishing tables of data and bitmap index, tables of data mark and bitmap index table mark is incorporated herein Know, wherein, tables of data identifies to be identified for unique mark bitmap index for unique identification data table, bitmap index table.
Due to need to store data record into tables of data in corresponding data partition and in bitmap index corresponding to In bitmap index subregion, therefore, the stipulations subregion of above-mentioned default mapping/stipulations model could be arranged to the data point of the tables of data The combination of the bitmap index subregion of area and the bitmap index, now, by this preset mapping/stipulations model can be directly by data Record storage is into corresponding data partition and bitmap index subregion., will be N number of with bitmap index in order to subsequently be easy to illustrate N number of stipulations subregion is referred to as the first stipulations subregion to bitmap index subregion correspondingly, by with M data partition of tables of data one by one Corresponding M stipulations subregion is referred to as the second stipulations subregion.
Correspondingly, the mapping process of the default mapping/stipulations model also includes two kinds of different mapping processing procedures, when Corresponding mapping process when inciting somebody to action at least a data record storage to bitmap index, referred to as first kind mapping are handled, second, near Corresponding mapping process during few a data record storage to tables of data, referred to as the second class mapping processing.
Similarly, the stipulations process of the default mapping/stipulations model also includes two kinds of different stipulations processing procedures, when Corresponding stipulations process when inciting somebody to action at least a data record storage to bitmap index, referred to as first kind stipulations are handled, second, near Corresponding stipulations process during few a data record storage to tables of data, the processing of referred to as the second class stipulations.
It follows that when getting at least one data record, in order to by default mapping/stipulations model by this at least In bitmap index subregion corresponding to a data record, N number of first stipulations point that default mapping/stipulations model includes need to be first determined Area.Specifically, the process can be realized by following step 302.
Step 202:It is determined that N number of first stipulations subregion in default mapping/stipulations model.
Specifically, it is determined that the partition information of bitmap index, the partition information of the bitmap index is used to describe the bitmap index In supporting body mark corresponding to each bitmap index subregion set.According to the partition information of the bitmap index, determine that this is pre- If N number of first stipulations subregion in mapping/stipulations model, the corresponding bitmap index subregion of each first stipulations subregion.That is, N number of first stipulations subregion is that the partition information of the N number of bitmap index subregion included according to bitmap index determines.
Wherein, it is notable that because each subregion section in tables of data represents the set of a supporting body mark, Each subregion section also illustrates that the set of a supporting body mark in bitmap index, therefore, if directly by dividing the tables of data The trivial subregion section as M the second stipulations subregions in the default mapping/stipulations model, by the subregion of the bitmap index The subregion section of N number of first stipulations subregion in mapping/stipulations model is preset in section as this, will cause N number of first stipulations point Common factor is there may be between area and M the second stipulations subregions.
Therefore, in order to avoid there may be common factor between different stipulations subregions, added for the subregion section of the bitmap index For identify bitmap index bitmap index table identify, by with the addition of the bitmap index table mark after bitmap index subregion Section is defined as the subregion section of N number of first stipulations subregion in the default mapping/stipulations model.That is, each first stipulations The partition information of subregion is made up of the supporting body mark of bitmap index table mark and pre-set interval scope.
For example, following subregion section is set for bitmap index in advance:
[b0, c0), [c0, d0), [d0, e0), [e0, f0) and [f0, j0).
At this point it is possible to following first stipulations subregion is set for the default mapping/stipulations model:
[B b0, Bc0), [Bc0, Bd0), [Bd0, Be0), [Be0, Bf0) and [Bf0, B j0).
Wherein, B is the mark for identifying bitmap index, namely bitmap index table mark.That is, the first stipulations subregion [B B0, Bc0), [Bc0, Bd0), [Bd0, Be0), [Be0, Bf0) and [Bf0, B j0) be and a pair of each bitmap index subregion 1 The stipulations subregion answered.
It is worth noting that, mapping/stipulations model is preset for this, because the first different stipulations subregions can be concurrently The data in the subregion section to belonging to the stipulations subregion are handled, therefore, need to be N number of according to this by least one data record First stipulations subregion is classified, and belongs to the number of the first stipulations subregion in order to which the first different stipulations subregions is accordingly handled According to.
That is, the supporting body mark included based on every data record, is included N number of according to default mapping/stipulations model The partition information of first stipulations subregion, first kind classification is carried out at least one data record, at least one first is obtained and reflects Penetrate set, the corresponding first stipulations subregion of each first mapping set, in order to first corresponding to the first stipulations multidomain treat-ment Data in mapping set.Specifically, the process can be realized by following step 203 to step 204.
Step 203:Mapping/stipulations model is preset by this first kind mapping concurrently is carried out at least one data record Processing, obtains at least one first mapping result, and each first mapping result includes bitmap index table mark, supporting body mark With at least one label value.
From step 202, this presets the subregion section of N number of first stipulations subregion in mapping/stipulations model actually It is not the subregion section of bitmap index Bitmap index partition, therefore, first kind mapping processing is predominantly remembered per data Record addition bitmap index table mark, in order to subsequently determine the first stipulations subregion corresponding to every data record.
That is, being recorded for every data, it is record addition bitmap index table mark per data, obtains the first mapping knot Fruit.
It should be noted that the default mapping/stipulations model is that bitmap index table mark is concurrently added into every number According in record, that is, this presets mapping/stipulations model, bitmap index table is identified added in being recorded per data simultaneously.Therefore, Default mapping/stipulations the model is remembered by bitmap index table mark added to the time in 1 data record and added to n datas The time of record is identical, improves the efficiency added at least one data record by bitmap index table mark.
In addition, for the first mapping result, the form of key-value (key-value) can be used to record the first mapping knot Fruit.Specifically, table 4 is a kind of form of first mapping result provided in an embodiment of the present invention, as shown in table 4, by the first mapping As a result bitmap index table mark and supporting body mark in are arranged to key jointly, by least one label in the first mapping result Value is arranged to the value of the key.
Table 4
Mapping result Key Value
First mapping result Bitmap index table mark+supporting body mark Label value list
, can also be right for each first mapping result when recording first mapping result using the form of key-value Remark information is added in the value answered, the remark information includes the generation time of each label value at least one label value, or often The internal identity mark (identification, ID) of individual label value.When remark information includes the inside ID of each label value, Show that the inside ID of the label value can used to replace the label value, to reduce the transmission quantity of data transmission procedure.
For example, B identifies for bitmap index table." a01- is recorded for the first data in table 3>{ sex:Man, educational background: Undergraduate course } ", the supporting body in the data record is identified as a01, and the data record includes two label value " sexes:Man " and " Go through:Undergraduate course ", the default mapping/stipulations model carry out first kind mapping processing to the data record, obtain the first mapping result, First mapping result includes bitmap index table mark B, supporting body identifies a01 and two label value " man " and " undergraduate course ".Simultaneously will First mapping result records according to the form shown in above-mentioned table 4, obtains following first mapping results as shown in table 5, also will It is Ba01 that first mapping result, which is recorded as key, is worth for { sex:Man, educational background:Undergraduate course } data.
Table 5
Key Value
Ba01 { sex:Man, educational background:Undergraduate course }
For the first mapping result shown in table 5, when system is label value " sex:The inside ID of man " configuration is 1, is mark Label value " sex:The inside ID of female " configuration is 2, is label value " educational background:The inside ID of undergraduate course " configuration is 3, is label value " Go through:When the inside ID of training " configuration is 4, now, value corresponding to the key Ba01 shown in above-mentioned table 5 can also be recorded as { 1,3 }.
It should be noted that for some supporting body mark, system may be supporting body mark configuration pair The bitmap bits answered, the bitmap bits that now the first mapping result also identifies including the supporting body.When system does not identify for the supporting body Corresponding to configuration during bitmap bits, now the first mapping result does not include the bitmap bits of supporting body mark.
When the first mapping result includes the bitmap bits of supporting body mark, if now still using key-value (key-value) Form record first mapping result, table 6 or the form of the mapping result shown in table 7 can be obtained.As shown in table 6, now The bitmap bits of bitmap index table mark, supporting body mark and supporting body mark are arranged to key jointly, value is still reflected for first At least one label value penetrated in result.
As shown in table 7, bitmap index table can also be identified and supporting body mark is arranged to key jointly, by least one mark The bitmap bits of label value and supporting body mark are arranged to value corresponding to the key jointly.
For example, record " a01- for the first data in table 3>{ sex:Man, educational background:Undergraduate course } ", if current system is Bitmap bits corresponding to a01 configurations are identified for the supporting body, and supporting body mark a01 bitmap bits are 5, now by the first mapping As a result recorded according to the form shown in above-mentioned table 6, obtain following first mapping results as shown in table 8, the first mapping result note It is (Ba01,5) to record as key, is worth for { sex:Man, educational background:Undergraduate course } data.
Table 6
Table 7
Table 8
Key Value
Ba01,5 { sex:Man, educational background:Undergraduate course }
Above-mentioned first kind mapping processing is carried out at least one data record when presetting mapping/stipulations model by this Afterwards, at least one first mapping result is obtained, that is, being recorded for every data, will all be obtained shown in above-mentioned table 4 or table 6 First mapping result.Afterwards, it is necessary to carry out first kind classification at least one first mapping result by following step 204.
Step 204:According to the partition information of N number of first stipulations subregion, at least one first mapping result is divided Class, obtain at least one first mapping set, the corresponding first stipulations subregion of each first mapping set.
Mapping/stipulations model is preset for this, different stipulations subregions concurrently can divide belonging to the stipulations subregion The data of trivial are handled, and therefore, at least one first mapping result, need to tie at least one first mapping Fruit is sorted out into corresponding first stipulations subregion.
For each first mapping result at least one first mapping result, according in first mapping result Supporting body identifies and bitmap index table mark, and first mapping result is searched from the subregion section of N number of first stipulations subregion In supporting body mark belonging to subregion section, to realize classification at least one first mapping result.After classification, obtain To at least one first mapping set, for each first mapping set, first mapping set reflects including at least one first Penetrate result.
Step 205:By at least one each self-corresponding first stipulations Paralleled of first mapping set to this extremely Few first mapping set carries out first kind stipulations processing, obtains the bitmap of the label value in each bitmap index subregion.
Because at least one each self-corresponding first stipulations subregion of first mapping set is concurrently at least one to this First mapping set carries out first kind stipulations processing, therefore below will be to be carried out to first mapping set at first kind stipulations The process of reason is explained.Specifically, first kind stipulations processing procedure is divided into following two processes:
(1) for each first mapping set, the first stipulations subregion corresponding to first mapping set is determined, and according to this Supporting body mark in first mapping set in each first mapping result, by the first stipulations subregion to first mapping ensemblen The first mapping result in conjunction is ranked up.
When server presets mapping/stipulations model by this carries out above-mentioned first kind stipulations processing, for every data Record, the first mapping result corresponding to the data record is present, and first mapping result is determined by step 204 The first affiliated mapping set.Now, will due to belonging to data record corresponding to the first mapping result of first mapping set Store into same bitmap index subregion, therefore server will be first by the first stipulations subregion corresponding to first mapping set The first mapping result for belonging to first mapping set is ranked up, successively first to reflect this according to the order after arrangement The data penetrated in set store into corresponding bitmap index subregion successively.
Wherein, the mode being ranked up to the first mapping result in first mapping set is usually default sort side Method, wherein, the default sort method is to be arranged according to the ascending order of the lexicographic order of supporting body mark, or is identified according to supporting body Lexicographic order descending arrangement, the embodiment of the present invention is not specifically limited herein.
Such as first mapping set include three the first mapping results, the supporting body mark in three first mapping results It is respectively a01, a02 and a03 to know, and these three first mapping results can be carried out successively according to a01, a02 and a03 order Sequence.
(2) for each first mapping result after sequence, according to ranking results, from corresponding with the first stipulations subregion Obtain the bitmap of each label value at least one label value that first mapping result includes in bitmap index subregion, and according to The bitmap bits of supporting body mark update the bitmap of the label value, each in bitmap index corresponding to the first stipulations subregion to obtain The bitmap of individual label value.
From step 203, recorded for every data, the first mapping result of the data record may include the carrying The bitmap bits of body mark, it is also possible to do not include the bitmap bits of supporting body mark, therefore the bitmap bits identified according to the supporting body The bitmap for updating each label value at least one label value that the first mapping result includes can have following two implementations:
First way, when first mapping result also includes the bitmap bits of supporting body mark, according to the supporting body mark The bitmap bits of knowledge, the bitmap of label value corresponding to renewal.
The second way, when first mapping result does not include the bitmap bits of supporting body mark, obtain the supporting body mark The bitmap bits of knowledge, and the bitmap bits identified according to the supporting body, the bitmap of label value corresponding to renewal.
No matter which kind of mode, the bitmap of each label value at least one label value for including of the first mapping result of renewal Need to first determine the supporting body mark bitmap bits, it is determined that the supporting body mark bitmap bits after, for the first mapping result Including at least one label value in each label value bitmap, the bitmap bits that the bitmap of the label value is identified in the supporting body On value be updated.
In embodiments of the present invention, the bitmap of label value can store by the way of shown in Fig. 1, that is, label value Numerical value of the bitmap in each bitmap bits is 0 or 1, now, by the bitmap of the label value in the bitmap bits that the supporting body identifies Value be updated, be also that value of the bitmap of the label value in the bitmap bits that the supporting body identifies is arranged to 1.
It should be noted that because the bitmap of each label value of above-mentioned determination is by the way that the bitmap of the label value is held at this Value in the bitmap bits of signal of carrier is arranged to 1 to realize.Therefore, in embodiments of the present invention, for each label value Bitmap, value of each label value in each bitmap bits is initialized in advance, that is, being arranged to 0.Afterwards, for some bitmap Each label value in index partition, when being presently processing first the first mapping result, determine this first first mapping As a result the bitmap bits of the supporting body mark included, include every at least one label value for first first mapping result Individual label value, value of the sub- bitmap of the label value in the bitmap bits that the supporting body identifies is changed to 1, that is, to this at least The sub- bitmap of one label value is updated, that is, being updated to the bitmap index subregion.For in the bitmap index subregion Other label values in addition at least one label value are without any processing, that is, the sub- bitmap of other label values is at this Value in the bitmap bits of supporting body mark is still 0, shows that supporting body mark does not have other label values.
After above-mentioned first the first mapping result has been handled, when handling second the first mapping result, and it is above-mentioned First the first mapping result of processing is essentially identical, and difference is, now, is tied according to above-mentioned first first mapping Fruit on the bitmap index subregion after the renewal of bitmap index subregion to continuing to update bitmap index subregion, that is, now right When the sub- bitmap of at least one label value in second the first mapping result is updated, first in the bitmap index subregion The sub- bitmap of at least one label value in first mapping result is in the bitmap bits of the supporting body mark of first mapping result Value be 1.
That is, the bitmap of each label value at least one label value that first mapping result includes is determined successively During, for each first mapping result, be according to upper first mapping result to the mark in bitmap index subregion The bitmap of label value be updated after on the basis of continue to be updated the bitmap of label value.
For example, table 9 is a kind of bitmap index of initialization provided in an embodiment of the present invention, and as shown in table 9, the bitmap index Including multiple bitmap index subregions, each bitmap index subregion includes the sub- bitmap of all label values, and the son of each label value Initialization value of the bitmap in each bitmap bits is 0.
Table 9
" a01- is recorded for the first data in table 3>{ sex:Man, educational background:Undergraduate course } " to the 9th data record “d03->{ sex:Man, educational background:Training, occupation:Enterprise staff } ", the first mapping result of 9 data record is the institute of table 5 The first mapping result shown, and the first mapping result that 9 data records is classified to bitmap index by above-mentioned steps 204 First stipulations subregion corresponding to subregion 1.And according in this 9 first mapping results supporting body identify a01, a02, b01, b02, C01, c02, d01, d02 and d03 are ranked up to 9 first mapping results, are followed successively by the first of the first data record Mapping result, the first mapping result of Article 2 data record ..., the 9th data record the first mapping result.
When it is determined that system, which is the supporting body, identifies a01, a02, b01, b02, c01, c02, d01, d02 and d03 configuration When bitmap bits are respectively 1,2,3,4,5,6,7,8 and 9, a01 the first mapping result is identified as supporting body, this second reflects Penetrating result includes two label values, " sex:Man " and " educational background:Undergraduate course ", as shown in Table 9, the two label values are in bitmap index The sub- bitmap of the sub- bitmap of corresponding first label value and the 4th label value in subregion 1, now, by the two label values Value of the sub- bitmap in bitmap bits 1 is updated to 1, obtains the bitmap index subregion 1 shown in table 10.
Table 10
A02 the first mapping result is identified as supporting body, first mapping result includes four label values, " sex: Female ", " educational background:Training ", " occupation:Individual " and " net purchase intelligent ", as shown in Table 9, this four label values are in bitmap index subregion 1 In corresponding sub- bitmap be the sub- bitmap of second label value, the sub- bitmap of the 3rd label value, the 6th label value sub- position The sub- bitmap of figure and penultimate label value, now, on the basis of table 10, continues to exist in the word bit figure of this four label values Value in bitmap bits 2 is updated to 1, obtains the bitmap index subregion 1 shown in table 11.
Table 11
The like, until 9 the second mapping result processing corresponding to this 9 data record are completed, realize this 9 Data record is stored in the bitmap index subregion 1, namely obtains the bitmap of each label value in bitmap index subregion 1.
Alternatively, the bitmap of label value can also be represented using array mode, and now the array of label value is used to represent this It is the bitmap bits of " 1 " in the bitmap of label value.For example, the corresponding bitmap of label value " drug addict " " [0000000001000000....] ", then the bitmap be also denoted as array [10], label value " net purchase intelligent " corresponding one Individual bitmap " [0100011000000100....] ", then the bitmap be also denoted as array [2,6,7,14].Wherein, using number Prescription formula represents the bitmap of label value, can save memory space.
Now, value of the bitmap of the label value in the bitmap bits that the supporting body identifies is updated, that is, in the mark Increase the bitmap bits of the mark of the supporting body in the array of label value newly.For example, for some label value, the mark of the supporting body is in son Corresponding bitmap bits are 3 in bitmap, and the initial sub- bitmap corresponding to the label value is [1,7], then by the bitmap of the label value After value in the bitmap bits of supporting body mark is updated, the sub- bitmap after label value renewal is [1,3,7].
In addition, when there is no to include bitmap bits that supporting body is identified in the first mapping result, show first to reflect to this Penetrate before data record is mapped corresponding to result, system also without for the supporting body mark configuration in the data record it is corresponding Bitmap bits, now, obtain the supporting body mark bitmap after, can also store the supporting body mark bitmap bits and should Corresponding relation between supporting body mark.
Specifically, the bitmap bits identified according to the supporting body and the supporting body identify, it is determined that for indicating the supporting body mark Know first pair of key assignments of the mapping relations of the bitmap bits, wherein, key is the mark of the supporting body, is worth for supporting body mark Bitmap bits.And the second pair of key assignments of mapping relations identified for indicating the bitmap bits to the supporting body is determined, wherein, key is should The bitmap bits of supporting body mark, are worth and are identified for the supporting body.And store first pair of key assignments and second pair of key assignments.
That is, between the bitmap bits and supporting body mark of supporting body mark reflected using two-way in embodiments of the present invention What the mode penetrated was stored, in order to bitmap bits according to corresponding to supporting body identifier lookup afterwards, or searched according to bitmap bits Corresponding supporting body mark.
Step 206:The bitmap of label value in obtained each bitmap index subregion is stored to corresponding bitmap index In subregion.
From step 205, for first mapping set, due at least the one of one the first mapping result of every determination Individual bitmap be it is determined that upper first mapping result at least one bitmap on the basis of determine, therefore, for any Individual first mapping result, when it is determined that obtaining at least one bitmap, need first to store at least one bitmap to first mapping As a result in corresponding bitmap index subregion, in order to post processing next first mapping result when, according to the mesh after renewal Mark index of the picture subregion continues to update.
Therefore, own when by the first stipulations subregion corresponding to first mapping set to belonging to first mapping set First mapping result has been carried out after the processing of first kind stipulations, you can obtains bitmap index corresponding with the first stipulations subregion The bitmap of each label value in subregion, now can be directly by the bitmap of each label value in obtained bitmap index subregion Store into the bitmap index subregion.
In embodiments of the present invention, can be by presetting mapping/stipulations model when getting at least one data record Based at least one data record, the bitmap of the label value in each bitmap index subregion that bitmap index includes is determined, with Realization stores at least one data record into corresponding bitmap index subregion.Because bitmap index includes at least one position Figure, the corresponding label value of each bitmap, therefore holding with the label value can be searched by bitmap index based on label value Signal of carrier, improve the efficiency that data query is carried out based on label value.In addition, can be simultaneously by default mapping/stipulations model The bitmap of the label value in each bitmap index subregion is determined capablely, improves the efficiency of data storage.
Fig. 3 is a kind of date storage method flow chart provided in an embodiment of the present invention, applied to by least one data Record storage to tables of data scene.As shown in figure 3, the date storage method comprises the following steps:
Step 301:Obtain at least one data record, per data record include supporting body mark with it is at least one Label value.
Wherein, the implementation of the step 201 in implementation Fig. 2 of step 301 is essentially identical, no longer explains in detail herein State.
It should be noted that step 201 as shown in Figure 2 is understood, presetting mapping/stipulations model includes M the second stipulations Subregion, the M the second stipulations subregions are the one-to-one stipulations subregions of the M data partition included with tables of data.And by this extremely Few a data record storage needs to preset the second class mapping processing and the second class of mapping/stipulations model by this into tables of data Stipulations processing.
That is, when getting at least one data record, in order to by default mapping/stipulations model by this at least one In data partition corresponding to data record, multiple first stipulations subregions that default mapping/stipulations model includes need to be first determined.Specifically Ground, the process can be realized by following step 302.
Step 302:It is determined that M the second stipulations subregions in default mapping/stipulations model.
Specifically, it is determined that the partition information of tables of data, the partition information of the tables of data is every in the tables of data for describing The set of supporting body mark corresponding to individual data partition.According to the partition information of the tables of data, the default mapping/stipulations are determined M the second stipulations subregions in model, the corresponding data partition of each second stipulations subregion.
Step 202 as shown in Figure 2 is understood, in order to avoid there may be common factor between different stipulations subregions, for the data The subregion section of table add for mark data table tables of data mark, by with the addition of tables of data mark after tables of data point The subregion section of trivial M the second stipulations subregion being defined as in the default mapping/stipulations model.That is, each second rule About the partition information of subregion is made up of the supporting body mark of tables of data mark and pre-set interval scope.
For example, following subregion section is set for tables of data in advance:
[,a1)、[a1,a2)、[a2,a3)、…、[a8,a9)。
At this point it is possible to following second stipulations subregion is set for the default mapping/stipulations model:
[,Aa1)、[Aa1,A a2)、[Aa2,Aa3)、…、[Aa8,Aa9)。
Wherein, A is mark for mark data table, namely tables of data mark.That is, the second stipulations subregion [, Aa1), [Aa1, A a2), [Aa2, Aa3) ..., [Aa8, Aa9) be and the one-to-one stipulations subregion of each data partition.
Similarly, because the second different stipulations subregions can be concurrently to the number in the subregion section for belonging to the stipulations subregion According to being handled, therefore, at least one data record need to be classified according to the M the second stipulations subregions, in order to not The second same stipulations subregion accordingly handles the data for belonging to the second stipulations subregion.
That is, the supporting body mark included based on every data record, M included according to default mapping/stipulations model The partition information of second stipulations subregion, two classification is carried out at least one data record, at least one second is obtained and reflects Penetrate set, the corresponding second stipulations subregion of each second mapping set, in order to first corresponding to the second stipulations multidomain treat-ment Data in mapping set.Specifically, the process can be realized by following step 303 to step 304.
Step 303:Concurrently at least one data record is carried out at the second class mapping by default mapping/stipulations model Reason, obtains at least one second mapping result, and each second mapping result includes tables of data mark, supporting body mark and at least one Individual label value.
From step 302, this presets the subregion section of M the second stipulations subregions in mapping/stipulations model actually It is not the subregion section of data partition in tables of data, therefore, second class mapping processing record will predominantly add per data Tables of data is added to identify, in order to subsequently determine the second stipulations subregion corresponding to every data record.
That is, being recorded for every data, the default mapping/stipulations model is that interpolation data table mark is recorded per data Know, obtain the second mapping result.
It should be noted that the default mapping/stipulations model is concurrently to be added to tables of data mark per data to remember In record, that is, tables of data mark is added to per data in record by the default mapping/stipulations model simultaneously.Therefore, this is default The time and the time phase added to n datas record that tables of data mark is added in 1 data record by mapping/stipulations model Together, the efficiency added at least one data record by tables of data mark is improved.
In addition, for the second mapping result, the form of key-value (key-value) can be used to record the second mapping knot Fruit.Specifically, table 12 is a kind of form of second mapping result provided in an embodiment of the present invention, as shown in table 12, for second Mapping result, tables of data mark and supporting body mark are arranged to key jointly, by least one label in the second mapping result Value is arranged to the value of the key.
Table 12
Mapping result Key Value
Second mapping result Tables of data mark+supporting body mark Label value list
Similarly, when recording second mapping result using the form of key-value, for each second mapping result, go back Remark information can be added in corresponding value, the remark information can be in the first mapping result in the step 203 in Fig. 2 Remark information.
For example, A identifies for tables of data." a01- is recorded for the first data in table 3>{ sex:Man, educational background:This Section } ", the supporting body in the data record is identified as a01, and the data record includes two label value " sexes:Man " and " educational background: Undergraduate course ", the default mapping/stipulations model carry out the second class mapping processing to the data record, obtain the second mapping result, and second Mapping result includes tables of data mark A, supporting body identifies a01 and two label value " man " and " undergraduate course ".Simultaneously by the second mapping As a result recorded according to the form shown in above-mentioned table 12, obtain following second mapping results as shown in table 13, also reflected second It is Aa01 to penetrate result and be recorded as key, is worth for { sex:Man, educational background:Undergraduate course } data.
Table 13
Key Value
Aa01 { sex:Man, educational background:Undergraduate course }
For the second mapping result shown in table 13, when system is label value " sex:The inside ID of man " configuration is 1, is Label value " sex:The inside ID of female " configuration is 2, is label value " educational background:The inside ID of undergraduate course " configuration is 3, is label value " Go through:When the inside ID of training " configuration is 4, now, value corresponding to the key Aa01 shown in above-mentioned table 13 can also be recorded as { 1,3 }.
Above-mentioned second class mapping processing is carried out at least one data record when presetting mapping/stipulations model by this Afterwards, at least one second mapping result is obtained, that is, being recorded for every data, second will be all obtained shown in above-mentioned table 12 and reflect Penetrate result.Afterwards, it is necessary to carry out two classification at least one second mapping result by following step 304.
Step 304:According to the partition information of M the second stipulations subregions, at least one second mapping result is divided Class, obtain at least one second mapping set, corresponding two the second stipulations subregions of each first mapping set.
Mapping/stipulations model is preset for this, different stipulations subregions concurrently can divide belonging to the stipulations subregion The data of trivial are handled, and therefore, at least one second mapping result, need to tie at least one second mapping Fruit is sorted out into corresponding second stipulations subregion.
For each second mapping result at least one second mapping result, according in second mapping result Supporting body identifies and tables of data mark, is searched from the subregion section of the M the second stipulations subregions in second mapping result Subregion section belonging to supporting body mark, to realize the classification at least one second mapping result.After classification, obtain to Few second mapping set, for each second mapping set, second mapping set includes at least one second mapping and tied Fruit.
Step 305:By at least one each self-corresponding second stipulations Paralleled of second mapping set to this extremely Few second mapping set carries out the processing of the second class stipulations, obtains the data in each data partition.
Because at least one each self-corresponding second stipulations subregion of second mapping set is concurrently at least one to this Second mapping set carries out the processing of the second class stipulations, therefore below will be to be carried out to second mapping set at the second class stipulations The process of reason is explained.Specifically, as the step 205 in Fig. 2, the second class stipulations processing procedure is also classified into following Two processes:
(1) for each second mapping set, the first stipulations subregion corresponding to second mapping set is determined, and according to this Supporting body mark in second mapping set in each second mapping result, by the second stipulations subregion to second mapping ensemblen The second mapping result in conjunction is ranked up.
Wherein, the realization being ranked up by the second stipulations subregion to the second mapping result in second mapping set Mode may be referred to step 205 in Fig. 2 by the first stipulations subregion to the first mapping result in first mapping set The implementation being ranked up, the embodiment of the present invention no longer elaborate herein.
(2) for each second mapping result after sequence, according to ranking results, according to ranking results, this is sequentially generated At least one of second mapping result records, and every record includes a supporting body mark and a label value, to obtain with being somebody's turn to do Data in data partition corresponding to second mapping set.
After the second mapping result in second mapping set is ranked up, mapping/rule can be preset by this The second stipulations subregion corresponding with second mapping set about in model, according to ranking results, each second mapping is tied successively Fruit is handled.
When the form of the second mapping result is the form of key-value (key-value) shown in table 12 in step 303, now, At least one record of second mapping result is generated, that is, deleting tables of data mark from the key of the second mapping result, is obtained Key is supporting body Data Identification and value is the data of at least one label value, and obtained data are converted into at least one note Record, every record include supporting body mark and a label value.
At this time it is also possible to export at least one record using key-value form, that is, for every record, this is carried Body mark is used as key, using a label value as value, obtains the record of key-value form.
For example, for the second mapping result in table 13, following two as shown in table 14 can be obtained by step 305 Record.As shown in table 14, the key of first record is a01, is worth for { sex:Man }, the key of Article 2 record is a01, is worth { to learn Go through:Undergraduate course }.
Table 14
Key Value
a01 Sex:Man
a01 Educational background:Undergraduate course
Step 306:By the data storage of obtained each data partition into corresponding data partition.
, can be direct when obtaining the data in each data partition by step 305 for each first mapping set By the data storage of each data partition into corresponding data partition.Because the second different mapping sets is to be performed in parallel Above-mentioned steps 305, that is, in embodiments of the present invention, different numbers will be belonged to by presetting mapping/stipulations model can be realized by this Concurrently stored into corresponding data partition according to the data record of subregion, so as to improve the efficiency of data storage.
In embodiments of the present invention, can be by presetting mapping/stipulations model when getting at least one data record Based at least one data record, determine the data in each data partition that tables of data includes, with realize by this at least one Data record storage is into corresponding data partition, in order to subsequently inquire about the mark that some supporting body has based on the tables of data Label.In addition, can concurrently determine the data in each data partition by default mapping/stipulations model, storage number is improved According to efficiency.
It should be noted that be parallel processing due to presetting the different stipulations subregions in mapping/stipulations model, therefore, In embodiments of the present invention, N number of first stipulations subregion and M the second stipulations subregion included by presetting mapping/stipulations model, It can realize according at least one data record while build tables of data and bitmap index.Following embodiments will be detailed to this progress Describe in detail bright.
Referring to Fig. 4, the embodiments of the invention provide a kind of date storage method, at least one data record is same When store the scene into tables of data and bitmap index, as shown in figure 4, this method comprises the following steps:
Step 401:Obtain at least one data record, per data record include supporting body mark with it is at least one Label value.
Wherein, the implementation of the step 201 in implementation Fig. 2 of step 401 is essentially identical, no longer explains in detail herein State.
After at least one data record is obtained, by following step 402 to step 406 simultaneously build bitmap index and Tables of data.
Step 402:It is determined that N number of first stipulations subregion and M the second stipulations subregion in default mapping/stipulations model.
Wherein, the implementation of step 402 may be referred to the realization side of the step 202 in Fig. 2 and the step 302 in Fig. 3 Formula.
That is, in order to realize while build bitmap index and tables of data, when obtaining at least one data record, Ke Yitong When obtain N number of first stipulations subregion one-to-one with N number of bitmap index subregion and individual with the one-to-one M of M data partition Second stipulations subregion.
Step 403:Mapping/stipulations model is preset by this first kind mapping concurrently is carried out at least one data record Processing, obtains at least one first mapping result, and each first mapping result includes bitmap index table mark, supporting body mark With at least one label value;Meanwhile the second class is concurrently carried out at least one data record by default mapping/stipulations model Mapping is handled, and obtains at least one second mapping result, each second mapping result include tables of data mark, supporting body mark and At least one label value.
Wherein, the implementation of step 403 may be referred to the realization side of the step 203 in Fig. 2 and the step 303 in Fig. 3 Formula.
That is, in embodiments of the present invention, the step in first kind mapping processing and Fig. 3 in the step 203 in Fig. 2 The second class mapping processing in 303 can with parallel processing, with realize obtain simultaneously every data record the first mapping result and Second mapping result.
Step 404:According to the partition information of N number of first stipulations subregion, at least one first mapping result is divided Class, obtain at least one first mapping set, the corresponding first stipulations subregion of each first mapping set;Meanwhile according to M The partition information of second stipulations subregion, at least one second mapping result is classified, obtain at least one second mapping Set, corresponding two the second stipulations subregions of each first mapping set.
Wherein, the implementation of step 404 may be referred to the realization side of the step 204 in Fig. 2 and the step 304 in Fig. 3 Formula.
That is, the two classification in step 304 in first kind classification and Fig. 3 in step 204 in Fig. 2 can be simultaneously Row processing, is classified at least one first mapping result and at least one second mapping result is divided simultaneously with realizing Class.
Step 405:By at least one each self-corresponding first stipulations Paralleled of first mapping set to this extremely Few first mapping set carries out first kind stipulations processing, obtains the bitmap of the label value in each bitmap index subregion;Together When, at least one second mapping by least one each self-corresponding second stipulations Paralleled of second mapping set Set carries out the processing of the second class stipulations, obtains the data in each data partition.
Wherein, the implementation of step 405 may be referred to the realization side of the step 205 in Fig. 2 and the step 305 in Fig. 3 Formula.
That is, in embodiments of the present invention, between either N number of first stipulations subregion, or M the second stipulations subregions it Between, or between the first stipulations subregion and the second stipulations subregion, each stipulations subregion is concurrently to be in belonging to respective number According to.It that is to say, processing data is separate between each stipulations subregion, and each bitmap index subregion is determined simultaneously to realize In data and each data partition in data.
Step 406:The bitmap of label value in obtained each bitmap index subregion is stored to corresponding bitmap index In subregion;Meanwhile by the data storage of obtained each data partition into corresponding data partition.
Wherein, the implementation of step 406 may be referred to the realization side of the step 206 in Fig. 2 and the step 306 in Fig. 3 Formula.
Due to being separate between different stipulations subregions, therefore can realize while by least one data record Store into bitmap index and tables of data.
In embodiments of the present invention, can be by presetting mapping/stipulations model when getting at least one data record Based at least one data record, at least one data record is stored to corresponding bitmap index subregion and data simultaneously In subregion, in order to subsequently inquire about supporting body mark corresponding to some label value or based on the data based on the bitmap index subregion The label that some supporting body has is inquired about in subregion inquiry, improves the efficiency of data storage.
The embodiment of the present invention has also carried a kind of data storage device, joined except providing date storage method described above See Fig. 5 A, the data storage device 500 includes acquisition module 501, the first sort module 502, the first protocol module 503 and first Memory module 504.
Acquisition module 501, for performing the step 201 in above-mentioned Fig. 2 or the step 301 in Fig. 3;
First sort module 502, for the supporting body mark included based on every data record, according to default mapping/rule The about partition information for N number of first stipulations subregion that model includes, first kind classification is carried out at least one data record, obtained To at least one first mapping set, the corresponding first stipulations subregion of each first mapping set;
Wherein, N number of first stipulations subregion is the partition information of the N number of bitmap index subregion included according to bitmap index Determine, N is positive integer, and the corresponding first stipulations subregion of each bitmap index subregion, each bitmap index subregion is included extremely A few bitmap, each bitmap correspond to a label value, and each bitmap includes at least one bitmap bits, and each bitmap bits are used for Whether one corresponding supporting body of supporting body mark of record possesses the label value corresponding to current bitmap;
First protocol module 503, for performing the step 205 in above-mentioned Fig. 2;
First memory module 504, for performing the step 206 in above-mentioned Fig. 2.
Alternatively, the partition information of each first stipulations subregion is held by bitmap index table mark and pre-set interval scope Signal of carrier forms;
Referring to Fig. 5 B, first sort module 502 includes the first map unit 5021 and the first taxon 5022:
First map unit 5021, for performing the step 203 in above-mentioned Fig. 2;
First taxon 5022, for performing the step 204 in above-mentioned Fig. 2.
Alternatively, first protocol module 503 includes:
Determining unit, for for each first mapping set, determining the first stipulations corresponding to first mapping set Subregion;
Sequencing unit, for being identified according to the supporting body in each first mapping result in first mapping set, lead to The first stipulations subregion is crossed to be ranked up the first mapping result in first mapping set;
Updating block, for for each first mapping result after sequence, according to ranking results, being advised from described first Each label at least one label value that first mapping result includes about is obtained in bitmap index subregion corresponding to subregion The bitmap of value, and the bitmap bits identified according to the supporting body update the bitmap of the label value.
Alternatively, first protocol module 503 also includes:
First execution unit, when first mapping result also includes the bitmap bits of supporting body mark, perform according to institute State the operation of the bitmap of the bitmap bits renewal label value of supporting body mark;Or
Second execution unit, when first mapping result does not include the bitmap bits of supporting body mark, held described in acquisition The bitmap bits of signal of carrier, and perform the operation that the bitmap bits identified according to the supporting body update the bitmap of the label value.
Alternatively, second execution unit, is additionally operable to:
Store the corresponding relation between the bitmap bits and supporting body mark of the supporting body mark.
Alternatively, the device 500 also includes:
First determining module, for determining the partition information of the bitmap index, the partition information of the bitmap index is used In the set for describing the supporting body mark in the bitmap index corresponding to each bitmap index subregion;
Second determining module, for the partition information according to the bitmap index, determine the default mapping/stipulations model In N number of first stipulations subregion.
Alternatively, the second sort module 505, the second protocol module 506 and second are also included referring to Fig. 5 C, the device 500 Memory module 507:
Second sort module 505, for the supporting body mark included based on every data record, reflected according to described preset / the partition information of the M that includes the second stipulations subregions of stipulations model is penetrated, the second class point is carried out at least one data record Class, obtain at least one second mapping set, the corresponding second stipulations subregion of each second mapping set;
Wherein, the M the second stipulations subregions are that the partition information of the M data partition included according to tables of data determines , M is positive integer, and the corresponding second stipulations subregion of each data partition, each data partition identifies for recording carrying body With the corresponding relation of label value;
Second protocol module 506, for performing the step 305 in Fig. 3;
Second memory module 507, for performing the step 306 in Fig. 3.
Alternatively, the partition information of each second stipulations subregion is by supporting body tables of data mark and pre-set interval scope Supporting body mark composition;
Referring to Fig. 5 D, second sort module 505 includes the second map unit 5051 and the second taxon 5052:
Second map unit 5051, for performing the step 304 in Fig. 3;
Second taxon 5052, for performing the step 305 in Fig. 3.
Alternatively, N is less than or equal to M, and each data partition that N is more than or equal in 2, M data partition belongs to unique Bitmap index subregion, each bitmap index subregion in N number of bitmap index subregion includes at least one data partition.
In embodiments of the present invention, can be by presetting mapping/stipulations model when getting at least one data record Based at least one data record, the bitmap of the label value in each bitmap index subregion that bitmap index includes is determined, with Realization stores at least one data record into corresponding bitmap index subregion.Because bitmap index includes at least one position Figure, the corresponding label value of each bitmap, therefore holding with the label value can be searched by bitmap index based on label value Signal of carrier, improve the efficiency that data query is carried out based on label value.In addition, can be simultaneously by default mapping/stipulations model The bitmap of the label value in each bitmap index subregion is determined capablely, improves the efficiency of data storage.
Fig. 6 show the schematic diagram of another data storage device provided in an embodiment of the present invention.Data storage device 600 Can be computer equipment, the computer equipment can be above-mentioned server, and data storage device 600 includes at least one place Manage device 601, communication bus 602, memory 603 and at least one communication interface 604.
Processor 601 can be a general central processor (CPU), microprocessor, ASIC (application-specific integrated circuit, ASIC), or it is one or more for controlling the present invention program The integrated circuit that program performs.
Communication bus 602 may include a path, and information is transmitted between said modules.The communication interface 604, using appoint A kind of device of what transceiver, for other equipment or communication, such as Ethernet, wireless access network (RAN), wirelessly LAN (Wireless Local Area Networks, WLAN) etc..
Memory 603 can be read-only storage (read-only memory, ROM) or can store static information and instruction Other kinds of static storage device, random access memory (random access memory, RAM) or letter can be stored Breath and other kinds of dynamic memory or the EEPROM (Electrically of instruction Erasable Programmable Read-Only Memory, EEPROM), read-only optical disc (Compact Disc Read- Only Memory, CD-ROM) or other optical disc storages, laser disc storage (including compression laser disc, laser disc, laser disc, digital universal Laser disc, Blu-ray Disc etc.), magnetic disk storage medium or other magnetic storage apparatus or can be used in carrying or store with referring to The desired program code of order or data structure form simultaneously can be by any other medium of computer access, but not limited to this. Memory can be individually present, and be connected by bus with processor.Memory can also integrate with processor.
Wherein, the memory 603 is used to store the program code for performing the present invention program, and is controlled by processor 601 System performs.The processor 601 is used to perform the program code stored in the memory 603.
In the specific implementation, as a kind of embodiment, processor 601 can include one or more CPU, such as in Fig. 6 CPU0 and CPU1.
In the specific implementation, as a kind of embodiment, data storage device 600 can include multiple processors, such as Fig. 6 In processor 601 and processor 608.Each in these processors can be monokaryon (single-CPU) processing Device or multinuclear (multi-CPU) processor.Here processor can refer to one or more equipment, circuit, And/or the process cores for processing data (such as computer program instructions).
In the specific implementation, as a kind of embodiment, data storage device 600 can also include output equipment 605 and defeated Enter equipment 606.Output equipment 605 and processor 601 communicate, and can carry out display information in many ways.For example, output equipment 605 can be liquid crystal display (liquid crystal display, LCD), Light-Emitting Diode (light emitting Diode, LED) display device, cathode-ray tube (cathode ray tube, CRT) display device, or projecting apparatus (projector) etc..Input equipment 606 and processor 601 communicate, and can receive the input of user in many ways.It is for example, defeated It can be mouse, keyboard, touch panel device or sensing equipment etc. to enter equipment 606.
Above-mentioned data storage device 600 can be that an a general purpose computing device either special-purpose computer is set It is standby.In the specific implementation, data storage device 600 can be desktop computer, portable computer, the webserver, palm PC (Personal Digital Assistant, PDA), cell phone, tablet personal computer, wireless terminal device, communication equipment, insertion Formula equipment or the equipment for having similar structures in Fig. 6.The embodiment of the present invention does not limit the data storage device 600 of user password management Type.
One or more software modules are stored in the memory of data storage device.Data storage device can pass through place The program code in device and memory is managed to realize software module, realizes the date storage method described in above-described embodiment.
The application one embodiment additionally provides a kind of computer-readable storage medium, and finger is stored with the computer-readable storage medium Order;Data storage device (can be computer equipment, such as server) performs the processing in the instruction, such as computer equipment Device performs the instruction so that the data storage device realizes the date storage method described in above-described embodiment.
The embodiment of the present application provides a kind of computer program product, and the computer program product includes instruction;Data storage Device (can be computer equipment, such as server) performs the instruction so that it is real that the data storage device performs the above method Apply the date storage method of example.
The embodiment described above provided for the application, it is all in spirit herein and original not to limit the application Within then, any modification, equivalent substitution and improvements made etc., it should be included within the protection domain of the application.

Claims (20)

1. a kind of date storage method, it is characterised in that methods described includes:
At least one data record is obtained, record includes a supporting body mark and at least one label value per data;
The supporting body mark included based on every data record, N number of first stipulations point included according to default mapping/stipulations model The partition information in area, first kind classification is carried out at least one data record, obtains at least one first mapping set, often The corresponding first stipulations subregion of individual first mapping set;
Wherein, N number of first stipulations subregion is that the partition information of the N number of bitmap index subregion included according to bitmap index determines , N is positive integer, and the corresponding first stipulations subregion of each bitmap index subregion, each bitmap index subregion includes at least one Individual bitmap, each bitmap correspond to a label value, and each bitmap includes at least one bitmap bits, and each bitmap bits are used to record Whether one corresponding supporting body of supporting body mark possesses the label value corresponding to current bitmap;
To described at least one by least one each self-corresponding first stipulations Paralleled of first mapping set One mapping set carries out first kind stipulations processing, obtains the bitmap of the label value in each bitmap index subregion;
The bitmap of label value in obtained each bitmap index subregion is stored into corresponding bitmap index subregion.
2. according to the method for claim 1, it is characterised in that the partition information of each first stipulations subregion is by bitmap rope Draw the supporting body mark composition of table mark and pre-set interval scope;
The partition information of the N number of first stipulations subregion included according to default mapping/stipulations model, counted to described at least one First kind classification is carried out according to record, obtains at least one first mapping set, including:
First kind mapping processing is concurrently carried out at least one data record by the default mapping/stipulations model, Obtain at least one first mapping result, each first mapping result include bitmap index table mark, supporting body mark and At least one label value;
According to the partition information of N number of first stipulations subregion, at least one first mapping result is classified, obtained At least one first mapping set.
3. according to the method for claim 2, it is characterised in that it is described by least one first mapping set each First kind stipulations processing is carried out at least one first mapping set corresponding first stipulations Paralleled, obtained each The bitmap of label value in bitmap index subregion, including:
For each first mapping set, the first stipulations subregion corresponding to first mapping set is determined;
Identified according to the supporting body in each first mapping result in first mapping set, pass through the first stipulations subregion The first mapping result in first mapping set is ranked up;
For each first mapping result after sequence, according to ranking results, from bitmap corresponding with the first stipulations subregion The bitmap of each label value at least one label value that first mapping result includes is obtained in index partition, and according to institute The bitmap bits for stating supporting body mark update the bitmap of the label value.
4. according to the method for claim 3, it is characterised in that the bitmap bits identified according to the supporting body update institute Before the bitmap for stating label value, in addition to:
When first mapping result also includes the bitmap bits of supporting body mark, the bitmap identified according to the supporting body is performed Position updates the operation of the bitmap of the label value;Or
When first mapping result does not include the bitmap bits of supporting body mark, the bitmap bits of the supporting body mark are obtained, And perform the operation that the bitmap bits identified according to the supporting body update the bitmap of the label value.
5. according to the method for claim 4, it is characterised in that after the bitmap bits for obtaining the supporting body mark, Also include:
Store the corresponding relation between the bitmap bits and supporting body mark of the supporting body mark.
6. according to any described methods of claim 1-5, it is characterised in that the carrying included based on every data record Body identifies, the partition information of the N number of first stipulations subregion included according to default mapping/stipulations model, is counted to described at least one Before first kind classification being carried out according to record, in addition to:
The partition information of the bitmap index is determined, the partition information of the bitmap index is used to describe every in the bitmap index The set of supporting body mark corresponding to individual bitmap index subregion;
According to the partition information of the bitmap index, N number of first stipulations subregion in the default mapping/stipulations model is determined.
7. according to the method for claim 1, it is characterised in that after at least one data record of the acquisition, in addition to:
The supporting body mark included based on every data record, M second rule included according to the default mapping/stipulations model The about partition information of subregion, two classification is carried out at least one data record, obtains at least one second mapping ensemblen Close, the corresponding second stipulations subregion of each second mapping set;
Wherein, the M the second stipulations subregions are that the partition information of the M data partition included according to tables of data determines that M is Positive integer, the corresponding second stipulations subregion of each data partition, each data partition are used for recording carrying body mark and label The corresponding relation of value;
To described at least one by least one each self-corresponding second stipulations Paralleled of second mapping set Two mapping sets carry out the processing of the second class stipulations, obtain the data in each data partition;
By the data storage of obtained each data partition into corresponding data partition.
8. according to the method for claim 7, it is characterised in that the partition information of each second stipulations subregion is by supporting body Tables of data identifies and the supporting body of pre-set interval scope mark composition;
The partition information of the M included according to the default mapping/stipulations model the second stipulations subregions, to described at least one Data record carries out two classification, including:
Concurrently the second class mapping is carried out by the default mapping/stipulations model at least one data record to handle, Obtain at least one second mapping result, each second mapping result includes tables of data mark, supporting body mark and at least One label value;
According to the partition information of the M the second stipulations subregions, at least one second mapping result is classified, obtained At least one second mapping set.
9. method according to any one of claims 1 to 8, it is characterised in that N is less than or equal to M, and N is more than or equal to 2, M Each data partition in data partition belongs to unique bitmap index subregion, each bitmap rope in N number of bitmap index subregion Draw subregion and include at least one data partition.
10. a kind of data storage device, it is characterised in that described device includes:
Acquisition module, for obtaining at least one data record, per data, record includes a supporting body mark and at least one Individual label value;
First sort module, for the supporting body mark included based on every data record, according to default mapping/stipulations model bag The partition information of the N number of first stipulations subregion included, first kind classification is carried out at least one data record, obtains at least one Individual first mapping set, the corresponding first stipulations subregion of each first mapping set;
Wherein, N number of first stipulations subregion is that the partition information of the N number of bitmap index subregion included according to bitmap index determines , N is positive integer, and the corresponding first stipulations subregion of each bitmap index subregion, each bitmap index subregion includes at least one Individual bitmap, each bitmap correspond to a label value, and each bitmap includes at least one bitmap bits, and each bitmap bits are used to record Whether one corresponding supporting body of supporting body mark possesses the label value corresponding to current bitmap;
First protocol module, for by least one each self-corresponding first stipulations Paralleled of first mapping set First kind stipulations processing is carried out at least one first mapping set, obtains label value in each bitmap index subregion Bitmap;
First memory module, for the bitmap of the label value in obtained each bitmap index subregion to be stored to corresponding bitmap In index partition.
11. device according to claim 10, it is characterised in that the partition information of each first stipulations subregion is by bitmap Concordance list identifies and the supporting body of pre-set interval scope mark composition;
First sort module includes:
First map unit, for concurrently being entered by the default mapping/stipulations model at least one data record The mapping of the row first kind is handled, and obtains at least one first mapping result, each first mapping result includes the bitmap index table Mark, supporting body mark and at least one label value;
First taxon, for the partition information according to N number of first stipulations subregion, at least one first mapping As a result classified, obtain at least one first mapping set.
12. device according to claim 11, it is characterised in that first protocol module includes:
Determining unit, for for each first mapping set, determining the first stipulations subregion corresponding to first mapping set;
Sequencing unit, for being identified according to the supporting body in each first mapping result in first mapping set, pass through institute The first stipulations subregion is stated to be ranked up the first mapping result in first mapping set;
Updating block, for for each first mapping result after sequence, according to ranking results, dividing from first stipulations Each label value at least one label value that first mapping result includes is obtained in bitmap index subregion corresponding to area Bitmap, and the bitmap bits identified according to the supporting body update the bitmap of the label value.
13. device according to claim 12, it is characterised in that first protocol module also includes:
First execution unit, when first mapping result also includes the bitmap bits of supporting body mark, perform and held according to The bitmap bits of signal of carrier update the operation of the bitmap of the label value;Or
Second execution unit, when first mapping result does not include the bitmap bits of supporting body mark, obtain the supporting body The bitmap bits of mark, and perform the operation that the bitmap bits identified according to the supporting body update the bitmap of the label value.
14. device according to claim 13, it is characterised in that second execution unit, be additionally operable to:
Store the corresponding relation between the bitmap bits and supporting body mark of the supporting body mark.
15. according to any described devices of claim 10-14, it is characterised in that described device also includes:
First determining module, for determining the partition information of the bitmap index, the partition information of the bitmap index is used to retouch State the set of the supporting body mark in the bitmap index corresponding to each bitmap index subregion;
Second determining module, for the partition information according to the bitmap index, determine in the default mapping/stipulations model N number of first stipulations subregion.
16. device according to claim 10, it is characterised in that described device also includes:
Second sort module, for the supporting body mark included based on every data record, according to the default mapping/stipulations mould The partition information of M that type includes the second stipulations subregions, two classification is carried out at least one data record, obtain to Few second mapping set, the corresponding second stipulations subregion of each second mapping set;
Wherein, the M the second stipulations subregions are that the partition information of the M data partition included according to tables of data determines that M is Positive integer, the corresponding second stipulations subregion of each data partition, each data partition are used for recording carrying body mark and label The corresponding relation of value;
Second protocol module, for by least one each self-corresponding second stipulations Paralleled of second mapping set The processing of second class stipulations is carried out at least one second mapping set, obtains the data in each data partition;
Second memory module, for by the data storage of obtained each data partition into corresponding data partition.
17. device according to claim 16, it is characterised in that the partition information of each second stipulations subregion is by carrying Volume data table identifies and the supporting body of pre-set interval scope mark composition;
Second sort module includes:
Second map unit, for concurrently being entered by the default mapping/stipulations model at least one data record The mapping of the second class of row is handled, and obtains at least one second mapping result, each second mapping result include tables of data mark, Supporting body identifies and at least one label value;
Second taxon, for the partition information according to the M the second stipulations subregions, at least one second mapping As a result classified, obtain at least one second mapping set.
18. according to any described device of claim 10 to 17, it is characterised in that N is less than or equal to M, and N is more than or equal to 2, Each data partition in M data partition belongs to unique bitmap index subregion, each bitmap in N number of bitmap index subregion Index partition includes at least one data partition.
19. a kind of data storage device, it is characterised in that described device includes:Memory and processor, deposit in the memory Instruction is contained, the instruction data storage device that the processor stores by performing in the memory realizes that right such as will Seek 1 to 9 any described date storage method.
A kind of 20. computer-readable recording medium, it is characterised in that instruction is stored with the computer-readable recording medium, Data storage device is instructed so that data storage device realizes any described data storage side of claim 1 to 9 shown in performing Method.
CN201710841916.3A 2017-09-18 2017-09-18 Data storage method, device and storage medium Active CN107704527B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201710841916.3A CN107704527B (en) 2017-09-18 2017-09-18 Data storage method, device and storage medium
PCT/CN2018/087377 WO2019052209A1 (en) 2017-09-18 2018-05-17 Data storage method and device, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710841916.3A CN107704527B (en) 2017-09-18 2017-09-18 Data storage method, device and storage medium

Publications (2)

Publication Number Publication Date
CN107704527A true CN107704527A (en) 2018-02-16
CN107704527B CN107704527B (en) 2020-05-08

Family

ID=61172880

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710841916.3A Active CN107704527B (en) 2017-09-18 2017-09-18 Data storage method, device and storage medium

Country Status (2)

Country Link
CN (1) CN107704527B (en)
WO (1) WO2019052209A1 (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109471874A (en) * 2018-10-30 2019-03-15 华为技术有限公司 Data analysis method, device and storage medium
WO2019052209A1 (en) * 2017-09-18 2019-03-21 华为技术有限公司 Data storage method and device, and storage medium
CN109656948A (en) * 2018-11-15 2019-04-19 金蝶软件(中国)有限公司 Bitmap data processing method, device, computer equipment and storage medium
CN110209348A (en) * 2019-04-17 2019-09-06 腾讯科技(深圳)有限公司 Date storage method, device, electronic equipment and storage medium
CN110297836A (en) * 2019-07-11 2019-10-01 杭州云梯科技有限公司 User tag storage method and search method based on compress bitmap mode
CN111259005A (en) * 2020-01-08 2020-06-09 北京每日优鲜电子商务有限公司 Model calling method and device and computer storage medium
CN112084245A (en) * 2020-09-03 2020-12-15 深圳力维智联技术有限公司 Data management method, device and equipment based on micro-service architecture and storage medium
CN112307264A (en) * 2020-10-22 2021-02-02 深圳市欢太科技有限公司 Data query method and device, storage medium and electronic equipment
CN112328595A (en) * 2020-10-30 2021-02-05 上海钐昆网络科技有限公司 Data searching method, device, equipment and storage medium
CN112532748A (en) * 2020-12-24 2021-03-19 北京百度网讯科技有限公司 Message pushing method, device, equipment, medium and computer program product
CN112905587A (en) * 2019-12-04 2021-06-04 北京金山云网络技术有限公司 Database data management method and device and electronic equipment
CN113068045A (en) * 2021-03-17 2021-07-02 厦门雅基软件有限公司 Data storage method and device, electronic equipment and computer readable storage medium
CN113590856A (en) * 2021-08-09 2021-11-02 平安银行股份有限公司 Label query method and device, electronic equipment and readable storage medium
CN113722533A (en) * 2021-08-30 2021-11-30 康键信息技术(深圳)有限公司 Information pushing method and device, electronic equipment and readable storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102722531A (en) * 2012-05-17 2012-10-10 北京大学 Query method based on regional bitmap indexes in cloud environment
CN104156407A (en) * 2014-07-29 2014-11-19 华为技术有限公司 Index data storage method, index data storage device and storage equipment
CN104731872A (en) * 2015-03-05 2015-06-24 长沙新弘软件有限公司 Bitmap-based storage space management system and method thereof
US20160147807A1 (en) * 2014-01-27 2016-05-26 Umbel Corporation Systems and methods of generating and using a bitmap index
CN106970935A (en) * 2017-01-20 2017-07-21 朗坤智慧科技股份有限公司 Real-time data memory structure, method for writing data and method for reading data

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8874849B2 (en) * 2010-04-21 2014-10-28 Empire Technology Development Llc Sectored cache with a tag structure capable of tracking sectors of data stored for a particular cache way
CN106201338B (en) * 2016-06-28 2019-10-22 华为技术有限公司 Date storage method and device
CN107704527B (en) * 2017-09-18 2020-05-08 华为技术有限公司 Data storage method, device and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102722531A (en) * 2012-05-17 2012-10-10 北京大学 Query method based on regional bitmap indexes in cloud environment
US20160147807A1 (en) * 2014-01-27 2016-05-26 Umbel Corporation Systems and methods of generating and using a bitmap index
CN104156407A (en) * 2014-07-29 2014-11-19 华为技术有限公司 Index data storage method, index data storage device and storage equipment
CN104731872A (en) * 2015-03-05 2015-06-24 长沙新弘软件有限公司 Bitmap-based storage space management system and method thereof
CN106970935A (en) * 2017-01-20 2017-07-21 朗坤智慧科技股份有限公司 Real-time data memory structure, method for writing data and method for reading data

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019052209A1 (en) * 2017-09-18 2019-03-21 华为技术有限公司 Data storage method and device, and storage medium
CN109471874A (en) * 2018-10-30 2019-03-15 华为技术有限公司 Data analysis method, device and storage medium
CN109656948A (en) * 2018-11-15 2019-04-19 金蝶软件(中国)有限公司 Bitmap data processing method, device, computer equipment and storage medium
CN109656948B (en) * 2018-11-15 2021-01-22 金蝶软件(中国)有限公司 Bitmap data processing method and device, computer equipment and storage medium
CN110209348A (en) * 2019-04-17 2019-09-06 腾讯科技(深圳)有限公司 Date storage method, device, electronic equipment and storage medium
CN110209348B (en) * 2019-04-17 2021-08-17 腾讯科技(深圳)有限公司 Data storage method and device, electronic equipment and storage medium
CN110297836A (en) * 2019-07-11 2019-10-01 杭州云梯科技有限公司 User tag storage method and search method based on compress bitmap mode
CN112905587A (en) * 2019-12-04 2021-06-04 北京金山云网络技术有限公司 Database data management method and device and electronic equipment
CN111259005A (en) * 2020-01-08 2020-06-09 北京每日优鲜电子商务有限公司 Model calling method and device and computer storage medium
CN112084245A (en) * 2020-09-03 2020-12-15 深圳力维智联技术有限公司 Data management method, device and equipment based on micro-service architecture and storage medium
CN112084245B (en) * 2020-09-03 2024-03-12 深圳力维智联技术有限公司 Data management method, device, equipment and storage medium based on micro-service architecture
CN112307264A (en) * 2020-10-22 2021-02-02 深圳市欢太科技有限公司 Data query method and device, storage medium and electronic equipment
CN112328595A (en) * 2020-10-30 2021-02-05 上海钐昆网络科技有限公司 Data searching method, device, equipment and storage medium
CN112532748A (en) * 2020-12-24 2021-03-19 北京百度网讯科技有限公司 Message pushing method, device, equipment, medium and computer program product
CN113068045A (en) * 2021-03-17 2021-07-02 厦门雅基软件有限公司 Data storage method and device, electronic equipment and computer readable storage medium
CN113590856A (en) * 2021-08-09 2021-11-02 平安银行股份有限公司 Label query method and device, electronic equipment and readable storage medium
CN113590856B (en) * 2021-08-09 2023-05-23 平安银行股份有限公司 Label query method and device, electronic equipment and readable storage medium
CN113722533A (en) * 2021-08-30 2021-11-30 康键信息技术(深圳)有限公司 Information pushing method and device, electronic equipment and readable storage medium
CN113722533B (en) * 2021-08-30 2023-10-17 康键信息技术(深圳)有限公司 Information pushing method and device, electronic equipment and readable storage medium

Also Published As

Publication number Publication date
CN107704527B (en) 2020-05-08
WO2019052209A1 (en) 2019-03-21

Similar Documents

Publication Publication Date Title
CN107704527A (en) Date storage method, device and storage medium
CN108280365B (en) Data access authority management method, device, terminal device and storage medium
CN110168529A (en) Date storage method, device and storage medium
EP2565802B1 (en) Data masking setup
CN108228817A (en) Data processing method, device and system
CN107704625A (en) Fields match method and apparatus
CN106095842B (en) Online course searching method and device
CN110909986A (en) Suspected actual controller risk identification method and system based on knowledge graph
WO2014173946A9 (en) Database management system
US20150169656A1 (en) Distributed database system
CN111639077B (en) Data management method, device, electronic equipment and storage medium
CN110298189A (en) Data base authority management method and equipment
CN107016115A (en) Data export method, device, computer-readable recording medium and electronic equipment
CN114090760B (en) Data processing method of table question and answer, electronic equipment and readable storage medium
CN104408189B (en) The methods of exhibiting and device of keyword ranking
CN107273369A (en) A kind of table data modification method and device
CN106599291A (en) Method and device for grouping data
CN110472758A (en) Medical treatment reserving method, device, equipment and readable storage medium storing program for executing
CN111382336B (en) Data acquisition method and system
CN110516909B (en) Urban and rural resource management system based on big data analysis
CN106844420A (en) Based on user packet method and device that social networks and big data are analyzed
CN107609017A (en) The method and system of medical industry intelligent search consulting are realized by self-defined hot word
CN110019229B (en) Database configuration system
US10726013B2 (en) Information processing device, information processing method, and recording medium
JP2010072876A (en) Rule creation program, rule creation method, and rule creation device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20220211

Address after: 550025 Huawei cloud data center, jiaoxinggong Road, Qianzhong Avenue, Gui'an New District, Guiyang City, Guizhou Province

Patentee after: Huawei Cloud Computing Technology Co.,Ltd.

Address before: 518129 Bantian HUAWEI headquarters office building, Longgang District, Guangdong, Shenzhen

Patentee before: HUAWEI TECHNOLOGIES Co.,Ltd.