WO2019024060A1 - 数据存储方法、装置和存储介质 - Google Patents
数据存储方法、装置和存储介质 Download PDFInfo
- Publication number
- WO2019024060A1 WO2019024060A1 PCT/CN2017/095893 CN2017095893W WO2019024060A1 WO 2019024060 A1 WO2019024060 A1 WO 2019024060A1 CN 2017095893 W CN2017095893 W CN 2017095893W WO 2019024060 A1 WO2019024060 A1 WO 2019024060A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- bitmap
- data
- bearer
- identifier
- bitmap index
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
- G06F16/2237—Vectors, bitmaps or matrices
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
- G06F16/2264—Multidimensional index structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2282—Tablespace storage structures; Management thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
- G06F16/278—Data partitioning, e.g. horizontal or vertical partitioning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
Definitions
- the present application relates to the field of information processing technologies, and in particular, to a data storage method, apparatus, and storage medium.
- HBase Hadoop Database, Hadoop database
- HBase has the characteristics of distributed, high reliability, high performance, based on KeyValue storage, so more and more enterprises and users use HBase to store data tables.
- the data table includes a plurality of rows of data records, and each row of data records includes an identifier of the carrier and a tag value of each tag that the carrier has.
- each row of data records includes an identifier of the carrier and a tag value of each tag that the carrier has.
- the row corresponding to user A in the data table includes the identifier of user A, the tag value "female” and the tag value " engineer”. That is to say, in the related art, the correspondence between the identifier of the bearer and the tag value it has is recorded in the data table.
- the related technology can only be based on Column value filter (row value filter) queries the label value of each carrier line by line according to the identifier of the carrier, and since the number of rows in the data table is usually tens of thousands, in the related scheme, when the data is queried based on the label value , its data query efficiency is low.
- the embodiment of the present application provides a data storage method, apparatus, and storage medium.
- the technical solution is as follows:
- a data storage method comprising:
- each data record including a carrier identifier and at least one tag value.
- the at least one data record may be stored in a data table, where the data table is used to record the correspondence between the bearer identifier and the tag value.
- bitmap index corresponding to at least one data record may be established.
- the bitmap index includes at least one bitmap, each bitmap corresponds to a label value, and each bitmap includes at least one bitmap bit, and each bitmap bit is used to record whether a bearer corresponding to a bearer identifier is recorded.
- Has the tag value corresponding to the current bitmap For example, if a bearer has the label value corresponding to the current bitmap, the corresponding bitmap bit may be set to “1”. Otherwise, if the label value corresponding to the current bitmap is not provided, the corresponding bit is A "0" can be set in the picture bit.
- the bitmap bit is a position corresponding to the identifier of each carrier in the bitmap.
- the bitmap index corresponding to the at least one data record is established, and the bitmap index includes a correspondence between the label value and the corresponding bitmap, and the bitmap includes the identifier for the record carrier. Whether the corresponding bearer has at least one bitmap bit of the corresponding tag value; so that when the query is based on the tag value, the corresponding data can be directly determined according to the bitmap index, thereby improving the data query efficiency.
- a label definition table may be preset, where the preset label definition table includes a preset plurality of label values; normally, The label The definition table is a definition table preset and saved by the designer, and the label definition table includes all the label values that may be included in the data table;
- the step of obtaining the at least one data record may include:
- the data table is generated according to the source data to be stored and a preset tag definition table.
- the label value in the source data can be determined according to each label value defined in the label definition table, thereby generating the data table.
- the label definition table may further include label configuration information, where the label configuration information further includes Whether the bitmap is resident in memory.
- each label value may be configured with a corresponding label configuration information, where the label configuration information is used to indicate whether the bitmap corresponding to the label value is resident in the memory.
- the corresponding bitmap can be loaded into the memory. In one implementation, only label configuration information indicating that resident memory is required may be included, or label configuration information indicating that resident memory is not required may be included, and both may be included.
- the tag value corresponding to the bitmap that needs to be resident in the memory is the tag value of the hot query, that is, the tag value whose query frequency is higher than the preset threshold. Therefore, the bitmap corresponding to the tag value is corresponding. Resident memory enables subsequent data query, and can directly query the data corresponding to the tag value according to the bitmap in the memory, thereby improving the data query efficiency.
- the bitmap index may be divided into multiple bitmap indexes. Partition, each bitmap index partition may include at least one sub-bitmap, and each bitmap index partition corresponds to a set of bearer identifiers, and the set of bearer identifiers corresponding to different bitmap index partitions does not have an intersection. . Wherein each sub-bitmap in the bitmap index partition corresponds to a label value.
- bitmap index By distributing the bitmap index in multiple bitmap index partitions, the problem of limited amount of data that can be stored due to storage space limitations is avoided.
- the data table may be similar to the bitmap index. Dividing into a plurality of data partitions, and then storing the data tables in a plurality of data partitions; optionally, the data table may be divided into M data partitions according to the first range of the bearer identifier, and each data partition includes at least A sub-data table, and the range of the bearer identifier corresponding to each data partition is the first range, and the set of bearer identifiers corresponding to different data partitions does not have an intersection.
- the above-mentioned bitmap index partition can determine N bitmap index partitions according to M data partitions, and, unlike the third possible implementation manner, in this implementation In the mode, the range of the bearer identifier corresponding to each bitmap index partition is greater than or equal to the first range, and the set of identifiers of the bearers corresponding to different bitmap index partitions does not have an intersection.
- M and N are positive integers, and N is less than or equal to M, and N is greater than or equal to 2.
- bitmap index partition is determined according to the data partition, and in actual implementation, the bitmap index partition may also be partitioned according to a preset partition manner, which is not limited thereto.
- the number of sub-bitmaps in each bitmap index partition mentioned above is a predefined label definition. All the labels set in the table The number of signed values. For example, if there are 10 tag values in the tag definition table, the total number of sub-bitmaps in each bitmap index partition is also 10. For a tag value in the tag definition table, if the bearer corresponding to the identifier of each bearer belonging to a bitmap index partition does not have the tag value, the content in the bitmap corresponding to the tag value is represented. Does not have a tag value.
- the mapping can be determined according to the bitmap index corresponding to the tag value.
- the data improves the efficiency of data query efficiency.
- the same bitmap bit in different bitmaps in the bitmap index corresponds to The same bearer identifier; the same bitmap bit of different sub-bitmaps in each bitmap index partition corresponds to the same bearer identifier, and the same bitmap bit of different sub-bitmaps in different bitmap index partitions Corresponding to different carrier identifiers.
- the data can be quickly queried to the corresponding bearer when the data query is performed according to the bitmap index or the bitmap index partition.
- the identification, and subsequent query to the corresponding data improves the efficiency of data query.
- the data query request when the user needs When querying based on the tag value, the data query request may be sent, and correspondingly, after receiving the data query request, the data query is executed. That is, the above data storage method further includes:
- the target sub-bitmap corresponding to the target tag value can be directly queried according to the previously established bitmap index, and then the tag value of the target bearer is obtained.
- the related art is based on the column value filter. The problem of low data query efficiency when performing data query has achieved the effect of improving data query efficiency.
- the server may obtain the new data correspondingly.
- the identifier and/or tag value of the bearer in the new correspondence included in the new data is new. That is to say, the new data may be data that updates the tag value of the already stored data, or may be data that has not been previously stored.
- bitmap index can be updated according to the new correspondence.
- bitmap index is updated, so that after receiving the data query request, the required data can be queried according to the latest bitmap index, thereby improving the data query efficiency.
- the step of updating the bitmap index according to the new correspondence may include:
- the label value in the new correspondence is updated, and/or the bitmap corresponding to the label value in the new correspondence is updated.
- a data storage device comprising: a memory and a processor; An instruction is stored in the memory; the data storage device (which may be a computer device, such as a server) executes the instruction, such as a processor in the data storage device executing the instruction, such that the data storage device implements the data storage described in the first aspect above method.
- a data storage device comprising at least one unit for implementing the data storage method provided by the first aspect above.
- a computer readable storage medium is provided, the instructions being stored in a computer readable storage medium; the data storage device (which may be a computer device, such as a server) executing the instructions, such as a processor executing in a data storage device The instruction causes the data storage device to implement the data storage method provided by the first aspect.
- FIG. 1 is a schematic diagram of a bitmap index involved in various embodiments of the present application.
- FIG. 2 is a schematic diagram of an implementation environment involved in various embodiments of the present application.
- FIG. 3 is a flowchart of a method for generating a data table according to an embodiment of the present application.
- FIG. 4 is a flow chart of a method for establishing a bitmap index provided by an embodiment of the present application.
- FIG. 5 is a schematic diagram of storage of a bitmap index in an underlying database according to an embodiment of the present application.
- FIG. 6 and FIG. 7 are flowcharts of a method for querying data according to an embodiment of the present application.
- FIG 8 and 9 are schematic diagrams of data storage devices provided by an embodiment of the present application.
- a tag is a way of organizing content to characterize a certain characteristic of a data to help people describe and classify content.
- common labels are gender, education, occupation, color, and so on.
- the label is artificially specified.
- the tag can include both an enumerated tag and a boolean tag.
- An enumeration label is a label that includes multiple enumeration values. For example, a degree includes a specialist, an undergraduate, a graduate student, a doctor, etc., for example, a gender includes a male or a female; and a Boolean label is only used to indicate whether the label is provided, such as Whether there is a room, whether it is drug-using, whether there has been a criminal record, etc.
- the label value of the label refers to the specific value of the label. For example, taking the label as the academic qualification, when the degree is undergraduate, the label value is undergraduate, and when the degree is graduate, the label value is graduate student.
- the label value of the label is the label itself. For example, when a user has a room, the tag value is a room. For example, when the user does not have a criminal record, the corresponding tag value is a criminal record.
- Carrier is the object described by each tag.
- the carrier may be a person, a car, a phone number or a virtual user account or the like.
- a carrier can have one label or multiple labels.
- the person's tag can have gender, education, whether there is a room, whether there is a criminal record, and so on.
- the label of the car can be colored, whether there is a violation record, and the like.
- Data table A data record created for the index in the database.
- Each data record in the data table records the identifier of a bearer, records all the tag values of the bearer, and records the correspondence between the identifier of the bearer and the tag value of the bearer.
- Bitmap index A secondary index established for indexing in the database with tag values in the data table.
- bitmap The index records the tag value and the bitmap, and also records the one-to-one correspondence between the tag value and the bitmap.
- Each bitmap bit in the bitmap corresponds to the identifier of one bearer, but the different bitmap bits in the bitmap correspond to the identifiers of different bearers, that is, all bitmap bits in the bitmap and one of the identifiers of all bearers A correspondence.
- Each bitmap bit in the bitmap records whether the bearer corresponding to the identifier of a bearer has a label value corresponding to the current bitmap (the bitmap in which the bitmap bit is located); for example, if a bit of the label value If a bitmap bit in the figure is 1, it means that the bearer corresponding to the bitmap bit has the label value, and if the bitmap bit is 0, it represents the bearer corresponding to the bitmap bit. Does not have this tag value.
- the same bitmap bits in different bitmaps correspond to the identity of the same carrier.
- the carrier Take the carrier as the virtual user account as an example. Assume that there are a total of eight virtual user accounts. Each virtual user account is user1, user2, ..., and user8.
- the set with the tag value "online shopper” is: user1, user4, user8, and the set with the tag value “forum actives” is: user1, user2, user8.
- the bitmap bits allocated for the eight virtual user accounts in the bitmap are 1, 2, 3...8, as shown in FIG. 1; for the label value "online purchaser”, the corresponding bitmap is "10010001"; For the tag value "Forum Active", the corresponding bitmap includes 11000001. Take the bitmap “10010001” corresponding to “Internet shopping darling” as an example.
- the first “1” in the bitmap indicates that the virtual user account with bitmap position 1 is a network darling, similarly, the second in the bitmap.
- a “1” indicates that the virtual user account with bitmap bit 4 is also an online shopper.
- the third “1” in the bitmap indicates that the virtual user account with bit map bit 8 is also an online shopper; the tag value "forum activists”
- the meaning of the corresponding bitmap "11000001" is similar. As can be seen from Figure 1, user1 and user8 have both the "network talent" and “forum active” tag values.
- FIG. 2 shows a schematic diagram of an implementation environment of various embodiments of the present application.
- the implementation environment includes a terminal 210 and a server 220. among them:
- a client for requesting data storage or data query is installed in the terminal 210.
- the client can be a browser.
- the terminal 210 can be a device such as a mobile phone, a tablet, an e-reader, a desktop computer.
- the terminal 210 can be connected to the server 220 through a wired or wireless network.
- Server 220 can be one or more servers; alternatively, multiple servers can provide database services to terminal 210 in a server cluster.
- a database is set in the server 220, and the database may be an HBase, a Mongo database (Mongo Database, MongoDB), a distributed relational database service (DRDS), a Volt database (Volt Database, VoltDB), And distributed databases such as ScaleBase.
- FIG. 3 is a flowchart of a method for generating a data table provided by an embodiment of the present application.
- This embodiment is exemplified by the data storage method used in the server shown in FIG. 2.
- the data storage method includes steps 301 and 302.
- step 301 a label definition table is set in advance.
- the tag definition table can be information that is pre-fetched and stored by the server.
- the tag definition table may be stored in the form of a separate file, such as an Extensible Markup Language (XML) file, or may be stored in a third-party distributed storage system, such as to ZooKeeper. .
- XML Extensible Markup Language
- the tag definition table records a plurality of preset tag values.
- An optional preset method that sets the tag value contained in the tag based on historical data, or artificially defines the tag value contained in the tag.
- Table 1 shows a possible list of tag definitions. Of course, Table 1 may also include more or fewer tags, which is not limited.
- label Tag value Tag configuration information gender men and women Resident memory Education Specialist, undergraduate, graduate student, doctor Not resident memory
- the label definition table may further include label configuration information, where the label configuration information includes whether the label value needs resident memory, and whether the bitmap corresponding to the resident memory value needs to be used frequently. Residing in memory, bitmaps corresponding to tag values that do not require resident memory do not need to be resident in memory.
- the identifier "resident memory” is set for the tag value that needs to be resident memory, and the identifier “non-resident memory” is set for the tag value that does not require resident memory. It should be understood that the identifier "resident memory” may also be set for the tag value that needs to be resident memory, and the tag value that does not need resident memory may not be set. Table 1 sets the identifier for the tag value that does not require resident memory. Resident memory is just an example.
- the database is stored in the storage medium of the server (such as a hard disk), and the bitmap value corresponding to the tag value is loaded into the memory when the database is initially used, and the bitmap in the memory is updated synchronously with the bitmap in the storage medium. Therefore, each time the tag value is queried, the bitmap of the tag value in the memory can be directly used without accessing the bitmap in the storage medium, which improves the query efficiency and saves the query time.
- the storage medium of the server such as a hard disk
- the label definition table may further include a lifetime of each label value, where the label value is a valid time period; that is, other times that are not in the life cycle, the label value is invalid.
- the server may also assign a tag number to each tag value in Table 1.
- the tag value can be replaced by the tag number, and the storage tag number can save storage space relative to the stored tag value.
- the corresponding tag value can be queried according to the tag number, and the corresponding tag number can be queried according to the tag value.
- Step 302 The server generates a data table according to the label definition table and the source data.
- the server first obtains the source data to be stored.
- the server receives a data storage request sent by the terminal, where the data storage request carries the source data to be stored.
- the server may actively obtain the source data from the terminal, or the server obtains the source data from a database storing the source data. Therefore, the embodiment of the present invention does not limit how the server obtains source data.
- the number of source data to be stored may be one or more.
- the plurality of source data may exist in the form of a table, each row in the table represents a piece of source data, and each piece of source data has a unique identifier of the carrier.
- the server can determine all the tag values that each source data has according to the tag definition table. For example, for the source data shown in Table 2, Table 3 shows the tag values of the pieces of source data determined by the server according to Table 1; wherein the content in each [] in Table 3 represents a tag value.
- the data table includes a plurality of data records.
- Each data record includes an identifier of a carrier.
- the first data record includes the identifier A01
- the ninth data record includes the identifier D03.
- Each data record further includes a label value corresponding to the identifier of the carrier, for example, the first data.
- the record includes the tag value corresponding to the identifier A01 "[sex: male] [education: undergraduate] [occupation: student]", and for example, the ninth data record includes the tag value corresponding to the identifier D03 "[sex: male] [degree: specialist ][Occupation: corporate employees].
- each data record in the data table includes a correspondence between an identifier of a bearer and all tag values of the bearer, for example, the first data record of the data table includes the identifier A01 and the tag value.
- all the tag values corresponding to the identifier of the bearer in the data table can be queried according to the identifier of the bearer, that is, all the tag values of the bearer are obtained by querying.
- the data table can be divided into multiple data partitions, and multiple data partitions of the data table are distributedly stored.
- An alternative way to partition data partitions is to divide the data partitions by specifying the number of data partitions, or you can directly define the partitioning interval for each data partition.
- Table 3 is divided into the above-mentioned partition sections to obtain the data partitions shown in Table 4.
- Each data partition of the data table can be automatically fissile or expanded. For example, as time passes, the data of a certain data partition (such as the identifier of the bearer or the tag value) is more and more, and the data volume of the data partition reaches the split threshold, and the server can split the data partition. Make two data partitions to avoid writing new data to the data partition after the storage space of the data partition is full.
- Each data partition includes at least one subdata table.
- the sub-data table includes the first data record (including the data label corresponding to the identifier A01 and the identifier A01), and the second data record (including The data label 1 corresponding to the identifier A02 and the identifier A03); of course, the data partition 1 may also have two sub-data tables, wherein one sub-data table records the first data record, and the other sub-data table records the second data record.
- the set of identifiers of the bearers corresponding to the sub-data tables of the data partition 1 is the interval [, B)
- the set of the identifiers of the bearers corresponding to the sub-data tables of the data partition 2 is the interval [B, C)
- the set corresponding to the sub-data table of the data partition 1 is the interval [, B)
- the set corresponding to the sub-data table of the data partition 2 is the interval [B, C)
- the server updates the label value of the identifier of the target bearer by using the label value to be updated, and the update label value may include adding a label value or deleting the label value.
- the data record of the identifier A01 has three tag values "[sex: male] [degree: undergraduate] [occupation: student]”.
- the tag value "[ ⁇ ]" is added to the data record of the identifier A01, after the tag value is updated, the data record of the tag A01 has four tag values "[sex: male] [education: undergraduate] [occupation] : Student] [Online shopper].
- the tag value "[occupation: student]” of the data record of the identifier A01 after the tag value is updated, the data record of the tag A01 has two tag values "[sex: male] [degree: undergraduate ]”.
- the server adds a data record in the data table, including increasing the identifier of the target bearer, and simultaneously adding all the tag values of the target bearer's identifier in the data table.
- the data record of the identifier A03 is added in Table 4, see Table 7, specifically, in the column of the identifier of the bearer of the data partition 1, the next row record identifier A03 of the identifier A02; correspondingly, in the column of the label value Record the "tag value [gender: female] [degree: specialist] [occupation: corporate employee] [online purchase]] corresponding to the mark A03.
- the server may also add multiple data records in the data partition 1, and may also add multiple data records in multiple data partitions, and details are not described herein.
- the server may also delete the data record in the data partition of the data table, including deleting the identifier of the bearer and deleting the label value that the bearer has. For example, when the data record of the identifier A02 is deleted, the identifier A02 is deleted, and the label value corresponding to the identifier A02 is deleted [[sex: female] [education: specialist] [occupation: individual] [online purchaser].
- the server may assign a bitmap bit in the bitmap to the identifier of each bearer, that is, the bitmap bit is a corresponding position in the bitmap of the identifier of each bearer. For example, as shown in FIG. 8, a bit map bit assigned to the bearer of the identifier A01 is 1; for the bearer of the identifier A02, the bitmap bit allocated for the bearer is 2, and so on.
- FIG. 4 is a flowchart of a method for establishing a bitmap index provided by an embodiment of the present application.
- This embodiment is exemplified by the data storage method used in the server shown in FIG. 2.
- the data storage method includes steps 401 and 402.
- step 401 the server acquires at least one data record.
- the at least one data record is stored in a data table.
- the data record includes an identifier of a carrier and at least one tag value; each piece of data in the data table records an identifier of a carrier, and records all tag values of the bearer.
- Step 402 The server establishes a bitmap index corresponding to the at least one data record.
- the data table has at least one data record, and the server creates a bitmap for each data record; therefore, the bitmap index includes at least one bitmap, each data record corresponding to a bitmap of the bitmap index.
- Each bitmap corresponds to a tag value, and each bitmap includes at least one bitmap bit.
- Each bitmap bit records whether the bearer corresponding to the identifier of a bearer has the label value corresponding to the current bitmap (the bitmap in which the bitmap bit is located), and the same bitmap in different bitmaps in the bitmap index.
- the bits correspond to the identity of the same carrier.
- the bitmap index shown in Table 9 is obtained.
- the tag value "drug addict” corresponds to a bitmap "[0000000001000000....]”
- the 10th bitmap bit in the bitmap is "1", which represents the identifier E01.
- the carrier has the tag value "drug addict”.
- the tag value "online shopper” corresponds to a bitmap "[0100011000000100....]”
- the tenth bitmap bit in the bitmap is "0", representing the logo
- the carrier of E01 does not have the tag value "online purchaser”.
- an array is used to represent bitmap bits that are "1" in the bitmap.
- the tag value "drug addict” corresponds to a bitmap "[0000000001000000....]”
- the bitmap can be represented as an array [10]”
- the tag value "online purchase” "People” corresponds to a bitmap "[0100011000000100....]”
- the bitmap can be represented as an array [2 6 7 14].
- Bitmap storage is still used in memory, so that the target bitmap can be determined by performing bitmap operations (such as AND operations, OR operations) on the bitmap according to the bitmap bits.
- an array is used to represent a bitmap bit that is relatively small in the bitmap.
- the tag value "drug addict” corresponds to a bitmap "[0000000001000000....]"
- the bitmap can be expressed as Array [10] containing "1”
- tag value "Online shopper” corresponds to a bitmap "[0100011000000100....]”
- the bitmap can be expressed as Array containing "1” [2 6 7 14].
- bitmap storage is still used in memory, which can be used according to the bit value query. The bitmap bit operates on the target bitmap (eg, AND, OR, OR) to determine the target bitmap bit.
- the structure shown in Table 9 is a storage structure of the bitmap index in the memory, and the bitmap index in the database is stored by Base+Delta.
- Base is the specific content of the bitmap index, and the actual storage structure is similar to the storage structure in the memory.
- the index can be represented by an array, and vice versa; referring to FIG. 5, the Delta includes at least one KeyValue, and each KeyValue corresponds to a change operation of changing the content in the bitmap, and the change operation includes updating a certain bit.
- the operation of the existing content for example, Delta includes "change the 10th bit in the bitmap to 1" and “change the 15th bit in the bitmap to 0"; or, the change operation includes adding in a certain bit
- the operation of the preset content for example, includes “add 1 to the 21st bit of the bitmap”.
- the bitmap index can also be stored in the form of a file in the database, which is not limited.
- the tag value of the resident memory is loaded according to the tag configuration information of the tag definition table in Table 1, and the bitmap corresponding to the tag value that needs the resident memory is loaded. Keeping the bitmap in the memory and the bitmap in the storage medium are updated synchronously, so that the bitmap of the tag value in the memory can be directly used each time the tag value is queried without accessing the bitmap in the storage medium, thereby improving the query efficiency. , saving query time.
- bitmap index may be divided into multiple bitmap index partitions.
- Dividing a bitmap index into multiple bitmap index partitions is similar to dividing a data table into data partitions.
- the partitioning of bitmap index partitions can be similar to the partitioning of reference data partitions.
- Table 10 shows the result of partitioning a bitmap index partition, bitmap index partition 1 is the interval [, E), bitmap index partition 2 is the interval [E, G), and so on.
- bitmap bits of the identifier of each bearer are unique; in different bitmap index partitions, the same bitmap bit corresponds to the identifier of different bearers.
- the interval corresponding to each bitmap index partition is preset by the user, and the server divides the bitmap index partition according to the preset interval. For example, three intervals [, E), [E, G), and [G,) that do not overlap are set in advance, and the bitmap index partition 1 shown in the bitmap 10 is divided according to the three intervals, and the bitmap index partition 2 is And bitmap index partition 3 (not shown).
- the server may divide the M data partitions into N bitmap index partitions; M and N are positive integers, N is less than or equal to M, and N is greater than or equal to 2.
- M and N are positive integers, N is less than or equal to M, and N is greater than or equal to 2.
- Each of the M data partitions belongs to a unique bitmap index partition, and each bitmap index partition in the N bitmap index partitions contains at least one data partition.
- bitmap index partition After determining the bitmap index partition, the bitmap index partition to which the identifier of the bearer in the data table belongs is also determined.
- a bitmap index partition has a sub-bitmap index in the bitmap index, and the sub-bitmap index includes all sub-bitmaps corresponding to the identifiers of all bearers corresponding to the bitmap index partition, each sub-bit
- the map includes bitmap bits corresponding to the identifiers of all the bearers; and so on, each bitmap index partition includes at least one sub-bitmap.
- Table 11 shows the bitmap index after partitioning the bitmap index partition.
- the bitmap index partition 1 corresponds to a sub-bitmap index
- the sub-bitmap index includes a plurality of sub-bitmaps, for example, a sub-bitmap corresponding to the label value "gender: male" "[101010101]”
- the sub-bitmap includes the bitmap "[1010101011011010....] corresponding to "Gender: Male” in Table 9.
- the first nine bitmap bits in the middle ie the first bitmap bit to the ninth bitmap bit).
- a sub-bitmap corresponding to each tag value in the bitmap index partition 1 is obtained.
- bitmap index partition 2 in Table 11 the correspondence between each bitmap bit in the bitmap index partition 2 and the bearer identifier, and the correspondence between each bitmap bit and the bearer identifier in Table 9 The relationship is different.
- the first bitmap bit of each sub-bitmap in the bitmap index partition 2 shown in Table 11 corresponds to the identifier E01 of the carrier
- the tenth bitmap bit in the bitmap of the bitmap index shown in Table 9 corresponds to The identifier E01 of the carrier.
- the sub-bitmap of the bitmap index partition shown in Table 11 is also the correspondence between the record label value and the bitmap bit (corresponding to the identifier of the bearer), and the difference lies in the bitmap record of Table 9.
- the number of bitmap bits is greater than the number of bitmap bits recorded in the sub-bitmap shown in Table 11.
- the same bitmap bits in different bitmaps in the bitmap index correspond to the identifiers of the same bearer; the same bitmap bits of different sub-bitmaps in each bitmap index partition correspond to the same The identifier of the bearer, the same bitmap bit of different sub-bitmaps in different bitmap index partitions corresponds to the identifiers of the different bearers.
- the set of identifiers of the bearers corresponding to the sub-bitmaps in each bitmap index partition are independent of each other, and there is no intersection.
- the set of the identifiers of the bearers corresponding to the bitmap index partition 1 is the interval [, E)
- the set of the identifiers of the bearers corresponding to the bitmap index partition 2 is the interval [E, G).
- interval [, E) has no intersection with interval [E, G).
- the identifier of each bearer belongs to a unique set (ie, a section), and therefore the bitmap bit corresponding to the identifier of the bearer also uniquely belongs to the bitmap index partition corresponding to the set.
- the identifier of the carrier F04 belongs to In the interval [E, G), the interval [E, G) corresponds to the bitmap index partition 2, then the bitmap bit of the identifier F04 of the bearer is in the bitmap index partition 2, that is, the bitmap containing the identifier F04 of the bearer. All sub-bitmaps of the bits are in bitmap index partition 2.
- the set of identifiers of the bearers corresponding to the sub-bitmaps in each bitmap index partition is greater than or equal to the first range, where the first range is the identifier of the bearer corresponding to the sub-data table in the data partition. set.
- each bitmap index partition contains at least one data partition, and each data partition in the data table belongs to a unique bitmap index partition; therefore, the first range of each data partition (ie, the child in the data partition)
- the set of identifiers of the bearers corresponding to the data table is less than or equal to the identifier set corresponding to the index partition to which the data partition belongs (that is, the set of identifiers of the bearers corresponding to the sub-bitmaps of the index partition).
- a sub-bitmap is created for each tag value in the tag definition table. Specifically, regardless of whether the bearer in the sub-bitmap index has a certain tag value, a sub-bitmap is established for the tag value in the sub-bitmap index. Taking Table 11 as an example, all the bearers in the bitmap index partition 1 do not have the label value "drug addict", and in the sub-bitmap index of the bitmap index partition 1, the child value "drug collector" is still established. Bitmap "[000000000]".
- the sub-bitmap "[000000000]" is directly updated, and there is no need to increase the correspondence of the tag value "drug addict".
- the sub-bitmap because only changing the bitmap bit of the sub-bitmap does not require additional storage resources relative to the increased sub-bitmap, does not cause the bitmap index partition 1 to split due to the increased amount of data.
- bitmap index partition determines the storage capacity size
- bitmap index partition is greatly increased due to the addition of the sub-bitmap
- the bitmap index partition is increased when the preset storage capacity threshold is increased. Can not store new data, you need to split the bitmap index partition.
- each bitmap index partition is created for each tag value in the tag definition table in each bitmap index partition.
- the number of sub-bitmaps in each bitmap index partition is the number of all tags in the tag definition table. For example, if the total number of tag values set in the tag definition table is 10, each bitmap index partition is created after each bitmap index partition separately creates a sub-bitmap for each tag value in the tag definition table. The number of sub-bitmaps is also 10.
- the server sets the bitmap index partition to be non-splitable or expandable.
- the server can reconstruct the bitmap index partition.
- the server may, according to the MapReduce mechanism, read the data records stored in the data table by using the distributed, and obtain the updated bitmap index partition, and reconstruct each bitmap index partition according to the updated bitmap index partition.
- Sub bitmap index Specifically, the server may re-allocate the bitmap bits for the bearer of each bitmap index partition, and generate a sub-bit map corresponding to each label value according to the allocated bitmap bits.
- the server stores new data, which may be a data record; the new data includes a corresponding relationship in the bearer
- the identity and/or tag values are new and the server updates the bitmap index based on the new correspondence.
- the step of updating the bitmap index according to the new correspondence includes:
- the bitmap index partition to which the identifier of the bearer in the new correspondence belongs belongs is determined.
- the bearer in the new correspondence can be determined. Identifies the bitmap index partition to which it belongs.
- the bitmap index partition to which the identifier of the bearer belongs belongs to the bitmap index partition 2; and if the bearer in the new correspondence is If the identifier is B02, it can be determined that the bitmap index partition to which it belongs is bitmap index partition 1.
- the label value in the new correspondence is updated, and/or the bitmap corresponding to the label value in the new correspondence is updated.
- the sub-bitmap corresponding to the updated tag value in the determined bitmap index partition is updated.
- the server corresponds to the identifier of the bearer in the sub-bitmap of the label value in the determined bitmap index partition.
- the bitmap bit is updated from "1" to "0".
- the identifier of the bearer in the sub-bitmap corresponds to a bitmap bit of 3, and the sub-bitmap before the update is 1011, and the updated sub-bitmap is 1001.
- the server determines the bitmap corresponding to the identifier of the bearer in the sub-bitmap of the tag value in the determined bitmap index partition.
- the bit is updated from "0" to "1".
- the identifier of the bearer in the sub-bitmap corresponds to a bitmap bit of 5, and the sub-bitmap before the update is 10110, and the updated sub-bitmap is 10111.
- the bitmap bit indicating “1” in the sub-bitmap is represented by an array, and if the bearer does not have the tag value before updating and has the tag value after updating, the server is in the determined bitmap index partition. , the bitmap bit of the identifier of the bearer is added to the array of the tag value. For example, the identifier of the bearer in the sub-bitmap is 3, and the initial sub-bitmap corresponding to the updated tag value is (1, 7), and the updated sub-bitmap is (1, 3). , 7); if the bearer has the tag value before updating and does not have the tag value after updating, the server deletes the bitmap bit corresponding to the identifier of the bearer in the array of tag values in the determined bitmap index.
- the server can update the content in the sub-bitmap by a similar update method, and details are not described herein again.
- the server adds a sub tag corresponding to the newly added tag value in the determined bitmap index partition. bitmap.
- the bitmap bits in the newly added sub-bitmap of the pieces of data belonging to the bitmap index partition are consistent with the bitmap bits in the other sub-bitmaps in the bitmap index partition.
- the bitmap bit is allocated to the bearer in the determined bitmap index partition, and all the identifiers of the bearer according to the new data have The tag value updates the sub-bitmap.
- the bitmap bit allocated in the sub-bitmap is added for indicating The information of the tag value is provided; for example, if the bitmap bit allocated to the bearer of the new data is 5, and the initial sub-bitmap corresponding to a tag value is 0101, the updated sub-bitmap is 01011; If the bearer of the new data does not have the tag value, the bitmap bit allocated in the sub-bitmap is added to indicate that the tag value is not provided; for example, the bitmap bit that is still allocated by the bearer of the new data is still considered. For example, if the initial sub-bitmap corresponding to a certain tag value is 0101, the updated sub-bitmap is 01010.
- the sub-bit map corresponding to each tag value in the determined bitmap index partition is provided if the carrier of the new data has the tag value. Adding a newly allocated bitmap bit to the sub-bitmap; for example, assuming that the bitmap bit allocated to the bearer of the new data is 5, the bearer has a certain tag value, and the initial value corresponding to the tag value
- the bitmap is (3).
- an array is used to represent a bitmap bit with a "0" in the bitmap bit, the server can be updated in a similar manner, and will not be described here.
- the server directly updates the bitmap index according to the new correspondence.
- the specific update method is similar to the above, except that when the index partition is divided, it is necessary to first determine the bitmap index partition to which the identifier of the bearer in the new data belongs, and then update, and there is no need to determine the direct update here. No longer.
- FIG. 6 is a flowchart of a method for querying a data provided by an embodiment of the present application. This embodiment is illustrated by using the data query method in the server shown in FIG. 2 , as shown in FIG. 6 .
- the data query method includes:
- Step 601 Receive a data query request, where the data query request carries at least one target tag value.
- the terminal When the terminal needs to query data with a certain tag value, the terminal may send a data query request to the server, and correspondingly, the server may receive the data query request.
- the data query request carries at least one target tag value that needs to be queried.
- the terminal may send the data query request carrying the tag value A to the server; when the terminal needs to query the data that does not have the tag value A, the terminal may send the carry A" data query request to the server.
- the at least two target tag values may be a relationship of "and”, or may be an "or” relationship, or may be a non- "Relationship.
- the terminal may send a data query request carrying "A and B" to the server.
- the terminal may send a data query request carrying the “A or B” to the server.
- the terminal may send a data query request carrying “A not B” to the server.
- Step 602 Determine, according to the bitmap index, a target bitmap corresponding to each target tag value.
- the bitmap index includes at least one bitmap, each bitmap corresponds to a label value, each bitmap includes at least one bitmap bit, and each bitmap bit is used to record whether the identifier of a carrier corresponds to whether the carrier has the current bit.
- the label value corresponding to the graph, the same bitmap bit in different bitmaps corresponds to the identifier of the same bearer; the bitmap index is the index corresponding to the data table.
- the bitmap index is the index created in the foregoing embodiment. For details of the specific implementation, refer to the foregoing embodiment, and details are not described herein again.
- the server After receiving the data query request, the server extracts the target tag value in the data query request, and then queries the target bitmap of the extracted target tag value in the bitmap index.
- the server may query the target bitmap corresponding to each target tag value according to the bitmap index. For example, if the target tag value carried in the data query request includes “A and B”, the server may query the target bitmap 1 corresponding to the tag value A and the target bitmap 2 corresponding to the tag value B.
- the server may determine the corresponding information according to the label configuration information corresponding to the target label value. Whether the bitmap is resident in the memory, if resident in the memory, the target bitmap corresponding to the target tag value in the memory is determined; and if the memory is not resident, the bitmap corresponding to the target tag value is loaded from the database to the memory in.
- Step 603 Determine an identifier of the target bearer corresponding to the target bitmap bit included in the at least one target bitmap.
- the server determines the bitmap bit corresponding to the data of the required query according to the target bitmap, and, because each bitmap bit and The identifiers of the bearers are in one-to-one correspondence. Therefore, the identifiers of the bearers corresponding to the bitmap bits can be determined according to the corresponding relationship.
- the identifier of the queried bearer is the identifier of the target bearer.
- the server may determine that the bitmap bits are 2, 3, and 8 according to the target bitmap, and further determine that the bitmap bit is 2 corresponding to the bearer.
- the identifier of the bearer corresponding to the identifier 2 and the bitmap bit 3 is the identifier 3
- the identifier of the bearer corresponding to the bitmap bit 8 is the identifier 8.
- the server may determine data corresponding to the index in the target bitmap that indicates that the target tag value is not available. This is similar to the above determination steps and will not be described here.
- the server may perform a non-operation on the target bitmap, and then calculate an identifier of the bearer corresponding to the bitmap bit of 1.
- the server may determine to obtain at least two target bitmaps in step 602; In the binary sequence, the server may perform the AND operation on the determined at least two target bitmaps, determine the bitmap bit with the value of 1 after the AND operation, and then query the identifier of the bearer corresponding to the determined bitmap bit.
- the server can determine the identifier of the bearer corresponding to the bitmap bits 2 and 8.
- the server may The j target bitmaps corresponding to a target tag value are ANDed, and the k target bitmaps corresponding to the k second target tag values are ANDed, and then the AND operations of the two are ORed to determine or The identifier of the carrier corresponding to the bitmap bit of "1" after the operation.
- j and k are integers greater than or equal to 1.
- the above-described AND operation may not be executed, that is, the result of the AND operation may be considered as itself.
- the server will determine x corresponding to the x third target tag values.
- the target bitmap performs the AND operation to obtain the first result, and the y target bitmaps corresponding to the yth target label values are ANDed to obtain the second result, and then the non-operation of the first result and the second result is calculated.
- the identifier of the bearer corresponding to the bitmap bit that is 1 after the non-operation is determined.
- x and y are integers greater than or equal to 1. Further, when x or y is 1, the above-described AND operation may not be performed, that is, the result of the AND operation may be considered as itself.
- bitmap is an array
- query logic is similar, and will not be described here.
- Step 604 Query the tag value that the target bearer has in the data table.
- the data table records the correspondence between the identifier of the at least one bearer of the stored data and the at least one tag value possessed by the data.
- the structure of the data table is similar to the structure described in the foregoing embodiment, and details are not described herein again.
- the server may query at least one tag value corresponding to the identifier of the target bearer according to the foregoing correspondence relationship recorded in the data table.
- Step 605 Feedback the tag value that the target carrier has.
- the server feeds back the queried tag value to the terminal.
- the data query method provided in this embodiment can determine the data of the required query by directly referring to the bitmap index after receiving the data query request, and solves the problem that the data query efficiency is low in the related art; Achieve the effect of improving the efficiency of data query.
- the ms-level query can be achieved in the multi-tag query.
- bitmap index partition includes at least one sub-bitmap, and the set of identifiers of the bearers corresponding to the sub-bitmaps in each bitmap index partition does not have an intersection, which is related to the foregoing data storage.
- the bitmap index partition is similar and will not be described here.
- FIG. 7 is a flowchart of a method for querying a data provided by an embodiment of the present application. This embodiment is illustrated by using the data query method in the server shown in FIG. 2 , as shown in FIG. 7 .
- the data query method includes:
- Step 701 Receive a data query request, where the data query request carries a value to a target tag.
- Step 702 Determine a target sub-bitmap corresponding to each target tag value in each of the plurality of bitmap index partitions.
- the server can query the target sub-bit map corresponding to the target label value in the bitmap index partition. For example, if the bitmap index is stored in the four bitmap index partitions, the server queries the target sub-bitmap 1 corresponding to the target tag value in the bitmap index partition 1, and queries the target tag value in the bitmap index partition 2 Corresponding target sub-bit map 2, in the bitmap index partition 3, the target sub-bit map 3 corresponding to the target tag value is queried, and in the bitmap index partition 4, the target sub-bit map 4 corresponding to the target tag value is queried.
- the server may query a target sub-bitmap corresponding to each target tag value in the bitmap index partition. index. For example, if the target tag value carried in the data query request has the target tag value 1, the target tag value 2, and the target tag value 3, the query is performed in the bitmap index partition 1 as an example, and the server may be in the bitmap index partition 1 The target sub-bit map corresponding to the query target tag value 1 and the target sub-bit map 2 corresponding to the target tag value 2 and the target sub-bit map 3 corresponding to the target tag value 3.
- Step 703 Determine, in the plurality of bitmap index partitions, identifiers of the target bearers corresponding to the target bitmap bits included in the at least one target sub-bitmap.
- the server can index each bitmap
- the target bitmap bit is determined according to the determined at least one target sub-bitmap, and the identifier of the bearer corresponding to the target bitmap bit is determined, and the identifier of each bearer determined in each bitmap index partition is the final target bearer.
- the identity of the body is determined according to the determined at least one target sub-bitmap, and the identifier of the bearer corresponding to the target bitmap bit is determined, and the identifier of each bearer determined in each bitmap index partition. The identity of the body.
- the specific determining manner of determining the target bitmap bit in each bitmap index partition and determining the identifier of the target bearer is determined by determining the target bitmap bit in the bitmap index and determining the identifier of the target bearer in the foregoing embodiment. Similar, I will not repeat them here.
- Step 704 Query the tag value of the target bearer in the data table.
- Step 705 Feedback the tag value that the target bearer has.
- step 704 and the step 705 is similar to the specific implementation of the step 604 and the step 605 in the foregoing embodiment, and details are not described herein again.
- the data query method provided by the embodiment can determine the data of the required query directly according to the bitmap index after receiving the data query request, and solves the problem that the data query efficiency is low in the related art. ; achieved the effect of improving the efficiency of data query.
- the ms-level query can be achieved in the multi-tag query.
- FIG. 8 is a schematic diagram of a data storage device according to an embodiment of the present invention.
- the data storage device 800 can be a computer device, which can be a server (such as the server 220 shown in FIG. 2).
- the data storage device 800 includes at least one processor 801, a communication bus 802, a memory 803, and at least one communication interface. 804.
- Processor 801 can be a general purpose central processing unit (CPU), a microprocessor, an application-specific integrated circuit (ASIC), or one or more integrated circuits for controlling the execution of the program of the present invention.
- CPU central processing unit
- ASIC application-specific integrated circuit
- Communication bus 802 can include a path for communicating information between the components described above.
- the communication interface 804 uses devices such as any transceiver for communicating with other devices or communication networks, such as Ethernet, Radio Access Network (RAN), Wireless Local Area Networks (WLAN), and the like.
- RAN Radio Access Network
- WLAN Wireless Local Area Networks
- the memory 803 can be a read-only memory (ROM) or other type of static storage device that can store static information and instructions, a random access memory (RAM) or other type that can store information and instructions.
- the dynamic storage device can also be an Electrically Erasable Programmable Read-Only Memory (EEPROM), a Compact Disc Read-Only Memory (CD-ROM) or other optical disc storage, and a disc storage device. (including compact discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or can be used to carry or store desired program code in the form of instructions or data structures and can be Any other media accessed, but not limited to this.
- the memory can exist independently and be connected to the processor via a bus.
- the memory can also be integrated with the processor.
- the memory 803 is used to store program code for executing the solution of the present invention, and is controlled by the processor 801 for execution.
- the processor 801 is configured to execute program code stored in the memory 803.
- the processor 801 may include one or more CPUs, such as CPU0 and CPU1 in FIG.
- data storage device 800 can include multiple processors, such as processor 801 and processor 808 in FIG. Each of these processors can be a single-CPU processor or a multi-core processor.
- a processor herein may refer to one or more devices, circuits, And/or processing cores for processing data, such as computer program instructions.
- data storage device 800 may also include an output device 805 and an input device 806.
- Output device 805 is in communication with processor 801 and can display information in a variety of ways.
- the output device 805 can be a liquid crystal display (LCD), a light emitting diode (LED) display device, a cathode ray tube (CRT) display device, or a projector. Wait.
- Input device 806 is in communication with processor 801 and can accept user input in a variety of ways.
- input device 806 can be a mouse, keyboard, touch screen device, or sensing device, and the like.
- the data storage device 800 described above can be a general purpose computer device or a special purpose computer device.
- the data storage device 800 can be a desktop computer, a portable computer, a network server, a personal digital assistant (PDA), a mobile phone, a tablet, a wireless terminal device, a communication device, an embedded device, or have FIG. 8 A device of similar structure.
- PDA personal digital assistant
- Embodiments of the invention do not limit the type of data storage device 800.
- One or more software modules are stored in the memory of the data storage device.
- the data storage device can implement the software module by using the processor and the program code in the memory to implement the data storage method in the above embodiment.
- FIG. 9 is a schematic structural diagram of a data storage device according to an embodiment of the present application.
- the data storage device may include: an obtaining unit 910 and an establishing unit 920;
- the obtaining unit 910 is configured to perform step 401 in the foregoing embodiment
- the establishing unit 920 is configured to perform step 402 in the above embodiment.
- the data storage device may further include a setting unit
- the setting unit is configured to perform step 301 in the above embodiment
- the obtaining unit 910 is further configured to perform step 302 in the foregoing embodiment.
- the label definition table further includes label configuration information, where the label configuration information includes information indicating whether the bitmap corresponding to each label value is resident in the memory;
- the data storage device may further include a loading unit;
- the loading unit is configured to load a bitmap that needs resident memory into the memory according to the label configuration information.
- the device further includes a dividing unit
- the dividing unit is configured to divide the bitmap index into a plurality of bitmap index partitions, each bitmap index partition includes at least one sub-bitmap, and each bitmap index partition corresponds to a set of bearer identifiers. There is no intersection of the set of bearer identifiers corresponding to different bitmap index partitions.
- the device further includes a dividing unit and a determining unit;
- a dividing unit configured to divide the data table into M data partitions according to a first range of identifiers of the bearers, where each data partition includes at least one sub-data table, and the set of bearer identifiers corresponding to each data partition is The first range, and the set of bearer identifiers corresponding to different data partitions does not have an intersection;
- a determining unit configured to determine N bitmap index partitions according to the M data partitions, where each bitmap index partition includes at least one sub-bitmap, and a range of identifiers of the bearers corresponding to each bitmap index partition is greater than Equal to the first range, and the set of identifiers of the bearers corresponding to different bitmap index partitions does not have an intersection, and M and N are positive integers, N is less than or equal to M, and N is greater than or equal to 2.
- the number of sub-bitmaps in each bitmap index partition is the number of all label values set in a predefined label definition table.
- the same bitmap bit in different bitmaps in the bitmap index corresponds to the identifier of the same carrier;
- the same bitmap bits of different sub-bitmaps in one bitmap index partition correspond to the identifiers of the same bearer, and the same bitmap bits of different sub-bitmaps in different bitmap index partitions correspond to different bearers logo.
- the device further includes: a receiving unit, a determining unit, a query unit, and a feedback unit;
- a receiving unit configured to perform step 601 or step 701 in the above embodiment
- a determining unit configured to perform step 602 and step 703 in the above embodiment, or perform step 602 and step 703;
- a query unit configured to perform step 604 or step 704 in the above embodiment
- the feedback unit is configured to perform step 605 or step 705 in the above embodiment.
- the obtaining unit 910 is further configured to acquire new data, where the identifier and/or the tag value of the bearer in the new correspondence included in the new data is new;
- the apparatus further includes an update unit, configured to update the bitmap index according to the new correspondence.
- the updating unit is further configured to determine a bitmap index partition to which the identifier of the bearer in the new correspondence belongs, and to update the label value in the new correspondence in the determined bitmap index partition, and / or update the bitmap corresponding to the tag value in the new correspondence.
- the data storage device after acquiring at least one data record, establishes a bitmap index corresponding to the at least one data record, and the bitmap index includes the tag value and the corresponding bitmap.
- the bitmap includes at least one bitmap bit for recording whether the bearer corresponding to the bearer identifier has a corresponding label value; so that when the query is based on the label value, the corresponding map may be directly determined according to the bitmap index. Data improves data query efficiency.
- An embodiment of the present application also provides a computer storage medium having instructions stored therein; a data storage device (which may be a computer device, such as a server) executing the instructions, such as a processor in the data storage device executing the instructions
- a data storage device which may be a computer device, such as a server
- the data storage device is implemented to implement the data storage method described in the above embodiments.
- the embodiment of the present application provides a computer program product, the computer program product includes instructions, and the data storage device (which may be a computer device, such as a server) executes the instruction, so that the data storage device executes the data storage method of the foregoing method embodiment.
- the data storage device which may be a computer device, such as a server
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
一种数据存储方法、装置和存储介质,涉及光信息处理技术领域,所述方法包括:获取至少一条数据记录(401),每条数据记录包括一个承载体标识和至少一个标签值,所述至少一条数据记录存储于数据表中,所述数据表用于记录承载体标识与标签值的对应关系;建立所述至少一条数据记录对应的位图索引(402),所述位图索引包括至少一个位图,每个位图对应于一个标签值,每个位图包括至少一个位图位,每个位图位用于记录一个承载体标识所对应的承载体是否具备当前位图所对应的标签值。解决了相关技术中数据查询效率低的问题;达到了可以提高数据查询效率的效果。
Description
本申请涉及信息处理技术领域,特别涉及一种数据存储方法、装置和存储介质。
HBase(Hadoop Database,Hadoop数据库)具有分布式、高可靠、高性能、基于KeyValue存储等特点,因此越来越多的企业和用户使用HBase来存构建数据表。
通常情况下,数据表包括多行数据记录,每一行数据记录包括承载体的标识和该承载体具有的各个标签的标签值。比如,对于用户A来说,其具有性别“女”和职业“工程师”两个标签值,则数据表中用户A所对应的行中包括用户A的标识、标签值“女”和标签值“工程师”。也就是说,相关技术中,数据表中记录了承载体的标识和其所具有的标签值的对应关系。
基于数据表的存储方式,当需要在数据表中查询数据时,根据承载体的标识进行数据查询时查询效率高,而在根据某一标签值或者标签值组合查询时,相关技术中只能根据column value filter(行值过滤器)按照承载体的标识逐行查询各个承载体的标签值,并且由于数据表的行数通常成千上万个,因此相关方案中,基于标签值进行数据查询时,其数据查询效率较低。
发明内容
为了解决相关技术中数据查询效率低的问题,本申请实施例提供了一种数据存储方法、装置和存储介质。所述技术方案如下:
第一方面,提供了一种数据存储方法,该数据存储方法包括:
获取至少一条数据记录,每条数据记录包括一个承载体标识和至少一个标签值。其中,至少一条数据记录可以存储在数据表中,该数据表用于记录承载体标识与标签值的对应关系。
在获取到至少一条数据记录之后,可以建立至少一条数据记录对应的位图索引。其中,位图索引包括至少一个位图,每个位图对应于一个标签值,每个位图包括至少一个位图位,每个位图位用于记录一个承载体标识所对应的承载体是否具备当前位图所对应的标签值。比如,某一承载体具备当前位图所对应的标签值,则其对应的位图位中可以设置有“1”,反之,若不具备当前位图所对应的标签值,则其对应的位图位中可以设置有“0”。其中,位图位为每个承载体的标识在位图中对应有的一个位置。
通过在获取到至少一条数据记录之后,建立该至少一条数据记录对应的位图索引,且位图索引中包括标签值和其对应的位图的对应关系,位图中包括用于记录承载体标识所对应的承载体是否具备对应的标签值的至少一个位图位;使得在后续基于标签值查询时,可以根据位图索引直接确定对应的数据,提高了数据查询效率。
在第一方面的第一种可能的实现方式中,在获取至少一条数据记录之前,还可以预先设置标签定义表,该预先设置的标签定义表包括预设的多个标签值;通常情况下,该标签
定义表为设计人员预先设置并保存的定义表,并且,该标签定义表中包括数据表中可能包括的所有标签值;
相应的,上述获取至少一条数据记录的步骤可以包括:
根据待存储的源数据和预先设置的标签定义表,生成该数据表。
通过预先设置标签定义表,使得在存储待存储的源数据时,可以根据标签定义表中定义的各个标签值确定源数据中的标签值,进而生成得到数据表。
结合第一种可能的实现方式,在第二种可能的实现方式中,实际实现时,标签定义表中还可以包括标签配置信息,该标签配置信息还包括用于表示每个标签值所对应的位图是否常驻内存的信息。可选地,每个标签值可以设置有对应的一条标签配置信息,该标签配置信息用于表示该标签值对应的位图是否常驻内存。对于标签配置信息中用于表示常驻内存的信息所对应的标签值,可以将其对应的位图加载至内存。一种实现中,可以仅包括用于表示需要常驻内存的标签配置信息,也可以仅包括用于表示不需要常驻内存的标签配置信息,还可以同时包括两者。
通常情况下,需要常驻内存的位图所对应的标签值是热查询的标签值,也即通常为查询频率高于预设阈值的标签值,因此,通过将此类标签值对应的位图常驻内存使得后续数据查询时,可以根据内存中的位图直接查询得到标签值对应的数据,提高了数据查询效率。
结合第一方面、第一方面的第一种可能的实现方式或者第一方面的第二种可能的实现方式,在第三种可能的实现方式中,位图索引可以划分为多个位图索引分区,每个位图索引分区中可以包括至少一个子位图,并且,每个位图索引分区对应于一个承载体标识的集合,不同位图索引分区所对应的承载体标识的集合不存在交集。其中,位图索引分区中的每个子位图对应于一个标签值。
通过将位图索引在多个位图索引分区中分布式存储,避免了受存储空间的限制而导致的所能存储的数据量有限的问题。
结合第一方面、第一方面的第一种可能的实现方式或者第一方面的第二种可能的实现方式,在第四种可能的实现方式中,与位图索引类似的,数据表也可以划分为多个数据分区,进而在多个数据分区中分布式存储数据表;可选地,可以根据承载体标识的第一范围将数据表划分为M个数据分区,每个数据分区中包括至少一个子数据表,并且每个数据分区所对应的承载体标识的范围为第一范围,且不同数据分区所对应的承载体标识的集合不存在交集。在数据表划分M个数据分区时,上述所说的位图索引分区可以根据M个数据分区确定得到N个位图索引分区,并且,与第三种可能的实现方式不同的是,在本实现方式中,每个位图索引分区所对应的承载体标识的范围大于等于第一范围,并且不同位图索引分区所对应的承载体的标识的集合不存在交集。其中,M和N为正整数,且N小于或等于M,N大于或等于2。
在本实现方式中,位图索引分区根据数据分区来确定,而实际实现时,位图索引分区还可以根据预先设置的分区方式分区,对此并不做限定。
通过将位图索引和数据表均划分分区,进而分布式存储,避免了受存储空间的限制而导致的所能存储的数据量有限的问题。
结合第三种可能的实现方式或者第四种可能的实现方式,在第五种可能的实现方式中,上述所说的每个位图索引分区中的子位图的数量为预先定义的标签定义表中设置的全部标
签值的数量。比如,标签定义表中的标签值共10个,则每个位图索引分区中的子位图的总数量也是10个。对于标签定义表中的某一标签值,若属于某一位图索引分区的各个承载体的标识所对应的承载体均不具备该标签值,则该标签值对应的位图中的内容均表示不具备标签值。
通过在每个位图索引分区中均设置标签定义表中的各个标签值所对应的位图,达到了在基于某一标签值查询时,可以根据该标签值所对应的位图索引确定得到对应的数据,提高了数据查询效率的效果。
结合第三种可能的实现方式、第四种可能的实现方式或者第五种可能的实现方式,在第六种可能的实现方式中,位图索引中的不同位图中相同的位图位对应于相同的承载体标识;每个位图索引分区中的不同子位图的相同的位图位对应于相同的承载体标识,不同位图索引分区中的不同子位图的相同的位图位对应于不相同的承载体标识。
通过设置位图索引或者位图索引分区中的位图位和承载体标识之间的对应关系,使得在根据位图索引或者位图索引分区进行数据查询时,可以快速的查询到对应的承载体标识,进而后续查询到对应的数据,提高了数据查询效率。
结合第一方面的第三种可能的实现方式、第四种可能的实现方式、第五种可能的实现方式或者第六种可能的实现方式,在第七种可能的实现方式中,当用户需要基于标签值查询时,可以发送数据查询请求,相应的,在接收到数据查询请求之后,执行数据查询。也即上述数据存储方法还包括:
接收数据查询请求,数据查询请求携带有至少一个目标标签值;
在多个位图索引分区中,分别确定至少一个目标标签值对应的至少一个目标子位图,分别确定至少一个目标子位图均包含的目标位图位所对应的目标承载体的标识;
查询数据表中目标承载体具有的标签值;
反馈目标承载体具有的标签值。
通过在基于标签值进行数据查询时,可以根据之前建立的位图索引直接查询该目标标签值对应的目标子位图,进而查询得到目标承载体的标签值;解决了相关技术中根据column value filter进行数据查询时数据查询效率低的问题,达到了可以提高数据查询效率的效果。
结合第一方面或者第一方面的上述任一种可能的实现方式,在第八种可能的实现方式中,在终端需要存储新数据时,服务器可以相应的获取到该新数据。其中,新数据中包括的新对应关系中承载体的标识和/或标签值是新的。这也就是说,该新数据可能是更新已经存储的数据的标签值的数据,也可能是之前未存储过的数据。
在获取到新数据之后,可以根据新对应关系更新位图索引。
通过在获取到新数据之后,更新位图索引,使得在接收到数据查询请求之后,可以根据最新的位图索引查询到所需要的数据,提高了数据查询效率。
结合第八种可能的实现方式,在第九种可能的实现方式中,上述所说的根据新对应关系更新位图索引的步骤可以包括:
确定新对应关系中的承载体的标识所属的位图索引分区;
在确定的位图索引分区中,更新新对应关系中的标签值,和/或更新新对应关系中的标签值所对应的位图。
第二方面,提供了一种数据存储装置,所述数据存储装置包括:存储器和处理器;所
述存储器中存储有指令;数据存储装置(可以是计算机设备,例如服务器)执行该指令,例如数据存储装置中的处理器执行该指令,使得该数据存储装置实现上述第一方面所述的数据存储方法。
第三方面,提供了一种数据存储装置,该装置包括至少一个单元,该至少一个单元用于实现上述第一方面所提供的数据存储方法。
第四方面,提供了一种计算机可读存储介质,该计算机可读存储介质中存储有指令;数据存储装置(可以是计算机设备,例如服务器)执行该指令,例如数据存储装置中的处理器执行该指令,使得该数据存储装置实现第一方面所提供的数据存储方法。
图1是本申请各个实施例所涉及的位图索引的示意图。
图2是本申请各个实施例所涉及的实施环境的示意图。
图3是本申请一个实施例提供的生成数据表的方法流程图。
图4是本申请一个实施例提供的建立位图索引的方法流程图。
图5是本申请一个实施例提供的在底层数据库中位图索引的存储示意图。
图6和图7是本申请一个实施例提供的数据查询方法的方法流程图。
图8和图9是本申请一个实施例提供的数据存储装置的示意图。
为了便于理解,首先对下述各个实施例所涉及的相关术语做简单介绍。
标签是一种内容组织方式,用于表征数据的某一特征进而帮助人们描述和分类内容。比如,常见的标签有性别、学历、职业、颜色等等。可选地,标签是人为规定的。
一种可能实现,标签可以包括枚举标签和布尔标签两种。枚举标签是指包括多个枚举值的标签,比如,学历包括专科、本科、研究生、博士等等,又比如,性别包括男或者女;而布尔标签只用于表示是否具备该标签,比如,是否有房、是否吸毒、是否有过犯罪记录等等。
在标签为枚举标签时,标签的标签值是指标签的具体取值。比如,以标签为学历为例,当学历是本科时其标签值为本科,当学历是研究生时其标签值为研究生。而在标签为布尔标签时,标签的标签值为标签本身。比如,当用户有房时其标签值为有房,再比如,当用户没有犯罪记录时,其对应的标签值为无犯罪记录。
承载体:是各个标签所描述的对象。可选地,承载体可以是人、车、电话号码或者虚拟用户帐号等等。一个承载体可以具有一个标签,也可以具有多个标签。比如,以承载体是人为例,描述人的标签可以有性别、学历、是否有房、是否有过犯罪记录等等。又比如,以承载体为车为例,描述车的标签可以有颜色、是否有违规记录等等。
数据表:为数据库中以承载体为索引建立的数据记录。数据表中的每条数据记录,记录一个承载体的标识,记录该个承载体具有的所有标签值,以及记录该个承载体的标识与该承载体具有的标签值之间的对应关系。
位图索引:为数据库中以数据表中的标签值为索引而建立的二级索引。可选地,位图
索引记录标签值和位图,还记录标签值和位图之间的一一对应关系。其中,位图中的每一个位图位对应一个承载体的标识,但位图中的不同位图位对应不同承载体的标识,即位图中的所有位图位与所有承载体的标识的一一对应。位图中的每个位图位记录一个承载体的标识所对应的承载体是否具备当前位图(该个位图位所在的位图)所对应的标签值;比如,若一个标签值的位图中的某个位图位为1,则代表该个位图位对应的承载体具有该个标签值,反之,若该个位图位为0,则代表该个位图位对应的承载体不具有该个标签值。不同位图中相同的位图位对应于相同的承载体的标识。
以承载体为虚拟用户账号为例,假设共8个虚拟用户帐号,每个虚拟用户帐号分别为user1、user2、…、user8。具有标签值“网购达人”的集合为:user1、user4、user8,具有标签值“论坛活跃分子”的集合为:user1、user2、user8。在位图中为8个虚拟用户帐号分配的位图位依次为1、2、3…8,如图1所示;对于标签值“网购达人”,其对应的位图为“10010001”;对于标签值“论坛活跃分子”,其对应的位图包括11000001。以“网购达人”对应的位图“10010001”为例,位图中的第一个“1”表示位图位为1的虚拟用户帐号是网络达人,类似的,位图中的第二个“1”表示位图位为4的虚拟用户帐号也是网购达人,位图中的第三个“1”表示位图位为8的虚拟用户帐号也是网购达人;标签值“论坛活跃分子”对应的位图“11000001”表达的意思类似。由图1可知,user1和user8同时具有“网络达人”和“论坛活跃分子”两个标签值。
请参考图2,其示出了本申请各个实施例的实施环境的示意图。如图2所示,该实施环境中包括终端210和服务器220。其中:
终端210中安装有用于请求数据存储或者数据查询的客户端。可选地,该客户端可以为浏览器。一种可能实现,终端210可以为诸如手机、平板电脑、电子阅读器、台式电脑之类的设备。终端210可以通过有线或者无线网络与服务器220连接。
服务器220可以为一台或多台服务器;可选地,多台服务器可以以服务器集群的方式为终端210提供数据库服务。一种可能实现,服务器220中设置有数据库,该数据库可以为HBase、Mongo数据库(Mongo Database,MongoDB)、分布型关系数据库服务(Distribute Relational Database Service,DRDS)、Volt数据库(Volt Database,VoltDB)、和ScaleBase等分布式数据库。
下面介绍数据表的生成方法
请参考图3,其示出了本申请一个实施例提供的生成数据表的方法的流程图,本实施例以该数据存储方法用于图2所示的服务器中来举例说明。如图3所示,该数据存储方法包括步骤301和步骤302。
步骤301,预先设置标签定义表。
标签定义表可以为服务器预先获取并存储的信息。可选地,标签定义表可以以独立的文件的形式存储,如以可扩展标记语言(Extensible Markup Language,XML)文件的形式存储,也可以在第三方分布式存储系统中存储,如存储至ZooKeeper。
该标签定义表记录预设的多个标签值。一种可选的预设方式,根据历史数据设置标签包含的标签值,或者人为定义标签包含的标签值。
表1示出了一种可能的标签定义表。当然,表1还可能会包括更多或者更少的标签,对此并不做限定。
标签 | 标签值 | 标签配置信息 |
性别 | 男,女 | 常驻内存 |
学历 | 专科,本科,研究生,博士 | 不常驻内存 |
职业 | 学生,教师,个体,企业员工 | 常驻内存 |
网购狂人 | 网购狂人 | 常驻内存 |
吸毒者 | 吸毒者 | 不常驻内存 |
表1
可选地,如表1所示,标签定义表还可以包括标签配置信息,该标签配置信息包括表示标签值是否需要常驻内存,需要常驻内存的标签值所对应的位图也需要是否常驻在内存,不需要常驻内存的标签值所对应的位图也不需要是否常驻在内存。
在表1中,对需要常驻内存的标签值设置了标识“常驻内存”,对不需要常驻内存的标签值设置了标识“不常驻内存”。应当理解,也可以对需要常驻内存的标签值设置标识“常驻内存”,可以对不需要常驻内存的标签值不设置标识,表1对不需要常驻内存的标签值设置标识“不常驻内存”仅是一种示例。
数据库存储在服务器的存储介质(例如硬盘)中,在初始使用数据库时便将标签值和该标签值对应的位图加载至内存,并且保持内存中的位图与存储介质中的位图同步更新,使得每次查询该标签值时可以直接使用内存中该标签值的位图而不需要访问存储介质中的位图,提高了查询效率,节省了查询时间。
可选地,标签定义表还可以包括每个标签值的生命周期,该生命周期是指该标签值为有效的时间段;即不属于该生命周期的其他时间,该标签值为无效。
可选地,服务器还可以为表1中的每个标签值分配一个标签号。在存储标签值与位图的映射关系时可以用该标签号替代标签值,存储标签号相对于存储标签值可以节省存储空间。另外,在中,可以根据标签号查询到对应的标签值,可以根据标签值查询对应的标签号。
步骤302,服务器根据标签定义表和源数据生成数据表。
服务器首先获取待存储的源数据。一种可选实现,服务器接收终端发送的数据存储请求,该数据存储请求中携带有待存储的源数据。又一种可选实现,服务器可以主动从终端中获取该源数据,或者,服务器从存储该源数据的数据库中获取该源数据。因此,本发明实施例对服务器如何获取源数据并不做限定。
待存储的源数据的数量可以是一条或者多条。可选地,参考表2,多条源数据可以以表格的形式存在,表格中的每一行表示一条源数据,每条源数据具有唯一的承载体的标识。
承载体的标识 | |
A01 | 性别:男,学历:本科,职业:学生 |
A02 | 性别:女,学历:专科,职业:个体,网购达人 |
B01 | 性别:男,学历:专科,职业:企业员工 |
B02 | 性别:女,学历:本科,职业:学生 |
C01 | 性别:男,学历:研究生, |
C02 | 性别:女,学历:专科,职业:企业员工,网购达人 |
D01 | 性别:男,学历:研究生,职业:企业员工,网购达人 |
D02 | 性别:女,学历:本科,职业:学生 |
D03 | 性别:男,学历:专科,职业:企业员工 |
E01 | 性别:男,学历:研究生,职业:个体,吸毒者 |
E02 | 性别:女,学历:专科,职业:个体 |
E03 | 性别:男,学历:本科,职业:学生 |
F01 | 性别:男,学历:研究生,职业:教师, |
F02 | 性别:女,学历:专科,职业:企业员工,网购达人 |
F03 | 性别:男,学历:本科,职业:学生 |
F04 | 性别:女,学历:研究生,职业:个体,吸毒者 |
..... | ..... |
表2
服务器获取到源数据之后,服务器可以根据标签定义表确定每条源数据具有的所有标签值。比如,对于表2所示的源数据,表3示出了服务器根据表1确定得到的各条源数据的标签值;其中,表3中每个[]中的内容表示一个标签值。
承载体的标识 | 标签值 |
A01 | [性别:男][学历:本科][职业:学生] |
A02 | [性别:女][学历:专科][职业:个体][网购达人] |
B01 | [性别:男][学历:专科][职业:企业员工] |
B02 | [性别:女][学历:本科][职业:学生] |
C01 | [性别:男][学历:研究生] |
C02 | [性别:女][学历:专科][职业:企业员工][网购达人] |
D01 | [性别:男][学历:研究生][职业:企业员工][网购达人] |
D02 | [性别:女][学历:本科][职业:学生] |
D03 | [性别:男][学历:专科][职业:企业员工] |
E01 | [性别:男][学历:研究生][职业:个体][吸毒者] |
E02 | [性别:女][学历:专科][职业:个体] |
E03 | [性别:男][学历:本科][职业:学生] |
F01 | [性别:男][学历:研究生][职业:教师] |
F02 | [性别:女][学历:专科][职业:个体][网购达人] |
F03 | [性别:男][学历:专科][职业:企业员工] |
F04 | [性别:女][学历:本科][职业:学生] |
..... | ..... |
表3
如表3所示的数据表,该数据表包括多条数据记录。每条数据记录包括一个承载体的标识,例如第一条数据记录包括标识A01和第九条数据记录包括标识D03;每条数据记录还包括承载体的标识对应的标签值,例如第一条数据记录包括标识A01对应的标签值“[性别:男][学历:本科][职业:学生]”,再例如第九条数据记录包括标识D03对应的标签值“[性别:男][学历:专科][职业:企业员工]”。如表3所示,所述数据表中的每一条数据记录包括一个承载体的标识与该承载体的所有标签值的对应关系,例如该数据表的第一条数据记录包括标识A01与标签值“[性别:男][学历:本科][职业:学生]的对应关系。
对于数据表,可以根据承载体的标识,查询数据表中该承载体的标识对应的所有标签值,即查询得到该承载体具有的所有标签值。
可选地,可以将数据表划分为多个数据分区,分布式地存储数据表的多个数据分区。划分数据分区的可选方式,可以通过指定数据分区的数量来划分数据分区,或者可以直接定义每个数据分区的分区区间。
举例说明,为数据表设置如下分区区间:
数据分区1:[,A)
数据分区2:[,B)
数据分区3:[B,C)
数据分区4:[C,D)
数据分区5:[D,E)
数据分区6:[E,F)
.....
表3按照上述分区区间划分后得到表4所示的数据分区。
表4
数据表的每个数据分区可以自动裂变或者扩展。比如,随着时间推移,某个数据分区的数据(例如承载体的标识,或者标签值)越来越多,在该个数据分区的数据量达到分裂阈值,服务器可以将该个数据分区进行分裂成两个数据分区,从而避免由于该个数据分区的存储空间被存满之后无法继续向该个数据分区写入新数据。
每个数据分区包括至少一个子数据表。以表4的数据分区1为例,如果数据分区1具有一个子数据表,该子数据表包括第一条数据记录(包括标识A01和标识A01对应的数据标签)、第二条数据记录(包括标识A02和标识A03对应的数据标签);当然,数据分区1还可以具有两个子数据表,其中一个子数据表记录该第一条数据记录,另一个子数据表记录该第二条数据记录。
每个数据分区中的子数据表所对应的承载体的标识的集合不存在交集。以表4为例,数据分区1的子数据表所对应的承载体的标识的集合为区间[,B),数据分区2的子数据表所对应的承载体的标识的集合为区间[B,C),数据分区1的子数据表所对应的集合为区间[,B)与数据分区2的子数据表所对应的集合为区间[B,C)不存在交集,以此类推,表4中不同数据分区1的子数据表所对应的集合都不存在交集。
下面介绍数据表的更新
(1)、更新标签值。
若数据表中已存储有目标承载体的标识,服务器使用待更新的标签值更新该目标承载体的标识所具有的标签值,更新标签值可以包括增加标签值或者删除标签值。
举例说明,在表4中,标识A01的数据记录具有三个标签值“[性别:男][学历:本科][职业:学生]”。参见表5,如果为标识A01的数据记录增加标签值“[网购达人]”,标签值更新后,标识A01的数据记录具有四个标签值“[性别:男][学历:本科][职业:学生][网购达人]”。参见表6,如果期望删除标识A01的数据记录所具有的标签值“[职业:学生]”,标签值更新后,标识A01的数据记录具有二个标签值“[性别:男][学历:本科]”。
表5
表6
上述举例仅以更新数据记录中的单个标签值为例,当然也可以同时更新数据记录中的多个标签值,还可以同时更新多条数据记录中的标签值,在此不再赘述。
(2)、增加承载体。
服务器在数据表中增加数据记录,包括增加目标承载体的标识、并在数据表中同时增加目标承载体的标识具有的所有标签值。
举例说明,在表4中增加标识A03的数据记录,参见表7,具体是在数据分区1的承载体的标识这一列中,标识A02的下一行记录标识A03;相应地,在标签值这一列记录标识A03对应的“标签值[性别:女][学历:专科][职业:企业员工][网购达人]”。
表7
上述举例仅以增加标识A03的数据记录为例,当然,服务器也可以在数据分区1中增加多条数据记录,还可以在多个数据分区分别增加多条数据记录,在此不再赘述。
另外,服务器还可以删除数据表的数据分区中的数据记录,包括删除承载体的标识和删除该承载体具有的标签值。例如在删除标识A02的数据记录时,会删除标识A02,以及删除标识A02对应的标签值“[性别:女][学历:专科][职业:个体][网购达人]”。
下面介绍位图索引的生成方法
服务器可以为该每个承载体的标识分配一个在位图中的位图位,即位图位为每个承载体的标识在位图中对应有的一个位置。如表8所示的一种位图位分配举例,对于标识A01的承载体,为该承载体分配的位图位为1;对于标识A02的承载体,为该承载体分配的位图位为2,以此类推。
承载体的标识 | 位图位 |
A01 | 1 |
A02 | 2 |
B01 | 3 |
B02 | 4 |
C01 | 5 |
C02 | 6 |
D01 | 7 |
D02 | 8 |
D03 | 9 |
E01 | 10 |
E02 | 11 |
E03 | 12 |
F01 | 13 |
F02 | 14 |
F03 | 15 |
F04 | 16 |
..... | ..... |
表8
请参考图4,其示出了本申请一个实施例提供的建立位图索引方法的流程图,本实施例以该数据存储方法用于图2所示的服务器中来举例说明。如图4所示,该数据存储方法包括步骤401和步骤402。
步骤401,服务器获取至少一条数据记录。
所述至少一条数据记录存储于数据表中。所述数据记录包括一个承载体的标识和至少一个标签值;所述数据表中的每条数据,记录一个承载体的标识,和记录该个该承载体具有的所有标签值。对数据表和数据记录的详细描述,可以参见上述以表3或表4为例的相关描述,在此不再赘述。
步骤402,服务器建立所述至少一条数据记录对应的位图索引。
所述数据表具有至少一条数据记录,服务器为每条数据记录建立一个位图;因此,位图索引包括至少一个位图,每条数据记录对应该位图索引的一个位图。
每个位图对应于一个标签值,每个位图包括至少一个位图位。每个位图位记录一个承载体的标识所对应的承载体是否具备当前位图(该位图位所在的位图)所对应的标签值,位图索引中的不同位图中的相同位图位对应于相同的承载体的标识。对位图索引的结构描述,可以上述介绍位图索引的相关描述,在此不再赘述。
举例说明,以表3或表4所示的数据表为基础、并按照表8所示承载体的标识与位图位的对应关系,得到表9所示的位图索引。在表9所示的位图索引中,标签值“吸毒者”对应一个位图“[0000000001000000....]”;该位图中的第10个位图位为“1”,代表标识E01的承载体具有标签值“吸毒者”。在表9所示的位图索引中,标签值“网购达人”对应一个位图“[0100011000000100....]”;该位图中的第10个位图位为“0”,代表标识E01的承载体不具有标签值“网购达人”。
标签值 | 位图 |
性别:男 | [1010101011011010....] |
性别:女 | [0101010100100101....] |
学历:专科 | [0110010010100100....] |
学历:本科 | [1001000100010010....] |
...... | ........ |
网购达人 | [0100011000000100....] |
吸毒者 | [0000000001000000....] |
...... | ........ |
表9
可选地,用数组表示位图中为“1”的位图位。例如在表9所示的位图索引中,标签值“吸毒者”对应一个位图“[0000000001000000....]”,则该位图可以表示为数组[10]”;标签值“网购达人”对应一个位图“[0100011000000100....]”,则该位图可以表示为数组[2 6 7 14]。采用数组替换位图在数据库中存储,可以节省存储空间。可选地,在内存中仍采用位图存储,这样可以在标签值查询时按照位图位对目标位图进行运算(例如与运算、或运算)来确定目标位图位。
可选地,用数组表示位图中占比较小的位图位。例如在表9所示的位图索引中,标签值“吸毒者”对应一个位图“[0000000001000000....]”,该位图中“1”占比较小,则该位图可以表示为包含“1”的数组[10]”;标签值“网购达人”对应一个位图“[0100011000000100....]”,该位图中“1”占比较小,则该位图可以表示为包含“1”的数组[2 6 7 14]。采用数组替换位图在数据库中存储,可以节省存储空间。可选地,在内存中仍采用位图存储,这样可以在标签值查询时按照位图位对目标位图进行运算(例如与运算、或运算)来确定目标位图位。
可选地,表9所示结构是位图索引在内存中的存储结构,而在数据库中位图索引采用Base+Delta进行存储。其中,Base为位图索引的具体内容,其实际存储结构与内存中的存储结构类似,如在位图中表示具备标签值或者不具备标签值的索引所占的比例低于预设比例,则可以通过数组表示该类索引,反之则按位存储;请参考图5,Delta包括至少一个KeyValue,每个KeyValue对应于一个变更位图中的内容的变更操作,该变更操作包括更新某一位中已有内容的操作,比如,Delta包括“将位图中的第10位更改为1”以及“将位图中的第15位更改为0”;或者,该变更操作包括在某一位新增预设内容的操作,比如,包括“在位图的第21位新增1”。可选地,当Base的数据大小达到第一大小时,则合并Base和Delta,合并后的Base为合并后的文件的存储路径,合并后的Delta中存储有合并后的文件。当然一种可能实现,位图索引在数据库中还可以以文件的形式存储,对此并不做限定。
可选地,在初始使用数据库时,根据表1中的标签定义表的标签配置信息加载需要常驻内存的标签值,以及加载该需要常驻内存的标签值所对应的位图。保持内存中的位图与存储介质中的位图同步更新,使得每次查询该标签值时可以直接使用内存中该标签值的位图而不需要访问存储介质中的位图,提高了查询效率,节省了查询时间。
在本发明实施例提供的数据存储方法中,可以将位图索引划分为多个位图索引分区。
将位图索引划分为多个位图索引分区,与将数据表划分为数据分区类似,位图索引分区的划分方式可以类似参考数据分区的划分方式。比如,表10示出一种位图索引分区的划分结果,位图索引分区1为区间[,E),位图索引分区2为区间[E,G),以此类推。在每个
位图索引分区中,每个承载体的标识的位图位是唯一的;在不同位图索引分区中,同一位图位对应不同承载体的标识。
表10
可选地,每个位图索引分区所对应的区间为用户预先设置的,服务器按照预先设置的区间划分出位图索引分区。例如,预先设置不重叠的三个区间[,E),[E,G)和[G,),根据该三个区间划分出位图表10所示的位图索引分区1,位图索引分区2和位图索引分区3(未示意出)。
可选地,服务器可以将M个数据分区划入N个位图索引分区;M、N为正整数,N小于或等于M,N大于或等于2。M个数据分区中的每个数据分区属于唯一的位图索引分区,N个位图索引分区中的每个位图索引分区包含至少一个数据分区。
确定位图索引分区后,数据表中的承载体的标识所属的位图索引分区也确定了。相应地,一个位图索引分区具有位图索引中的一个子位图索引,该个子位图索引包括该个位图索引分区对应的所有承载体的标识所对应的所有子位图,每个子位图包括该所有承载体的标识所对应的位图位;以此类推,每个位图索引分区中包括至少一个子位图。
表11示意了划分位图索引分区后的位图索引。以位图索引分区1为例,位图索引分区1对应一个子位图索引,该个子位图索引包括多个子位图,例如标签值“性别:男”对应的子位图“[101010101]”,该子位图包括表9中“性别:男”对应的位图“[1010101011011010....]”
中的前九个位图位(即第一个位图位至第九个位图位)。以此类推,得到位图索引分区1中每个标签值对应的子位图。
表11
应注意,对于表11中的位图索引分区2,该位图索引分区2中的每个位图位与承载体标识的对应关系,与表9中每个位图位与承载体标识的对应关系是不同的。例如表11所示位图索引分区2中的每个子位图的第一个位图位对应承载体的标识E01,而表9所示位图索引的位图中的第十个位图位对应承载体的标识E01。
另外,与表9的位图类似,表11所示位图索引分区的子位图也是记录标签值与位图位(对应承载体的标识)的对应关系,区别在于表9的位图记录的位图位的数量多于表11所示子位图记录的位图位的数量。
结合上述分析可知,位图索引中的不同位图中相同的位图位对应于相同的承载体的标识;每个位图索引分区中的不同子位图的相同的位图位对应于相同的承载体的标识,不同位图索引分区中的不同子位图的相同的位图位对应于不相同的承载体的标识。
可选地,每个位图索引分区中的子位图所对应的承载体的标识的集合相互独立,不存在交集。比如在表10和表11中,位图索引分区1所对应的承载体的标识的集合为区间[,E),位图索引分区2所对应的承载体的标识的集合为区间[E,G);区间[,E)与区间[E,G)没有交集。更具体地,每个承载体的标识属于唯一的集合(即区间),因此该个承载体的标识对应的位图位也唯一属于该集合对应的位图索引分区。举例说明,承载体的标识F04属
于区间[E,G),该区间[E,G)对应位图索引分区2,则承载体的标识F04的位图位在位图索引分区2中,即包含承载体的标识F04的位图位的所有子位图都在位图索引分区2中。
可选的,每个位图索引分区中的子位图所对应的承载体的标识的集合大于等于第一范围,该第一范围为数据分区中的子数据表所对应的承载体的标识的集合。具体地,每个位图索引分区包含至少一个数据分区,数据表中的每个数据分区属于唯一的位图索引分区;因此,每个数据分区的第一范围(即该个数据分区中的子数据表所对应的承载体的标识的集合)小于或等于该个数据分区所属索引分区对应的标识集合(即该索引分区的子位图所对应的承载体的标识的集合)。
可选地,在每个位图索引分区中,为标签定义表中的所有标签值分别建立一个子位图。具体地,无论子位图索引中的承载体是否具有某个标签值,在该子位图索引中为该个标签值建立一个子位图。以表11为例,位图索引分区1中的所有承载体均不具有标签值“吸毒者”,在位图索引分区1的子位图索引中,仍为该标签值“吸毒者”建立子位图“[000000000]”。这样,如果后续位图索引分区1中的某个承载体更新为具有该标签值“吸毒者”,直接更新该子位图“[000000000]”,不需要增加建立该标签值“吸毒者”对应的子位图,由于仅更改子位图的位图位相对于增加子位图不需要额外的存储资源,因此不会导致位图索引分区1因数据量增大而分裂。
举例说明,位图索引分区一旦确定存储容量大小,如果该位图索引分区因增加子位图而导致所需存储容量大幅度增加,当增加到预设的存储容量阈值时,该位图索引分区不能存储新数据,需要对该位图索引分区分裂。
另外,以表11为例,即使位图索引分区1增加新的承载体,该承载体具有标签值“吸毒者”,为该承载体的标识在子位图中增加位图位所占用的存储资源较小,例如1个比特。因此,增加承载体而引发的增加位图位通常不会导致图索引分区1分裂。
由于在每个位图索引分区中为标签定义表中的所有标签值分别建立一个子位图,因此每个位图索引分区中的子位图的数量为标签定义表中的全部标签的数量。比如,假设标签定义表中设置的标签值的总个数为10个,则在每个位图索引分区分别为标签定义表中的所有标签值分别建立子位图之后,每个位图索引分区的子位图的数量也为10个。
可选地,由于位图索引分区扩展或者分裂会导致位图索引中的所有位图索引分区需要重建,代价较高,在本实施例中服务器将位图索引分区设置为不可分裂或者扩展。
然而,随着时间的推移,位图索引分区中的容量会不断增加或者变化;比如,大数据场景中承载体的个数不断增多,或者,因标签定义表中新增标签值而导致每个位图索引分区相应增加子位图,又或者,大量承载体具有的标签值失效。因此,为了避免因存储容量不够而导致位图索引分区中无法存储新增数据,服务器可以重建位图索引分区。可选地,服务器可以基于MapReduce机制,通过分布式读取数据表中存储的数据记录,并且获取更新后的位图索引分区,根据更新后的位图索引分区重建每个位图索引分区中的子位图索引。具体地,服务器可以为每个位图索引分区的承载体重新分配位图位,并且根据分配的位图位生成各个标签值所对应的子位图。
下面介绍位图索引的更新方法
服务器存储新数据,该新数据可以是数据记录;该新数据包括的对应关系中承载体的
标识和/或标签值是新的,则服务器根据新对应关系更新位图索引。
其中,根据新对应关系更新位图索引的步骤包括:
第一步,确定新对应关系中的承载体的标识所属的位图索引分区。
参考上述关于位图索引分区中的子位图所对应的承载体的标识的集合的介绍,由于每个承载体的标识唯一属于一个位图索引分区,因此可以确定新对应关系中的承载体的标识所属的位图索引分区。
举例说明,结合表10,若新对应关系中的承载体的标识为F03,则可以确定该承载体的标识所属的位图索引分区为位图索引分区2;而若新对应关系中的承载体的标识为B02,则可以确定其所属的位图索引分区为位图索引分区1。
第二步,在确定的位图索引分区中,更新新对应关系中的标签值,和/或更新新对应关系中的标签值所对应的位图。
(1)、更新标签值。
若数据表中已存储有新数据中的目标承载体的标识,更新确定的位图索引分区中更新的标签值所对应的子位图。
可选地,若承载体在更新之前具备该标签值而更新之后不具备该标签值,则服务器在确定的位图索引分区中,将该标签值的子位图中该承载体的标识所对应的位图位由“1”更新为“0”。举例说明,承载体的标识在子位图中所对应的位图位为3,更新前的子位图为1011,则更新后的子位图为1001。而若承载体在更新之前不具备该标签值而更新之后具备该标签值,则服务器在确定的位图索引分区中,将该标签值的子位图中该承载体的标识所对应的位图位由“0”更新为“1”。举例说明,承载体的标识在子位图中所对应的位图位为5,更新前的子位图为10110,则更新后的子位图为10111。
可选地,用数组表示子位图中为“1”的位图位,则若承载体在更新之前不具备该标签值而更新之后具备该标签值,则服务器在确定的位图索引分区中,该标签值的数组中新增该承载体的标识的位图位。比如,承载体的标识在子位图中所对应的位图位为3,更新的标签值所对应的初始子位图为(1,7),则更新后的子位图为(1,3,7);而若承载体在更新之前具备该标签值而更新之后不具备该标签值,则服务器删除确定的位图索引中,标签值的数组中该承载体的标识所对应的位图位;比如,若源数据在子位图中所对应的位图位为3,更新的标签值所对应的初始子位图为(1,3,7),则更新后的子位图为(1,7)。类似的,若用数组中为“0”的位图位,则服务器可以采用类似的更新方法更新子位图中的内容,在此不再赘述。
可选地,在某种情况下,若新数据中的某一标签值不是标签定义表中预先定义的标签值,则服务器在确定的位图索引分区中增加新增的标签值所对应的子位图。其中,属于该位图索引分区中的各条数据在新增的子位图中的位图位与在该位图索引分区中的其他子位图中的位图位一致。
(2)、更新承载体。
若新数据(例如数据表中的新数据记录)中增加承载体的标识,则在确定的位图索引分区中为该承载体分配位图位,根据该新数据的承载体的标识具有的所有标签值更新子位图。
可选地,对于确定的位图索引分区中的各个标签值所对应的子位图,若新数据的承载体具备该标签值,则在子位图中分配的位图位处增加用于表示具备该标签值的信息;举例说明,假设为新数据的承载体分配的位图位为5,某一标签值对应的初始子位图为0101,则更新后的子位图为01011;而若新数据的承载体不具备该标签值,则在子位图中分配的位图位处增加用于表示不具备该标签值的信息;举例说明,仍然以为新数据的承载体分配的位图位为5来举例,某一标签值对应的初始子位图为0101,则更新后的子位图为01010。
可选地,若用数组表示位图位中为“1”的位图位,则对于确定的位图索引分区中各个标签值所对应的子位图,若新数据的承载体具备该标签值,则在子位图中增加新分配的位图位;举例说明,假设为该新数据的承载体分配的位图位为5,该承载体具有某一标签值,该标签值对应的初始子位图为(3)。类似的,若用数组表示位图位中为“0”的位图位,则服务器可以采用类似方法更新,在此不做赘述。
可选地,若位图索引不划分索引分区,则服务器根据新对应关系直接更新位图索引。其具体更新方法与上述类似,不同的是,在划分索引分区时,需要先确定新数据中的承载体的标识所属的位图索引分区,然后再更新;而此处无需确定直接更新,在此不再赘述。
请参考图6,其示出了本申请一个实施例提供的数据查询方法的方法流程图,本实施例以该数据查询方法用于图2所示的服务器中来举例说明,如图6所示,该数据查询方法包括:
步骤601,接收数据查询请求,数据查询请求中携带有至少一个目标标签值。
终端需要查询具备某个标签值的数据时,终端可以发送数据查询请求至服务器,相应的,服务器可以接收到该数据查询请求。其中,数据查询请求中携带有需要查询的至少一个目标标签值。
可选地,终端需要查询具备标签值A的数据时,终端可以发送携带有标签值A的数据查询请求至服务器;在终端需要查询不具备标签值A的数据时,终端可以发送携带有“not A”的数据查询请求至服务器。
可选地,当数据查询请求中携带有至少两个目标标签值时,该至少两个目标标签值之间可以是“和”的关系,也可以为“或”的关系,还可以为“非”的关系。
比如,当终端需要查询同时满足标签值A和标签值B的数据时,终端可以发送携带有“A and B”的数据查询请求至服务器。又比如,当终端需要查询标签值为A或者标签值B的数据时,终端可以发送携带有“A or B”的数据查询请求至服务器。再比如,当需要查询标签值为A非B的数据时,终端可以发送携带有“A not B”的数据查询请求至服务器。
步骤602,根据位图索引,确定每个目标标签值对应的目标位图。
位图索引包括至少一个位图,每个位图对应于一个标签值,每个位图包括至少一个位图位,每个位图位用于记录一个承载体的标识对应承载体是否具备当前位图所对应的标签值,不同位图中相同的位图位对应于相同的承载体的标识;位图索引为数据表对应的索引。其中,该位图索引为上述实施例中创建的索引,对于其具体实现详见上述实施例,在此不再赘述。
服务器接收到数据查询请求之后,提取数据查询请求中的目标标签值,然后在位图索引中查询提取得到的每个目标标签值所对应的目标位图。
可选地,若目标标签值有至少两个,则服务器可以根据位图索引查询每个目标标签值所对应的目标位图。比如,数据查询请求中携带的目标标签值包括“A and B”,则服务器可以查询标签值A所对应的目标位图1以及标签值B所对应的目标位图2。
可选地,若预先设置有用于表示标签值所对应的位图是否常驻内存的标签配置信息,则在接收到数据查询请求之后,服务器可以根据目标标签值所对应的标签配置信息确定其对应的位图是否常驻内存,若常驻内存,则确定内存中的该目标标签值所对应的目标位图;而若不常驻内存,则从数据库中加载目标标签值对应的位图至内存中。
步骤603,确定至少一个目标位图均包含的目标位图位所对应的目标承载体的标识。
若目标标签值包括一个,且数据查询请求用于查询具备该目标标签值的数据,则服务器根据目标位图确定所需查询的数据所对应的位图位,并且,由于每个位图位与一个承载体的标识一一对应,因此,可以根据上述对应关系确定查询到的位图位所对应的承载体的标识,查询到的承载体的标识即为目标承载体的标识。
比如,假设步骤802中确定的目标位图为“01100001”,则服务器可以根据该目标位图确定得到位图位为2、3和8,进而确定得到位图位为2所对应的承载体的标识为标识2、位图位为3所对应的承载体的标识为标识3、位图位为8所对应的承载体的标识为标识8。
若目标标签值包括一个,且数据查询请求用于查询不具备该目标标签值的数据,则服务器可以确定目标位图中用于表示不具备该目标标签值的索引所对应的数据。本与上述确定步骤类似,在此不再赘述。可选地,一种可能实现,服务器还可以对目标位图进行非运算,然后计算为1的位图位所对应的承载体的标识。
若目标标签值包括至少两个,则若数据查询请求用于查询同时具备至少两个目标标签值的数据,则在步骤602中服务器可以确定得到至少两个目标位图;此时在位图为二进制序列时,服务器可以将确定得到的至少两个目标位图进行与运算,确定与运算后取值为1的位图位,然后查询确定的位图位所对应的承载体的标识。比如,假设目标标签值有A和B,且数据查询请求用于查询同时具备标签值A和标签值B的数据,服务器查询得到的标签值A所对应的位图为“01100001”,标签值B所对应的位图为“01001011”,则服务器进行与运算后得到“01000001”,此时服务器可以确定位图位为2和8所对应的承载体的标识。
若目标标签值包括至少两个,数据查询请求用于查询具备j个第一目标标签值或者具备k个第二目标标签值的数据,则在位图为二进制序列时,服务器可以将j个第一目标标签值所对应的j个目标位图进行与运算,将k个第二目标标签值所对应的k个目标位图进行与运算,然后将两者的与运算结果进行或运算,确定或运算后为“1”的位图位所对应的承载体的标识。其中,j和k为大于等于1的整数。并且,在j或者k为1时,上述所说的与运算也可以不执行也即可以认为与运算的结果为其本身。
若目标标签值包括至少两个,数据查询请求用于查询具备x个第三目标标签值而不具备y个第四目标标签值的数据,则服务器将x个第三目标标签值所对应的x个目标位图进行与运算得到第一结果,将y个第四目标标签值所对应的y个目标位图进行与运算得到第二结果,然后再计算第一结果和第二结果的非运算,确定非运算后为1的位图位所对应的承载体的标识。其中,x和y为大于等于1的整数。并且,在x或者y为1时,上述所说的与运算也可以不执行也即可以认为与运算的结果为其本身。
需要说明的是,在位图为数组时,其查询逻辑类似,在此不再赘述。
步骤604,查询数据表中目标承载体具有的标签值。
数据表中记录已存储的数据的至少一个承载体的标识和数据所具备的至少一个标签值的对应关系。可选地,数据表的结构与上述实施例中所说的结构类似,在此不再赘述。
在步骤603中确定得到目标承载体的标识之后,服务器即可根据数据表中记录的上述对应关系查询目标承载体的标识所对应的至少一个标签值。
步骤605,反馈目标承载体具有的标签值。
可选地,服务器将查询到的标签值反馈至终端。
综上所述,本实施例提供的数据查询方法,通过在接收到数据查询请求之后,直接位图索引即可确定得到所需查询的数据,解决了相关技术中数据查询效率较低的问题;达到了可以提高数据查询效率的效果。
同时,在进行多标签查询时,通过根据内存中的索引信息直接查询,使得在多标签查询时可以达到ms级查询。
需要说明的是,一种可能实现,若位图索引被划分为多个位图索引分区,如有N个位图索引分区,N为大于等于2的整数,则上述实施例中步骤602和步骤603会适应性的发生变化,因此,下述实施例将对位图位图索引分区存储时的数据查询方法做详细介绍。其中,每个位图索引分区中包括至少一个子位图,且每个位图索引分区中的子位图所对应的承载体的标识的集合不存在交集,这与上述数据存储中所涉及的位图索引分区类似,在此不再赘述。
请参考图7,其示出了本申请一个实施例提供的数据查询方法的方法流程图,本实施例以该数据查询方法用于图2所示的服务器中来举例说明,如图7所示,该数据查询方法包括:
步骤701,接收数据查询请求,数据查询请求中携带有至一个目标标签值。
步骤702,在多个位图索引分区中,分别确定每个目标标签值对应的目标子位图。
当位图索引在N个位图索引分区中分区存储时,对于每个位图索引分区,服务器可以查询在该位图索引分区中目标标签值所对应的目标子位图。比如,假设位图索引在4个位图索引分区中存储,则服务器在位图索引分区1中查询目标标签值所对应的目标子位图1,在位图索引分区2中查询目标标签值所对应的目标子位图2,在位图索引分区3中查询目标标签值所对应的目标子位图3,在位图索引分区4中查询目标标签值所对应的目标子位图4。
可选地,当数据查询请求中携带的目标标签值有至少两个时,对于每个位图索引分区,服务器可以查询在该位图索引分区中每个目标标签值所对应的目标子位图索引。比如,假设数据查询请求中携带的目标标签值有目标标签值1、目标标签值2和目标标签值3,则以在位图索引分区1中进行查询为例,服务器可以在位图索引分区1中查询目标标签值1所对应的目标子位图1、目标标签值2所对应的目标子位图2、以及目标标签值3所对应的目标子位图3。
步骤703,在多个位图索引分区中,分别确定至少一个目标子位图均包含的目标位图位所对应的目标承载体的标识。
在确定每个位图索引分区中的至少一个目标子位图之后,服务器可以在每个位图索引
分区中根据确定的至少一个目标子位图确定目标位图位,并确定目标位图位对应的承载体的标识,在各个位图索引分区中确定的各个承载体的标识即为最终的目标承载体的标识。
其中,对于每个位图索引分区中确定目标位图位以及确定目标承载体的标识的具体确定方式与上述实施例在位图索引中确定目标位图位以及确定目标承载体的标识的确定方式类似,在此不再赘述。
步骤704,查询数据表中目标承载体具有的标签值。
步骤705,反馈目标承载体具有的标签值。
其中,步骤704和步骤705的具体实现与上述实施例中的步骤604和步骤605的具体实现类似,在此不再赘述。
综上所述,本实施例提供的数据查询方法,通过在接收到数据查询请求之后,直接根据位图索引即可确定得到所需查询的数据,解决了相关技术中数据查询效率较低的问题;达到了可以提高数据查询效率的效果。
同时,在进行多标签查询时,通过根据内存中的索引信息直接查询,使得在多标签查询时可以达到ms级查询。
图8所示为本发明实施例提供的数据存储装置的示意图。数据存储装置800可以是计算机设备,该计算机设备可以是上述的服务器(例如图2所示的服务器220),数据存储装置800包括至少一个处理器801,通信总线802,存储器803以及至少一个通信接口804。
处理器801可以是一个通用中央处理器(CPU),微处理器,特定应用集成电路(application-specific integrated circuit,ASIC),或一个或多个用于控制本发明方案程序执行的集成电路。
通信总线802可包括一通路,在上述组件之间传送信息。所述通信接口804,使用任何收发器一类的装置,用于与其他设备或通信网络通信,如以太网,无线接入网(RAN),无线局域网(Wireless Local Area Networks,WLAN)等。
存储器803可以是只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)或者可存储信息和指令的其他类型的动态存储设备,也可以是电可擦可编程只读存储器(Electrically Erasable Programmable Read-Only Memory,EEPROM)、只读光盘(Compact Disc Read-Only Memory,CD-ROM)或其他光盘存储、光碟存储(包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟等)、磁盘存储介质或者其他磁存储设备、或者能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。存储器可以是独立存在,通过总线与处理器相连接。存储器也可以和处理器集成在一起。
其中,所述存储器803用于存储执行本发明方案的程序代码,并由处理器801来控制执行。所述处理器801用于执行所述存储器803中存储的程序代码。
在具体实现中,作为一种实施例,处理器801可以包括一个或多个CPU,例如图8中的CPU0和CPU1。
在具体实现中,作为一种实施例,数据存储装置800可以包括多个处理器,例如图8中的处理器801和处理器808。这些处理器中的每一个可以是一个单核(single-CPU)处理器,也可以是一个多核(multi-CPU)处理器。这里的处理器可以指一个或多个设备、电路、
和/或用于处理数据(例如计算机程序指令)的处理核。
在具体实现中,作为一种实施例,数据存储装置800还可以包括输出设备805和输入设备806。输出设备805和处理器801通信,可以以多种方式来显示信息。例如,输出设备805可以是液晶显示器(liquid crystal display,LCD),发光二级管(light emitting diode,LED)显示设备,阴极射线管(cathode ray tube,CRT)显示设备,或投影仪(projector)等。输入设备806和处理器801通信,可以以多种方式接受用户的输入。例如,输入设备806可以是鼠标、键盘、触摸屏设备或传感设备等。
上述的数据存储装置800可以是一个通用计算机设备或者是一个专用计算机设备。在具体实现中,数据存储装置800可以是台式机、便携式电脑、网络服务器、掌上电脑(Personal Digital Assistant,PDA)、移动手机、平板电脑、无线终端设备、通信设备、嵌入式设备或有图8中类似结构的设备。本发明实施例不限定数据存储装置800的类型。
数据存储装置的存储器中存储了一个或多个软件模块。数据存储装置可以通过处理器以及存储器中的程序代码来实现软件模块,实现上述实施例所说的数据存储方法。
请参考图9,其示出了本申请一个实施例提供的数据存储装置的结构示意图,如图9所示,该数据存储装置可以包括:获取单元910和建立单元920;
其中,获取单元910用于执行上述实施例中的步骤401;
建立单元920用于执行上述实施例中的步骤402。
可选地,该数据存储装置还可以包括设置单元;
该设置单元,用于执行上述实施例中的步骤301;
获取单元910,还用于执行上述实施例中的步骤302。
可选地,所述标签定义表还包括标签配置信息,所述标签配置信息包括用于表示每个标签值所对应的位图是否常驻内存的信息;该数据存储装置还可以包括加载单元;
该加载单元,用于根据所述标签配置信息,将需要常驻内存的位图加载至内存。
可选地,该装置还包括划分单元;
该划分单元,用于将所述位图索引划分为多个位图索引分区,每个位图索引分区中包括至少一个子位图,每个位图索引分区对应于一个承载体标识的集合,不同位图索引分区所对应的承载体标识的集合不存在交集。
可选地,该装置还包括划分单元和确定单元;
划分单元,用于根据承载体的标识的第一范围将所述数据表划分为M个数据分区,每个数据分区包括至少一个子数据表,每个数据分区所对应的承载体标识的集合为所述第一范围,且不同数据分区所对应的承载体标识的集合不存在交集;
确定单元,用于根据所述M个数据分区确定N个位图索引分区,每个位图索引分区中包括至少一个子位图,每个位图索引分区所对应的承载体的标识的范围大于等于所述第一范围,且不同位图索引分区所对应的承载体的标识的集合不存在交集,M、N为正整数,N小于或等于M,N大于或等于2。
可选地,所述每个位图索引分区中的子位图的数量为预先定义的标签定义表中设置的全部标签值的数量。
可选地,所述位图索引中的不同位图中相同的位图位对应于相同的承载体的标识;每
个位图索引分区中的不同子位图的相同的位图位对应于相同的承载体的标识,不同位图索引分区中的不同子位图的相同的位图位对应于不相同的承载体的标识。
可选地,该装置还包括:接收单元、确定单元、查询单元和反馈单元;
接收单元,用于执行上述实施例中的步骤601或者步骤701;
确定单元,用于执行上述实施例中的步骤602和步骤703,或者,执行步骤602和步骤703;
查询单元,用于执行上述实施例中的步骤604或者步骤704;
反馈单元,用于执行上述实施例中的步骤605或者步骤705。
可选地,获取单元910,还用于获取新数据,其中,在所述新数据包括的新对应关系中承载体的标识和/或标签值是新的;
该装置还包括更新单元,该更新单元,用于根据所述新对应关系更新所述位图索引。
可选地,更新单元,还用于确定所述新对应关系中的承载体的标识所属的位图索引分区;在确定的位图索引分区中,更新所述新对应关系中的标签值,和/或更新所述新对应关系中的标签值所对应的位图。
综上所述,本实施例提供的数据存储装置,通过在获取到至少一条数据记录之后,建立该至少一条数据记录对应的位图索引,且位图索引中包括标签值和其对应的位图的对应关系,位图中包括用于记录承载体标识所对应的承载体是否具备对应的标签值的至少一个位图位;使得在后续基于标签值查询时,可以根据位图索引直接确定对应的数据,提高了数据查询效率。
本申请一个实施例还提供了一种计算机存储介质,该计算机存储介质中存储有指令;数据存储装置(可以是计算机设备,例如服务器)执行该指令,例如数据存储装置中的处理器执行该指令,使得该数据存储装置实现上述实施例所说的数据存储方法。
本申请实施例提供一种计算机程序产品,该计算机程序产品包括指令;数据存储装置(可以是计算机设备,例如服务器)执行该指令,使得该数据存储装置执行上述方法实施例的数据存储方法。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应所述以权利要求的保护范围为准。
Claims (22)
- 一种数据存储方法,其特征在于,所述方法包括:获取至少一条数据记录,每条数据记录包括一个承载体标识和至少一个标签值,所述至少一条数据记录存储于数据表中,所述数据表用于记录承载体标识与标签值的对应关系;建立所述至少一条数据记录对应的位图索引,所述位图索引包括至少一个位图,每个位图对应于一个标签值,每个位图包括至少一个位图位,每个位图位用于记录一个承载体标识所对应的承载体是否具备当前位图所对应的标签值。
- 根据权利要求1所述的方法,其特征在于,所述获取至少一条数据记录之前,所述方法包括:预先设置标签定义表,所述预先设置的标签定义表包括预设的多个标签值;所述获取至少一条数据记录包括:根据待存储的源数据和所述预先设置的标签定义表,生成所述数据表。
- 根据权利要求2所述的方法,其特征在于,所述预先设置的标签定义表还包括标签配置信息,所述标签配置信息包括表示每个标签值所对应的位图是否常驻内存的信息;所述方法还包括:根据所述标签配置信息,将需要常驻内存的位图加载至内存。
- 根据权利要求1至3任一项所述的方法,其特征在于,所述方法还包括:将所述位图索引划分为多个位图索引分区,每个位图索引分区中包括至少一个子位图,每个位图索引分区对应于一个承载体标识的集合,不同位图索引分区所对应的承载体标识的集合不存在交集。
- 根据权利要求1至3任一项所述的方法,其特征在于,所述方法还包括:根据承载体标识的第一范围将所述数据表划分为M个数据分区,每个数据分区包括至少一个子数据表,每个数据分区所对应的承载体标识的集合为所述第一范围,且不同数据分区所对应的承载体标识的集合不存在交集;根据所述M个数据分区确定N个位图索引分区,每个位图索引分区中包括至少一个子位图,每个位图索引分区对应的承载体标识的范围大于等于所述第一范围,且不同位图索引分区所对应的承载体标识的集合不存在交集,M、N为正整数,N小于或等于M,N大于或等于2。
- 根据权利要求4或5所述的方法,其特征在于,所述每个位图索引分区中的子位图的数量为预先定义的标签定义表中设置的全部标签值的数量。
- 根据权利要求4或5或6所述的方法,其特征在于,所述位图索引中的不同位图中相同的位图位对应于相同的承载体的标识;每个位图索引分区中的不同子位图的相同的位图位对应于相同的承载体的标识,不同位图索引分区中的不同子位图的相同的位图位对应于不相同的承载体的标识。
- 根据权利要求4至7任一项所述的方法,其特征在于,接收数据查询请求,所述数据查询请求携带有至少一个目标标签值;在所述多个位图索引分区中,分别确定所述至少一个目标标签值对应的至少一个目标子位图,分别确定所述至少一个目标子位图均包含的目标位图位所对应的目标承载体的标识;查询所述数据表中所述目标承载体具有的标签值;反馈所述目标承载体具有的标签值。
- 根据权利要求1至8任一项所述的方法,其特征在于,所述方法包括:获取新数据,其中,在所述新数据包括的新对应关系中承载体的标识和/或标签值是新的;根据所述新对应关系更新所述位图索引。
- 根据权利要求9所述的方法,其特征在于,所述根据所述新对应关系更新位图索引,包括:确定所述新对应关系中的承载体的标识所属的位图索引分区;在确定的位图索引分区中,更新所述新对应关系中的标签值,和/或更新所述新对应关系中的标签值所对应的位图。
- 一种数据存储装置,其特征在于,所述装置包括:获取单元,用于获取至少一条数据记录,每条数据记录包括一个承载体标识和至少一个标签值,所述至少一条数据记录存储于数据表中,所述数据表用于记录承载体标识与标签值的对应关系;建立单元,用于建立所述至少一条数据记录对应的位图索引,所述位图索引包括至少一个位图,每个位图对应于一个标签值,每个位图包括至少一个位图位,每个位图位用于记录一个承载体标识所对应的承载体是否具备当前位图所对应的标签值。
- 根据权利要求11所述的装置,其特征在于,所述装置包括:设置单元,用于在所述获取单元获取所述至少一条数据记录之前,预先设置标签定义表,所述预先设置的标签定义表包括预设的多个标签值;所述获取单元,还用于根据待存储的源数据和所述预先设置的标签定义表,生成所述数据表。
- 根据权利要求12所述的装置,其特征在于,所述预先设置的标签定义表还包括标签配置信息,所述标签配置信息包括表示每个标签值所对应的位图是否常驻内存的信息;所述装置还包括:加载单元,用于根据所述标签配置信息,将需要常驻内存的位图加载至内存。
- 根据权利要求11至13任一项所述的装置,其特征在于,所述装置还包括:划分单元,用于将所述位图索引划分为多个位图索引分区,每个位图索引分区中包括至少一个子位图,每个位图索引分区对应于一个承载体标识的集合,不同位图索引分区所对应的承载体标识的集合不存在交集。
- 根据权利要求11至13任一项所述的装置,其特征在于,所述装置还包括:划分单元,用于根据承载体标识的第一范围将所述数据表划分为M个数据分区,每个数据分区包括至少一个子数据表,每个数据分区所对应的承载体标识的集合为所述第一范围,且不同数据分区所对应的承载体标识的集合不存在交集;确定单元,用于根据所述M个数据分区确定N个位图索引分区,每个位图索引分区中包括至少一个子位图,每个位图索引分区对应的承载体标识的范围大于等于所述第一范围,且不同位图索引分区所对应的承载体标识的集合不存在交集,M、N为正整数,N小于或等于M,N大于或等于2。
- 根据权利要求14或15所述的装置,其特征在于,所述每个位图索引分区中的子位 图的数量为预先定义的标签定义表中设置的全部标签值的数量。
- 根据权利要求14或15或16所述的装置,其特征在于,所述位图索引中的不同位图中相同的位图位对应于相同的承载体的标识;每个位图索引分区中的不同子位图的相同的位图位对应于相同的承载体的标识,不同位图索引分区中的不同子位图的相同的位图位对应于不相同的承载体的标识。
- 根据权利要求14至17任一项所述的装置,其特征在于,所述装置还包括:接收单元,用于接收数据查询请求,所述数据查询请求携带有至少一个目标标签值;确定单元,用于在所述多个位图索引分区中,分别确定所述至少一个目标标签值对应的至少一个目标子位图,分别确定所述至少一个目标子位图均包含的目标位图位所对应的目标承载体的标识;查询单元,用于查询所述数据表中所述目标承载体具有的标签值;反馈单元,用于反馈所述目标承载体具有的标签值。
- 根据权利要求11至18任一项所述的装置,其特征在于,所述获取单元,还用于获取新数据,其中,在所述新数据包括的新对应关系中承载体的标识和/或标签值是新的;所述装置包括:更新单元;所述更新单元,用于根据所述新对应关系更新所述位图索引。
- 根据权利要求19所述的装置,其特征在于,所述更新单元,还用于:确定所述新对应关系中的承载体的标识所属的位图索引分区;在确定的位图索引分区中,更新所述新对应关系中的标签值,和/或更新所述新对应关系中的标签值所对应的位图。
- 一种数据存储装置,其特征在于,所述装置包括:存储器和处理器,所述存储器中存储有指令,所述处理器通过执行所述存储器中存储的指令使得所述数据存储装置实现如权利要求1至10任一所述的数据存储方法。
- 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有指令,数据存储装置执行所述指令使得所述数据存储装置实现权利要求1至10任一所述的数据存储方法。
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP17920395.5A EP3637280B1 (en) | 2017-08-03 | 2017-08-03 | Data storage method and device, and storage medium |
CN201780003522.9A CN110168529B (zh) | 2017-08-03 | 2017-08-03 | 数据存储方法、装置和存储介质 |
PCT/CN2017/095893 WO2019024060A1 (zh) | 2017-08-03 | 2017-08-03 | 数据存储方法、装置和存储介质 |
US16/748,252 US11249969B2 (en) | 2017-08-03 | 2020-01-21 | Data storage method and apparatus, and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2017/095893 WO2019024060A1 (zh) | 2017-08-03 | 2017-08-03 | 数据存储方法、装置和存储介质 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/748,252 Continuation US11249969B2 (en) | 2017-08-03 | 2020-01-21 | Data storage method and apparatus, and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2019024060A1 true WO2019024060A1 (zh) | 2019-02-07 |
Family
ID=65233263
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2017/095893 WO2019024060A1 (zh) | 2017-08-03 | 2017-08-03 | 数据存储方法、装置和存储介质 |
Country Status (4)
Country | Link |
---|---|
US (1) | US11249969B2 (zh) |
EP (1) | EP3637280B1 (zh) |
CN (1) | CN110168529B (zh) |
WO (1) | WO2019024060A1 (zh) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110263044A (zh) * | 2019-06-21 | 2019-09-20 | 深圳前海微众银行股份有限公司 | 数据存储方法、装置、设备及计算机可读存储介质 |
CN111159204A (zh) * | 2020-01-02 | 2020-05-15 | 北京东方金信科技有限公司 | 一种通过配置的方式生成标签的方法及系统 |
CN112532748A (zh) * | 2020-12-24 | 2021-03-19 | 北京百度网讯科技有限公司 | 消息推送方法、装置、设备、介质和计算机程序产品 |
CN113360499A (zh) * | 2021-06-01 | 2021-09-07 | 北京沃东天骏信息技术有限公司 | 数据查询方法和装置 |
CN113704339A (zh) * | 2021-08-30 | 2021-11-26 | 平安普惠企业管理有限公司 | 已读信息状态的记录、装置、设备及存储介质 |
Families Citing this family (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11663207B2 (en) * | 2018-09-24 | 2023-05-30 | Salesforce, Inc. | Translation of tenant identifiers |
US11392606B2 (en) * | 2019-10-30 | 2022-07-19 | Disney Enterprises, Inc. | System and method for converting user data from disparate sources to bitmap data |
CN111131453A (zh) * | 2019-12-24 | 2020-05-08 | 中国平安财产保险股份有限公司 | 一种服务请求响应方法、装置、计算机设备及存储介质 |
CN111858617B (zh) * | 2020-08-06 | 2024-10-08 | 贝壳技术有限公司 | 用户查找方法和装置、计算机可读存储介质、电子设备 |
CN112650887B (zh) * | 2020-12-22 | 2022-02-18 | 广州锦行网络科技有限公司 | 一种图数据库时间属性的快速查询方法 |
CN113068045A (zh) * | 2021-03-17 | 2021-07-02 | 厦门雅基软件有限公司 | 数据存储方法、装置、电子设备及计算机可读存储介质 |
CN113722533B (zh) * | 2021-08-30 | 2023-10-17 | 康键信息技术(深圳)有限公司 | 信息推送方法、装置、电子设备及可读存储介质 |
CN114244595B (zh) * | 2021-12-10 | 2024-03-12 | 北京达佳互联信息技术有限公司 | 权限信息的获取方法、装置、计算机设备及存储介质 |
CN114490656A (zh) * | 2022-01-26 | 2022-05-13 | 阿里云计算有限公司 | 数据查询方法、装置、设备及存储介质 |
CN115017875B (zh) * | 2022-08-09 | 2022-11-25 | 建信金融科技有限责任公司 | 企业信息处理方法、装置、系统、设备和介质 |
CN117332153B (zh) * | 2023-10-10 | 2024-07-09 | 天翼数字生活科技有限公司 | 一种基于区间矩阵的标签选择方法、系统、设备及介质 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102110146A (zh) * | 2011-02-16 | 2011-06-29 | 清华大学 | 基于键值key-value存储的分布式文件系统元数据管理方法 |
CN102254012A (zh) * | 2011-07-19 | 2011-11-23 | 北京大学 | 一种基于外存的图数据存储方法及子图查询方法 |
CN102779180A (zh) * | 2012-06-29 | 2012-11-14 | 华为技术有限公司 | 数据存储系统的操作处理方法,数据存储系统 |
CN105630972A (zh) * | 2015-12-24 | 2016-06-01 | 网易(杭州)网络有限公司 | 数据处理方法及装置 |
Family Cites Families (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6081800A (en) * | 1997-02-28 | 2000-06-27 | Oracle Corporation | Creating bitmaps from multi-level identifiers |
US6067540A (en) * | 1997-02-28 | 2000-05-23 | Oracle Corporation | Bitmap segmentation |
US6216125B1 (en) * | 1998-07-02 | 2001-04-10 | At&T Corp. | Coarse indexes for a data warehouse |
US6658405B1 (en) * | 2000-01-06 | 2003-12-02 | Oracle International Corporation | Indexing key ranges |
US7171427B2 (en) * | 2002-04-26 | 2007-01-30 | Oracle International Corporation | Methods of navigating a cube that is implemented as a relational object |
US7756853B2 (en) * | 2003-08-18 | 2010-07-13 | Oracle International Corporation | Frequent itemset counting using subsets of bitmaps |
US7774346B2 (en) * | 2005-08-26 | 2010-08-10 | Oracle International Corporation | Indexes that are based on bitmap values and that use summary bitmap values |
CN101087205A (zh) * | 2006-06-07 | 2007-12-12 | 华为技术有限公司 | 上报用户代理档案信息的方法、系统及终端设备 |
US9280780B2 (en) * | 2014-01-27 | 2016-03-08 | Umbel Corporation | Systems and methods of generating and using a bitmap index |
CN106682042B (zh) * | 2015-11-11 | 2019-11-22 | 杭州海康威视数字技术股份有限公司 | 一种关系数据缓存及查询方法及装置 |
US9489410B1 (en) * | 2016-04-29 | 2016-11-08 | Umbel Corporation | Bitmap index including internal metadata storage |
-
2017
- 2017-08-03 CN CN201780003522.9A patent/CN110168529B/zh active Active
- 2017-08-03 EP EP17920395.5A patent/EP3637280B1/en active Active
- 2017-08-03 WO PCT/CN2017/095893 patent/WO2019024060A1/zh unknown
-
2020
- 2020-01-21 US US16/748,252 patent/US11249969B2/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102110146A (zh) * | 2011-02-16 | 2011-06-29 | 清华大学 | 基于键值key-value存储的分布式文件系统元数据管理方法 |
CN102254012A (zh) * | 2011-07-19 | 2011-11-23 | 北京大学 | 一种基于外存的图数据存储方法及子图查询方法 |
CN102779180A (zh) * | 2012-06-29 | 2012-11-14 | 华为技术有限公司 | 数据存储系统的操作处理方法,数据存储系统 |
CN105630972A (zh) * | 2015-12-24 | 2016-06-01 | 网易(杭州)网络有限公司 | 数据处理方法及装置 |
Non-Patent Citations (1)
Title |
---|
See also references of EP3637280A4 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110263044A (zh) * | 2019-06-21 | 2019-09-20 | 深圳前海微众银行股份有限公司 | 数据存储方法、装置、设备及计算机可读存储介质 |
CN110263044B (zh) * | 2019-06-21 | 2023-03-31 | 深圳前海微众银行股份有限公司 | 数据存储方法、装置、设备及计算机可读存储介质 |
CN111159204A (zh) * | 2020-01-02 | 2020-05-15 | 北京东方金信科技有限公司 | 一种通过配置的方式生成标签的方法及系统 |
CN111159204B (zh) * | 2020-01-02 | 2020-08-11 | 北京东方金信科技有限公司 | 一种通过配置的方式生成标签的方法及系统 |
CN112532748A (zh) * | 2020-12-24 | 2021-03-19 | 北京百度网讯科技有限公司 | 消息推送方法、装置、设备、介质和计算机程序产品 |
CN113360499A (zh) * | 2021-06-01 | 2021-09-07 | 北京沃东天骏信息技术有限公司 | 数据查询方法和装置 |
CN113704339A (zh) * | 2021-08-30 | 2021-11-26 | 平安普惠企业管理有限公司 | 已读信息状态的记录、装置、设备及存储介质 |
Also Published As
Publication number | Publication date |
---|---|
EP3637280B1 (en) | 2023-05-24 |
US20200159708A1 (en) | 2020-05-21 |
US11249969B2 (en) | 2022-02-15 |
EP3637280A1 (en) | 2020-04-15 |
EP3637280A4 (en) | 2020-04-29 |
CN110168529A (zh) | 2019-08-23 |
CN110168529B (zh) | 2022-05-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2019024060A1 (zh) | 数据存储方法、装置和存储介质 | |
WO2019052209A1 (zh) | 数据存储方法、装置及存储介质 | |
WO2018161940A1 (zh) | 媒体文件的推送方法及装置、存储介质以及电子装置 | |
CN108228817A (zh) | 数据处理方法、装置和系统 | |
WO2019024496A1 (zh) | 企业推荐方法及应用服务器 | |
US10182024B1 (en) | Reallocating users in content sharing environments | |
CN104933173B (zh) | 一种用于异构多数据源的数据处理方法、装置和服务器 | |
CN108319661A (zh) | 一种备件信息的结构化存储方法及装置 | |
WO2018219285A1 (zh) | 一种数据对象展示方法及装置 | |
US9754015B2 (en) | Feature rich view of an entity subgraph | |
WO2021217659A1 (zh) | 多源异构数据的处理方法、计算机设备、存储介质 | |
WO2022083436A1 (zh) | 数据处理方法、装置、设备及可读存储介质 | |
CN111723161A (zh) | 一种数据处理方法、装置及设备 | |
US10311082B2 (en) | Synchronization of offline instances | |
CN109063061B (zh) | 跨分布式系统数据处理方法、装置、设备及存储介质 | |
US8880562B2 (en) | Generating a supplemental description of an entity | |
WO2020024824A1 (zh) | 一种用户状态标识确定方法及装置 | |
CN116737753A (zh) | 业务数据处理方法、装置、计算机设备和存储介质 | |
CN104813304A (zh) | 标识由服务存储的共享内容 | |
CN113821514B (zh) | 数据拆分方法、装置、电子设备和可读存储介质 | |
CN116303657A (zh) | 群体画像生成方法、装置、计算机设备和存储介质 | |
CN113761102B (zh) | 数据处理方法、装置、服务器、系统和存储介质 | |
WO2022160443A1 (zh) | 谱系挖掘方法、装置、电子设备及计算机可读存储介质 | |
US10114864B1 (en) | List element query support and processing | |
WO2019165762A1 (zh) | 一种抽样查询的方法和装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
ENP | Entry into the national phase |
Ref document number: 2017920395 Country of ref document: EP Effective date: 20200108 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |