WO2013097065A1 - 一种索引数据处理方法及设备 - Google Patents

一种索引数据处理方法及设备 Download PDF

Info

Publication number
WO2013097065A1
WO2013097065A1 PCT/CN2011/084609 CN2011084609W WO2013097065A1 WO 2013097065 A1 WO2013097065 A1 WO 2013097065A1 CN 2011084609 W CN2011084609 W CN 2011084609W WO 2013097065 A1 WO2013097065 A1 WO 2013097065A1
Authority
WO
WIPO (PCT)
Prior art keywords
index
data
segment
unit
segmentation
Prior art date
Application number
PCT/CN2011/084609
Other languages
English (en)
French (fr)
Inventor
曹俊亮
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to PCT/CN2011/084609 priority Critical patent/WO2013097065A1/zh
Priority to CN201180003412.5A priority patent/CN102725754B/zh
Publication of WO2013097065A1 publication Critical patent/WO2013097065A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2246Trees, e.g. B+trees

Definitions

  • the present invention relates to the field of data processing technologies, and in particular, to an index data processing method and device. Background technique
  • the Key-Value database provides key-value pairs ⁇ Key, Value ⁇ storage, and the general Key-Value system only provides Key-based operations.
  • an adaptation layer needs to be built on the basis of the Key-value system to provide structured data retrieval.
  • the data models contained in the Values of different Keys are tabulated.
  • Value is abstracted into the way the attribute name and attribute value (or column and value), so that the Value corresponding to different Keys may have the same attribute name, so it is possible to create a premise for searching by attribute name.
  • the value of the attribute can be indexed.
  • Index data can be organized in the form of a B+ tree or a prefix hash tree in the prior art.
  • the B+ tree structure is shown in FIG. 1a, where the internal node stores the information of the range, for example, the internal node 1 indicates that the attribute value less than or equal to 0110 is in the subtree to the left, and the subtree with the attribute value greater than 0110 on the right side thereof. Medium; saves the actual index data in the leaf node.
  • the prefix hash tree constructs a data structure based on the dictionary tree. The principle is to use the common prefix of the string to organize the data.
  • the prefix hash tree including the character set of 0 and 1 is shown in Figure lb.
  • the prefix is a string consisting of characters from the root to this node, the intermediate node stores only the relationship with the child nodes, and the leaf node stores the real index data.
  • a B+ tree or a prefix hash tree structure needs to be separately organized to store the index data of the attribute, and Each node in the tree corresponds to a key-value pair in the Key-Value database. Therefore, when performing related data operations on the index data, such as locating or adding or deleting an index data, it is necessary to first obtain from a plurality of B+ trees or The B+ tree or the prefix hash tree corresponding to the attribute information of the index data is determined in the prefix hash tree, and then traversed from the upper internal node or the intermediate node until it reaches a leaf node.
  • Embodiments of the present invention provide an index data processing method and device, which can reduce interaction with a Key-Value system and improve efficiency when processing index data.
  • An index data processing method in which at least one index data of attribute information is stored in an index structure, the index structure includes a header index and at least one index, and an index of the at least one index is stored in the header index.
  • the index of the at least one index includes attribute information of the index data stored in the block index, and the method includes:
  • index of an index of an index can be determined according to the header index in the index structure and the attribute information of the specified index data, locate the block index according to an index of the block index, and match the block index
  • the data items of the index data are processed.
  • An index data processing device wherein at least one index data of attribute information is stored in an index structure, the index structure includes a header index and at least one index, and an index of the at least one index is stored in the header index.
  • the index of the at least one index includes attribute information of the index data stored in the block index, and the device includes:
  • An instruction receiving module configured to receive a processing instruction for specifying index data, where the processing instruction includes attribute information specifying the index data;
  • a data processing module configured to: when an index of an index of an index can be determined according to a header index in the index structure and attribute information of the specified index data, locate the block index according to an index of the block index, and A data item matching the index data in the block index is processed.
  • the index data with different attribute information is stored in an index structure, so that the data storage is more compact, so that one Key-Value access can acquire or update more data, so the data processing time and the Key can be reduced.
  • the interaction between the -Value systems improves the performance of index creation and querying.
  • Figure la is a schematic diagram of index data organized in a B+ tree structure in the prior art
  • Figure lb is a schematic diagram of index data organized in the form of a prefix hash tree in the prior art
  • FIG. 2 is a flowchart of a method for processing index data according to an embodiment of the present invention
  • FIG. 3 is a schematic diagram of an index structure in the embodiment shown in FIG. 2;
  • Figure 4a is a schematic diagram of an index structure
  • 4b is a flowchart of another method for processing index data according to an embodiment of the present invention.
  • FIG. 5 is a flowchart of deleting index data according to an embodiment of the present invention.
  • FIG. 6 is a schematic diagram of an index structure in the embodiment shown in FIG. 5;
  • FIG. 7 is a schematic structural diagram of a block index in another embodiment of the present invention.
  • FIG. 8 is a flowchart of adding index data according to an embodiment of the present invention.
  • FIG. 9 is a schematic diagram of an index structure of another embodiment of the present invention.
  • FIG. 10 is a schematic structural diagram of an index data processing device according to an embodiment of the present invention.
  • FIG. 11 is a schematic structural diagram of another index data processing device according to an embodiment of the present invention.
  • FIG. 2 is a flowchart of a method for processing index data according to an embodiment of the present invention.
  • the index data of the at least one attribute information is stored in an index structure.
  • the index structure includes a head index 31 and at least one block index 32, the block index. 32 may be used to store index data having a large number of data items, the at least one index index reference 33 being stored in the header index 31, and the index 33 of at least one index including the index data stored in the block index 32. Attribute information.
  • the processing method of the index data may include:
  • Step 201 Receive a processing instruction for specifying index data, where the processing instruction includes attribute information of the specified index data.
  • the lead structure can be stored in a database inside or outside the terminal or server such as a PC.
  • the index data may include attribute information such as an attribute name (attr) and an attribute value (value), and further includes a specific data item (ite), and a column ⁇ Attm, Valuen, Itemn>.
  • Step 202 If an index of an index of an index can be determined according to the header index in the index structure and the attribute information of the specified index data, the block index is located according to the index of the block index, and the data item matching the index data in the block index is processed.
  • the terminal or server such as a PC searches the attribute information in the index 33 of the block index stored in the header index 31, and if the attribute information of the specified index data is included therein, the block index 32 can be located, and the block index 32 can be The data item of the index data that matches the index data is specified for processing.
  • the index structure may include at least one storage unit.
  • the storage unit in addition to the block index 41, at least one segment index may be included.
  • Index ) 42 is used for storing index data with a small number of data items
  • the index 44 of the block index and the index of the segment index are stored in the header index 43
  • the index 45 of the segment index is stored in the index.
  • the index data is stored in the block index 41 and the segment index 42 according to the number of the respective data items. For example, when the number of data items of the index data is greater than a preset value, the index data may be stored in the block.
  • the index data when less than the preset value, may be stored in the segment index 42, wherein the block index 41 and the segment index 42 may be used to distinguish the storage index data, of course, in another embodiment
  • the index data can also be stored in the block index and the segment index based on other information, such as attribute information of the index data.
  • the attribute information may include an attribute name, an attribute value, and the like.
  • only the segment index in the index structure may be included in the index structure.
  • the storage unit in all embodiments of the present invention may be a computer readable storage medium or a database system, which may be a distributed database system.
  • the above index structure may be a data structure stored in the above computer readable storage medium or database system.
  • the segment index is located according to the index of the segment index, and the matching index is matched in the segment index.
  • the data items of the data are processed. Specifically, the terminal or the server such as the PC searches for the attribute information in the index of the segment index stored in the header index, and if the attribute information of the specified index data is included, the segment index may be located, and then the segment index may be The data item of the index data that matches the specified index data is processed.
  • the determining process may be: first searching whether the attribute information of the specified index data exists in the index 44 of the block index; if yes, determining the designation The attribute information of the index data belongs to the index 44 of the block index; if not, it is determined that the attribute information of the specified index data belongs to the index 45 of the segment index. Of course, it is also possible to first find the index of the segment index 45 whether there is attribute information of the specified index data.
  • the index data with different attribute information is stored in an index structure, so that the data storage is more compact, so that one Key-Value access can acquire or update more data, so the data processing time and the Key can be reduced.
  • the interaction between the -Value systems improves the performance of index creation and querying.
  • FIG. 4b it is a flowchart of another method for processing index data according to an embodiment of the present invention.
  • the processing method of the I data may include:
  • Step 401 Receive a processing instruction for specifying index data, where the processing instruction includes attribute information of the specified index data.
  • the navigation structure can be stored in an internal database of a terminal or server such as a PC, or in an external database independent of a terminal or server such as a PC.
  • Step 402 Determine that the attribute information of the specified index data is located in the index of the segment index in the header index or an index of the block index.
  • the index data may include attribute information such as an attribute name (attr) and an attribute value (value), and further includes a specific item (item), such as ⁇ Attrn, Valuen, Itemn>.
  • the determining process may be: first searching for the attribute information of the specified index data in the index 44 of the block index; if yes, determining that the attribute information of the specified index data belongs to the index 44 of the block index; If not, it is determined that the attribute information of the specified index data belongs to the index 45 of the segment index. Of course, it is also possible to first find the index 45 of the segment index whether there is attribute information of the specified index data.
  • step 403 If the attribute information of the specified index data belongs to the index 44 of the block index, step 403 is performed; if the attribute information of the specified index data belongs to the index 45 of the segment index, step 404 is performed.
  • Step 403 Perform data processing on the data item matching the index data according to the data item of the specified index data in the block index.
  • the matching index data in the block index has the same attribute information as the specified index data.
  • the data processing performed on the data items in the matching index data may refer to:
  • the data item of the specified index data is located; or, in the data item matching the index data, the data item of the specified index data is deleted; or, in the data item matching the index data, the specified index is added.
  • the data item of the data is not limited to:
  • the index data in the processed block index may be further deleted, or merged, or divided, or migrated.
  • Step 404 Perform data processing on the data item matching the index data according to the data item of the specified index data in the segment index.
  • the matching index data in the segment index has the same attribute information as the specified index data.
  • the data processing performed on the data items in the matching index data may refer to:
  • the data item of the specified index data is located; or, in the data item matching the index data, the data item of the specified index data is deleted; or, in the data item matching the index data, the specified index is added.
  • the data item of the data is not limited to:
  • the index data in the processed segment index may be further deleted, merged, or divided, or migrated.
  • the header index may be searched after receiving the processing instruction for the specified index data.
  • the attribute information in the index of the block index stored in the block if the attribute information of the specified index data is included, the block index may be located, and then the data item of the index data matching the specified index data in the block index may be processed. For example, locate, delete, increase, etc.
  • the embodiment of the invention stores the index data with different attribute information in an index structure, so that the data storage is more compact, so that a Key-Value access can acquire or update more.
  • the data so it can reduce the interaction between the data processing and the Key-Value system, thereby improving the performance of index creation and query.
  • FIG. 5 it is a flowchart of deleting index data according to an embodiment of the present invention.
  • the processing of the index data is described by taking an example of deleting a specified index data.
  • the index data of the multiple attributes is still stored in the organizational form shown in FIG. 4a. More specifically, as shown in FIG. 6, the segment index 42 includes at least one segment index unit, and the segment index unit 421, For example, the segment index units 421 and 422 are stored in parallel, and each segment index unit includes a plurality of index data, wherein the index data may be sorted according to the attribute information, and the specific process may be: first, segment index 42 The index data is sorted according to the attribute information, and then divided into multiple segment index units. The process of sorting by attribute information can be sorted by comparing different attribute names and attribute values.
  • the specific sorting method is as follows, (Attrl , Valuel), (Attr2, Value2) two attribute information pairs as an example, first determine whether Attrl is greater than Attr2, if not equal, the size of the two attribute information pairs and the order of Attrl and Attr2 are the same, otherwise the two attribute information pairs The size is the same as the size order of Valuel and Value2.
  • the data item of the index data having the same attribute information in the block index 41 is divided into at least one segmentation unit, and the chain connection between the segmentation units forms a segmentation unit chain, such as the segmentation unit chain 411, 412, the segmentation unit chain Among the adjacent two segment units, the previous segment unit stores access information of the next segment unit, such as a key value.
  • At least one segment unit divided by data items of index data having the same attribute information may be stored side by side, and the block index 41 is further Segmentation information of each segmentation unit, such as segmentation information 413, is stored.
  • the method of deleting the specified data may include:
  • Step 501 Determine whether the attribute information of the specified index data is located in an index of the block index.
  • the terminal or the server of the PC receives the processing instruction for the specified index data, in this step, it may first check whether the attribute information in the index 44 of the block index contains the attribute information (Attrn, Valuen) of the specified index data, and if not , the index 45 of the segment index includes the attribute information of the specified index data, that is, the specified index data is stored in the segment index 42, and steps 502-506 are performed; if yes, the block index 41 stores the information. Specify index data and go to steps 507 to 509.
  • Step 502 Delete the specified index data in the data item of the matching index data of the segment index Data item.
  • the matching index data has the same attribute information (Attrn, Valuen) as the specified index data, and may first search the attribute information (Attrn, Valuen) in the index 45 of the segment index to determine the segment index of the matching index data.
  • a unit such as the segmentation index unit 421, deletes the data item Itemn specifying the index data from the data item matching the index data after determining the matching index data in the segment index unit 421.
  • Step 503 Determine whether the number of data items of the matching index data after deleting the data item is 0. If the data item in the matching index data is completely identical to the data item of the specified index data, the data item of the matching index data is zero after the data item is deleted, and step 504 needs to be performed.
  • Step 504 Delete the matching index data in the segment index unit.
  • the data item matching the index data is 0 after the data item is deleted, it can be directly in the segment index unit.
  • the matching index data is deleted in 421. Thereafter, the subsequent steps of the segmentation index unit 421 can also be performed.
  • Step 505 If the index data in the segment index unit is empty, delete the segment index unit. After the matching index data is deleted, it is determined whether the index data exists in the segment index unit 421. If not, the segment index unit 421 can be deleted directly, and the index 45 of the segment index is modified. Further, step 506 can also be performed.
  • Step 506 If the index data does not exist in the segment index, the index of the segment index is deleted from the header index.
  • the segment index unit 42 After deleting the segment index unit 421, it is determined whether the index data exists in the segment index 42. If it does not exist, that is, there is no segment index unit, the segment index 42 may be further deleted and divided. The index 45 of the segment index is deleted from the header index 43 in which it is located.
  • the adjacent two segment index units may also be performed. Merging, that is, merging into a segmented index unit, and further modifying the index of the segmentation index.
  • the threshold a can be set according to the specific situation of the index structure, which is not limited herein.
  • Step 507 In the block index, in the data item matching the index data, the data item of the specified index data is deleted.
  • step 501 If the result of the determination in step 501 is that the attribute information of the specified index data is located in the block index In the index 44, the specified index data is located in the block index 41, and further, the matching index data in the block index 41 that is the same as the attribute information of the specified index data is further determined, and the specified index is deleted in the data item matching the index data.
  • the data item of the data is the result of the determination in step 501.
  • the chain connection between the segment units in the block index 41 is taken as an example.
  • the matching index may be first determined according to the attribute information (Attrn, Valuen) of the specified index data.
  • the segmentation unit chain 411 in which the data is located determines the segmentation unit in which the data item to be deleted is located, assuming the segmentation unit 411a, and then deletes the data item Itemn of the specified index data in the determined segmentation unit 411a. Step 508 is further performed.
  • the segmentation information to which the matching index data belongs may be determined according to the attribute information of the specified index data.
  • the segment information 413 further determines, in the segmentation unit corresponding to the determined segmentation information 413, the segmentation unit 413a in which the data item to be deleted is located, and then deletes the data item of the specified index data in the determined segmentation unit 413a. And update the segmentation information 413.
  • Step 508 Determine whether the segment unit in which the data item of the index data is located in the block index is empty.
  • step 509 is performed.
  • Step 509 The empty segmentation unit is deleted from the segmentation unit chain in which it is located.
  • the segmentation unit chain can be re-established, and only the access information in the previous segmentation unit of the segmentation unit 411a in the original segmentation unit chain 411 needs to be modified to be the segmentation unit 411a.
  • the access information of the next segment unit is sufficient.
  • segmentation unit in the block index 41 if the segmentation unit in the block index 41 is stored in parallel, if the segmentation unit 413a in which the data item is located after deleting the data item Itemn is empty, the segmentation unit 413a may be directly deleted. And update its corresponding segmentation information 413.
  • step 508 in another embodiment, if the number of data items in the segmentation unit in which the data item Itemn is located is not 0, and, in the block index 41, the data item of the matching index data is only divided into A segmentation unit, whether the segmentation unit in the block index 41 is stored in a chain form or in a segmented manner, after deleting the data item Itemn, the data in the segmentation unit in which the data item is located can be determined. Whether the number of items is less than the threshold b (fifth threshold) and not 0 (the threshold b can The setting is performed according to the index structure, which is not specifically limited herein.
  • the matching index data after deleting the data item is migrated from the block index 41 to the segment index 42, and the index 45 of the segment index in the header index 43 is updated and Index 44 of the block index. Further, after the matching index data of the deleted data item is migrated to the segment index 42, the matching index data of the deleted data item may be added to a segment index unit, or a segment index unit may be additionally added. The matching index data of the deleted data item is stored.
  • the segmentation unit in the block index 41 is stored in a chain manner, when the data is deleted.
  • the threshold c the fourth threshold
  • the adjacent two segment units are merged, and the The access information in each of the segmentation units in the segmentation unit chain 411, wherein the threshold value c can be set according to the index structure, which is not limited.
  • the threshold value c can be set according to the index structure, which is not limited.
  • the index structure is the structure shown in FIG. 3, that is, only the block index is included
  • Locating the attribute information in the index of the block index stored in the header index if the attribute information of the specified index data is included, the block index may be located, and then the data item of the index data matching the specified index data in the block index may be Delete it.
  • the index data with different attribute information is stored in an index structure, so that the data storage is more compact, so that when the index data deletion operation is performed, the interaction with the Key-Value system can be reduced, thereby Improved performance for indexing and querying.
  • FIG. 8 is a flowchart of adding index data according to an embodiment of the present invention.
  • the processing of the index data is described by adding a specified index data as an example.
  • the index data of the plurality of attributes is still stored in the organization form shown in FIG. 4a, and the index data in the specific block index 41 and the segment index 42 are still stored in the manner shown in FIGS.
  • the method of increasing the specified index data may include: Step 801: Determine whether attribute information of the specified index data is located in an index of a block index.
  • the attribute information in the index 44 of the block index may first be checked whether the attribute information (Attrn, Valuen) of the specified index data is included. If not, the index 45 of the segment index includes the attribute information of the specified index data. , that is, the specified index data needs to be added to the segment index 42, and steps 802 to 803 are performed; if yes, the index 44 of the block index includes the attribute information of the specified index data, that is, the specified index data needs to be added to In block index 41, steps 804-806 are performed.
  • Step 802 Add a data item of the specified index data to the data item of the matching index data of the segment index.
  • the matching index data has the same attribute information (Attrn, Valuen) as the specified index data, and may first search the attribute information (Attrn, Valuen) in the index 45 of the segment index to determine the segment index of the matching index data.
  • a unit such as the segment index unit 421, adds the data item Itemn specifying the index data to the data item matching the index data after determining the matching index data in the segment index unit 421.
  • the threshold d (second threshold) is used to divide the index data to be stored in the segment index or the block index, and the specific value may be set according to the index structure, which is not limited.
  • step 8031 is performed. If no, the threshold d is not exceeded, then step 8032 is performed.
  • Step 8031 The matching index data after the data item is added is migrated from the segment index to the block index. First, the matching index data is deleted from the segment index unit 421 of the segment index 42, and the index of the segment index is modified. 45, and then add the matching index data of the added data item to the block index.
  • the matching index data of the added data item is first divided into one or more segmentation units according to the number of data items of the preset segmentation unit, if the segmentation unit in the block index 41 Stored in a chain manner, the chain connection between the segmentation units is established, that is, the access information of each segmentation unit is increased, the segmentation unit chain is created for storage, and the index 44 of the block index is modified; if the block index 41 The segmentation units in the storage are stored in parallel, and the segmented cells are stored side by side, and the segmentation information corresponding to the matching index data is added in the block index 41, and the block index is modified. 44.
  • Step 8032 Determine whether the size of the segment index unit where the matching index data is added after the data item is greater than a threshold e, and if yes, divide the segment index unit into two new segment index units.
  • the threshold e (third threshold) is used to determine the size of the segment index unit. If the index data stored in the segment index unit is greater than the threshold e, the segment index unit needs to be divided, otherwise no division is needed.
  • the threshold e can be set according to the index structure, and the specific value is not limited herein.
  • the segment index unit 421 After the data item is added, if the data in the segment index unit 421 where the matching index data is located is excessive and the occupied space is too large and exceeds the threshold e, the segment index unit 421 needs to be re-divided into two new points.
  • the segment index unit replaces the original segment index unit 421, and after the division, the index 45 of the segment index can be further updated.
  • Step 804 In the block index, in the data item matching the index data, the data item of the specified index data is added.
  • step 801 If the result of the determination in step 801 is that the attribute information of the specified index data is located in the index 44 of the block index, it is necessary to add the specified index data to the block index 41, and further determine the specified index in the block index 41.
  • the attribute information of the data is the same matching index data, and the data item of the specified index data is added to the data item matching the index data.
  • the chain connection between the segment units in the block index 41 is taken as an example.
  • the matching index data is first determined according to the attribute information (Attrn, Valuen) of the specified index data.
  • the segmentation unit chain 411 is located, and then determines the segmentation unit in which the data item to be added is located, assuming the segmentation unit 411a, and then adding the data item Itemn of the specified index data to the determined segmentation unit 411a. Go to step 805.
  • the segmentation information to which the matching index data belongs may be determined according to the attribute information of the specified index data.
  • the segment information 413 further determines, in the segmentation unit corresponding to the determined segmentation information 413, the segmentation unit 413a in which the data item to be added is located, and then adds the data item of the specified index data in the determined segmentation unit 413a. And update the segmentation information 413.
  • Step 805 Determine whether the number of data items exceeds the threshold f in the segmentation unit where the data item of the specified index data is located.
  • the data in the segmentation unit 411a is determined. Whether the number of items exceeds the threshold f (the sixth threshold), if yes, step 806 is performed, and if not, the operation of increasing the data item ends.
  • the threshold f is used to determine the size of the segmentation unit. If the data item stored in the segmentation unit is greater than the threshold f, the segmentation unit needs to be partitioned, otherwise no division is required.
  • the threshold value f can be set according to the index structure, and the specific value is not limited herein.
  • Step 806 dividing the segmentation unit in which the data item of the specified index data is located into two new segments.
  • the segmentation unit 411a needs to be divided into two new segmentation units, and then two new segments are
  • the segmentation unit replaces the segmentation unit 411a in the segmentation unit chain 411, and updates the access information of each segmentation unit in the segmentation unit chain 411.
  • the segmentation unit in the block index 41 is stored in a side-by-side manner, the segmentation unit 413a after the data item Itemn is incremented, and if the number of data items exceeds the threshold f, the additional data item is added.
  • the subsequent segmentation unit 413a is divided into two new segmentation units in place of the original segmentation unit 413a, and the two new segmentation units are stored side by side, and the segmentation information 413 in the block index 41 is updated.
  • the index structure is the structure shown in FIG. 3, that is, only the block index is included
  • the processing for adding the data after receiving the processing instruction for the specified index data, Locating the attribute information in the index of the block index stored in the header index, if the attribute information of the specified index data is included, the block index may be located, and then the index data matching the specified index data may be added in the block index.
  • the data item of the index data if the index structure is the structure shown in FIG. 3, that is, only the block index is included
  • the index data with different attribute information is stored in an index structure, so that the data storage is more compact, so that when the index data is added, the interaction with the Key-Value system can be reduced, thereby Improved performance for indexing and querying.
  • the following operations may be performed on any segment unit or index data in the block index: when the data item in the segment unit is empty, the branch is deleted. a segment unit; when the number of data items in the adjacent two segment units is less than a fourth threshold, combining the adjacent two segment units; when the data item of the index data is only divided into one segment unit, if If the number of data items in the segmentation unit is less than the fifth threshold, the index data is migrated from the block index to the segmentation index; when the number of data items in the segmentation unit is greater than the sixth threshold, the segmentation unit is divided into Two new segmentation units.
  • the following operations may be performed on any segment index unit or index data in the segment index: when the index data in the segment index unit is 0, Deleting the segmentation index unit; when the number of index data in the adjacent two segment index units is less than the first threshold, combining the adjacent two segment index units; and data of the index data in the segment index unit When the number of items exceeds the second threshold, the index data is migrated from the segmentation unit in which it is located to the block index; when the size of the segment index unit is greater than the third threshold, the segmentation index unit is divided into two New segmentation index unit.
  • the index structure is stored in a plurality of storage units as shown in FIG. 4a, and the index structure further includes storage information of the plurality of storage units, as shown in FIG.
  • Each storage unit 91 may be stored in parallel.
  • the index structure may further include storage information 92 of each storage unit.
  • the storage information 92 and its lower storage unit 91 form a cascade structure, and the storage information 91 may be each storage unit.
  • the storage information 92 is used to first search the storage information 92 of each storage unit 91 according to the attribute information of the specified index data to determine the storage unit 91 of the specified index data, and then determine the storage unit 91 of the specified index data, and then Data processing is then performed in the determined memory unit 91 in accordance with the method of the previous embodiment.
  • the storage information 92 may be stored side by side, and form a cascade structure with the index node 93 of the upper level thereof, and the index node 93 may include index information of the lower level storage information 92 thereof.
  • the attribute information in the information 92 is stored, and the index structure may also be a large-scale cascade structure formed by a plurality of nodes including the storage unit 91, the storage information 92, and the index node 93.
  • the index structure Based on the index structure, when performing data processing, such as locating, deleting, adding data, etc., after determining the storage unit for which the data processing is performed according to the attribute information of the data, etc., the foregoing embodiment can be executed based on the storage unit. A similar step, data processing, will not be repeated here.
  • FIG. 10 is a schematic structural diagram of an index data processing device according to an embodiment of the present invention.
  • the index data of the at least one attribute information is stored in an index structure, where the index structure includes a header index and at least one index, and an index of the at least one index is stored in the header information.
  • the above index structure can be stored in a database such as a PC or the like, or a database inside or outside the server. Medium. Based on the above index structure, the device may include:
  • the instruction receiving module 1001 is configured to receive a processing instruction for specifying index data, where the processing instruction includes attribute information of the specified index data.
  • the data processing module 1002 is configured to: when determining an index of an index of an index according to the header index in the index structure and the attribute information of the specified index data, locate the block index according to an index of the block index, and The data items matching the index data in the block index are processed.
  • the data processing module 1002 locates the location where the specified index data is located according to the attribute information of the specified index data, and then the data item of the index data in which the specified index data matches. Perform data processing.
  • the data processing may be to locate, delete, and add data items of the specified index data. For the specific operation, refer to the foregoing embodiment corresponding to FIG. 5-8.
  • the index structure may further include at least one segment index, the index of the at least one segment index is stored in the header index, and the index of the at least one segment index is stored in the segment index.
  • the attribute information of the index data, the data processing module may also be used to locate the segment index according to the index of the segment index when the index of the segment index is determined according to the header index in the index structure and the attribute information of the specified index data. And processing the data item matching the index data in the segment index, and the index structure can adopt the form as shown in FIG. 3 or FIG. 4a or FIG.
  • the embodiment of the present invention stores the index data having different attribute information in an index structure, so that the data storage is more compact, so that the above module can acquire or update more data in one Key-Value access, so the number of data can be reduced.
  • the interaction between the data processing and the Key-Value system improves the performance of index creation and query.
  • FIG. 11 is a schematic structural diagram of another index data processing device according to an embodiment of the present invention.
  • the segment index includes at least one segment index unit, and the segment index unit is stored in parallel, each segment index unit includes a plurality of index data; and the block index has the same attribute information.
  • the data item of the index data is divided into at least one segmentation unit.
  • the chained connection between the segmentation units forms a segmentation unit chain, and in the adjacent two segmentation units of the segmentation unit chain, the previous segmentation unit stores access information of the next segmentation unit; or, the segmentation unit Parallel storage, block information of each segment unit is also stored in the block index.
  • the device may also include an instruction receiving module 1101 and a data processing module 1102, where the command is connected
  • the receiving module 1101 is similar to the command receiving module 1001.
  • the data processing module 1102 may further include:
  • the first merging unit 11021 is configured to combine the adjacent two segmentation units when the number of index data in the adjacent two segment index units is less than the first threshold;
  • the first migration unit 11022 is configured to: when the number of data items of the index data in the segment index unit exceeds a second threshold, migrate the index data from the segment index unit in which it is located to the block index;
  • the first dividing unit 11023 is configured to divide the segment index unit into two new segment index units when the size of the segment index unit is greater than a third threshold.
  • a second merging unit 11024 configured to merge the adjacent two segment units when the number of data items in the adjacent two segment units is less than a fourth threshold
  • the second migration unit 11025 is configured to: when the data item of the index data is divided into only one segmentation unit, if the number of data items in the segmentation unit is less than a fifth threshold, the index data is migrated from the block index to the segmentation In the index;
  • the second dividing unit 11026 is configured to divide the segmentation unit into two new segmentation units when the number of data items in the segmentation unit is greater than a sixth threshold.
  • the data processing module 1102 can include any combination of the above units, and is not specifically limited.
  • the data processing module 1102 may further include the following units:
  • a positioning unit configured to locate, in the data item of the matching index data, the data item of the specified index data
  • a first deleting unit configured to delete, in the data item of the matching index data, the data item of the specified index data
  • an adding unit configured to add, in the data item of the matching index data, the data item of the specified index data.
  • a second deleting unit configured to delete the segment indexing unit when the index data in the segment index unit is 0;
  • a third deleting unit configured to delete the segment unit when the data item in the segment unit is empty.
  • the data processing module can include any combination of the above units at the same time.
  • the storage units may be stored side by side, and the index structure further includes storage information of each storage unit to form a cascade structure of large-scale data.
  • the data processing module may perform the storage unit based on the storage unit. The steps similar to the previous embodiment are performed, and the description is not repeated here.
  • the embodiments of the present invention can be implemented by means of software plus a necessary general hardware platform.
  • the technical solution of the embodiments of the present invention may be embodied in the form of a software product in essence or in the form of a software product, which may be stored in a storage medium such as a ROM/RAM.
  • Disks, optical disks, and the like include instructions for causing a computer device (which may be a personal computer, server, or network device, etc.) to perform the methods described in certain embodiments of the present invention or embodiments.
  • the application can be described in the general context of computer-executable instructions executed by a computer, such as a program module.
  • program modules include routines, programs, objects, components, data structures, and the like that perform particular tasks or implement particular abstract data types.
  • the present application can also be practiced in distributed computing environments where tasks are performed by remote processing devices that are connected through a communication network.
  • program modules can be located in both local and remote computer storage media including storage devices.
  • the embodiment of the present invention stores the index data having different attribute information in an index structure, so that the data storage is more compact, so that the above module can acquire or update more data in one Key-Value access, so the number of data can be reduced.
  • the interaction between the data processing and the Key-Value system improves the performance of index creation and query.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种索引数据处理方法,其中,至少一种属性信息的索引数据存储在一个索引结构中,所述索引结构包括一头索引和至少一块索引,所述至少一块索引的索引存储在所述头索引中,所述至少一块索引的索引包括存储在所述块索引中的索引数据的属性信息,所述方法包括:接收对指定索引数据的处理指令,所述处理指令中包括指定索引数据的属性信息;若根据所述索引结构中的头索引和所述指定索引数据的属性信息能确定一块索引的索引,则根据所述块索引的索引定位所述块索引,并对所述块索引中匹配所述索引数据的数据项进行处理。

Description

一种索引数据处理方法及设备
技术领域
本发明涉及数据处理技术领域, 尤其涉及一种索引数据处理方法及设备。 背景技术
Key-Value数据库提供键值对 <Key,Value 々存储, 一般的 Key- Value系统 仅提供基于 Key的操作。 对于根据 Value 中的属性进行检索的应用, 需要在 Key-value 系统的基础上构建适配层来提供结构化的数据检索, 通过适配层, 将不同 Key的 Value包含的数据模型表格化, 即将 Value抽象为属性名称和属 性值的方式(或者称为列和值), 这样不同的 Key对应的 Value可能具有相同 的属性名称, 因此可以为按照属性名称进行检索创造了前提。 为了提高检索的 效率, 可以对属性的值建立索引。
现有技术中可以采用 B+树或前缀哈希树的形式来组织索引数据。 其中, B+树结构如图 la所示, 其中内部节点保存范围的信息, 如内部节点 1中表示小 于等于 0110的属性值在其左边的子树中, 大于 0110的属性值在其右边的子树 中; 叶子节点中保存真实的索引数据。 前缀哈希树构建一个基于字典树的数据 结构, 其原理是利用字符串的公共前缀来组织数据, 以包括 0和 1的字符集为例 的前缀哈希树如图 lb所示,每个节点的前缀为从根到此节点的字符组成的字符 串, 中间节点仅存储了和子节点的关系, 叶子节点存储真实的索引数据。
然而, 无论是 B+树还是前缀哈希树的索引数据组织形式, 对于每一个需 要建立索引的属性信息, 都需要单独组织一颗 B+树或前缀哈希树结构来存储 该属性的索引数据, 同时树中的每一个节点对应 Key- Value数据库中的一个键 值对, 因此, 在当对索引数据进行相关数据操作时, 例如定位或增加或删除某 一索引数据, 都需要先从众多 B+树或前缀哈希树中确定该索引数据的属性信 息所对应的 B+树或前缀哈希树, 然后再从最上级的内部节点或中间节点向下 遍历,直到它到达一个叶子节点。因此,基于现有技术中的索引数据组织形式, 在对索引数据进行处理时, 均需要与 Key-Value系统进行多次交互, 效率较低。 发明内容 本发明实施例提供一种索引数据处理方法及设备,能够在对索引数据进行 处理时, 减少与 Key- Value系统的交互, 提高效率。
为了解决上述技术问题, 本发明实施例的技术方案如下:
一种索引数据处理方法,至少一种属性信息的索引数据存储在一个索引结 构中, 所述索引结构包括一头索引和至少一块索引, 所述至少一块索引的索引 存储在所述头索引中,所述至少一块索引的索引包括存储在所述块索引中的索 引数据的属性信息, 所述方法包括:
接收对指定索引数据的处理指令,所述处理指令中包括指定索引数据的属 性信息;
若根据所述索引结构中的头索引和所述指定索引数据的属性信息能确定 一块索引的索引, 则根据所述块索引的索引定位所述块索引, 并对所述块索引 中匹配所述索引数据的数据项进行处理。
一种索引数据处理设备,至少一种属性信息的索引数据存储在一个索引结 构中, 所述索引结构包括一头索引和至少一块索引, 所述至少一块索引的索引 存储在所述头索引中,所述至少一块索引的索引包括存储在所述块索引中的索 引数据的属性信息, 所述设备包括:
指令接收模块, 用于接收对指定索引数据的处理指令, 所述处理指令中包 括指定索引数据的属性信息;
数据处理模块,用于当根据所述索引结构中的头索引和所述指定索引数据 的属性信息能确定一块索引的索引时, 根据所述块索引的索引定位所述块索 引, 并对所述块索引中匹配所述索引数据的数据项进行处理。
本发明实施例通过将具有不同属性信息的索引数据存储在一个索引结构 中, 使得数据存储更为紧凑, 从而一次 Key- Value访问可以获取或者更新更多 的数据, 所以可以减少数据处理时与 Key-Value系统之间的交互, 从而提高了 索引建立和查询的性能。
附图说明
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施 例或现有技术描述中所需要使用的附图作筒单地介绍,显而易见地, 下面描述 中的附图仅仅是本发明的一些实施例, 对于本领域普通技术人员来讲,在不付 出创造性劳动性的前提下, 还可以根据这些附图获得其他的附图。
图 la是现有技术中索引数据以 B+树结构形式组织的示意图;
图 lb是现有技术中索引数据以前缀哈希树形式组织的示意图;
图 2是本发明实施例一种索引数据处理方法的流程图;
图 3是图 2所示实施例中索引结构的示意图;
图 4a是一种索引结构的示意图;
图 4b是本发明实施例另一种索引数据处理方法的流程图;
图 5是本发明实施例一种删除索引数据的流程图;
图 6是图 5所示实施例中索引结构的示意图;
图 7是本发明另一实施例中块索引的结构示意图;
图 8是本发明实施例一种增加索引数据的流程图;
图 9是本发明另一实施例索引结构的示意图;
图 10是本发明实施例一种索引数据处理设备的结构示意图;
图 11是本发明实施例另一种索引数据处理设备的结构示意图。
具体实施方式
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清 楚、 完整地描述, 显然, 所描述的实施例仅仅是本发明一部分实施例, 而不是 全部的实施例。基于本发明中的实施例, 本领域普通技术人员在没有做出创造 性劳动前提下所获得的所有其他实施例, 都属于本发明保护的范围。
参见图 2, 为本发明实施例一种索引数据处理方法的流程图。
本实施例中, 至少一种属性信息的索引数据存储在一个索引结构中,如图 3所示,该索引结构包括一头索引( head index )31和至少一块索引( block index ) 32, 该块索引 32可以用于存储数据项数量较大的索引数据, 该至少一块索引 的索引 (block index reference ) 33存储在头索引 31中, 至少一块索引的索引 33包括存储在块索引 32中的索引数据的属性信息。
基于上述索引结构, 当指定一索引数据时,对该索引数据的处理方法可以 包括:
步骤 201 , 接收对指定索引数据的处理指令, 该处理指令中包括指定索引 数据的属性信息。 引结构可以存储在 PC机等终端或服务器的内部或外部的数据库中。 其中, 索引数据中可以包括属性名称( attr )、属性值( value )等属性信息, 还包括具体的数据项 ( item ) , 列 ^口 <Attm, Valuen,Itemn>。
步骤 202 , 若根据索引结构中的头索引和指定索引数据的属性信息能确定 一块索引的索引, 则根据块索引的索引定位块索引, 并对块索引中匹配索引数 据的数据项进行处理。
PC机等终端或服务器查找头索引 31中存储的块索引的索引 33中的属性 信息, 如果其中包括指定索引数据的属性信息, 则可定位块索引 32, 进而可 以对该块索引 32中与该指定索引数据匹配的索引数据的数据项进行处理。
在本发明的另一实施例中, 该索引结构可以包括至少一个存储单元,在存 储单元中, 如图 4a所示, 除了可以包括块索引 41夕卜, 还可以包括至少一分段 索引 (segment index ) 42, 用于存储数据项数量较小的索引数据, 头索引 43 中存储有块索引的索引 44和分段索引的索引 ( segment index reference ) 45 , 分段索引的索引 45包括存储在分段索引 42中的索引数据的属性信息。索引数 据根据各自数据项数量的多少, 分配在块索引 41和分段索引 ( segment index ) 42 中存储, 例如当索引数据的数据项数量大于一个预设值时, 可以将该索引 数据存储在块索引 41中, 当小于该预设值时, 可以将索引数据存储在分段索 引 42中, 其中, 块索引 41和分段索引 42可以用于区分存储索引数据, 当然, 在另一实施例中索引数据也可以根据其他信息, 例如索引数据的属性信息等, 分配在块索引和分段索引中存储。本发明实施例中该属性信息可以包括属性名 称及属性值等。 当然, 在另一实施例中, 该索引结构中也可以只包括其中的分 段索引。
需要说明的是,本发明所有实施例中的存储单元可以是一种计算机可读存 储介质,也可以是一种数据库系统,该数据库系统可以是分布式的数据库系统。 上述索引结构可以是存储在上述计算机可读存储介质或数据库系统中的数据 结构。
若根据该索引结构中的头索引和指定索引数据的属性信息能确定一分段 索引的索引, 则根据分段索引的索引定位分段索引, 并对分段索引中匹配索引 数据的数据项进行处理。 具体的, PC机等终端或服务器查找头索引中存储的 分段索引的索引中的属性信息,如果其中包括指定索引数据的属性信息, 则可 定位分段索引,进而可以对该分段索引中与该指定索引数据匹配的索引数据的 数据项进行处理。
在确定指定索引数据的属性信息位于分段索引的索引还是块索引的索引 时, 该确定过程可以是: 先查找块索引的索引 44中是否存在指定索引数据的 属性信息; 若是, 则确定该指定索引数据的属性信息属于块索引的索引 44; 若否, 则确定指定索引数据的属性信息属于分段索引的索引 45。 当然, 也可 以先查找分段索引的索引 45是否存在指定索引数据的属性信息。
本发明实施例通过将具有不同属性信息的索引数据存储在一个索引结构 中, 使得数据存储更为紧凑, 从而一次 Key- Value访问可以获取或者更新更多 的数据, 所以可以减少数据处理时与 Key-Value系统之间的交互, 从而提高了 索引建立和查询的性能。
参见图 4b, 为本发明实施例另一种索引数据处理方法的流程图。
本实施例中, 基于图 4a所示的索引结构, 当指定一索引数据时, 对该索
? I数据的处理方法可以包括:
步骤 401 , 接收对指定索引数据的处理指令, 该处理指令中包括指定索引 数据的属性信息。 引结构可以存储在 PC机等终端或服务器的内部数据库, 或存储在独立于 PC 机等终端或服务器的外部数据库中。
步骤 402, 确定指定索引数据的属性信息位于头索引中分段索引的索引还 是块索引的索引。
本发明实施例中, 索引数据中可以包括属性名称(attr )、 属性值(value ) 等属性信息, 还包括具体的数据项 (item ), 例如 <Attrn,Valuen,Itemn>。 当给 出一指定索引数据时, 首先可以根据其属性信息,确定该指定索引数据的属性 信息是属于头索引 43中的块索引的索引 44还是分段索引的索引 45。
该确定过程可以是: 先查找块索引的索引 44中是否存在指定索引数据的 属性信息; 若是, 则确定该指定索引数据的属性信息属于块索引的索引 44; 若否, 则确定指定索引数据的属性信息属于分段索引的索引 45。 当然, 也可 以先查找分段索引的索引 45是否存在指定索引数据的属性信息。
若该指定索引数据的属性信息属于块索引的索引 44, 则执行步骤 403; 若 该指定索引数据的属性信息属于分段索引的索引 45 , 则执行步骤 404。
步骤 403 , 在块索引中, 根据指定索引数据的数据项, 对匹配索引数据的 数据项进行数据处理。
其中, 块索引中的匹配索引数据具有与指定索引数据相同的属性信息。 该对匹配索引数据中的数据项进行的数据处理可以是指:
在匹配索引数据的数据项中, 定位指定索引数据的数据项; 或者, 在匹配索引数据的数据项中, 删除指定索引数据的数据项; 或者, 在匹配索引数据的数据项中, 增加指定索引数据的数据项。
在经过上述数据处理后,还可以进一步对处理后的块索引中的索引数据进 行删除, 或合并, 或划分, 或迁移等处理。
步骤 404, 在分段索引中, 根据指定索引数据的数据项, 对匹配索引数据 的数据项进行数据处理。
其中, 分段索引中的匹配索引数据具有与指定索引数据相同的属性信息。 该对匹配索引数据中的数据项进行的数据处理可以是指:
在匹配索引数据的数据项中, 定位指定索引数据的数据项; 或者, 在匹配索引数据的数据项中, 删除指定索引数据的数据项; 或者, 在匹配索引数据的数据项中, 增加指定索引数据的数据项。
在经过上述数据处理后,还可以进一步对处理后的分段索引中的索引数据 进行删除, 或合并, 或划分, 或迁移等处理。
在另一实施例中,如果索引结构为图 3所示的结构,即其中只包括块索引, 则在进行索引数据的处理时, 可以在接收到对指定索引数据的处理指令后, 查 找头索引中存储的块索引的索引中的属性信息,如果其中包括指定索引数据的 属性信息, 则可定位块索引, 进而可以对该块索引中与该指定索引数据匹配的 索引数据的数据项进行处理, 例如, 定位、 删除、 增加等。
本发明实施例通过将具有不同属性信息的索引数据存储在一个索引结构 中, 使得数据存储更为紧凑, 从而一次 Key- Value访问可以获取或者更新更多 的数据, 所以可以减少数据处理时与 Key-Value系统之间的交互, 从而提高了 索引建立和查询的性能。
参见图 5 , 为本发明实施例一种删除索引数据的流程图。 本实施例中, 对 索引数据进行的处理以删除一指定的索引数据为例进行说明。
其中, 多个属性的索引数据仍以图 4a所示的组织形式进行存储, 更具体 的, 如图 6所示, 分段索引 42中包括至少一个分段索引单元, 以分段索引单 元 421、 422为例, 分段索引单元 421、 422并列存储, 每个分段索引单元中包 含多个索引数据, 其中的索引数据可以按照属性信息进行排序, 具体过程可以 是, 先将分段索引 42中的索引数据按照属性信息进行排序, 然后再划分为多 个分段索引单元,按属性信息进行排序的过程可以是通过比较不同的属性名称 和属性值进行排序, 具体的排序方法如下, 以 (Attrl,Valuel),(Attr2,Value2)两个 属性信息对为例, 首先判断 Attrl是否大于 Attr2, 如果不相等, 则两个属性信 息对的大小和 Attrl和 Attr2的次序相同,否则两个属性信息对的大小和 Valuel 和 Value2的大小次序相同。
块索引 41中具有相同属性信息的索引数据的数据项划分为至少一个分段 单元, 且分段单元之间链式连接形成分段单元链, 例如分段单元链 411、 412, 分段单元链的相邻两分段单元中,上一分段单元存储有下一分段单元的访问信 息, 如 key值等。
在另一实施例中, 如图 7所示, 该块索引 41中, 对具有相同属性信息的 索引数据的数据项所划分的至少一个分段单元之间还可以并列存储, 块索引 41中还存储有各分段单元的分段信息, 例如分段信息 413。
该删除指定索 ^ I数据的方法可以包括:
步骤 501 , 判断该指定索引数据的属性信息是否位于块索引的索引中。 在 PC机等终端或服务器接收对指定索引数据的处理指令后,在本步骤中, 可以首先查看块索引的索引 44中的属性信息是否包含指定索引数据的属性信 息 (Attrn,Valuen), 如果否, 则说明分段索引的索引 45 中包含该指定索引数据 的属性信息, 也即分段索引 42中存储有该指定索引数据, 执行步骤 502~506; 若是, 则说明块索引 41中存储有该指定索引数据, 执行步骤 507~509。
步骤 502, 在分段索引的匹配索引数据的数据项中, 删除该指定索引数据 的数据项。
该匹配索引数据具有与指定索引数据相同的属性信息 (Attrn,Valuen), 可以 首先在分段索引的索引 45 中查找该属性信息 (Attrn,Valuen), 以确定该匹配索 引数据所在的分段索引单元,例如分段索引单元 421 ,在确定分段索引单元 421 中的匹配索引数据后,将指定索引数据的数据项 Itemn从匹配索引数据的数据 项中删除。
步骤 503 , 判断删除数据项后的匹配索引数据的数据项数量是否为 0。 如果该匹配索引数据中的数据项完全等同于指定索引数据的数据项,则删 除该数据项后, 该匹配索引数据的数据项即为 0个, 此时需要执行步骤 504。
步骤 504, 在分段索引单元中删除该匹配索引数据。
若删除数据项后, 匹配索引数据的数据项为 0, 则可直接在分段索引单元
421中删除该匹配索引数据。 之后, 还可以对分段索引单元 421执行后续步骤
505。
步骤 505 ,若该分段索引单元中的索引数据为空,则删除该分段索引单元。 删除匹配索引数据后, 判断该分段索引单元 421中是否还存在索引数据, 若不存在, 则可以直接将该分段索引单元 421 删除, 并修改分段索引的索引 45。 进一步, 还可以执行步骤 506。
步骤 506, 若分段索引中不存在索引数据, 则将该分段索引的索引从头索 引中删除。
删除上述分段索引单元 421后, 判断该分段索引 42中是否还存在索引数 据, 如果不存在, 也即不存在任何分段索引单元, 则可进一步将该分段索引 42删除, 并将分段索引的索引 45从其所在的头索引 43中删除。
在另一实施例中,在执行上述删除操作后,如果相邻两分段索引单元中索 引数据的数量和小于阈值 a (第一阈值) 时, 还可以将该相邻两分段索引单元 进行合并, 即合并为一个分段索引单元, 并可进一步修改分段索引的索引。 其 中, 阈值 a可以根据索引结构的具体情况设定, 此处不作限定。
步骤 507, 在块索引中, 在匹配索引数据的数据项中, 删除指定索引数据 的数据项。
如果在步骤 501 中判断的结果是该指定索引数据的属性信息位于块索引 的索引 44中, 则说明该指定索引数据位于块索引 41中, 则进一步确定块索引 41 中与该指定索引数据的属性信息相同的匹配索引数据, 在匹配索引数据的 数据项中删除该指定索引数据的数据项。
在本实施例中, 以块索引 41中的分段单元之间链式连接为例进行说明, 如图 6所示, 可以首先根据指定索引数据的属性信息 (Attrn,Valuen), 确定该匹 配索引数据所在的分段单元链 411 ,再确定需要删除的数据项所在的分段单元, 假设为分段单元 411a, 然后在确定的分段单元 411a中删除该指定索引数据的 数据项 Itemn。 进一步执行步骤 508。
在另一实施例中,如果块索引 41中的分段单元为并列存储,如图 7所示, 则可以首先根据指定索引数据的属性信息确定该匹配索引数据所属的分段信 息, 假设为分段信息 413 , 进一步在确定的分段信息 413对应的分段单元中, 确定该需要删除的数据项所在的分段单元 413a, 然后在确定的分段单元 413a 中删除该指定索引数据的数据项, 并更新该分段信息 413。
步骤 508, 判断块索引中, 指定索引数据的数据项所在的分段单元是否为 空。
在上述分段单元 411a中删除数据项 Itemn后, 如果该分段单元 411a中不 存在其它数据项, 则执行步骤 509。
步骤 509, 将该空的分段单元从其所在的分段单元链中删除。
删除该分段单元 411a后, 还可以重新建立分段单元链, 只需要将原分段 单元链 411 中分段单元 411a的上一分段单元中的访问信息修改为对该分段单 元 411a的下一分段单元的访问信息即可。
在另一实施例中, 如果块索引 41中的分段单元为并列存储, 则若删除数 据项 Itemn后, 该数据项所在的分段单元 413a为空, 则可以直接将该分段单 元 413a删除, 并更新其对应的分段信息 413。
在步骤 508后, 在另一实施例中, 如果该数据项 Itemn所在的分段单元中 的数据项数量不为 0, 而且, 如果在块索引 41 中, 该匹配索引数据的数据项 只划分为一个分段单元, 则无论块索引 41中的分段单元以链式形式存储, 还 是以分段式并列存储, 在删除数据项 Itemn后, 均可以判断该数据项所在的分 段单元中的数据项数量是否小于阈值 b (第五阈值)且不为 0 (该阈值 b可以 根据索引结构进行设置, 此处不作具体限定), 若是, 则将删除数据项后的匹 配索引数据从块索引 41迁移至分段索引 42中, 并更新头索引 43中分段索引 的索引 45以及块索引的索引 44。 进一步的, 在该删除数据项的匹配索引数据 迁移至分段索引 42后, 可以将该删除数据项的匹配索引数据增加至某一分段 索引单元中,也可以另外增设一分段索引单元来存储该删除数据项的匹配索引 数据。
在步骤 508后, 在另一实施例中, 如果该数据项 Itemn所在的分段单元中 的数据项数量不为 0, 对于块索引 41 中分段单元以链式存储的方式, 当删除 该数据项后, 如果其所在的分段单元链 411中,相邻两分段单元中数据项的数 量和小于阈值 c (第四阈值), 则将相邻两分段单元合并, 还可以进一步更新 该分段单元链 411中各分段单元中的访问信息, 其中, 该阈值 c可以根据该索 引结构进行设置, 具体不作限定。对于块索引 41中分段单元并列存储的方式, 当删除该数据项后, 如果相邻两分段单元的数据项的数量和小于阈值 c, 则将 该相邻两分段单元进行合并, 并更新对应的分段信息 413 , 其中, 阈值 c可以 根据该索引结构进行设置, 具体不作限定。
在另一实施例中,如果索引结构为图 3所示的结构,即其中只包括块索引, 则在进行索 ^ I数据的删除操作时, 可以在接收到对指定索引数据的处理指令 后, 查找头索引中存储的块索引的索引中的属性信息,如果其中包括指定索引 数据的属性信息, 则可定位块索引, 进而可以对该块索引中与该指定索引数据 匹配的索引数据的数据项进行删除。
本发明实施例通过将具有不同属性信息的索引数据存储在一个索引结构 中, 使得数据存储更为紧凑, 从而在执行索引数据的删除操作时, 可以减少与 Key-Value系统之间的交互, 从而提高了索引建立和查询的性能。
参见图 8, 为本发明实施例一种增加索引数据的流程图。
本实施例中,对索引数据进行的处理以增加一指定的索引数据为例进行说 明。
其中, 多个属性的索引数据仍以图 4a所示的组织形式进行存储, 具体的 块索引 41和分段索引 42中的索引数据仍以图 6、 7所示的方式存储。
该增加指定索引数据的方法可以包括: 步骤 801 , 判断该指定索引数据的属性信息是否位于块索引的索引中。 本步骤中, 可以首先查看块索引的索引 44中的属性信息是否包含指定索 引数据的属性信息 (Attrn,Valuen), 如果否, 则说明分段索引的索引 45 中包含 该指定索引数据的属性信息,也即该指定索引数据需要增加至分段索引 42中, 执行步骤 802~803; 若是, 则说明块索引的索引 44中包含该指定索引数据的 属性信息, 也即该指定索引数据需要增加至块索引 41中, 执行步骤 804~806。
步骤 802, 在分段索引的匹配索引数据的数据项中, 增加该指定索引数据 的数据项。
该匹配索引数据具有与指定索引数据相同的属性信息 (Attrn,Valuen), 可以 首先在分段索引的索引 45 中查找该属性信息 (Attrn,Valuen), 以确定该匹配索 引数据所在的分段索引单元,例如分段索引单元 421 ,在确定分段索引单元 421 中的匹配索引数据后,将指定索引数据的数据项 Itemn增加至匹配索引数据的 数据项中。
803 , 判断增加数据项后的匹配索引数据的数据项的数量是否大于阈值 d。 其中, 阈值 d (第二阈值)用于划分索引数据适于存储于分段索引还是块 索引, 其具体数值可以根据索引结构进行设置, 具体不作限定。
若增加数据项后的匹配索引数据中, 数据项数量超过该阈值 d, 则执行步 骤 8031 , 若否, 不超过该阈值 d, 则执行步骤 8032。
步骤 8031 , 将增加数据项后的匹配索引数据从分段索引迁移至块索引中, 首先, 将该匹配索引数据从分段索引 42的分段索引单元 421中删除, 修 改该分段索引的索引 45 , 然后将该增加数据项的匹配索引数据增加至块索引 中。
在增加至块索引 41时, 首先根据预设的分段单元的数据项数量, 将该增 加数据项的匹配索引数据划分为一个或多个分段单元, 如果该块索引 41中的 分段单元以链式方式存储, 则建立该分段单元间的链式连接, 即增加各分段单 元的访问信息, 创建分段单元链进行存储, 并修改该块索引的索引 44; 如果 该块索引 41中的分段单元以并列方式存储,则将划分后的分段单元并列存储, 并在块索引 41中增设该匹配索引数据对应的分段信息, 并修改该块索引的索 引 44。
步骤 8032, 判断增加数据项后的匹配索引数据所在的分段索引单元的大 小是否大于阈值 e,若是,则将该分段索引单元划分为两个新的分段索引单元。
该阈值 e (第三阈值)用于确定分段索引单元的大小, 如果存储在该分段 索引单元中的索引数据大于该阈值 e, 则需要将该分段索引单元进行划分, 否 则无需划分。该阈值 e可以根据索引结构进行设置,具体数值,此处不作限定。
在增加数据项后,如果该匹配索引数据所在的分段索引单元 421内的数据 过多, 占据空间过大, 超过阈值 e, 则需要将该分段索引单元 421重新划分为 两个新的分段索引单元来替代原分段索引单元 421 , 划分后, 还可进一步更新 该分段索引的索引 45。
步骤 804, 在块索引中, 在匹配索引数据的数据项中, 增加指定索引数据 的数据项。
如果在步骤 801 中判断的结果是该指定索引数据的属性信息位于块索引 的索引 44中, 则说明需要将该指定索引数据增加至块索引 41中, 则进一步确 定块索引 41中与该指定索引数据的属性信息相同的匹配索引数据, 在匹配索 引数据的数据项中增加该指定索引数据的数据项。
在本实施例中, 以块索引 41中的分段单元之间链式连接为例进行说明, 如图 6所示, 首先根据指定索引数据的属性信息 (Attrn,Valuen), 确定该匹配索 引数据所在的分段单元链 411 , 再确定需要增加的数据项所在的分段单元, 假 设为分段单元 411a, 然后在确定的分段单元 411a中增加该指定索引数据的数 据项 Itemn。 进一步执行步骤 805。
在另一实施例中,如果块索引 41中的分段单元为并列存储,如图 7所示, 则可以首先根据指定索引数据的属性信息确定该匹配索引数据所属的分段信 息, 假设为分段信息 413 , 进一步在确定的分段信息 413对应的分段单元中, 确定该需要增加的数据项所在的分段单元 413a, 然后在确定的分段单元 413a 中增加该指定索引数据的数据项, 并更新该分段信息 413。
步骤 805 , 判断指定索引数据的数据项所在的分段单元中, 数据项的数量 是否超过阈值 f。
在分段单元 411a中增加数据项 Itemn后, 判断该分段单元 411a中的数据 项数量是否超过阈值 f (第六阈值), 若超过, 则执行步骤 806, 若不超过, 则 增加数据项操作结束。
该阈值 f用于确定分段单元的大小,如果存储在该分段单元中的数据项大 于该阈值 f, 则需要将该分段单元进行划分, 否则无需划分。 该阈值 f可以根 据索引结构进行设置, 具体数值, 此处不作限定。
步骤 806, 将指定索引数据的数据项所在的分段单元划分为两个新的分段 在本步骤中, 需要将分段单元 411a划分为两个新的分段单元, 然后将两 个新的分段单元替换分段单元链 411 中的分段单元 411a, 并更新该分段单元 链 411中各分段单元的访问信息。
在另一实施例中, 如果该块索引 41中的分段单元以并列方式存储, 则增 加数据项 Itemn后的分段单元 413a, 如果其数据项的数量超过阈值 f, 则将该 增加数据项后的分段单元 413a 划分为两个新的分段单元来替代原分段单元 413a, 两个新的分段单元并列存储, 并更新块索引 41中的分段信息 413。
在另一实施例中,如果索引结构为图 3所示的结构,即其中只包括块索引, 则在进行索 ^ I数据的增加操作时, 可以在接收到对指定索引数据的处理指令 后, 查找头索引中存储的块索引的索引中的属性信息,如果其中包括指定索引 数据的属性信息, 则可定位块索引, 进而可以在该块索引中与该指定索引数据 匹配的索引数据里增加指定索引数据的数据项。
本发明实施例通过将具有不同属性信息的索引数据存储在一个索引结构 中, 使得数据存储更为紧凑, 从而在执行索引数据的增加操作时, 可以减少与 Key-Value系统之间的交互, 从而提高了索引建立和查询的性能。
本发明实施例中, 基于上述索引结构, 在进行数据处理后, 对于块索引中 的任何分段单元或索引数据均可执行以下操作: 当分段单元中的数据项为空 时, 删除该分段单元; 当相邻两分段单元中数据项的数量和小于第四阈值时, 将所述相邻两分段单元合并; 当索引数据的数据项只划分为一个分段单元时, 若该分段单元中的数据项数量小于第五阈值,则将该索引数据从块索引迁移至 分段索引中; 当分段单元中数据项的数量大于第六阈值时,将该分段单元划分 为两个新的分段单元。 本发明实施例中, 基于上述索引结构, 在进行数据处理后, 对于分段索引 中的任何分段索引单元或索引数据均可执行以下操作:当分段索引单元中的索 引数据为 0时,删除该分段索引单元; 当相邻两分段索引单元中索引数据的数 量和小于第一阈值时,将所述相邻两分段索引单元进行合并; 当分段索引单元 中索引数据的数据项数量超过第二阈值时,将该索引数据从其所在的分段索 ) 单元迁移至块索引中; 当分段索引单元的大小大于第三阈值时,将该分段索引 单元划分为两个新的分段索引单元。
在本发明的另一实施例中, 该索引结构被存储在多个如图 4a所示的存储 单元中, 该索引结构中还包括多个存储单元的存储信息, 具体的如图 9所示, 各存储单元 91之间可以并列存储, 该索引结构中还可以包括各存储单元的存 储信息 92, 存储信息 92与其下级存储单元 91之间形成级联结构, 该存储信 息 91可以是各存储单元的存储数据的属性信息等,该存储信息 92用于在确定 指定索引数据位置时, 首先根据指定索引数据的属性信息查找各存储单元 91 的存储信息 92, 以确定指定索引数据所在存储单元 91 , 然后再在确定的存储 单元 91中根据前述实施例的方法进行数据处理。 而且, 当存储信息 92为多个 时,存储信息 92之间可以并列存储, 并与其上一级的索引节点 93形成级联结 构, 该索引节点 93中可以包括其下级存储信息 92的索引信息, 例如存储信息 92中的属性信息等, 该索引结构还可以是由包括存储单元 91、 存储信息 92、 索引节点 93在内的多级节点形成的大规模级联结构。 基于该索引结构, 在进 行数据处理时, 如定位、 删除、 增加数据等, 在根据该数据的属性信息等确定 该数据处理所针对的存储单元后,即可基于该存储单元执行与前述实施例类似 的步骤, 进行数据处理, 此处不再重复描述。
以上是对本发明方法实施例的描述, 下面对实现上述方法的设备进行介 绍。
参见图 10, 为本发明实施例一种索引数据处理设备的结构示意图。
本实施例中, 至少一种属性信息的索引数据存储在一个索引结构中, 该索 引结构包括一头索引和至少一块索引,所述至少一块索引的索引存储在所述头 信息。 上述索引结构可以存储在 PC机等终端或服务器的内部或外部的数据库 中。 基于上述索引结构, 该设备可以包括:
指令接收模块 1001 , 用于接收对指定索引数据的处理指令, 所述处理指 令中包括指定索引数据的属性信息。
数据处理模块 1002, 用于当根据所述索引结构中的头索引和所述指定索 引数据的属性信息能确定一块索引的索引时,根据所述块索引的索引定位所述 块索引, 并对所述块索引中匹配所述索引数据的数据项进行处理。
指令接收模块 1001接收到指定索引数据的指令后, 由数据处理模块 1002 根据指定索引数据的属性信息,对指定索引数据所在的位置进行定位, 然后对 其中与指定索引数据匹配的索引数据的数据项进行数据处理,。 该数据处理可 以是定位、 删除、 增加指定索引数据的数据项等, 其具体操作请参见前述图 5-8对应的实施例。
在另一实施例中, 该索引结构也还可以包括至少一分段索引, 该至少一分 段索引的索引存储在所述头索引中,至少一分段索引的索引包括存储在分段索 引中的索引数据的属性信息,数据处理模块,还可以用于当根据索引结构中的 头索引和指定索引数据的属性信息能确定一分段索引的索引时,根据分段索引 的索引定位分段索引, 并对分段索引中匹配索引数据的数据项进行处理, 该索 引结构可以才采用如图 3或图 4a或图 9中的形式等。
本发明实施例通过将具有不同属性信息的索引数据存储在一个索引结构 中, 使得数据存储更为紧凑, 从而通过上述模块可以在一次 Key- Value访问中 获取或者更新更多的数据, 所以可以减少数据处理时与 Key- Value系统之间的 交互, 从而提高了索引建立和查询的性能。
参见图 11 , 为本发明实施例另一种索引数据处理设备的结构示意图。 本发明实施例的索引结构中, 分段索引中包括至少一个分段索引单元, 分 段索引单元并列存储, 每个分段索引单元中包含多个索引数据; 块索引中, 具 有相同属性信息的索引数据的数据项划分为至少一个分段单元。分段单元之间 链式连接形成分段单元链, 该分段单元链的相邻两分段单元中, 上一分段单元 存储有下一分段单元的访问信息; 或者, 分段单元之间并列存储, 块索引中还 存储有各分段单元的分段信息。
该设备也可以包括指令接收模块 1101和数据处理模块 1102, 其中指令接 收模块 1101与指令接收模块 1001类似。
本实施例中, 该数据处理模块 1102可以进一步包括:
第一合并单元 11021 , 用于当相邻两分段索引单元中索引数据的数量和小 于第一阈值时, 将所述相邻两分段索 ^ I单元进行合并;
第一迁移单元 11022, 用于当分段索引单元中索引数据的数据项数量超过 第二阈值时, 将该索引数据从其所在的分段索引单元迁移至块索引中;
第一划分单元 11023 , 用于当分段索引单元的大小大于第三阈值时, 将该 分段索引单元划分为两个新的分段索引单元。
第二合并单元 11024, 用于当相邻两分段单元中数据项的数量和小于第四 阈值时, 将所述相邻两分段单元合并;
第二迁移单元 11025,用于当索引数据的数据项只划分为一个分段单元时, 若该分段单元中的数据项数量小于第五阈值,则将该索引数据从块索引迁移至 分段索引中;
第二划分单元 11026, 用于当分段单元中数据项的数量大于第六阈值时, 将该分段单元划分为两个新的分段单元。
在该实施例中, 该数据处理模块 1102可以包括以上单元的任意组合, 不 作具体限定。
在本发明的另一实施例中, 该数据处理模块 1102还可以进一步包括如下 单元:
定位单元, 用于在所述匹配索引数据的数据项中, 定位所述指定索引数据 的数据项;
第一删除单元, 用于在所述匹配索引数据的数据项中,删除所述指定索引 数据的数据项;
增加单元, 用于在所述匹配索引数据的数据项中,增加所述指定索引数据 的数据项。
第二删除单元, 用于当分段索引单元中的索引数据为 0时,删除该分段索 引单元;
第三删除单元, 用于当分段单元中的数据项为空时, 删除该分段单元。 该数据处理模块可以同时包括上述单元的任意组合。 在另一实施例中, 若该索引结构中存储有多个存储单元, 则存储单元之间 可以并列存储, 该索引结构中还包括各存储单元的存储信息, 以形成大规模数 据的级联结构。 在该索引结构下, 数据处理模块在进行数据处理时, 如定位、 删除、增加数据等,在根据该数据的属性信息等确定该数据处理所针对的存储 单元后, 即可基于该存储单元执行与前述实施例类似的步骤, 进行数据处理, 此处不再重复描述。
通过上述本发明实施例的实施方式的描述可知,本领域的技术人员可以清 楚地了解到本发明实施例可借助软件加必需的通用硬件平台的方式来实现。基 于这样的理解,本发明实施例的技术方案本质上或者说对现有技术做出贡献的 部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在存储介质 中,如 ROM/RAM、磁碟、光盘等, 包括若干指令用以使得一台计算机设备(可 以是个人计算机, 服务器, 或者网络设备等)执行本发明实施例或者实施例的 某些部分所述的方法。
本申请可以在由计算机执行的计算机可执行指令的一般上下文中描述,例 如程序模块。一般地,程序模块包括执行特定任务或实现特定抽象数据类型的 例程、 程序、 对象、 组件、 数据结构等等。 也可以在分布式计算环境中实践本 申请,在这些分布式计算环境中, 由通过通信网络而被连接的远程处理设备来 执行任务。在分布式计算环境中,程序模块可以位于包括存储设备在内的本地 和远程计算机存储介质中。
本发明实施例通过将具有不同属性信息的索引数据存储在一个索引结构 中, 使得数据存储更为紧凑, 从而通过上述模块可以在一次 Key- Value访问中 获取或者更新更多的数据, 所以可以减少数据处理时与 Key- Value系统之间的 交互, 从而提高了索引建立和查询的性能。
以上设备中各模块及单元的具体实现过程请参见前述方法实施例的相应 描述, 此处不再赘述。
以上所述的本发明实施方式, 并不构成对本发明保护范围的限定。任何在 本发明的精神和原则之内所作的修改、等同替换和改进等, 均应包含在本发明 的权利要求保护范围之内。

Claims

权 利 要 求
1. 一种索引数据处理方法, 其特征在于, 至少一种属性信息的索引数据 存储在一个索引结构中, 所述索引结构包括一头索引和至少一块索引, 所述至 少一块索引的索引存储在所述头索引中,所述至少一块索引的索引包括存储在 所述块索引中的索引数据的属性信息, 所述方法包括:
接收对指定索引数据的处理指令,所述处理指令中包括指定索引数据的属 性信息;
若根据所述索引结构中的头索引和所述指定索引数据的属性信息能确定 一块索引的索引, 则根据所述块索引的索引定位所述块索引, 并对所述块索引 中匹配所述索引数据的数据项进行处理。
2.根据权利要求 1 所述的方法, 其特征在于, 所述索引结构还包括至少 一分段索引, 所述至少一分段索引的索引存储在所述头索引中, 所述至少一分 若根据所述索引结构中的头索引和所述指定索引数据的属性信息能确定 一分段索引的索引, 则根据所述分段索引的索引定位所述分段索引, 并对所述 分段索引中匹配所述索引数据的数据项进行处理。
3.根据权利要求 1或 2所述的方法, 其特征在于, 所述索引结构被存储 在多个存储单元中, 所述索引结构中还包括所述多个存储单元的存储信息。
4、 根据权利要求 2或 3所述的方法, 其特征在于, 所述分段索引中包括 至少一个分段索引单元, 所述分段索引单元并列存储,每个分段索引单元中包 含多个索引数据。
5、 根据权利要求 4所述的方法, 其特征在于, 所述方法还包括: 当相邻两分段索引单元中索引数据的数量和小于第一阈值时,将所述相邻 两分段索引单元进行合并; 或者,
当分段索引单元中索引数据的数据项数量超过第二阈值时,将该索引数据 从其所在的分段索引单元迁移至块索引中; 或者,
当分段索引单元的大小大于第三阈值时,将该分段索引单元划分为两个新 的分段索引单元。
6、 根据权利要求 1至 5中任意一项所述的方法, 其特征在于, 所述块索 引中, 具有相同属性信息的索引数据的数据项划分为至少一个分段单元。
7、 根据权利要求 6所述的方法, 其特征在于,
所述分段单元之间链式连接形成分段单元链,该分段单元链的相邻两分段 单元中, 上一分段单元存储有下一分段单元的访问信息; 或者,
所述分段单元之间并列存储,所述块索引中还存储有各分段单元的分段信 息。
8、 根据权利要求 6所述的方法, 其特征在于, 所述方法还包括: 当相邻两分段单元中数据项的数量和小于第四阈值时,将所述相邻两分段 单元合并; 或者,
当索引数据的数据项只划分为一个分段单元时,若该分段单元中的数据项 数量小于第五阈值, 则将该索引数据从块索引迁移至分段索引中; 或者, 当分段单元中数据项的数量大于第六阈值时,将该分段单元划分为两个新 的分段单元。
9、 一种索引数据处理设备, 其特征在于, 至少一种属性信息的索引数据 存储在一个索引结构中, 所述索引结构包括一头索引和至少一块索引, 所述至 少一块索引的索引存储在所述头索引中,所述至少一块索引的索引包括存储在 所述块索引中的索引数据的属性信息, 所述设备包括:
指令接收模块, 用于接收对指定索引数据的处理指令, 所述处理指令中包 括指定索引数据的属性信息;
数据处理模块,用于当根据所述索引结构中的头索引和所述指定索引数据 的属性信息能确定一块索引的索引时, 根据所述块索引的索引定位所述块索 引, 并对所述块索引中匹配所述索引数据的数据项进行处理。
10、 根据权利要求 9所述的设备, 其特征在于, 所述索引结构还包括至少 一分段索引, 所述至少一分段索引的索引存储在所述头索引中, 所述至少一分 所述数据处理模块,还用于当根据所述索引结构中的头索引和所述指定索 引数据的属性信息能确定一分段索引的索引时,根据所述分段索引的索引定位 所述分段索引, 并对所述分段索引中匹配所述索引数据的数据项进行处理。
11、 根据权利要求 9或 10所述的设备, 其特征在于, 所述索引结构被存 储在多个存储单元中, 所述索引结构中还包括所述多个存储单元的存储信息。
12、 根据权利要求 10所述的设备, 其特征在于, 所述分段索引中包括至 少一个分段索引单元, 所述分段索引单元并列存储,每个分段索引单元中包含 多个索引数据。
13、 根据权利要求 12所述的设备, 其特征在于, 所述数据处理模块包括 如下至少一项:
第一合并单元,用于当相邻两分段索引单元中索引数据的数量和小于第一 阈值时, 将所述相邻两分段索引单元进行合并;
第一迁移单元,用于当分段索引单元中索引数据的数据项数量超过第二阈 值时, 将该索引数据从其所在的分段索引单元迁移至块索引中;
第一划分单元, 用于当分段索引单元的大小大于第三阈值时,将该分段索 引单元划分为两个新的分段索引单元。
14、 根据权利要求 9至 13中任意一项所述的设备, 其特征在于, 所述块 索引中, 具有相同属性信息的索引数据的数据项划分为至少一个分段单元。
15、 根据权利要求 14所述的设备, 其特征在于,
所述分段单元之间链式连接形成分段单元链,该分段单元链的相邻两分段 单元中, 上一分段单元存储有下一分段单元的访问信息; 或者,
所述分段单元之间并列存储,所述块索引中还存储有各分段单元的分段信 息。
16、 根据权利要求 14或 15所述的设备, 其特征在于, 所述数据处理模块 包括如下至少一项:
第二合并单元, 用于当相邻两分段单元中数据项的数量和小于第四阈值 时, 将所述相邻两分段单元合并;
第二迁移单元, 用于当索引数据的数据项只划分为一个分段单元时, 若该 分段单元中的数据项数量小于第五阈值,则将该索引数据从块索引迁移至分段 索引中;
第二划分单元, 用于当分段单元中数据项的数量大于第六阈值时,将该分 段单元划分为两个新的分段单元。
PCT/CN2011/084609 2011-12-26 2011-12-26 一种索引数据处理方法及设备 WO2013097065A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2011/084609 WO2013097065A1 (zh) 2011-12-26 2011-12-26 一种索引数据处理方法及设备
CN201180003412.5A CN102725754B (zh) 2011-12-26 2011-12-26 一种索引数据处理方法及设备

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2011/084609 WO2013097065A1 (zh) 2011-12-26 2011-12-26 一种索引数据处理方法及设备

Publications (1)

Publication Number Publication Date
WO2013097065A1 true WO2013097065A1 (zh) 2013-07-04

Family

ID=46950465

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2011/084609 WO2013097065A1 (zh) 2011-12-26 2011-12-26 一种索引数据处理方法及设备

Country Status (2)

Country Link
CN (1) CN102725754B (zh)
WO (1) WO2013097065A1 (zh)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103970739B (zh) * 2013-01-24 2017-04-26 中兴通讯股份有限公司 一种存储信息的处理方法及装置
CN104346347A (zh) * 2013-07-25 2015-02-11 深圳市腾讯计算机系统有限公司 数据存储方法、装置、服务器及系统
CN107688567B (zh) * 2016-08-03 2021-02-09 腾讯科技(深圳)有限公司 一种索引存储方法及相关装置
CN106570093B (zh) * 2016-10-24 2020-03-27 南京中新赛克科技有限责任公司 一种基于独立元数据组织结构的海量数据迁移方法和装置

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4841433A (en) * 1986-11-26 1989-06-20 American Telephone And Telegraph Company, At&T Bell Laboratories Method and apparatus for accessing data from data attribute tables
CN101055589A (zh) * 2007-05-30 2007-10-17 北京航空航天大学 嵌入式数据库的存储管理方法
CN101853283A (zh) * 2010-05-21 2010-10-06 南京邮电大学 面向多维数据的语义索引对等网络的构建方法

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4841433A (en) * 1986-11-26 1989-06-20 American Telephone And Telegraph Company, At&T Bell Laboratories Method and apparatus for accessing data from data attribute tables
CN101055589A (zh) * 2007-05-30 2007-10-17 北京航空航天大学 嵌入式数据库的存储管理方法
CN101853283A (zh) * 2010-05-21 2010-10-06 南京邮电大学 面向多维数据的语义索引对等网络的构建方法

Also Published As

Publication number Publication date
CN102725754A (zh) 2012-10-10
CN102725754B (zh) 2014-08-13

Similar Documents

Publication Publication Date Title
CN109299102B (zh) 一种基于Elastcisearch的HBase二级索引系统及方法
CN102193917B (zh) 一种数据处理和查询方法和装置
US9870382B2 (en) Data encoding and corresponding data structure
US9934289B2 (en) Fuzzy full text search
US11386063B2 (en) Data edge platform for improved storage and analytics
CN106874425B (zh) 基于Storm的实时关键词近似搜索算法
WO2018184305A1 (zh) 基于社交网络的群组查找方法、装置、服务器和存储介质
US9953058B1 (en) Systems and methods for searching large data sets
US20220005546A1 (en) Non-redundant gene set clustering method and system, and electronic device
CN111666468A (zh) 一种基于团簇属性在社交网络中搜索个性化影响力社区的方法
WO2013097065A1 (zh) 一种索引数据处理方法及设备
JP2019087249A (ja) 自動検索辞書およびユーザインターフェイス
US9984108B2 (en) Database joins using uncertain criteria
CN111666302A (zh) 用户排名的查询方法、装置、设备及存储介质
CN113535803B (zh) 一种基于关键字索引的区块链高效检索及可靠性验证方法
US11126622B1 (en) Methods and apparatus for efficiently scaling result caching
CN114048219A (zh) 图数据库更新方法及装置
WO2021207831A1 (en) Method and systems for indexing databases on a contextual basis
CN111538804A (zh) 一种基于HBase的图数据处理方法和设备
CN114546731B (zh) 一种工作流数据恢复方法及数据恢复系统
CN113946580B (zh) 一种海量异构日志数据检索中间件
US11797485B2 (en) Frameworks for data source representation and compression
CN115801020B (zh) 确定有限状态自动机压缩方法、匹配方法、设备及介质
CN117435560A (zh) 数据查询方法、装置、电子设备及可读存储介质
CN116361295A (zh) 一种针对高读写争用工作负载的可验证索引构建、更新与验证方法

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 201180003412.5

Country of ref document: CN

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11878435

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 11878435

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 11878435

Country of ref document: EP

Kind code of ref document: A1