WO2013097065A1 - 一种索引数据处理方法及设备 - Google Patents
一种索引数据处理方法及设备 Download PDFInfo
- Publication number
- WO2013097065A1 WO2013097065A1 PCT/CN2011/084609 CN2011084609W WO2013097065A1 WO 2013097065 A1 WO2013097065 A1 WO 2013097065A1 CN 2011084609 W CN2011084609 W CN 2011084609W WO 2013097065 A1 WO2013097065 A1 WO 2013097065A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- index
- data
- segment
- unit
- segmentation
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
- G06F16/2246—Trees, e.g. B+trees
Definitions
- the present invention relates to the field of data processing technologies, and in particular, to an index data processing method and device. Background technique
- the Key-Value database provides key-value pairs ⁇ Key, Value ⁇ storage, and the general Key-Value system only provides Key-based operations.
- an adaptation layer needs to be built on the basis of the Key-value system to provide structured data retrieval.
- the data models contained in the Values of different Keys are tabulated.
- Value is abstracted into the way the attribute name and attribute value (or column and value), so that the Value corresponding to different Keys may have the same attribute name, so it is possible to create a premise for searching by attribute name.
- the value of the attribute can be indexed.
- Index data can be organized in the form of a B+ tree or a prefix hash tree in the prior art.
- the B+ tree structure is shown in FIG. 1a, where the internal node stores the information of the range, for example, the internal node 1 indicates that the attribute value less than or equal to 0110 is in the subtree to the left, and the subtree with the attribute value greater than 0110 on the right side thereof. Medium; saves the actual index data in the leaf node.
- the prefix hash tree constructs a data structure based on the dictionary tree. The principle is to use the common prefix of the string to organize the data.
- the prefix hash tree including the character set of 0 and 1 is shown in Figure lb.
- the prefix is a string consisting of characters from the root to this node, the intermediate node stores only the relationship with the child nodes, and the leaf node stores the real index data.
- a B+ tree or a prefix hash tree structure needs to be separately organized to store the index data of the attribute, and Each node in the tree corresponds to a key-value pair in the Key-Value database. Therefore, when performing related data operations on the index data, such as locating or adding or deleting an index data, it is necessary to first obtain from a plurality of B+ trees or The B+ tree or the prefix hash tree corresponding to the attribute information of the index data is determined in the prefix hash tree, and then traversed from the upper internal node or the intermediate node until it reaches a leaf node.
- Embodiments of the present invention provide an index data processing method and device, which can reduce interaction with a Key-Value system and improve efficiency when processing index data.
- An index data processing method in which at least one index data of attribute information is stored in an index structure, the index structure includes a header index and at least one index, and an index of the at least one index is stored in the header index.
- the index of the at least one index includes attribute information of the index data stored in the block index, and the method includes:
- index of an index of an index can be determined according to the header index in the index structure and the attribute information of the specified index data, locate the block index according to an index of the block index, and match the block index
- the data items of the index data are processed.
- An index data processing device wherein at least one index data of attribute information is stored in an index structure, the index structure includes a header index and at least one index, and an index of the at least one index is stored in the header index.
- the index of the at least one index includes attribute information of the index data stored in the block index, and the device includes:
- An instruction receiving module configured to receive a processing instruction for specifying index data, where the processing instruction includes attribute information specifying the index data;
- a data processing module configured to: when an index of an index of an index can be determined according to a header index in the index structure and attribute information of the specified index data, locate the block index according to an index of the block index, and A data item matching the index data in the block index is processed.
- the index data with different attribute information is stored in an index structure, so that the data storage is more compact, so that one Key-Value access can acquire or update more data, so the data processing time and the Key can be reduced.
- the interaction between the -Value systems improves the performance of index creation and querying.
- Figure la is a schematic diagram of index data organized in a B+ tree structure in the prior art
- Figure lb is a schematic diagram of index data organized in the form of a prefix hash tree in the prior art
- FIG. 2 is a flowchart of a method for processing index data according to an embodiment of the present invention
- FIG. 3 is a schematic diagram of an index structure in the embodiment shown in FIG. 2;
- Figure 4a is a schematic diagram of an index structure
- 4b is a flowchart of another method for processing index data according to an embodiment of the present invention.
- FIG. 5 is a flowchart of deleting index data according to an embodiment of the present invention.
- FIG. 6 is a schematic diagram of an index structure in the embodiment shown in FIG. 5;
- FIG. 7 is a schematic structural diagram of a block index in another embodiment of the present invention.
- FIG. 8 is a flowchart of adding index data according to an embodiment of the present invention.
- FIG. 9 is a schematic diagram of an index structure of another embodiment of the present invention.
- FIG. 10 is a schematic structural diagram of an index data processing device according to an embodiment of the present invention.
- FIG. 11 is a schematic structural diagram of another index data processing device according to an embodiment of the present invention.
- FIG. 2 is a flowchart of a method for processing index data according to an embodiment of the present invention.
- the index data of the at least one attribute information is stored in an index structure.
- the index structure includes a head index 31 and at least one block index 32, the block index. 32 may be used to store index data having a large number of data items, the at least one index index reference 33 being stored in the header index 31, and the index 33 of at least one index including the index data stored in the block index 32. Attribute information.
- the processing method of the index data may include:
- Step 201 Receive a processing instruction for specifying index data, where the processing instruction includes attribute information of the specified index data.
- the lead structure can be stored in a database inside or outside the terminal or server such as a PC.
- the index data may include attribute information such as an attribute name (attr) and an attribute value (value), and further includes a specific data item (ite), and a column ⁇ Attm, Valuen, Itemn>.
- Step 202 If an index of an index of an index can be determined according to the header index in the index structure and the attribute information of the specified index data, the block index is located according to the index of the block index, and the data item matching the index data in the block index is processed.
- the terminal or server such as a PC searches the attribute information in the index 33 of the block index stored in the header index 31, and if the attribute information of the specified index data is included therein, the block index 32 can be located, and the block index 32 can be The data item of the index data that matches the index data is specified for processing.
- the index structure may include at least one storage unit.
- the storage unit in addition to the block index 41, at least one segment index may be included.
- Index ) 42 is used for storing index data with a small number of data items
- the index 44 of the block index and the index of the segment index are stored in the header index 43
- the index 45 of the segment index is stored in the index.
- the index data is stored in the block index 41 and the segment index 42 according to the number of the respective data items. For example, when the number of data items of the index data is greater than a preset value, the index data may be stored in the block.
- the index data when less than the preset value, may be stored in the segment index 42, wherein the block index 41 and the segment index 42 may be used to distinguish the storage index data, of course, in another embodiment
- the index data can also be stored in the block index and the segment index based on other information, such as attribute information of the index data.
- the attribute information may include an attribute name, an attribute value, and the like.
- only the segment index in the index structure may be included in the index structure.
- the storage unit in all embodiments of the present invention may be a computer readable storage medium or a database system, which may be a distributed database system.
- the above index structure may be a data structure stored in the above computer readable storage medium or database system.
- the segment index is located according to the index of the segment index, and the matching index is matched in the segment index.
- the data items of the data are processed. Specifically, the terminal or the server such as the PC searches for the attribute information in the index of the segment index stored in the header index, and if the attribute information of the specified index data is included, the segment index may be located, and then the segment index may be The data item of the index data that matches the specified index data is processed.
- the determining process may be: first searching whether the attribute information of the specified index data exists in the index 44 of the block index; if yes, determining the designation The attribute information of the index data belongs to the index 44 of the block index; if not, it is determined that the attribute information of the specified index data belongs to the index 45 of the segment index. Of course, it is also possible to first find the index of the segment index 45 whether there is attribute information of the specified index data.
- the index data with different attribute information is stored in an index structure, so that the data storage is more compact, so that one Key-Value access can acquire or update more data, so the data processing time and the Key can be reduced.
- the interaction between the -Value systems improves the performance of index creation and querying.
- FIG. 4b it is a flowchart of another method for processing index data according to an embodiment of the present invention.
- the processing method of the I data may include:
- Step 401 Receive a processing instruction for specifying index data, where the processing instruction includes attribute information of the specified index data.
- the navigation structure can be stored in an internal database of a terminal or server such as a PC, or in an external database independent of a terminal or server such as a PC.
- Step 402 Determine that the attribute information of the specified index data is located in the index of the segment index in the header index or an index of the block index.
- the index data may include attribute information such as an attribute name (attr) and an attribute value (value), and further includes a specific item (item), such as ⁇ Attrn, Valuen, Itemn>.
- the determining process may be: first searching for the attribute information of the specified index data in the index 44 of the block index; if yes, determining that the attribute information of the specified index data belongs to the index 44 of the block index; If not, it is determined that the attribute information of the specified index data belongs to the index 45 of the segment index. Of course, it is also possible to first find the index 45 of the segment index whether there is attribute information of the specified index data.
- step 403 If the attribute information of the specified index data belongs to the index 44 of the block index, step 403 is performed; if the attribute information of the specified index data belongs to the index 45 of the segment index, step 404 is performed.
- Step 403 Perform data processing on the data item matching the index data according to the data item of the specified index data in the block index.
- the matching index data in the block index has the same attribute information as the specified index data.
- the data processing performed on the data items in the matching index data may refer to:
- the data item of the specified index data is located; or, in the data item matching the index data, the data item of the specified index data is deleted; or, in the data item matching the index data, the specified index is added.
- the data item of the data is not limited to:
- the index data in the processed block index may be further deleted, or merged, or divided, or migrated.
- Step 404 Perform data processing on the data item matching the index data according to the data item of the specified index data in the segment index.
- the matching index data in the segment index has the same attribute information as the specified index data.
- the data processing performed on the data items in the matching index data may refer to:
- the data item of the specified index data is located; or, in the data item matching the index data, the data item of the specified index data is deleted; or, in the data item matching the index data, the specified index is added.
- the data item of the data is not limited to:
- the index data in the processed segment index may be further deleted, merged, or divided, or migrated.
- the header index may be searched after receiving the processing instruction for the specified index data.
- the attribute information in the index of the block index stored in the block if the attribute information of the specified index data is included, the block index may be located, and then the data item of the index data matching the specified index data in the block index may be processed. For example, locate, delete, increase, etc.
- the embodiment of the invention stores the index data with different attribute information in an index structure, so that the data storage is more compact, so that a Key-Value access can acquire or update more.
- the data so it can reduce the interaction between the data processing and the Key-Value system, thereby improving the performance of index creation and query.
- FIG. 5 it is a flowchart of deleting index data according to an embodiment of the present invention.
- the processing of the index data is described by taking an example of deleting a specified index data.
- the index data of the multiple attributes is still stored in the organizational form shown in FIG. 4a. More specifically, as shown in FIG. 6, the segment index 42 includes at least one segment index unit, and the segment index unit 421, For example, the segment index units 421 and 422 are stored in parallel, and each segment index unit includes a plurality of index data, wherein the index data may be sorted according to the attribute information, and the specific process may be: first, segment index 42 The index data is sorted according to the attribute information, and then divided into multiple segment index units. The process of sorting by attribute information can be sorted by comparing different attribute names and attribute values.
- the specific sorting method is as follows, (Attrl , Valuel), (Attr2, Value2) two attribute information pairs as an example, first determine whether Attrl is greater than Attr2, if not equal, the size of the two attribute information pairs and the order of Attrl and Attr2 are the same, otherwise the two attribute information pairs The size is the same as the size order of Valuel and Value2.
- the data item of the index data having the same attribute information in the block index 41 is divided into at least one segmentation unit, and the chain connection between the segmentation units forms a segmentation unit chain, such as the segmentation unit chain 411, 412, the segmentation unit chain Among the adjacent two segment units, the previous segment unit stores access information of the next segment unit, such as a key value.
- At least one segment unit divided by data items of index data having the same attribute information may be stored side by side, and the block index 41 is further Segmentation information of each segmentation unit, such as segmentation information 413, is stored.
- the method of deleting the specified data may include:
- Step 501 Determine whether the attribute information of the specified index data is located in an index of the block index.
- the terminal or the server of the PC receives the processing instruction for the specified index data, in this step, it may first check whether the attribute information in the index 44 of the block index contains the attribute information (Attrn, Valuen) of the specified index data, and if not , the index 45 of the segment index includes the attribute information of the specified index data, that is, the specified index data is stored in the segment index 42, and steps 502-506 are performed; if yes, the block index 41 stores the information. Specify index data and go to steps 507 to 509.
- Step 502 Delete the specified index data in the data item of the matching index data of the segment index Data item.
- the matching index data has the same attribute information (Attrn, Valuen) as the specified index data, and may first search the attribute information (Attrn, Valuen) in the index 45 of the segment index to determine the segment index of the matching index data.
- a unit such as the segmentation index unit 421, deletes the data item Itemn specifying the index data from the data item matching the index data after determining the matching index data in the segment index unit 421.
- Step 503 Determine whether the number of data items of the matching index data after deleting the data item is 0. If the data item in the matching index data is completely identical to the data item of the specified index data, the data item of the matching index data is zero after the data item is deleted, and step 504 needs to be performed.
- Step 504 Delete the matching index data in the segment index unit.
- the data item matching the index data is 0 after the data item is deleted, it can be directly in the segment index unit.
- the matching index data is deleted in 421. Thereafter, the subsequent steps of the segmentation index unit 421 can also be performed.
- Step 505 If the index data in the segment index unit is empty, delete the segment index unit. After the matching index data is deleted, it is determined whether the index data exists in the segment index unit 421. If not, the segment index unit 421 can be deleted directly, and the index 45 of the segment index is modified. Further, step 506 can also be performed.
- Step 506 If the index data does not exist in the segment index, the index of the segment index is deleted from the header index.
- the segment index unit 42 After deleting the segment index unit 421, it is determined whether the index data exists in the segment index 42. If it does not exist, that is, there is no segment index unit, the segment index 42 may be further deleted and divided. The index 45 of the segment index is deleted from the header index 43 in which it is located.
- the adjacent two segment index units may also be performed. Merging, that is, merging into a segmented index unit, and further modifying the index of the segmentation index.
- the threshold a can be set according to the specific situation of the index structure, which is not limited herein.
- Step 507 In the block index, in the data item matching the index data, the data item of the specified index data is deleted.
- step 501 If the result of the determination in step 501 is that the attribute information of the specified index data is located in the block index In the index 44, the specified index data is located in the block index 41, and further, the matching index data in the block index 41 that is the same as the attribute information of the specified index data is further determined, and the specified index is deleted in the data item matching the index data.
- the data item of the data is the result of the determination in step 501.
- the chain connection between the segment units in the block index 41 is taken as an example.
- the matching index may be first determined according to the attribute information (Attrn, Valuen) of the specified index data.
- the segmentation unit chain 411 in which the data is located determines the segmentation unit in which the data item to be deleted is located, assuming the segmentation unit 411a, and then deletes the data item Itemn of the specified index data in the determined segmentation unit 411a. Step 508 is further performed.
- the segmentation information to which the matching index data belongs may be determined according to the attribute information of the specified index data.
- the segment information 413 further determines, in the segmentation unit corresponding to the determined segmentation information 413, the segmentation unit 413a in which the data item to be deleted is located, and then deletes the data item of the specified index data in the determined segmentation unit 413a. And update the segmentation information 413.
- Step 508 Determine whether the segment unit in which the data item of the index data is located in the block index is empty.
- step 509 is performed.
- Step 509 The empty segmentation unit is deleted from the segmentation unit chain in which it is located.
- the segmentation unit chain can be re-established, and only the access information in the previous segmentation unit of the segmentation unit 411a in the original segmentation unit chain 411 needs to be modified to be the segmentation unit 411a.
- the access information of the next segment unit is sufficient.
- segmentation unit in the block index 41 if the segmentation unit in the block index 41 is stored in parallel, if the segmentation unit 413a in which the data item is located after deleting the data item Itemn is empty, the segmentation unit 413a may be directly deleted. And update its corresponding segmentation information 413.
- step 508 in another embodiment, if the number of data items in the segmentation unit in which the data item Itemn is located is not 0, and, in the block index 41, the data item of the matching index data is only divided into A segmentation unit, whether the segmentation unit in the block index 41 is stored in a chain form or in a segmented manner, after deleting the data item Itemn, the data in the segmentation unit in which the data item is located can be determined. Whether the number of items is less than the threshold b (fifth threshold) and not 0 (the threshold b can The setting is performed according to the index structure, which is not specifically limited herein.
- the matching index data after deleting the data item is migrated from the block index 41 to the segment index 42, and the index 45 of the segment index in the header index 43 is updated and Index 44 of the block index. Further, after the matching index data of the deleted data item is migrated to the segment index 42, the matching index data of the deleted data item may be added to a segment index unit, or a segment index unit may be additionally added. The matching index data of the deleted data item is stored.
- the segmentation unit in the block index 41 is stored in a chain manner, when the data is deleted.
- the threshold c the fourth threshold
- the adjacent two segment units are merged, and the The access information in each of the segmentation units in the segmentation unit chain 411, wherein the threshold value c can be set according to the index structure, which is not limited.
- the threshold value c can be set according to the index structure, which is not limited.
- the index structure is the structure shown in FIG. 3, that is, only the block index is included
- Locating the attribute information in the index of the block index stored in the header index if the attribute information of the specified index data is included, the block index may be located, and then the data item of the index data matching the specified index data in the block index may be Delete it.
- the index data with different attribute information is stored in an index structure, so that the data storage is more compact, so that when the index data deletion operation is performed, the interaction with the Key-Value system can be reduced, thereby Improved performance for indexing and querying.
- FIG. 8 is a flowchart of adding index data according to an embodiment of the present invention.
- the processing of the index data is described by adding a specified index data as an example.
- the index data of the plurality of attributes is still stored in the organization form shown in FIG. 4a, and the index data in the specific block index 41 and the segment index 42 are still stored in the manner shown in FIGS.
- the method of increasing the specified index data may include: Step 801: Determine whether attribute information of the specified index data is located in an index of a block index.
- the attribute information in the index 44 of the block index may first be checked whether the attribute information (Attrn, Valuen) of the specified index data is included. If not, the index 45 of the segment index includes the attribute information of the specified index data. , that is, the specified index data needs to be added to the segment index 42, and steps 802 to 803 are performed; if yes, the index 44 of the block index includes the attribute information of the specified index data, that is, the specified index data needs to be added to In block index 41, steps 804-806 are performed.
- Step 802 Add a data item of the specified index data to the data item of the matching index data of the segment index.
- the matching index data has the same attribute information (Attrn, Valuen) as the specified index data, and may first search the attribute information (Attrn, Valuen) in the index 45 of the segment index to determine the segment index of the matching index data.
- a unit such as the segment index unit 421, adds the data item Itemn specifying the index data to the data item matching the index data after determining the matching index data in the segment index unit 421.
- the threshold d (second threshold) is used to divide the index data to be stored in the segment index or the block index, and the specific value may be set according to the index structure, which is not limited.
- step 8031 is performed. If no, the threshold d is not exceeded, then step 8032 is performed.
- Step 8031 The matching index data after the data item is added is migrated from the segment index to the block index. First, the matching index data is deleted from the segment index unit 421 of the segment index 42, and the index of the segment index is modified. 45, and then add the matching index data of the added data item to the block index.
- the matching index data of the added data item is first divided into one or more segmentation units according to the number of data items of the preset segmentation unit, if the segmentation unit in the block index 41 Stored in a chain manner, the chain connection between the segmentation units is established, that is, the access information of each segmentation unit is increased, the segmentation unit chain is created for storage, and the index 44 of the block index is modified; if the block index 41 The segmentation units in the storage are stored in parallel, and the segmented cells are stored side by side, and the segmentation information corresponding to the matching index data is added in the block index 41, and the block index is modified. 44.
- Step 8032 Determine whether the size of the segment index unit where the matching index data is added after the data item is greater than a threshold e, and if yes, divide the segment index unit into two new segment index units.
- the threshold e (third threshold) is used to determine the size of the segment index unit. If the index data stored in the segment index unit is greater than the threshold e, the segment index unit needs to be divided, otherwise no division is needed.
- the threshold e can be set according to the index structure, and the specific value is not limited herein.
- the segment index unit 421 After the data item is added, if the data in the segment index unit 421 where the matching index data is located is excessive and the occupied space is too large and exceeds the threshold e, the segment index unit 421 needs to be re-divided into two new points.
- the segment index unit replaces the original segment index unit 421, and after the division, the index 45 of the segment index can be further updated.
- Step 804 In the block index, in the data item matching the index data, the data item of the specified index data is added.
- step 801 If the result of the determination in step 801 is that the attribute information of the specified index data is located in the index 44 of the block index, it is necessary to add the specified index data to the block index 41, and further determine the specified index in the block index 41.
- the attribute information of the data is the same matching index data, and the data item of the specified index data is added to the data item matching the index data.
- the chain connection between the segment units in the block index 41 is taken as an example.
- the matching index data is first determined according to the attribute information (Attrn, Valuen) of the specified index data.
- the segmentation unit chain 411 is located, and then determines the segmentation unit in which the data item to be added is located, assuming the segmentation unit 411a, and then adding the data item Itemn of the specified index data to the determined segmentation unit 411a. Go to step 805.
- the segmentation information to which the matching index data belongs may be determined according to the attribute information of the specified index data.
- the segment information 413 further determines, in the segmentation unit corresponding to the determined segmentation information 413, the segmentation unit 413a in which the data item to be added is located, and then adds the data item of the specified index data in the determined segmentation unit 413a. And update the segmentation information 413.
- Step 805 Determine whether the number of data items exceeds the threshold f in the segmentation unit where the data item of the specified index data is located.
- the data in the segmentation unit 411a is determined. Whether the number of items exceeds the threshold f (the sixth threshold), if yes, step 806 is performed, and if not, the operation of increasing the data item ends.
- the threshold f is used to determine the size of the segmentation unit. If the data item stored in the segmentation unit is greater than the threshold f, the segmentation unit needs to be partitioned, otherwise no division is required.
- the threshold value f can be set according to the index structure, and the specific value is not limited herein.
- Step 806 dividing the segmentation unit in which the data item of the specified index data is located into two new segments.
- the segmentation unit 411a needs to be divided into two new segmentation units, and then two new segments are
- the segmentation unit replaces the segmentation unit 411a in the segmentation unit chain 411, and updates the access information of each segmentation unit in the segmentation unit chain 411.
- the segmentation unit in the block index 41 is stored in a side-by-side manner, the segmentation unit 413a after the data item Itemn is incremented, and if the number of data items exceeds the threshold f, the additional data item is added.
- the subsequent segmentation unit 413a is divided into two new segmentation units in place of the original segmentation unit 413a, and the two new segmentation units are stored side by side, and the segmentation information 413 in the block index 41 is updated.
- the index structure is the structure shown in FIG. 3, that is, only the block index is included
- the processing for adding the data after receiving the processing instruction for the specified index data, Locating the attribute information in the index of the block index stored in the header index, if the attribute information of the specified index data is included, the block index may be located, and then the index data matching the specified index data may be added in the block index.
- the data item of the index data if the index structure is the structure shown in FIG. 3, that is, only the block index is included
- the index data with different attribute information is stored in an index structure, so that the data storage is more compact, so that when the index data is added, the interaction with the Key-Value system can be reduced, thereby Improved performance for indexing and querying.
- the following operations may be performed on any segment unit or index data in the block index: when the data item in the segment unit is empty, the branch is deleted. a segment unit; when the number of data items in the adjacent two segment units is less than a fourth threshold, combining the adjacent two segment units; when the data item of the index data is only divided into one segment unit, if If the number of data items in the segmentation unit is less than the fifth threshold, the index data is migrated from the block index to the segmentation index; when the number of data items in the segmentation unit is greater than the sixth threshold, the segmentation unit is divided into Two new segmentation units.
- the following operations may be performed on any segment index unit or index data in the segment index: when the index data in the segment index unit is 0, Deleting the segmentation index unit; when the number of index data in the adjacent two segment index units is less than the first threshold, combining the adjacent two segment index units; and data of the index data in the segment index unit When the number of items exceeds the second threshold, the index data is migrated from the segmentation unit in which it is located to the block index; when the size of the segment index unit is greater than the third threshold, the segmentation index unit is divided into two New segmentation index unit.
- the index structure is stored in a plurality of storage units as shown in FIG. 4a, and the index structure further includes storage information of the plurality of storage units, as shown in FIG.
- Each storage unit 91 may be stored in parallel.
- the index structure may further include storage information 92 of each storage unit.
- the storage information 92 and its lower storage unit 91 form a cascade structure, and the storage information 91 may be each storage unit.
- the storage information 92 is used to first search the storage information 92 of each storage unit 91 according to the attribute information of the specified index data to determine the storage unit 91 of the specified index data, and then determine the storage unit 91 of the specified index data, and then Data processing is then performed in the determined memory unit 91 in accordance with the method of the previous embodiment.
- the storage information 92 may be stored side by side, and form a cascade structure with the index node 93 of the upper level thereof, and the index node 93 may include index information of the lower level storage information 92 thereof.
- the attribute information in the information 92 is stored, and the index structure may also be a large-scale cascade structure formed by a plurality of nodes including the storage unit 91, the storage information 92, and the index node 93.
- the index structure Based on the index structure, when performing data processing, such as locating, deleting, adding data, etc., after determining the storage unit for which the data processing is performed according to the attribute information of the data, etc., the foregoing embodiment can be executed based on the storage unit. A similar step, data processing, will not be repeated here.
- FIG. 10 is a schematic structural diagram of an index data processing device according to an embodiment of the present invention.
- the index data of the at least one attribute information is stored in an index structure, where the index structure includes a header index and at least one index, and an index of the at least one index is stored in the header information.
- the above index structure can be stored in a database such as a PC or the like, or a database inside or outside the server. Medium. Based on the above index structure, the device may include:
- the instruction receiving module 1001 is configured to receive a processing instruction for specifying index data, where the processing instruction includes attribute information of the specified index data.
- the data processing module 1002 is configured to: when determining an index of an index of an index according to the header index in the index structure and the attribute information of the specified index data, locate the block index according to an index of the block index, and The data items matching the index data in the block index are processed.
- the data processing module 1002 locates the location where the specified index data is located according to the attribute information of the specified index data, and then the data item of the index data in which the specified index data matches. Perform data processing.
- the data processing may be to locate, delete, and add data items of the specified index data. For the specific operation, refer to the foregoing embodiment corresponding to FIG. 5-8.
- the index structure may further include at least one segment index, the index of the at least one segment index is stored in the header index, and the index of the at least one segment index is stored in the segment index.
- the attribute information of the index data, the data processing module may also be used to locate the segment index according to the index of the segment index when the index of the segment index is determined according to the header index in the index structure and the attribute information of the specified index data. And processing the data item matching the index data in the segment index, and the index structure can adopt the form as shown in FIG. 3 or FIG. 4a or FIG.
- the embodiment of the present invention stores the index data having different attribute information in an index structure, so that the data storage is more compact, so that the above module can acquire or update more data in one Key-Value access, so the number of data can be reduced.
- the interaction between the data processing and the Key-Value system improves the performance of index creation and query.
- FIG. 11 is a schematic structural diagram of another index data processing device according to an embodiment of the present invention.
- the segment index includes at least one segment index unit, and the segment index unit is stored in parallel, each segment index unit includes a plurality of index data; and the block index has the same attribute information.
- the data item of the index data is divided into at least one segmentation unit.
- the chained connection between the segmentation units forms a segmentation unit chain, and in the adjacent two segmentation units of the segmentation unit chain, the previous segmentation unit stores access information of the next segmentation unit; or, the segmentation unit Parallel storage, block information of each segment unit is also stored in the block index.
- the device may also include an instruction receiving module 1101 and a data processing module 1102, where the command is connected
- the receiving module 1101 is similar to the command receiving module 1001.
- the data processing module 1102 may further include:
- the first merging unit 11021 is configured to combine the adjacent two segmentation units when the number of index data in the adjacent two segment index units is less than the first threshold;
- the first migration unit 11022 is configured to: when the number of data items of the index data in the segment index unit exceeds a second threshold, migrate the index data from the segment index unit in which it is located to the block index;
- the first dividing unit 11023 is configured to divide the segment index unit into two new segment index units when the size of the segment index unit is greater than a third threshold.
- a second merging unit 11024 configured to merge the adjacent two segment units when the number of data items in the adjacent two segment units is less than a fourth threshold
- the second migration unit 11025 is configured to: when the data item of the index data is divided into only one segmentation unit, if the number of data items in the segmentation unit is less than a fifth threshold, the index data is migrated from the block index to the segmentation In the index;
- the second dividing unit 11026 is configured to divide the segmentation unit into two new segmentation units when the number of data items in the segmentation unit is greater than a sixth threshold.
- the data processing module 1102 can include any combination of the above units, and is not specifically limited.
- the data processing module 1102 may further include the following units:
- a positioning unit configured to locate, in the data item of the matching index data, the data item of the specified index data
- a first deleting unit configured to delete, in the data item of the matching index data, the data item of the specified index data
- an adding unit configured to add, in the data item of the matching index data, the data item of the specified index data.
- a second deleting unit configured to delete the segment indexing unit when the index data in the segment index unit is 0;
- a third deleting unit configured to delete the segment unit when the data item in the segment unit is empty.
- the data processing module can include any combination of the above units at the same time.
- the storage units may be stored side by side, and the index structure further includes storage information of each storage unit to form a cascade structure of large-scale data.
- the data processing module may perform the storage unit based on the storage unit. The steps similar to the previous embodiment are performed, and the description is not repeated here.
- the embodiments of the present invention can be implemented by means of software plus a necessary general hardware platform.
- the technical solution of the embodiments of the present invention may be embodied in the form of a software product in essence or in the form of a software product, which may be stored in a storage medium such as a ROM/RAM.
- Disks, optical disks, and the like include instructions for causing a computer device (which may be a personal computer, server, or network device, etc.) to perform the methods described in certain embodiments of the present invention or embodiments.
- the application can be described in the general context of computer-executable instructions executed by a computer, such as a program module.
- program modules include routines, programs, objects, components, data structures, and the like that perform particular tasks or implement particular abstract data types.
- the present application can also be practiced in distributed computing environments where tasks are performed by remote processing devices that are connected through a communication network.
- program modules can be located in both local and remote computer storage media including storage devices.
- the embodiment of the present invention stores the index data having different attribute information in an index structure, so that the data storage is more compact, so that the above module can acquire or update more data in one Key-Value access, so the number of data can be reduced.
- the interaction between the data processing and the Key-Value system improves the performance of index creation and query.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
Claims
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2011/084609 WO2013097065A1 (zh) | 2011-12-26 | 2011-12-26 | 一种索引数据处理方法及设备 |
CN201180003412.5A CN102725754B (zh) | 2011-12-26 | 2011-12-26 | 一种索引数据处理方法及设备 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2011/084609 WO2013097065A1 (zh) | 2011-12-26 | 2011-12-26 | 一种索引数据处理方法及设备 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2013097065A1 true WO2013097065A1 (zh) | 2013-07-04 |
Family
ID=46950465
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2011/084609 WO2013097065A1 (zh) | 2011-12-26 | 2011-12-26 | 一种索引数据处理方法及设备 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN102725754B (zh) |
WO (1) | WO2013097065A1 (zh) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103970739B (zh) * | 2013-01-24 | 2017-04-26 | 中兴通讯股份有限公司 | 一种存储信息的处理方法及装置 |
CN104346347A (zh) * | 2013-07-25 | 2015-02-11 | 深圳市腾讯计算机系统有限公司 | 数据存储方法、装置、服务器及系统 |
CN107688567B (zh) * | 2016-08-03 | 2021-02-09 | 腾讯科技(深圳)有限公司 | 一种索引存储方法及相关装置 |
CN106570093B (zh) * | 2016-10-24 | 2020-03-27 | 南京中新赛克科技有限责任公司 | 一种基于独立元数据组织结构的海量数据迁移方法和装置 |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4841433A (en) * | 1986-11-26 | 1989-06-20 | American Telephone And Telegraph Company, At&T Bell Laboratories | Method and apparatus for accessing data from data attribute tables |
CN101055589A (zh) * | 2007-05-30 | 2007-10-17 | 北京航空航天大学 | 嵌入式数据库的存储管理方法 |
CN101853283A (zh) * | 2010-05-21 | 2010-10-06 | 南京邮电大学 | 面向多维数据的语义索引对等网络的构建方法 |
-
2011
- 2011-12-26 CN CN201180003412.5A patent/CN102725754B/zh not_active Expired - Fee Related
- 2011-12-26 WO PCT/CN2011/084609 patent/WO2013097065A1/zh active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4841433A (en) * | 1986-11-26 | 1989-06-20 | American Telephone And Telegraph Company, At&T Bell Laboratories | Method and apparatus for accessing data from data attribute tables |
CN101055589A (zh) * | 2007-05-30 | 2007-10-17 | 北京航空航天大学 | 嵌入式数据库的存储管理方法 |
CN101853283A (zh) * | 2010-05-21 | 2010-10-06 | 南京邮电大学 | 面向多维数据的语义索引对等网络的构建方法 |
Also Published As
Publication number | Publication date |
---|---|
CN102725754A (zh) | 2012-10-10 |
CN102725754B (zh) | 2014-08-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109299102B (zh) | 一种基于Elastcisearch的HBase二级索引系统及方法 | |
CN102193917B (zh) | 一种数据处理和查询方法和装置 | |
US9870382B2 (en) | Data encoding and corresponding data structure | |
US9934289B2 (en) | Fuzzy full text search | |
US11386063B2 (en) | Data edge platform for improved storage and analytics | |
CN106874425B (zh) | 基于Storm的实时关键词近似搜索算法 | |
WO2018184305A1 (zh) | 基于社交网络的群组查找方法、装置、服务器和存储介质 | |
US9953058B1 (en) | Systems and methods for searching large data sets | |
US20220005546A1 (en) | Non-redundant gene set clustering method and system, and electronic device | |
CN111666468A (zh) | 一种基于团簇属性在社交网络中搜索个性化影响力社区的方法 | |
WO2013097065A1 (zh) | 一种索引数据处理方法及设备 | |
JP2019087249A (ja) | 自動検索辞書およびユーザインターフェイス | |
US9984108B2 (en) | Database joins using uncertain criteria | |
CN111666302A (zh) | 用户排名的查询方法、装置、设备及存储介质 | |
CN113535803B (zh) | 一种基于关键字索引的区块链高效检索及可靠性验证方法 | |
US11126622B1 (en) | Methods and apparatus for efficiently scaling result caching | |
CN114048219A (zh) | 图数据库更新方法及装置 | |
WO2021207831A1 (en) | Method and systems for indexing databases on a contextual basis | |
CN111538804A (zh) | 一种基于HBase的图数据处理方法和设备 | |
CN114546731B (zh) | 一种工作流数据恢复方法及数据恢复系统 | |
CN113946580B (zh) | 一种海量异构日志数据检索中间件 | |
US11797485B2 (en) | Frameworks for data source representation and compression | |
CN115801020B (zh) | 确定有限状态自动机压缩方法、匹配方法、设备及介质 | |
CN117435560A (zh) | 数据查询方法、装置、电子设备及可读存储介质 | |
CN116361295A (zh) | 一种针对高读写争用工作负载的可验证索引构建、更新与验证方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WWE | Wipo information: entry into national phase |
Ref document number: 201180003412.5 Country of ref document: CN |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 11878435 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 11878435 Country of ref document: EP Kind code of ref document: A1 |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 11878435 Country of ref document: EP Kind code of ref document: A1 |