CN102725754B - Method and device for processing index data - Google Patents

Method and device for processing index data Download PDF

Info

Publication number
CN102725754B
CN102725754B CN201180003412.5A CN201180003412A CN102725754B CN 102725754 B CN102725754 B CN 102725754B CN 201180003412 A CN201180003412 A CN 201180003412A CN 102725754 B CN102725754 B CN 102725754B
Authority
CN
China
Prior art keywords
index
data
unit
segmented
segmenting
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201180003412.5A
Other languages
Chinese (zh)
Other versions
CN102725754A (en
Inventor
曹俊亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Publication of CN102725754A publication Critical patent/CN102725754A/en
Application granted granted Critical
Publication of CN102725754B publication Critical patent/CN102725754B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2246Trees, e.g. B+trees

Abstract

Provided is a method for processing index data, wherein index data of at least one piece of attribute information is stored in an index structure; the index structure comprises a head index and at least one block index; the index of the at least one block index is stored in the head index; the index of the at least one block index comprises the attribute information of the index data stored in the block index. The method comprises receiving a processing instruction on designated index data, wherein the processing instruction comprises the attribute information of the designated index data; if the index of a block index can be determined according to the head index in the index structure and the attribute information of the designated index data, then the block index is positioned on the basis of the index of the block index, and a data item, which matches the index data, in the block index is processed.

Description

A kind of index data disposal route and equipment
Technical field
The present invention relates to technical field of data processing, relate in particular to a kind of index data disposal route and equipment.
Background technology
Key-Value database provides key-value pair <Key, the storage of Value>, and general Key-Value system only provides the operation based on Key.For the application of retrieving according to the attribute in Value, need on the basis of Key-value system, build adaptation layer structurized data retrieval is provided, pass through adaptation layer, the data model tabular that the Value of different K ey is comprised, be about to the abstract mode for Property Name and property value of Value (or being called row and value), the Value that different like this Key is corresponding may have identical Property Name, therefore can be for retrieving and created prerequisite according to Property Name.In order to improve effectiveness of retrieval, can set up index to the value of attribute.
In prior art, can adopt the form of B+ tree or prefix hashing tree to organize index data.Wherein, as shown in Figure 1a, the information of internal node Save Range wherein, is less than or equal to 0110 property value in the subtree on its left side to B+ tree construction as represented in internal node 1, is greater than 0110 property value in the subtree on its right; In leaf node, preserve real index data.Prefix hashing tree builds a data structure based on dictionary tree, its principle is to utilize the common prefix of character string to carry out organising data, take and comprise that prefix hashing tree that 0 and 1 character set is example as shown in Figure 1 b, the prefix of each node is the character string that the character from root to this node forms, intermediate node has only been stored the relation with child node, and leaf node is stored real index data.
Yet, no matter be B+ tree or the index data organizational form of prefix hashing tree, for each, need to set up the attribute information of index, all need to organize separately a B+ tree or prefix hashing tree construction to store the index data of this attribute, a key-value pair in corresponding Key-Value database of each node in simultaneously setting, therefore, when index data being carried out to related data operation, location or increase or delete a certain index data for example, all need first from numerous B+ trees or prefix hashing tree, to determine the corresponding B+ tree of attribute information or the prefix hashing tree of this index data, and then travel through downwards from higher level's internal node or intermediate node, until it arrives a leaf node.Therefore, based on index data organizational form of the prior art, when index data is processed, all need to carry out repeatedly alternately with Key-Value system, efficiency is lower.
Summary of the invention
The embodiment of the present invention provides a kind of index data disposal route and equipment, can be when index data be processed, and minimizing is mutual with Key-Value system, raises the efficiency.
In order to solve the problems of the technologies described above, the technical scheme of the embodiment of the present invention is as follows:
A kind of index data disposal route, the index datastore of at least one attribute information is in an index structure, described index structure comprises an index and at least one index, the index stores of described at least one index is in described index, the index of described at least one index comprises the attribute information that is stored in described index data in index, and described method comprises:
The processing instruction of reception to assigned indexes data, described processing instruction comprises the attribute information of assigned indexes data;
If can determine the index of an index according to the attribute information of the index in described index structure and described assigned indexes data, according to the index of described index, locate described index, and process mating the data item of described index data in described index.
A kind of index data treatment facility, the index datastore of at least one attribute information is in an index structure, described index structure comprises an index and at least one index, the index stores of described at least one index is in described index, the index of described at least one index comprises the attribute information that is stored in described index data in index, and described equipment comprises:
Command reception module, for receiving the processing instruction to assigned indexes data, described processing instruction comprises the attribute information of assigned indexes data;
Data processing module, for in the time determining the index of an index according to the attribute information of an index of described index structure and described assigned indexes data, according to the described index in index location of described index, and process mating the data item of described index data in described index.
The embodiment of the present invention by by the index datastore with different attribute information in an index structure, make data storage more compact, thereby more data can be obtained or upgrade to a Key-Value access, so mutual in the time of can reducing data processing and between Key-Value system, thereby improved, index is set up and the performance of inquiry.
Accompanying drawing explanation
In order to be illustrated more clearly in the embodiment of the present invention or technical scheme of the prior art, to the accompanying drawing of required use in embodiment or description of the Prior Art be briefly described below, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those of ordinary skills, do not paying under the prerequisite of creative work, can also obtain according to these accompanying drawings other accompanying drawing.
Fig. 1 a be in prior art index data with the schematic diagram of B+ tree-building version tissue;
Fig. 1 b be in prior art index data with the schematic diagram of prefix hashing tree form tissue;
Fig. 2 is the process flow diagram of a kind of index data disposal route of the embodiment of the present invention;
Fig. 3 is the schematic diagram of middle index structure embodiment illustrated in fig. 2;
Fig. 4 a is a kind of schematic diagram of index structure;
Fig. 4 b is the process flow diagram of the another kind of index data disposal route of the embodiment of the present invention;
Fig. 5 is a kind of process flow diagram of deleting index data of the embodiment of the present invention;
Fig. 6 is the schematic diagram of middle index structure embodiment illustrated in fig. 5;
Fig. 7 is the structural representation of piece index in another embodiment of the present invention;
Fig. 8 is a kind of process flow diagram that increases index data of the embodiment of the present invention;
Fig. 9 is the schematic diagram of another embodiment of the present invention index structure;
Figure 10 is the structural representation of a kind of index data treatment facility of the embodiment of the present invention;
Figure 11 is the structural representation of the another kind of index data treatment facility of the embodiment of the present invention.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is clearly and completely described, obviously, described embodiment is only the present invention's part embodiment, rather than whole embodiment.Embodiment based in the present invention, those of ordinary skills, not making the every other embodiment obtaining under creative work prerequisite, belong to the scope of protection of the invention.
Referring to Fig. 2, it is the process flow diagram of a kind of index data disposal route of the embodiment of the present invention.
In the present embodiment, the index datastore of at least one attribute information is in an index structure, as shown in Figure 3, this index structure comprises an index (head index) 31 and at least one index (block index) 32, this piece index 32 can be for the larger index data of storing data item quantity, the index of this at least one index (block index reference) 33 is stored in an index 31, and the index 33 of at least one index comprises the attribute information that is stored in the index data in piece index 32.
Based on above-mentioned index structure, when specifying an index data, to the disposal route of this index data, can comprise:
Step 201, receives the processing instruction to assigned indexes data, and this processing instruction comprises the attribute information of assigned indexes data.
Implementation methods all in the embodiment of the present invention can be carried out by the terminals such as PC or server, and this index structure can be stored in the inside of the terminals such as PC or server or the database of outside.
Wherein, in index data, can comprise the attribute informations such as Property Name (attr), property value (value), also comprise concrete data item (item), <Attrn for example, Valuen, Itemn>.
Step 202, if can determine the index of an index according to the attribute information of the index in index structure and assigned indexes data, processes according to the index locating piece index of piece index, and to the data item of match index data in piece index.
Attribute information in the terminals such as PC or whois lookup head index 31 in the index 33 of the piece index of storage, if the attribute information comprising assigned indexes data, can locating piece index 32, and then can be to processing with the data item of the index data of this assigned indexes Data Matching in this piece index 32.
In another embodiment of the present invention, this index structure can comprise at least one storage unit, in storage unit, as shown in Fig. 4 a, except comprising piece index 41, can also comprise at least one segmented index (segment index) 42, for the less index data of storing data item quantity, in index 43, store the index 44 of piece index and the index of segmented index (segment index reference) 45, the index 45 of segmented index comprises the attribute information that is stored in the index data in segmented index 42.Index data according to data item quantity separately number, be distributed in storage in piece index 41 and segmented index (segment index) 42, for example, when the data item quantity of index data is greater than a preset value, can be by this index datastore in piece index 41, when being less than this preset value, can be by index datastore in segmented index 42, wherein, piece index 41 and segmented index 42 can be for distinguishing storage index data, certainly, index data also can be according to other information in another embodiment, such as attribute information of index data etc., be distributed in piece index and segmented index and store.In the embodiment of the present invention, this attribute information can comprise Property Name and property value etc.Certainly, in another embodiment, in this index structure, also can only include segmented index wherein.
It should be noted that, the storage unit in all embodiment of the present invention can be a kind of computer-readable recording medium, can be also a kind of Database Systems, and these Database Systems can be distributed Database Systems.Above-mentioned index structure can be the data structure being stored in above-mentioned computer-readable recording medium or Database Systems.
If can determine the index of a segmented index according to the attribute information of the index in this index structure and assigned indexes data, according to the index location segmented index of segmented index, and the data item of match index data in segmented index is processed.Concrete, attribute information in the index of the segmented index of storing in the terminals such as PC or whois lookup head index, if the attribute information comprising assigned indexes data, can locate segmented index, and then can be to processing with the data item of the index data of this assigned indexes Data Matching in this segmented index.
When the attribute information of determining assigned indexes data is positioned at the index of segmented index or the index of piece index, this deterministic process can be: first search the attribute information that whether has assigned indexes data in the index 44 of piece index; If so, determine that the attribute information of these assigned indexes data belongs to the index 44 of piece index; If not, determine that the attribute information of assigned indexes data belongs to the index 45 of segmented index.Whether the index 45 that certainly, also can first search segmented index there is the attribute information of assigned indexes data.
The embodiment of the present invention by by the index datastore with different attribute information in an index structure, make data storage more compact, thereby more data can be obtained or upgrade to a Key-Value access, so mutual in the time of can reducing data processing and between Key-Value system, thereby improved, index is set up and the performance of inquiry.
Referring to Fig. 4 b, it is the process flow diagram of the another kind of index data disposal route of the embodiment of the present invention.
In the present embodiment, the index structure based on shown in Fig. 4 a, when specifying an index data, can comprise the disposal route of this index data:
Step 401, receives the processing instruction to assigned indexes data, and this processing instruction comprises the attribute information of assigned indexes data.
Implementation methods all in the embodiment of the present invention can be carried out by the terminals such as PC or server.This index structure can be stored in the internal database of the terminals such as PC or server, or is stored in the external data base of terminals such as being independent of PC or server.
Step 402, determines that the attribute information of assigned indexes data is arranged in an index of index segmented index or the index of piece index.
In the embodiment of the present invention, in index data, can comprise the attribute informations such as Property Name (attr), property value (value), also comprise concrete data item (item), for example <Attrn, Valuen, Itemn>.When providing assigned indexes data, first can, according to its attribute information, determine that the attribute information of these assigned indexes data belongs to index 44 or the index of segmented index 45 of the piece index in an index 43.
This deterministic process can be: first search the attribute information that whether has assigned indexes data in the index 44 of piece index; If so, determine that the attribute information of these assigned indexes data belongs to the index 44 of piece index; If not, determine that the attribute information of assigned indexes data belongs to the index 45 of segmented index.Whether the index 45 that certainly, also can first search segmented index there is the attribute information of assigned indexes data.
If the attribute information of these assigned indexes data belongs to the index 44 of piece index, perform step 403; If the attribute information of these assigned indexes data belongs to the index 45 of segmented index, perform step 404.
Step 403, in piece index, according to the data item of assigned indexes data, carries out data processing to the data item of match index data.
Wherein, the match index data in piece index have the attribute information identical with assigned indexes data.
This data processing that data item in match index data is carried out can refer to:
In the data item of match index data, the data item of location assigned indexes data; Or,
In the data item of match index data, delete the data item of assigned indexes data; Or,
In the data item of match index data, increase the data item of assigned indexes data.
After above-mentioned data processing, can also be further to the index data in the piece index after processing, delete, or merge, or divide, or the processing such as migration.
Step 404, in segmented index, according to the data item of assigned indexes data, carries out data processing to the data item of match index data.
Wherein, the match index data in segmented index have the attribute information identical with assigned indexes data.
This data processing that data item in match index data is carried out can refer to:
In the data item of match index data, the data item of location assigned indexes data; Or,
In the data item of match index data, delete the data item of assigned indexes data; Or,
In the data item of match index data, increase the data item of assigned indexes data.
After above-mentioned data processing, can also be further to the index data in the segmented index after processing, delete, or merge, or divide, or the processing such as migration.
In another embodiment, if index structure is the structure shown in Fig. 3, wherein only include piece index, when carrying out the processing of index data, can be after receiving the processing instruction of assigned indexes data, search the attribute information in the index of the piece index of storing in an index, if the attribute information comprising assigned indexes data, can locating piece index, and then can be to processing with the data item of the index data of this assigned indexes Data Matching in this piece index, for example, location, deletion, increase etc.
The embodiment of the present invention by by the index datastore with different attribute information in an index structure, make data storage more compact, thereby more data can be obtained or upgrade to a Key-Value access, so mutual in the time of can reducing data processing and between Key-Value system, thereby improved, index is set up and the performance of inquiry.
Referring to Fig. 5, for the embodiment of the present invention is a kind of, delete the process flow diagram of index data.In the present embodiment, the processing that index data is carried out be take the index data of deleting an appointment and is described as example.
Wherein, the index data of a plurality of attributes is still stored with the organizational form shown in Fig. 4 a, more specifically, as shown in Figure 6, segmented index 42 comprises at least one segmented index unit, with segmented index unit 421, 422 is example, segmented index unit 421, 422 storages side by side, in each segmented index unit, comprise a plurality of index datas, index data wherein can sort according to attribute information, detailed process can be, first the index data in segmented index 42 is sorted according to attribute information, and then be divided into a plurality of segmented indexes unit, the process sorting by attribute information can be to sort by more different Property Names and property value, concrete sort method is as follows, with (Attr1, Value1), (Attr2, Value2) two attribute informations are to being example, first judge whether Attr1 is greater than Attr2, if unequal, the right size of two attribute informations is identical with the order of Attr1 and Attr2, otherwise the right size of two attribute informations is identical with the size sequence of Value1 and Value2.
The data item in piece index 41 with the index data of same alike result information is divided at least one segmenting unit, and between segmenting unit, chain type is connected to form segmenting unit chain, for example segmenting unit chain 411,412, in adjacent two segmenting units of segmenting unit chain, a upper segmenting unit stores the visit information of next segmenting unit, as key value etc.
In another embodiment, as shown in Figure 7, in this piece index 41, to thering is storage side by side between at least one segmenting unit that the data item of the index data of same alike result information divides, in piece index 41, also store the segment information of each segmenting unit, for example segment information 413.
The method of these deletion assigned indexes data can comprise:
Step 501, judges whether the attribute information of these assigned indexes data is arranged in the index of piece index.
After the terminals such as PC or server receive the processing instruction of assigned indexes data, in this step, can first check whether the attribute information in the index 44 of piece index comprises the attribute information (Attrn of assigned indexes data, Valuen), if not, the attribute information that comprises these assigned indexes data in the index 45 of segmented index is described, is also in segmented index 42, to store this assigned indexes data, execution step 502~506; If so, in illustrated block index 41, store this assigned indexes data, execution step 507~509.
Step 502, in the data item of the match index data of segmented index, deletes the data item of these assigned indexes data.
These match index data have the attribute information (Attrn identical with assigned indexes data, Valuen), can first in the index 45 of segmented index, search this attribute information (Attrn, Valuen), to determine the segmented index unit at this match index data place, for example segmented index unit 421, after the match index data in determining segmented index unit 421, the data item Itemn of assigned indexes data deleted from the data item of match index data.
Step 503, judges whether the data item quantity of the match index data after deleted data item is 0.
If the data item in these match index data is equal to the data item of assigned indexes data completely, to delete after this data item, the data item of these match index data is 0, now needs to perform step 504.
Step 504 is deleted this match index data in segmented index unit.
If after deleted data item, the data item of match index data is 0, can directly in segmented index unit 421, delete these match index data.Afterwards, can also carry out subsequent step 505 to segmented index unit 421.
Step 505, if the index data in this segmented index unit is empty, deletes this segmented index unit.
Delete after match index data, judge in this segmented index unit 421 whether also have index data, if do not exist, can directly this segmented index unit 421 be deleted, and revise the index 45 of segmented index.Further, can also perform step 506.
From the beginning step 506, if there is not index data in segmented index, delete the index of this segmented index in index.
Delete behind above-mentioned segmented index unit 421, judge in this segmented index 42 whether also have index data, if there is no, also do not have any segmented index unit, can further this segmented index 42 be deleted, and the index of segmented index 45 is deleted from an index 43 at its place.
In another embodiment, after carrying out above-mentioned deletion action, if the quantity of index data and while being less than threshold value a (first threshold) in adjacent two segmented index unit, these adjacent two segmented index unit can also be merged, merge into a segmented index unit, and can further revise the index of segmented index.Wherein, threshold value a can set according to the concrete condition of index structure, is not construed as limiting herein.
Step 507, in piece index, in the data item of match index data, deletes the data item of assigned indexes data.
If the attribute information that the result judging in step 501 is these assigned indexes data is arranged in the index 44 of piece index, illustrate that these assigned indexes data are arranged in piece index 41, further determine the match index data identical with the attribute information of these assigned indexes data in piece index 41, in the data item of match index data, delete the data item of these assigned indexes data.
In the present embodiment, with chain type between the segmenting unit in piece index 41, being connected to example describes, as shown in Figure 6, can be first according to the attribute information (Attrn of assigned indexes data, Valuen), determine the segmenting unit chain 411 at this match index data place, then determine the segmenting unit at the data item place that needs deletion, be assumed to be segmenting unit 411a, then in definite segmenting unit 411a, delete the data item Itemn of these assigned indexes data.Further perform step 508.
In another embodiment, segmenting unit in if block index 41 is storage side by side, as shown in Figure 7, can be first according to the attribute information of assigned indexes data, determine the segment information under these match index data, be assumed to be segment information 413, further in the segmenting unit of definite segment information 413 correspondences, determine the segmenting unit 413a at the data item place that this need to be deleted, then in definite segmenting unit 413a, delete the data item of these assigned indexes data, and upgrade this segment information 413.
Step 508, in decision block index, whether the segmenting unit at the data item place of assigned indexes data is empty.
In above-mentioned segmenting unit 411a, after deleted data item Itemn, if there is not other data item in this segmenting unit 411a, perform step 509.
Step 509 is deleted this empty segmenting unit from the segmenting unit chain at its place.
Delete after this segmenting unit 411a, can also re-establish segmenting unit chain, only the visit information in a upper segmenting unit of segmenting unit 411a in former segmenting unit chain 411 need to be revised as to the visit information to next segmenting unit of this segmenting unit 411a.
In another embodiment, segmenting unit in if block index 41 is side by side storage, if after deleted data item Itemn, the segmenting unit 413a at this data item place be sky, can directly this segmenting unit 413a be deleted, and upgrade its corresponding segment information 413.
After step 508, in another embodiment, if the data item quantity in the segmenting unit at this data item Itemn place is not 0, and, if in piece index 41, the data item of these match index data is only divided into a segmenting unit, no matter the segmenting unit in piece index 41 is stored with chain type form, or store side by side with sectional type, after deleted data item Itemn, all can judge whether data item quantity in the segmenting unit at this data item place is less than threshold value b (the 5th threshold value) and be not that 0 (this threshold value b can arrange according to index structure, do not do concrete restriction herein), if, the match index data after deleted data item are migrated to segmented index 42 from piece index 41, and upgrade the index 45 of segmented index and the index 44 of piece index in an index 43.Further, at the match index Data Migration of this deleted data item to segmented index 42, the match index data of this deleted data item can be increased in a certain segmented index unit, also can set up in addition the match index data that this deleted data item is stored in a segmented index unit.
After step 508, in another embodiment, if the data item quantity in the segmenting unit at this data item Itemn place is not 0, mode for segmenting unit in piece index 41 with chain type storage, after deleting this data item, if in the segmenting unit chain 411 at its place, the quantity of data item and be less than threshold value c (the 4th threshold value) in adjacent two segmenting units, adjacent two segmenting units are merged, can also further upgrade the visit information in each segmenting unit in this segmenting unit chain 411, wherein, this threshold value c can arrange according to this index structure, specifically be not construed as limiting.The mode of storing side by side for segmenting unit in piece index 41, after deleting this data item, if the quantity of the data item of adjacent two segmenting units and be less than threshold value c, these adjacent two segmenting units are merged, and upgrade corresponding segment information 413, wherein, threshold value c can arrange according to this index structure, is specifically not construed as limiting.
In another embodiment, if index structure is the structure shown in Fig. 3, wherein only include piece index, when carrying out the deletion action of index data, can be after receiving the processing instruction of assigned indexes data, search the attribute information in the index of the piece index of storing in an index, if comprising the attribute information of assigned indexes data, can locating piece index, and then can be to deleting with the data item of the index data of this assigned indexes Data Matching in this piece index.
The embodiment of the present invention by by the index datastore with different attribute information in an index structure, make data storage more compact, thereby when the deletion action of execution index data, can reduce and Key-Value system between mutual, thereby improved, index is set up and the performance of inquiry.
Referring to Fig. 8, for the embodiment of the present invention is a kind of, increase the process flow diagram of index data.
In the present embodiment, the processing that index data is carried out be take to be increased the index data of an appointment and describes as example.
Wherein, the index data of a plurality of attributes is still stored with the organizational form shown in Fig. 4 a, and concrete piece index 41 and the index data in segmented index 42 are still stored in the mode shown in Fig. 6,7.
The method of these increase assigned indexes data can comprise:
Step 801, judges whether the attribute information of these assigned indexes data is arranged in the index of piece index.
In this step, can first check whether the attribute information in the index 44 of piece index comprises the attribute information (Attrn of assigned indexes data, Valuen), if not, the attribute information that comprises these assigned indexes data in the index 45 of segmented index is described, also these assigned indexes data need to be increased in segmented index 42, execution step 802~803; If so, the attribute information that comprises these assigned indexes data in the index 44 of illustrated block index, also these assigned indexes data need to be increased in piece index 41, execution step 804~806.
Step 802, in the data item of the match index data of segmented index, increases the data item of these assigned indexes data.
These match index data have the attribute information (Attrn identical with assigned indexes data, Valuen), can first in the index 45 of segmented index, search this attribute information (Attrn, Valuen), to determine the segmented index unit at this match index data place, for example segmented index unit 421, after the match index data in determining segmented index unit 421, the data item Itemn of assigned indexes data are increased in the data item of match index data.
803, whether the quantity that judgement increases the data item of the match index data after data item is greater than threshold value d.
Wherein, threshold value d (Second Threshold) is suitable for being stored in segmented index or piece index for dividing index data, and its concrete numerical value can arrange according to index structure, is specifically not construed as limiting.
If in the match index data after increase data item, data item quantity surpasses this threshold value d, performs step 8031, if not, is no more than this threshold value d, performs step 8032.
Step 8031, migrates to from segmented index the match index data that increase after data item piece index, and upgrades the index of segmented index and the index of piece index in an index.
First, these match index data are deleted from the segmented index unit 421 of segmented index 42, revise the index 45 of this segmented index, then the match index data of this increase data item are increased in piece index.
When being increased to piece index 41, first according to the data item quantity of default segmenting unit, the match index data of this increase data item are divided into one or more segmenting units, if the segmenting unit in this piece index 41 is stored in chain type mode, the chain type of setting up between this segmenting unit connects, increase the visit information of each segmenting unit, create segmenting unit chain and store, and revise the index 44 of this piece index; If the segmenting unit in this piece index 41 is stored in mode arranged side by side, the segmenting unit after dividing is stored side by side, and in piece index 41, set up segment information corresponding to these match index data, and revise the index 44 of this piece index.
Step 8032, whether the size that judgement increases the segmented index unit at the match index data place after data item is greater than threshold value e, by this segmented index dividing elements, is if so, two new segmented index unit.
This threshold value e (the 3rd threshold value), for determining the size of segmented index unit, if the index data being stored in this segmented index unit is greater than this threshold value e, needs this segmented index unit to divide, otherwise without division.This threshold value e can arrange according to index structure, and concrete numerical value is not construed as limiting herein.
After increasing data item, if the overabundance of data in the segmented index unit 421 at this match index data place, take up space excessive, surpass threshold value e, need that this segmented index unit 421 is reclassified as to two new segmented index unit and substitute former segmented index unit 421, after division, also can further upgrade the index 45 of this segmented index.
Step 804, in piece index, in the data item of match index data, increases the data item of assigned indexes data.
If the attribute information that the result judging in step 801 is these assigned indexes data is arranged in the index 44 of piece index, explanation needs these assigned indexes data to be increased in piece index 41, further determine the match index data identical with the attribute information of these assigned indexes data in piece index 41, in the data item of match index data, increase the data item of these assigned indexes data.
In the present embodiment, with chain type between the segmenting unit in piece index 41, being connected to example describes, as shown in Figure 6, first according to the attribute information (Attrn of assigned indexes data, Valuen), determine the segmenting unit chain 411 at this match index data place, then determine the segmenting unit at the data item place that needs increase, be assumed to be segmenting unit 411a, then in definite segmenting unit 411a, increase the data item Itemn of these assigned indexes data.Further perform step 805.
In another embodiment, segmenting unit in if block index 41 is storage side by side, as shown in Figure 7, can be first according to the attribute information of assigned indexes data, determine the segment information under these match index data, be assumed to be segment information 413, further in the segmenting unit of definite segment information 413 correspondences, determine the segmenting unit 413a at the data item place that this need to increase, then in definite segmenting unit 413a, increase the data item of these assigned indexes data, and upgrade this segment information 413.
Step 805, in the segmenting unit at the data item place of judgement assigned indexes data, whether the quantity of data item surpasses threshold value f.
In segmenting unit 411a, increase after data item Itemn, judge whether the data item quantity in this segmenting unit 411a surpasses threshold value f (the 6th threshold value), if surpass, perform step 806, if be no more than, increase data item EO.
This threshold value f, for determining the size of segmenting unit, if the data item being stored in this segmenting unit is greater than this threshold value f, needs this segmenting unit to divide, otherwise without division.This threshold value f can arrange according to index structure, and concrete numerical value is not construed as limiting herein.
Step 806, is divided into two new segmenting units by the segmenting unit at the data item place of assigned indexes data, and upgrades the visit information between segmenting unit in the segmenting unit chain at place.
In this step, segmenting unit 411a need to be divided into two new segmenting units, then two new segmenting units be replaced to the segmenting unit 411a in segmenting unit chain 411, and upgrade the visit information of each segmenting unit in this segmenting unit chain 411.
In another embodiment, if the segmenting unit in this piece index 41 is stored in mode arranged side by side, increase the segmenting unit 413a after data item Itemn, if the quantity of its data item surpasses threshold value f, the segmenting unit 413a after this increase data item is divided into two new segmenting units and substitutes former segmenting unit 413a, two new segmenting units are stored side by side, and upgrade the segment information 413 in piece index 41.
In another embodiment, if index structure is the structure shown in Fig. 3, wherein only include piece index, when the increase operation of carrying out index data, can be after receiving the processing instruction of assigned indexes data, search the attribute information in the index of the piece index of storing in an index, if the attribute information comprising assigned indexes data, can locating piece index, so can be in this piece index with the index data of this assigned indexes Data Matching in increase the data item of assigned indexes data.
The embodiment of the present invention by by the index datastore with different attribute information in an index structure, make data storage more compact, thereby when the increase operation of execution index data, can reduce and Key-Value system between mutual, thereby improved, index is set up and the performance of inquiry.
In the embodiment of the present invention, based on above-mentioned index structure, after carrying out data processing, for any segmenting unit in piece index or index data, all can carry out following operation: when the data item in segmenting unit is empty, delete this segmenting unit; The quantity of data item and while being less than the 4th threshold value in adjacent two segmenting units, will described adjacent two segmenting units merging; When the data item of index data is only divided into a segmenting unit, if the data item quantity in this segmenting unit is less than the 5th threshold value, this index data is migrated to segmented index from piece index; When the quantity of data item is greater than the 6th threshold value in segmenting unit, this segmenting unit is divided into two new segmenting units.
In the embodiment of the present invention, based on above-mentioned index structure, after carrying out data processing, for any segmented index unit in segmented index or index data, all can carry out following operation: when the index data in segmented index unit is 0, delete this segmented index unit; The quantity of index data and while being less than first threshold in adjacent two segmented index unit, merges described adjacent two segmented index unit; When in segmented index unit, the data item quantity of index data surpasses Second Threshold, this index data is migrated to piece index from the segmented index unit at its place; When the size of segmented index unit is greater than the 3rd threshold value, by this segmented index dividing elements, be two new segmented index unit.
In another embodiment of the present invention, this index structure is stored in a plurality of storage unit as shown in Fig. 4 a, the storage information that also comprises a plurality of storage unit in this index structure, specifically as shown in Figure 9, between each storage unit 91, can store side by side, the storage information 92 that can also comprise each storage unit in this index structure, between storage information 92Yu Qi subordinate storage unit 91, form cascade structure, the attribute information of the storage data that this storage information 91 can be each storage unit etc., this storage information 92 is for when definite assigned indexes Data Position, first according to the attribute information of assigned indexes data, search the storage information 92 of each storage unit 91, to determine assigned indexes data place storage unit 91, and then according to the method for previous embodiment, carry out data processing in definite storage unit 91.And, when storage information 92 is while being a plurality of, between storage information 92, can store side by side, and form cascade structure with the index node 93 of its upper level, the index information that can comprise its subordinate's storage information 92 in this index node 93, such as attribute information in storage information 92 etc., this index structure can also be the extensive cascade structure being formed by the multistage node that comprises storage unit 91, storage information 92, index node 93.Based on this index structure, when carrying out data processing, as location, deletion, increase data etc., according to the attribute information of these data etc., determine this data processing for storage unit after, can carry out and the similar step of previous embodiment based on this storage unit, carry out data processing, be no longer repeated in this description herein.
Be more than the description to the inventive method embodiment, to realizing the equipment of said method, be introduced below.
Referring to Figure 10, it is the structural representation of a kind of index data treatment facility of the embodiment of the present invention.
In the present embodiment, the index datastore of at least one attribute information is in an index structure, this index structure comprises an index and at least one index, the index stores of described at least one index is in described index, and the index of described at least one index comprises the attribute information that is stored in described index data in index.Above-mentioned index structure can be stored in the inside of the terminals such as PC or server or the database of outside.Based on above-mentioned index structure, this equipment can comprise:
Command reception module 1001, for receiving the processing instruction to assigned indexes data, described processing instruction comprises the attribute information of assigned indexes data.
Data processing module 1002, for in the time determining the index of an index according to the attribute information of an index of described index structure and described assigned indexes data, according to the described index in index location of described index, and process mating the data item of described index data in described index.
Command reception module 1001 receives after the instruction of assigned indexes data, by data processing module 1002 according to the attribute information of assigned indexes data, position to assigned indexes data place positions, then to wherein carrying out data processing with the data item of the index data of assigned indexes Data Matching.This data processing can be location, deletes, increase the data item of assigned indexes data etc., and its concrete operations refer to the embodiment of earlier figures 5~8 correspondences.
In another embodiment, this index structure also can also comprise at least one segmented index, the index stores of this at least one segmented index is in described index, the index of at least one segmented index comprises the attribute information that is stored in the index data in segmented index, data processing module, can also be in the time determining the index of a segmented index according to the attribute information of an index of index structure and assigned indexes data, according to the index location segmented index of segmented index, and the data item of match index data in segmented index is processed, this index structure can just adopt as form in Fig. 3 or Fig. 4 a or Fig. 9 etc.
The embodiment of the present invention by by the index datastore with different attribute information in an index structure, make data storage more compact, thereby can in a Key-Value access, obtain or upgrade more data by above-mentioned module, so mutual in the time of can reducing data processing and between Key-Value system, thereby improved, index is set up and the performance of inquiry.
Referring to Figure 11, it is the structural representation of the another kind of index data treatment facility of the embodiment of the present invention.
In the index structure of the embodiment of the present invention, segmented index comprises at least one segmented index unit, and segmented index unit is stored side by side, in each segmented index unit, comprises a plurality of index datas; In piece index, the data item with the index data of same alike result information is divided at least one segmenting unit.Between segmenting unit, chain type is connected to form segmenting unit chain, and in adjacent two segmenting units of this segmenting unit chain, a upper segmenting unit stores the visit information of next segmenting unit; Or, between segmenting unit, store side by side, in piece index, also store the segment information of each segmenting unit.
This equipment also can comprise command reception module 1101 and data processing module 1102, and wherein command reception module 1101 and instruction receiver modules 1001 are similar.
In the present embodiment, this data processing module 1102 may further include:
The first merge cells 11021, for the quantity when adjacent two segmented index unit index datas with while being less than first threshold, merges described adjacent two segmented index unit;
The first migration units 11022, while surpassing Second Threshold for the data item quantity when segmented index unit index data, migrates to this index data piece index from the segmented index unit at its place;
The first division unit 11023, while being greater than the 3rd threshold value for the size when segmented index unit, is two new segmented index unit by this segmented index dividing elements.
The second merge cells 11024, for the quantity when adjacent two segmenting unit data item with while being less than the 4th threshold value, will described adjacent two segmenting units merging;
The second migration units 11025, while being only divided into a segmenting unit for the data item when index data, if the data item quantity in this segmenting unit is less than the 5th threshold value, migrates to this index data segmented index from piece index;
The second division unit 11026, while being greater than the 6th threshold value for the quantity when segmenting unit data item, is divided into two new segmenting units by this segmenting unit.
In this embodiment, this data processing module 1102 can comprise the combination in any of above unit, does not do concrete restriction.
In another embodiment of the present invention, this data processing module 1102 can further include as lower unit:
Positioning unit, for the data item in described match index data, locates the data item of described assigned indexes data;
The first delete cells, for the data item in described match index data, deletes the data item of described assigned indexes data;
Increase unit, for the data item in described match index data, increase the data item of described assigned indexes data.
The second delete cells, while being 0 for the index data when segmented index unit, deletes this segmented index unit;
The 3rd delete cells, while being empty for the data item when segmenting unit, deletes this segmenting unit.
This data processing module can comprise the combination in any of said units simultaneously.
In another embodiment, if store a plurality of storage unit in this index structure, between storage unit, can store side by side, in this index structure, also comprise the storage information of each storage unit, to form the cascade structure of large-scale data.Under this index structure, data processing module is when carrying out data processing, as location, deletion, increase data etc., according to the attribute information of these data etc., determine this data processing for storage unit after, can carry out and the similar step of previous embodiment based on this storage unit, carry out data processing, be no longer repeated in this description herein.
The description of the embodiment by the invention described above embodiment is known, and those skilled in the art can be well understood to the mode that the embodiment of the present invention can add essential general hardware platform by software and realize.Understanding based on such, the part that the technical scheme of the embodiment of the present invention contributes to prior art in essence in other words can embody with the form of software product, this computer software product can be stored in storage medium, as ROM/RAM, magnetic disc, CD etc., comprise that some instructions are with so that a computer equipment (can be personal computer, server, or the network equipment etc.) carry out the method described in some part of the embodiment of the present invention or embodiment.
The application can describe in the general context of the computer executable instructions of being carried out by computing machine, for example program module.Usually, program module comprises the routine carrying out particular task or realize particular abstract data type, program, object, assembly, data structure etc.Also can in distributed computing environment, put into practice the application, in these distributed computing environment, by the teleprocessing equipment being connected by communication network, be executed the task.In distributed computing environment, program module can be arranged in the local and remote computer-readable storage medium that comprises memory device.
The embodiment of the present invention by by the index datastore with different attribute information in an index structure, make data storage more compact, thereby can in a Key-Value access, obtain or upgrade more data by above-mentioned module, so mutual in the time of can reducing data processing and between Key-Value system, thereby improved, index is set up and the performance of inquiry.
In above equipment, the specific implementation process of each module and unit refers to the corresponding description of preceding method embodiment, repeats no more herein.
Above-described embodiment of the present invention, does not form limiting the scope of the present invention.Any modification of doing within the spirit and principles in the present invention, be equal to and replace and improvement etc., within all should being included in claim protection domain of the present invention.

Claims (8)

1. an index data disposal route, it is characterized in that, the index datastore of at least one attribute information is in an index structure, described index structure comprises an index and at least one index, the index stores of described at least one index is in described index, the index of described at least one index comprises the attribute information that is stored in described index data in index, and described method comprises:
The processing instruction of reception to assigned indexes data, described processing instruction comprises the attribute information of assigned indexes data;
If can determine the index of an index according to the attribute information of the index in described index structure and described assigned indexes data, according to the index of described index, locate described index, and process mating the data item of described index data in described index;
Described index structure also comprises at least one segmented index, and the index stores of described at least one segmented index is in described index, and the index of described at least one segmented index comprises the attribute information that is stored in the index data in described segmented index;
If can determine the index of a segmented index according to the attribute information of the index in described index structure and described assigned indexes data, according to the index of described segmented index, locate described segmented index, and process mating the data item of described index data in described segmented index;
Described segmented index comprises at least one segmented index unit, and described segmented index unit is stored side by side, in each segmented index unit, comprises a plurality of index datas;
In described index, the data item with the index data of same alike result information is divided at least one segmenting unit;
Between described segmenting unit, chain type is connected to form segmenting unit chain, and in adjacent two segmenting units of this segmenting unit chain, a upper segmenting unit stores the visit information of next segmenting unit; Or,
Between described segmenting unit, store side by side, in described index, also store the segment information of each segmenting unit.
2. method according to claim 1, is characterized in that, described index structure is stored in a plurality of storage unit, also comprises the storage information of described a plurality of storage unit in described index structure.
3. method according to claim 1, is characterized in that, described method also comprises:
The quantity of index data and while being less than first threshold in adjacent two segmented index unit, merges described adjacent two segmented index unit; Or,
When in segmented index unit, the data item quantity of index data surpasses Second Threshold, this index data is migrated to piece index from the segmented index unit at its place; Or,
When in segmented index unit, the data item population size of index data is greater than the 3rd threshold value, by this segmented index dividing elements, be two new segmented index unit.
4. method according to claim 1, is characterized in that, described method also comprises:
The quantity of data item and while being less than the 4th threshold value in adjacent two segmenting units, will described adjacent two segmenting units merging; Or,
When the data item of index data is only divided into a segmenting unit, if the data item quantity in this segmenting unit is less than the 5th threshold value, this index data is migrated to segmented index from piece index; Or,
When the quantity of data item is greater than the 6th threshold value in segmenting unit, this segmenting unit is divided into two new segmenting units.
5. an index data treatment facility, it is characterized in that, the index datastore of at least one attribute information is in an index structure, described index structure comprises an index and at least one index, the index stores of described at least one index is in described index, the index of described at least one index comprises the attribute information that is stored in described index data in index, and described equipment comprises:
Command reception module, for receiving the processing instruction to assigned indexes data, described processing instruction comprises the attribute information of assigned indexes data;
Data processing module, for in the time determining the index of an index according to the attribute information of an index of described index structure and described assigned indexes data, according to the described index in index location of described index, and process mating the data item of described index data in described index;
Described index structure also comprises at least one segmented index, and the index stores of described at least one segmented index is in described index, and the index of described at least one segmented index comprises the attribute information that is stored in the index data in described segmented index;
Described data processing module, also in the time determining the index of a segmented index according to the attribute information of an index of described index structure and described assigned indexes data, according to the index of described segmented index, locate described segmented index, and process mating the data item of described index data in described segmented index;
Described segmented index comprises at least one segmented index unit, and described segmented index unit is stored side by side, in each segmented index unit, comprises a plurality of index datas;
In described index, the data item with the index data of same alike result information is divided at least one segmenting unit;
Between described segmenting unit, chain type is connected to form segmenting unit chain, and in adjacent two segmenting units of this segmenting unit chain, a upper segmenting unit stores the visit information of next segmenting unit; Or,
Between described segmenting unit, store side by side, in described index, also store the segment information of each segmenting unit.
6. equipment according to claim 5, is characterized in that, described index structure is stored in a plurality of storage unit, also comprises the storage information of described a plurality of storage unit in described index structure.
7. equipment according to claim 5, is characterized in that, described data processing module comprises following at least one:
The first merge cells, for the quantity when adjacent two segmented index unit index datas with while being less than first threshold, merges described adjacent two segmented index unit;
The first migration units, while surpassing Second Threshold for the data item quantity when segmented index unit index data, migrates to this index data piece index from the segmented index unit at its place;
The first division unit, while being greater than the 3rd threshold value for the data item population size when segmented index unit index data, is two new segmented index unit by this segmented index dividing elements.
8. equipment according to claim 5, is characterized in that, described data processing module comprises following at least one:
The second merge cells, for the quantity when adjacent two segmenting unit data item with while being less than the 4th threshold value, will described adjacent two segmenting units merging;
The second migration units, while being only divided into a segmenting unit for the data item when index data, if the data item quantity in this segmenting unit is less than the 5th threshold value, migrates to this index data segmented index from piece index;
The second division unit, while being greater than the 6th threshold value for the quantity when segmenting unit data item, is divided into two new segmenting units by this segmenting unit.
CN201180003412.5A 2011-12-26 2011-12-26 Method and device for processing index data Expired - Fee Related CN102725754B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2011/084609 WO2013097065A1 (en) 2011-12-26 2011-12-26 Index data processing method and device

Publications (2)

Publication Number Publication Date
CN102725754A CN102725754A (en) 2012-10-10
CN102725754B true CN102725754B (en) 2014-08-13

Family

ID=46950465

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201180003412.5A Expired - Fee Related CN102725754B (en) 2011-12-26 2011-12-26 Method and device for processing index data

Country Status (2)

Country Link
CN (1) CN102725754B (en)
WO (1) WO2013097065A1 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103970739B (en) * 2013-01-24 2017-04-26 中兴通讯股份有限公司 Storage information processing method and device
CN104346347A (en) * 2013-07-25 2015-02-11 深圳市腾讯计算机系统有限公司 Data storage method, device, server and system
CN107688567B (en) * 2016-08-03 2021-02-09 腾讯科技(深圳)有限公司 Index storage method and related device
CN106570093B (en) * 2016-10-24 2020-03-27 南京中新赛克科技有限责任公司 Mass data migration method and device based on independent metadata organization structure

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4841433A (en) * 1986-11-26 1989-06-20 American Telephone And Telegraph Company, At&T Bell Laboratories Method and apparatus for accessing data from data attribute tables
CN101055589A (en) * 2007-05-30 2007-10-17 北京航空航天大学 Embedded database storage management method
CN101853283A (en) * 2010-05-21 2010-10-06 南京邮电大学 Construction method for multidimensional data-oriented semantic indexing peer-to-peer network

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4841433A (en) * 1986-11-26 1989-06-20 American Telephone And Telegraph Company, At&T Bell Laboratories Method and apparatus for accessing data from data attribute tables
CN101055589A (en) * 2007-05-30 2007-10-17 北京航空航天大学 Embedded database storage management method
CN101853283A (en) * 2010-05-21 2010-10-06 南京邮电大学 Construction method for multidimensional data-oriented semantic indexing peer-to-peer network

Also Published As

Publication number Publication date
WO2013097065A1 (en) 2013-07-04
CN102725754A (en) 2012-10-10

Similar Documents

Publication Publication Date Title
EP2924594B1 (en) Data encoding and corresponding data structure in a column-store database
CN102725754B (en) Method and device for processing index data
CN102867049B (en) Chinese PINYIN quick word segmentation method based on word search tree
CN105989015B (en) Database capacity expansion method and device and method and device for accessing database
CN103605776A (en) Method and device for processing data of information database
CN112463774B (en) Text data duplication eliminating method, equipment and storage medium
CN105653716A (en) Database construction method and system based on classification-attribute-value
CN105446705A (en) Method and device used for determining configuration file feature
CN104077385A (en) Classification and retrieval method of files
CN111666468A (en) Method for searching personalized influence community in social network based on cluster attributes
CN110427404A (en) A kind of across chain data retrieval system of block chain
CN110969517A (en) Bidding life cycle association method, system, storage medium and computer equipment
CN108628907A (en) A method of being used for the Trie tree multiple-fault diagnosis based on Aho-Corasick
CN106802928B (en) Power grid historical data management method and system
EP3955256A1 (en) Non-redundant gene clustering method and system, and electronic device
CN109741034B (en) Grid tree organization management method and device
KR101955376B1 (en) Processing method for a relational query in distributed stream processing engine based on shared-nothing architecture, recording medium and device for performing the method
CN110263104A (en) JSON character string processing method and device
CN105740371A (en) Density-based incremental clustering data mining method and system
CN106569986A (en) Character string replacement method and device
CN115017161A (en) Method, device and application for updating tree data structure by combining virtual DOM
CN108197295A (en) Application process of the attribute reduction based on more granularity attribute trees in text classification
CN110825846B (en) Data processing method and device
KR20220099745A (en) A spatial decomposition-based tree indexing and query processing methods and apparatus for geospatial blockchain data retrieval
CN108170987B (en) BIM technology-based PBS structure automatic hanging method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20140813

Termination date: 20191226

CF01 Termination of patent right due to non-payment of annual fee