CN112765171A - Optimization algorithm for multi-field combined index access of block chain data uplink - Google Patents

Optimization algorithm for multi-field combined index access of block chain data uplink Download PDF

Info

Publication number
CN112765171A
CN112765171A CN202110038939.7A CN202110038939A CN112765171A CN 112765171 A CN112765171 A CN 112765171A CN 202110038939 A CN202110038939 A CN 202110038939A CN 112765171 A CN112765171 A CN 112765171A
Authority
CN
China
Prior art keywords
field
data
algorithm
row
combined index
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110038939.7A
Other languages
Chinese (zh)
Other versions
CN112765171B (en
Inventor
洪薇
洪健
李京昆
刘文思
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hubei Chenweixi Chain Information Technology Co ltd
Original Assignee
Hubei Chenweixi Chain Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hubei Chenweixi Chain Information Technology Co ltd filed Critical Hubei Chenweixi Chain Information Technology Co ltd
Priority to CN202110038939.7A priority Critical patent/CN112765171B/en
Publication of CN112765171A publication Critical patent/CN112765171A/en
Application granted granted Critical
Publication of CN112765171B publication Critical patent/CN112765171B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2264Multidimensional index structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses an optimization algorithm for multi-field combination index access of block chain data uplink, which comprises the following operation steps: obtaining the structure of a data table by an SQL command provided by a database system or an embedded tool thereof; extracting a combined index of the combination; carrying out DISTINCT deduplication operation on the value of each field in the combined index to calculate the unique value count of each field value of the extracted combined index; sorting the numerical values in the combined index, wherein an ORDER (ORDER method) is adopted in a sorting method; identifying the non-decreasing sequencing to find out the position of the breakpoint; and segmenting the data according to the break points, and reading the data according to the segmentation. The invention has the advantages that on one hand, the performance of the index is fully combined to improve the reading efficiency, and in addition, the two aims of maximizing parallel processing and minimizing reading batches are served, the highest efficiency of reading and processing is realized, and meanwhile, the non-overlapping property of the data range is ensured to ensure the correctness of the application function.

Description

Optimization algorithm for multi-field combined index access of block chain data uplink
Technical Field
The present invention relates to the field of block chains, and more particularly, to an algorithm for optimizing multi-field combinatorial index access for data uplink in a block chain.
Background
Relational database systems have been widely used in various fields and industries for their structured data management capabilities and standardized SQL interfaces. The data can be conveniently exported into a byte stream file and a specific format file supported by a database system by using a data export tool of the database. The index function of the database system provides great convenience for the record-level search and positioning, the search and positioning of single records can be rapidly carried out, the selection is carried out according to the range of the single-word segment value, and under the condition of multi-thread processing, each thread can carry out parallel processing and analysis work in the interval with mutually isolated data ranges according to the value of the index field.
However, in the case of a multi-field combined unique index, on the premise of effectively utilizing the index, how to accurately divide each isolated data range, and achieve the two objectives of maximizing parallel processing or minimizing reading batches, the conventional practice is generally to locate and divide by using a single index field or a partial index, the advantage of this method is simple implementation, but the disadvantage is that on the one hand the inability to combine the performance of the indexes sufficiently results in a reduction of the reading efficiency, in addition, because the value space of a single field or a part of fields can not reflect the value space of the whole field of the combined index, the processing data ranges of all threads are overlapped, i.e., there are cases where data is repeatedly read and repeatedly processed, which increases processing time and resource overhead, in addition, the above problem is solved by an optimization algorithm for multi-field combination index access for blockchain data uplink.
Disclosure of Invention
The invention realizes a novel method for parallelization reading and processing of the combined index, can serve two aims of maximizing parallel processing and minimizing reading batches, realizes the reading and processing efficiency with the highest efficiency, and simultaneously ensures the non-overlapping property of data ranges so as to ensure the correctness of application functions.
The technical purpose of the invention is realized by the following technical scheme:
an optimization algorithm for multi-field combinatorial index access for block-chain data uplink, comprising the following steps:
s1, acquiring a structure of a data table through an SQL command or an embedded tool thereof provided by a database system;
s2, extracting the combined indexes, and if a plurality of combined indexes exist, randomly selecting one of the combined indexes; (Combined index is an identifier for reading data, and one of the identifiers is selected for operation)
S3, carrying out DISTINCT duplicate removal operation on the value of each field in the combined index, and calculating the unique value COUNT of each field value of the extracted combined index, wherein the unique value COUNT is represented as COUNT (DISTINCT ());
s4, sorting numerical values in the combined index, wherein an ORDER (ORDER extractor) is adopted as a sorting method;
s5, identifying the non-decreasing sequence and finding out the position of a breakpoint;
and S6, segmenting the data according to the break points, and reading the data according to the segmentation to ensure the integrity of the read data.
Further, in step S4: a.2 performs the following for each field of the combined index of step 4. a.1:
scanning all values of the field, and searching for a breakpoint, that is, a breakpoint that does not satisfy the non-decreasing order, taking the sorting result of step 4.a.1 as an example, the field value with the blue mark is the breakpoint of the field.
After all treatments, the results were as follows:
the first sorting field, i.e., F3, has no breakpoint because all values thereof satisfy the non-decreasing order, and thus, the processing thereof can be skipped during the actual operation and algorithm implementation.
A.3 Range-cut with the results of step A.2:
a.3.1 sequentially scanning each row of the combined index until a row containing a blue flag column is encountered;
a.3.2 marking the upper row containing the blue mark column as green as the lower boundary of the interval;
a.3.3 continue scanning from the row containing the blue flag column until the next row containing the blue flag column is encountered;
and A.3.4, taking the step A.3.3 as a reference, jumping to the step A.3.2 for circulation until all data lines are scanned, and obtaining the following results:
the row marked with green a.4 is the lower boundary of the range block, and the upper boundary is the white row or itself, i.e. the range block cannot contain more than 1 row of green-marked rows.
Further, the upper and lower boundaries of the finally formed data block are:
upper and lower boundaries
{[1,1,1],[1,1,2]}
{[1,2,1],[1,2,1]}
{[2,1,1],[2,1,2]}
{[2,2,1],[2,2,1]}
{[3,1,1],[3,1,2]}
{[3,2,1],[3,2,1]}
{[4,1,2],[4,1,2]}
A.5, the result of step a.4 defines the range of the interval for data processing, and any access across two intervals may result in incomplete or lost data acquisition.
Further, b.1, according to the unique value counts of each field of the combined index calculated in step 3, arranged from small to large, taking the data in step 3 as an example, i.e. F1, F2, F3, performs the following operations:
ORDER BY F1 ASC,F2 ASC,F3 ASC
the ranking results were obtained as follows:
F1,F2,F3
{1,1,1}
{1,1,2}
{1,1,3}
{1,2,1}
{1,2,2}
{1,2,3}
{1,2,4}
{2,1,1}
{2,1,2}
{2,1,3}
b.2, for each field of the combined index of the step B.1, the following operations are carried out:
and scanning all values of the field, and searching for a breakpoint, namely, the breakpoint does not satisfy the non-decreasing sequence, and taking the sorting result of the step b.1 as an example, the field value with the blue mark is the breakpoint of the field. After all treatments, the results were as follows:
the first sorting field, i.e., F1, has no breakpoint because all values thereof satisfy the non-decreasing order, and thus, the processing thereof can be skipped during the actual operation and algorithm implementation.
Further, b.3 performs range splitting with the results of step b.2:
b.3.1 sequentially scanning each row of the combined index until a row containing a blue flag column is encountered;
b.3.2, marking the upper row containing the blue mark column as green as a lower boundary of the interval;
b.3.3 continuing the scan from the row containing the blue flag column until the next row containing the blue flag column is encountered;
b.3.4 with reference to step B.3.3, jumping to step B.3.2 to circulate until all data lines are scanned, and the results are as follows:
the row marked as green in b.4 is the lower boundary of the range block, and the upper boundary is the white row or itself, i.e. the range block cannot contain more than 1 row of green-marked rows.
Further, the upper and lower boundaries of the finally formed data block are:
upper and lower boundaries
{[1,1,1],[1,1,3]}
{[1,2,1],[1,2,4]}
{[2,1,1],[2,1,3]}
B.5, the result of step b.4 defines the range of the data processing interval, and any access across two intervals may result in incomplete or lost data acquisition.
Further, consider a special case, i.e., where there is no blue label column, take the following data as an example:
F1,F2,F3
{1,1,1}
{2,2,2}
{3,3,3}
{3,4,5}
in this case, the partition of the interval range may be 1 interval to N intervals, where N is equal to the number of rows, and in this case, is equal to the number of entries for each field value.
1 interval:
upper and lower boundaries
{[1,1,1],[3,4,5]}
2 intervals, 3 cases in total:
2.1:
upper and lower boundaries
{[1,1,1],[1,1,1]}
{[2,2,2],[3,4,5]}
2.2:
Upper and lower boundaries
{[1,1,1],[2,2,2]}
{[3,3,3],[3,4,5]}
2.3:
Upper and lower boundaries
{[1,1,1],[3,3,3]}
{[3,4,5],[3,4,5]}
3 intervals, 3 cases in total:
3.1:
upper and lower boundaries
{[1,1,1],[1,1,1]}
{[2,2,2],[2,2,2]}
{[3,3,3],[3,4,5]}
3.2:
Upper and lower boundaries
{[1,1,1],[2,2,2]}
{[3,3,3],[3,3,3]}
{[3,4,5],[3,4,5]}
3.3:
Upper and lower boundaries
{[1,1,1],[1,1,1]}
{[2,2,2],[3,3,3]}
{[3,4,5],[3,4,5]}
4 intervals:
upper and lower boundaries
{[1,1,1],[1,1,1]}
{[2,2,2],[2,2,2]}
{[3,3,3],[3,3,3]}
{[3,4,5],[3,4,5]}
6. The above formalization rules can be extended to any number of cases of combining primary keys/indices.
In conclusion, the invention has the following beneficial effects:
the invention realizes a novel method for parallelizing reading and processing of the combined index, but for the condition of combining the unique index of a plurality of fields, on the premise of effectively utilizing the index, the method has the advantages that on one hand, the performance of the combined index is fully improved, the reading efficiency is improved, in addition, two aims of maximizing parallel processing and minimizing reading batches are served, the highest-efficiency reading and processing efficiency is realized, and meanwhile, the non-overlapping property of the data range is ensured, so that the correctness of the application function is ensured.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example 1
In a preferred embodiment of the present invention, an algorithm for optimizing multi-field combination index access for uplink of block chain data comprises the following steps:
s1, acquiring a structure of a data table through an SQL command or an embedded tool thereof provided by a database system;
s2, extracting the combined indexes, and if a plurality of combined indexes exist, randomly selecting one of the combined indexes; (Combined index is an identifier for reading data, and one of the identifiers is selected for operation)
S3, carrying out DISTINCT duplicate removal operation on the value of each field in the combined index, and calculating the unique value COUNT of each field value of the extracted combined index, wherein the unique value COUNT is represented as COUNT (DISTINCT ());
s4, sorting numerical values in the combined index, wherein an ORDER (ORDER extractor) is adopted as a sorting method;
s5, identifying the non-decreasing sequence and finding out the position of a breakpoint;
and S6, segmenting the data according to the break points, and reading the data according to the segmentation to ensure the integrity of the read data.
In step S4: a.2 performs the following for each field of the combined index of step 4. a.1:
scanning all values of the field, and searching for a breakpoint, that is, a breakpoint that does not satisfy the non-decreasing order, taking the sorting result of step 4.a.1 as an example, the field value with the blue mark is the breakpoint of the field. After all treatments, the results were as follows:
the first sorting field, i.e., F3, has no breakpoint because all values thereof satisfy the non-decreasing order, and thus, the processing thereof can be skipped during the actual operation and algorithm implementation.
A.3 Range-cut with the results of step A.2:
a.3.1 sequentially scanning each row of the combined index until a row containing a blue flag column is encountered;
a.3.2 marking the upper row containing the blue mark column as green as the lower boundary of the interval;
a.3.3 continue scanning from the row containing the blue flag column until the next row containing the blue flag column is encountered;
and A.3.4, taking the step A.3.3 as a reference, jumping to the step A.3.2 for circulation until all data lines are scanned, and obtaining the following results:
the row marked with green a.4 is the lower boundary of the range block, and the upper boundary is the white row or itself, i.e. the range block cannot contain more than 1 row of green-marked rows.
The upper and lower boundaries of the finally formed data block are:
upper and lower boundaries
{[1,1,1],[1,1,2]}
{[1,2,1],[1,2,1]}
{[2,1,1],[2,1,2]}
{[2,2,1],[2,2,1]}
{[3,1,1],[3,1,2]}
{[3,2,1],[3,2,1]}
{[4,1,2],[4,1,2]}
A.5, the result of step a.4 defines the range of the interval for data processing, and any access across two intervals may result in incomplete or lost data acquisition.
B.1, according to the unique value counts of each field of the combined index calculated in the step 3, the unique value counts are arranged from small to large, taking the data in the step 3 as an example, namely F1, F2 and F3, the following operations are executed:
ORDER BY F1 ASC,F2 ASC,F3 ASC
the ranking results were obtained as follows:
F1,F2,F3
{1,1,1}
{1,1,2}
{1,1,3}
{1,2,1}
{1,2,2}
{1,2,3}
{1,2,4}
{2,1,1}
{2,1,2}
{2,1,3}
b.2, for each field of the combined index of the step B.1, the following operations are carried out:
and scanning all values of the field, and searching for a breakpoint, namely, the breakpoint does not satisfy the non-decreasing sequence, and taking the sorting result of the step b.1 as an example, the field value with the blue mark is the breakpoint of the field. After all treatments, the results were as follows:
the first sorting field, i.e., F1, has no breakpoint because all values thereof satisfy the non-decreasing order, and thus, the processing thereof can be skipped during the actual operation and algorithm implementation.
B.3 Range-cut with the results of step B.2:
b.3.1 sequentially scanning each row of the combined index until a row containing a blue flag column is encountered;
b.3.2, marking the upper row containing the blue mark column as green as a lower boundary of the interval;
b.3.3 continuing the scan from the row containing the blue flag column until the next row containing the blue flag column is encountered;
b.3.4 with reference to step B.3.3, jumping to step B.3.2 to circulate until all data lines are scanned, and the results are as follows:
the row marked as green in b.4 is the lower boundary of the range block, and the upper boundary is the white row or itself, i.e. the range block cannot contain more than 1 row of green-marked rows.
The upper and lower boundaries of the finally formed data block are:
upper and lower boundaries
{[1,1,1],[1,1,3]}
{[1,2,1],[1,2,4]}
{[2,1,1],[2,1,3]}
B.5, the result of step b.4 defines the range of the data processing interval, and any access across two intervals may result in incomplete or lost data acquisition.
Example 2
In a preferred embodiment of the present invention, an optimization algorithm for multi-field combination index access for uplink of block chain data considers a special case, that is, a case where no blue label column exists, taking the following data as an example:
F1,F2,F3
{1,1,1}
{2,2,2}
{3,3,3}
{3,4,5}
in this case, the partition of the interval range may be 1 interval to N intervals, where N is equal to the number of rows, and in this case, is equal to the number of entries for each field value.
1 interval:
upper and lower boundaries
{[1,1,1],[3,4,5]}
2 intervals, 3 cases in total:
2.1:
upper and lower boundaries
{[1,1,1],[1,1,1]}
{[2,2,2],[3,4,5]}
2.2:
Upper and lower boundaries
{[1,1,1],[2,2,2]}
{[3,3,3],[3,4,5]}
2.3:
Upper and lower boundaries
{[1,1,1],[3,3,3]}
{[3,4,5],[3,4,5]}
3 intervals, 3 cases in total:
3.1:
upper and lower boundaries
{[1,1,1],[1,1,1]}
{[2,2,2],[2,2,2]}
{[3,3,3],[3,4,5]}
3.2:
Upper and lower boundaries
{[1,1,1],[2,2,2]}
{[3,3,3],[3,3,3]}
{[3,4,5],[3,4,5]}
3.3:
Upper and lower boundaries
{[1,1,1],[1,1,1]}
{[2,2,2],[3,3,3]}
{[3,4,5],[3,4,5]}
4 intervals:
upper and lower boundaries
{[1,1,1],[1,1,1]}
{[2,2,2],[2,2,2]}
{[3,3,3],[3,3,3]}
{[3,4,5],[3,4,5]}
6. The above formalization rules can be extended to any number of cases of combining primary keys/indices.
(DISTINCT: performing deduplication operations on the field value array, such as {1, 1,2, 3, 3}, and changing the field value array into {1, 2, 3} after the DISTINCT operations;
COUNT: the number of elements in the field value array is counted, COUNT ({1, 2, 3}) -3.
Taking the combination index { F1, F2, F3} as an example, the values are as follows:
F1,F2,F3
{1,1,1}
{1,1,2}
{1,1,3}
{1,2,1}
{1,2,2}
{1,2,3}
{1,2,4}
{2,1,1}
{2,1,2}
{2,1,3}
the unique value count of F1 is 2, 1,2 respectively;
the unique value count of F2 is 2, 1,2 respectively;
the unique value count of F3 is 4, 1,2, 3, 4 respectively;
s4, according to the target of the second type of optimization:
maximizing parallel processing
B, minimizing the read batch,
The unique value counts of each field of the combined index calculated in the step 3 are arranged from large to small, and taking the data in the step 3 as an example, namely F3, F1 and F2, the following operations are executed:
ORDER BY F3 ASC,F1 ASC,F2 ASC
the ranking results were obtained as follows:
F3,F1,F2
{1,1,1}
{1,1,2}
{1,2,1}
{2,1,1}
{2,1,2}
{2,2,1}
{3,1,1}
{3,1,2}
{3,2,1}
{4,1,2}
ORDER BY: carrying out sorting operation on the field values;
ASC: used in combination with ORDER BY, i.e. sorted in ascending ORDER, for example, F1 ═ 1, 3, 2, 0, after the ORDER BY F1 ASC operation, becomes {0, 1,2, 3 };
ORDER BY F2 ASC, F1 ASC: the sorting is first performed in ascending order according to the value of F1, and if the values of F1 are the same, the sorting is performed in ascending order according to the value of F2.
In summary, the following steps: the invention realizes a novel method for parallelization reading and processing of the combined index, can serve two aims of maximizing parallel processing and minimizing reading batches, realizes the reading and processing efficiency with the highest efficiency, and simultaneously ensures the non-overlapping property of data ranges so as to ensure the correctness of application functions.
The foregoing shows and describes the general principles and broad features of the present invention and advantages thereof. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are described in the specification and illustrated only to illustrate the principle of the present invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the present invention, which fall within the scope of the invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims (8)

1. An optimization algorithm for multi-field combinatorial index access for block-chain data uplink, comprising: the method comprises the following operation steps:
s1, acquiring a structure of a data table through an SQL command or an embedded tool thereof provided by a database system;
s2, extracting the combined indexes, and if a plurality of combined indexes exist, randomly selecting one of the combined indexes;
s3, carrying out DISTINCT duplicate removal operation on the value of each field in the combined index to calculate the unique value count of each field value of the extracted combined index;
s4, sorting numerical values in the combined index, wherein an ORDER (ORDER extractor) is adopted as a sorting method;
s5, identifying the non-decreasing sequence and finding out the position of a breakpoint;
and S6, segmenting the data according to the break points, and reading the data according to the segmentation to ensure the integrity of the read data.
2. The algorithm of claim 1, wherein the algorithm comprises: in the step S4: a.2 performs the following for each field of the combined index of step 4. a.1:
scanning all values of the field, and searching for a breakpoint, that is, a breakpoint that does not satisfy the non-decreasing order, taking the sorting result of step 4.a.1 as an example, the field value with the blue mark is the breakpoint of the field.
3. The algorithm of claim 2, wherein the algorithm comprises: after scanning all treatments, the results were as follows:
the first sorting field, i.e., F3, has no breakpoint because all values thereof satisfy the non-decreasing order, so that in the actual operation and algorithm implementation process, the processing thereof can be skipped,
a.3 Range-cut with the results of step A.2:
a.3.1 sequentially scanning each row of the combined index until a row containing a blue flag column is encountered;
a.3.2 marking the upper row containing the blue mark column as green as the lower boundary of the interval;
a.3.3 continue scanning from the row containing the blue flag column until the next row containing the blue flag column is encountered;
and A.3.4, taking the step A.3.3 as a reference, jumping to the step A.3.2 for circulation until all data lines are scanned, and obtaining the following results:
the row marked with green a.4 is the lower boundary of the range block, and the upper boundary is the white row or itself, i.e. the range block cannot contain more than 1 row of green-marked rows.
4. The algorithm of claim 3, wherein the algorithm comprises: the upper and lower boundaries of the finally formed data block are:
upper and lower boundaries
{[1,1,1],[1,1,2]}
{[1,2,1],[1,2,1]}
{[2,1,1],[2,1,2]}
{[2,2,1],[2,2,1]}
{[3,1,1],[3,1,2]}
{[3,2,1],[3,2,1]}
{[4,1,2],[4,1,2]}
A.5, the result of step a.4 defines the range of the interval for data processing, and any access across two intervals may result in incomplete or lost data acquisition.
5. The algorithm of claim 4, wherein the algorithm comprises: b.1, according to the unique value count of each field of the combined index calculated in the step 3, arranging from small to large, and executing the following operations:
ORDER BY F1 ASC,F2 ASC,F3 ASC
the ranking results were obtained as follows:
F1,F2,F3
{1,1,1}
{1,1,2}
{1,1,3}
{1,2,1}
{1,2,2}
{1,2,3}
{1,2,4}
{2,1,1}
{2,1,2}
{2,1,3}。
6. the algorithm of claim 5, wherein the algorithm comprises: b.2, for each field of the combined index of the step B.1, the following operations are carried out:
scanning all values of the field, searching for a breakpoint, that is, not satisfying the non-decreasing order, taking the sorting result of step b.1 as an example, the field value with the blue mark is the breakpoint of the field, and after all processing, the result is as follows:
the first sorting field, i.e., F1, has no breakpoint because all values thereof satisfy the non-decreasing order, and thus, the processing thereof can be skipped during the actual operation and algorithm implementation.
7. The algorithm of claim 6, wherein the algorithm comprises: b.3 Range-cut with the results of step B.2:
b.3.1 sequentially scanning each row of the combined index until a row containing a blue flag column is encountered;
b.3.2, marking the upper row containing the blue mark column as green as a lower boundary of the interval;
b.3.3 continuing the scan from the row containing the blue flag column until the next row containing the blue flag column is encountered;
b.3.4 with reference to step B.3.3, jumping to step B.3.2 to circulate until all data lines are scanned, and the results are as follows:
the row marked as green in b.4 is the lower boundary of the range block, and the upper boundary is the white row or itself, i.e. the range block cannot contain more than 1 row of green-marked rows.
8. The algorithm of claim 7, wherein the algorithm comprises: the upper and lower boundaries of the finally formed data block are:
upper and lower boundaries
{[1,1,1],[1,1,3]}
{[1,2,1],[1,2,4]}
{[2,1,1],[2,1,3]}
B.5, the result of step b.4 defines the range of the data processing interval, and any access across two intervals may result in incomplete or lost data acquisition.
CN202110038939.7A 2021-01-12 2021-01-12 Optimization algorithm for multi-field combined index fetch of block chain data uplink Active CN112765171B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110038939.7A CN112765171B (en) 2021-01-12 2021-01-12 Optimization algorithm for multi-field combined index fetch of block chain data uplink

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110038939.7A CN112765171B (en) 2021-01-12 2021-01-12 Optimization algorithm for multi-field combined index fetch of block chain data uplink

Publications (2)

Publication Number Publication Date
CN112765171A true CN112765171A (en) 2021-05-07
CN112765171B CN112765171B (en) 2023-05-23

Family

ID=75701699

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110038939.7A Active CN112765171B (en) 2021-01-12 2021-01-12 Optimization algorithm for multi-field combined index fetch of block chain data uplink

Country Status (1)

Country Link
CN (1) CN112765171B (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102033948A (en) * 2010-12-22 2011-04-27 中国农业银行股份有限公司 Method and device for updating data
CN102722531A (en) * 2012-05-17 2012-10-10 北京大学 Query method based on regional bitmap indexes in cloud environment
US20120323923A1 (en) * 2011-06-14 2012-12-20 Bank Of America Corporation Sorting Data in Limited Memory
US20140280400A1 (en) * 2013-03-15 2014-09-18 Stephane G. Legay System and method for improved data accessibility
CN104317966A (en) * 2014-11-18 2015-01-28 国家电网公司 Dynamic indexing method applied to quick combined querying of big electric power data
US20170076213A1 (en) * 2015-09-14 2017-03-16 International Business Machines Corporation Detecting Interesting Decision Rules in Tree Ensembles
CN106547755A (en) * 2015-09-17 2017-03-29 北京国双科技有限公司 A kind of data processing method and device based on piece key
US20190236201A1 (en) * 2018-01-31 2019-08-01 Salesforce.Com, Inc. Techniques for processing database tables using indexes
CN110413624A (en) * 2019-08-07 2019-11-05 南京录信软件技术有限公司 A method of the multiple row stored in association deposited based on column
CN111143373A (en) * 2019-12-30 2020-05-12 卓尔智联(武汉)研究院有限公司 Data processing method and device, electronic equipment and storage medium
CN112163019A (en) * 2020-09-29 2021-01-01 台州师同人信息技术有限公司 Trusted electronic batch record processing method based on block chain and block chain service platform

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102033948A (en) * 2010-12-22 2011-04-27 中国农业银行股份有限公司 Method and device for updating data
US20120323923A1 (en) * 2011-06-14 2012-12-20 Bank Of America Corporation Sorting Data in Limited Memory
CN102722531A (en) * 2012-05-17 2012-10-10 北京大学 Query method based on regional bitmap indexes in cloud environment
US20140280400A1 (en) * 2013-03-15 2014-09-18 Stephane G. Legay System and method for improved data accessibility
CN104317966A (en) * 2014-11-18 2015-01-28 国家电网公司 Dynamic indexing method applied to quick combined querying of big electric power data
US20170076213A1 (en) * 2015-09-14 2017-03-16 International Business Machines Corporation Detecting Interesting Decision Rules in Tree Ensembles
CN106547755A (en) * 2015-09-17 2017-03-29 北京国双科技有限公司 A kind of data processing method and device based on piece key
US20190236201A1 (en) * 2018-01-31 2019-08-01 Salesforce.Com, Inc. Techniques for processing database tables using indexes
CN110413624A (en) * 2019-08-07 2019-11-05 南京录信软件技术有限公司 A method of the multiple row stored in association deposited based on column
CN111143373A (en) * 2019-12-30 2020-05-12 卓尔智联(武汉)研究院有限公司 Data processing method and device, electronic equipment and storage medium
CN112163019A (en) * 2020-09-29 2021-01-01 台州师同人信息技术有限公司 Trusted electronic batch record processing method based on block chain and block chain service platform

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
L. YANG ET AL: "A new index structure combines a cluster algorithm with block distance", 2015 8TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING (CISP) *
宗平等: "基于NoSQL系统的组合索引技术研究", 计算机技术与发展 *
徐旭平: "高性能云存储系统设计与实现", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Also Published As

Publication number Publication date
CN112765171B (en) 2023-05-23

Similar Documents

Publication Publication Date Title
US6185557B1 (en) Merge join process
US8359316B2 (en) Database table look-up
US5899986A (en) Methods for collecting query workload based statistics on column groups identified by RDBMS optimizer
US5179699A (en) Partitioning of sorted lists for multiprocessors sort and merge
US7245762B2 (en) Color image processing method
US7685510B2 (en) System and method for grouping data
CN107679104B (en) Large-flow parallel high-speed data comparison method
USRE26429E (en) Information retrieval system and method
CN108733790B (en) Data sorting method, device, server and storage medium
CN112765171A (en) Optimization algorithm for multi-field combined index access of block chain data uplink
CN105447135A (en) Data search method and device
CN106649385A (en) Data ranking method and device based on HBase database
KR100436500B1 (en) Color image processing method
Holanda et al. Cracking KD-Tree: The First Multidimensional Adaptive Indexing (Position Paper).
CN102841988A (en) System and method for matching nucleotide sequence information
JPS59121436A (en) Sorting method of data group
CN113378986A (en) Clustering strategy optimization of density peak clustering algorithm
Esmat et al. A parallel hash‐based method for local sequence alignment
CN112860734A (en) Seismic data multi-dimensional range query method and device
CN110766087A (en) Method for improving data clustering quality of k-means based on dispersion maximization method
CN114791916B (en) Rapid comparison method of clinical test data
JP2585851B2 (en) Join processing method
CN110517727B (en) Sequence alignment method and system
CN109977269B (en) Data self-adaptive fusion method for XML file
CN114817308A (en) Method for optimizing execution of multiple percentile _ cont analysis functions in database

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant