CN112000707B - Variable-length sequence matching method, database access method and device - Google Patents

Variable-length sequence matching method, database access method and device Download PDF

Info

Publication number
CN112000707B
CN112000707B CN202010639824.9A CN202010639824A CN112000707B CN 112000707 B CN112000707 B CN 112000707B CN 202010639824 A CN202010639824 A CN 202010639824A CN 112000707 B CN112000707 B CN 112000707B
Authority
CN
China
Prior art keywords
sequence
comparison unit
matched
minimum comparison
template
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010639824.9A
Other languages
Chinese (zh)
Other versions
CN112000707A (en
Inventor
鄢贵海
孔浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yusur Technology Co ltd
Original Assignee
Yusur Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yusur Technology Co ltd filed Critical Yusur Technology Co ltd
Priority to CN202010639824.9A priority Critical patent/CN112000707B/en
Publication of CN112000707A publication Critical patent/CN112000707A/en
Application granted granted Critical
Publication of CN112000707B publication Critical patent/CN112000707B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2474Sequence data queries, e.g. querying versioned data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation

Abstract

The invention provides a variable-length sequence matching method, a database access method and a device, wherein the variable-length sequence matching method comprises the following steps: respectively caching data bits of the sequence flow to be matched and data bits of the template sequence according to the sequence flow sequence by taking the minimum comparison unit as granularity; the minimum comparison unit of the sequence flow to be matched and the template sequence comprises data bits with the same bit width; respectively reading the data bit of the minimum comparison unit from the data bit of the cached sequence stream to be matched and the data bit of the cached template sequence, and performing matching comparison; and when the matching is consistent, if the data bit of the currently read sequence stream to be matched is the last minimum comparison unit of the subsequence, and the data bit of the currently read template sequence is the last minimum comparison unit of the template, acquiring and outputting the index value of the current subsequence. By the scheme, the performance of the special processor for the database can be improved, and the performance of the database is further improved.

Description

Variable-length sequence matching method, database access method and device
Technical Field
The invention relates to the technical field of databases, in particular to a variable-length sequence matching method, a database access method and a database access device.
Background
In a database, data refers to many data types, e.g., char, varchar, text, binary, tinyint, int, decimal, etc. In a data table in a database, data columns of multiple attributes may be included, where widths of data columns of different attributes may be different, and lengths of different data elements in data columns of the same attribute may also be different. In brief, different columns of data or different data elements in a column in a data table in a database may be variable length. When accessing data in a database, it often involves matching out data elements or data sequences, for example, querying a data table for a data column meeting a certain condition, or querying a data table for a data element meeting a certain condition.
Designing a dedicated processor (e.g., FPGA, ASIC, etc. hardware) for the database can speed up database access. However, how to match variable length sequences becomes a key factor affecting the performance of a dedicated processor.
Disclosure of Invention
The invention provides a variable-length sequence matching method, a database access method and a device, which are used for improving the performance of a special processor for a database and further improving the performance of the database.
According to an aspect of the present invention, there is provided a variable length sequence matching method, including:
caching data bits of the sequence flow to be matched by taking the minimum comparison unit as granularity according to the sequence flow sequence, and caching the data bits of the template sequence by taking the minimum comparison unit as granularity according to the sequence flow sequence; the minimum comparison unit of the sequence flow to be matched and the minimum comparison unit of the template sequence comprise data bits with the same bit width;
reading a data bit of a minimum comparison unit from the cached data bits of the sequence flow to be matched according to the sequence flow order, and reading a data bit of a minimum comparison unit from the cached data bits of the template sequence according to the sequence flow order;
performing matching comparison on the read data bit of the minimum comparison unit of the sequence stream to be matched and the read data bit of the minimum comparison unit of the template sequence;
and under the condition of consistent matching comparison, if the currently read minimum comparison unit to which the data bit of the sequence stream to be matched belongs is the last minimum comparison unit of the subsequence to which the data bit belongs and the currently read minimum comparison unit to which the data bit of the template sequence belongs is the last minimum comparison unit of the template, acquiring the index value of the currently read subsequence to which the data bit of the sequence stream to be matched belongs and outputting the acquired index value.
In some embodiments, before buffering the data bits of the sequence stream to be matched with the smallest comparing unit as granularity in the sequence stream order and buffering the data bits of the template sequence with the smallest comparing unit as granularity in the sequence stream order, the method further comprises: dividing subsequences of the sequence flow to be matched by taking the minimum comparison unit as granularity according to the sequence flow sequence, and dividing the template sequence by taking the minimum comparison unit as granularity according to the sequence flow sequence; the minimum comparison unit of the sequence flow to be matched comprises a data bit with a set bit width and a flag bit for identifying a subsequence to which the minimum comparison unit belongs, and the minimum comparison unit of the template sequence comprises the data bit with the set bit width.
In some embodiments, the method further comprises: under the condition of matching comparison consistency, if the currently read minimum comparison unit to which the data bit of the sequence flow to be matched belongs is not the last minimum comparison unit of the subsequence to which the data bit belongs, and the currently read minimum comparison unit to which the data bit of the template sequence belongs is not the last minimum comparison unit of the template, acquiring the data bit of the next minimum comparison unit of the minimum comparison unit to which the data bit of the sequence flow to be matched belongs currently, and acquiring the data bit of the next minimum comparison unit of the minimum comparison unit to which the data bit of the template sequence belongs currently read, so as to perform next matching comparison.
In some embodiments, the method further comprises: and under the condition that the matching comparison is not consistent, resetting the reading position of the template sequence to the initial position of the template, and jumping the reading position of the sequence stream to be matched to the initial minimum comparison unit of the next subsequence of the currently read subsequence to which the data bits belong.
In some embodiments, when buffering the data bits of the sequence stream to be matched with the smallest comparing unit as the granularity in the sequence stream order, the method further includes: and counting the sub-sequences to which the data bits of the cached sequence flow to be matched belong to so as to obtain the index values of the corresponding sub-sequences.
In some embodiments, caching data bits of a template sequence in a sequence flow order with a minimum unit of comparison as a granularity includes: and sequentially caching the data bits of all the minimum comparison units of the template sequence by taking the minimum comparison unit as granularity according to the sequence flow sequence.
In some embodiments, buffering data bits of the sequence stream to be matched with a minimum compare unit as granularity in the sequence stream order comprises: and sequentially caching the data bits of the minimum comparison units of the sequence flow to be matched by taking the minimum comparison unit as granularity according to the sequence flow sequence.
In some embodiments, the set bit width is a bit width of one byte.
In some embodiments, the flag of the minimum comparison unit of the sequence stream to be matched is a first value, so as to indicate that the minimum comparison unit and the next minimum comparison unit which is immediately adjacent to the minimum comparison unit belong to the same subsequence; and the flag bit of the minimum comparison unit of the sequence stream to be matched is a second value different from the first value so as to indicate that the minimum comparison unit and the next minimum comparison unit which is adjacent to the minimum comparison unit belong to different subsequences.
In some embodiments, buffering data bits of the sequence stream to be matched with a minimum compare unit as granularity in the sequence stream order comprises: and under the condition of receiving a read data instruction of an external sequence flow output device, caching data bits of the sequence flow to be matched by taking the minimum comparison unit as granularity according to the sequence of the sequence flow.
In some embodiments, reading a data bit of a smallest unit of comparison from data bits of the buffered sequence to be matched in sequence flow order and reading a data bit of a smallest unit of comparison from data bits of the buffered template sequence in sequence flow order comprises: in the case of receiving a match instruction of the external match result receiving apparatus, reading a data bit of a minimum comparison unit from the buffered data bits of the sequence stream to be matched in the sequence stream order, and reading a data bit of a minimum comparison unit from the buffered data bits of the template sequence in the sequence stream order.
In some embodiments, after reading the data bits of the smallest comparison unit from the buffered data bits of the sequence stream to be matched in the sequence stream order, the method further includes: and sending a request for acquiring a new minimum comparison unit of the sequence flow to be matched to an external sequence flow output device to cache a data bit of a next minimum comparison unit of a minimum comparison unit to which a latest cached data bit of the sequence flow to be matched belongs under the condition that an idle cache space for storing the data bit of the sequence flow to be matched exists.
In some embodiments, the sequence of templates is one template.
In some embodiments, the sequence of templates is divided into a plurality of templates; the minimum comparison unit of the template sequence comprises a data bit with set bit width and a flag bit for identifying the template to which the minimum comparison unit belongs.
According to another aspect of the present invention, there is provided a database access method including:
obtaining a sequence flow to be matched and a template sequence based on a database access statement;
matching and comparing the sequence flow to be matched with the template sequence by using the variable-length sequence matching method of any embodiment of the invention to obtain the index values of the subsequences of the sequence flow to be matched which are consistent in matching and comparison;
and obtaining a data access result corresponding to the database access statement based on the obtained index value of the subsequence of the sequence flow to be matched.
According to still another aspect of the present invention, there is provided a variable-length sequence matching apparatus including:
the temporary cache processing module is used for caching the data bits of the sequence flow to be matched by taking the minimum comparison unit as granularity according to the sequence flow sequence;
the comparison template module is used for caching data bits of the template sequence by taking the minimum comparison unit as granularity according to the sequence flow sequence; the minimum comparison unit of the sequence flow to be matched and the minimum comparison unit of the template sequence comprise data bits with the same bit width;
the temporary cache processing module is further configured to read a data bit of a minimum comparison unit from the cached data bits of the sequence stream to be matched according to the sequence stream order;
the comparison template module is used for reading a data bit of a minimum comparison unit from the cached data bits of the template sequence according to the sequence flow order;
the comparison module is used for performing matching comparison on the read data bit of the minimum comparison unit of the sequence stream to be matched and the read data bit of the minimum comparison unit of the template sequence;
and the comparison module is further configured to, under the condition that the matching comparison is consistent, if the currently read minimum comparison unit to which the data bit of the sequence stream to be matched belongs is the last minimum comparison unit of the subsequence to which the data bit of the sequence stream to be matched belongs, and the currently read minimum comparison unit to which the data bit of the template sequence belongs is the last minimum comparison unit of the template, obtain an index value of the subsequence to which the data bit of the sequence stream to be matched belongs, which is currently read, and output the obtained index value.
In some embodiments, the apparatus further comprises: the preprocessing module is used for dividing the subsequences of the sequence flow to be matched by taking the minimum comparison unit as the granularity according to the sequence flow sequence and dividing the template sequence by taking the minimum comparison unit as the granularity according to the sequence flow sequence; the minimum comparison unit of the sequence flow to be matched comprises a data bit with a set bit width and a flag bit for identifying a subsequence to which the minimum comparison unit belongs, and the minimum comparison unit of the template sequence comprises the data bit with the set bit width.
In some embodiments, the comparison module is further configured to: under the condition of matching comparison consistency, if the currently read minimum comparison unit to which the data bit of the sequence flow to be matched belongs is not the last minimum comparison unit of the subsequence to which the data bit belongs and the currently read minimum comparison unit to which the data bit of the template sequence belongs is not the last minimum comparison unit of the template, acquiring the data bit of the next minimum comparison unit of the minimum comparison unit to which the data bit of the sequence flow to be matched belongs currently read and acquiring the data bit of the next minimum comparison unit of the minimum comparison unit to which the data bit of the template sequence belongs currently read for next matching comparison; or, under the condition that the matching comparison is not consistent, resetting the reading position of the template sequence to the initial position of the template, and jumping the reading position of the sequence stream to be matched to the initial minimum comparison unit of the subsequence next to the currently read data bit.
In some embodiments, the temporary cache processing module is further configured to count the cached sub-sequences to which the data bits of the sequence stream to be matched belong, so as to obtain the index values of the corresponding sub-sequences.
In some embodiments, the temporary cache processing module allows caching data bits of at least two minimum comparison units of the sequence stream to be matched; and/or the comparison template module is implemented based on a register.
According to still another aspect of the present invention, there is provided a database system including: the variable length sequence matching apparatus as in any of the above embodiments.
The variable-length sequence matching method, the database access method, the variable-length sequence matching device and the database system can realize matching comparison of variable-length sequence streams to be matched, so that the performance of a special processor for the database can be improved, and the performance of the database is further improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts. In the drawings:
FIG. 1 is a flow chart of a variable length sequence matching method according to an embodiment of the invention;
FIG. 2 is a flow chart illustrating a database access method according to an embodiment of the invention;
FIG. 3 is a schematic structural diagram of a variable-length sequence matching apparatus according to an embodiment of the present invention;
FIG. 4 is a schematic structural diagram of a variable-length sequence matching apparatus according to another embodiment of the present invention;
FIG. 5 is a flowchart illustrating a variable length sequence matching apparatus according to an embodiment of the present invention;
FIG. 6 is a diagram illustrating a minimum comparison unit between a sequence A used as a template and a sequence flow B to be matched according to an embodiment of the present invention;
fig. 7 is a schematic diagram illustrating an application of the temporary cache processing module to set two minimum comparing unit cache spaces in an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the embodiments of the present invention are further described in detail below with reference to the accompanying drawings. The exemplary embodiments and descriptions of the present invention are provided to explain the present invention, but not to limit the present invention.
Fig. 1 is a flowchart illustrating a variable-length sequence matching method according to an embodiment of the present invention. As shown in fig. 1, the variable-length sequence matching method of some embodiments may include the following steps S110 to S140.
A detailed description will be given of a specific implementation of the embodiment of the present invention based on steps S110 to S140.
Step S110: caching data bits of the sequence flow to be matched by taking the minimum comparison unit as granularity according to the sequence flow sequence, and caching the data bits of the template sequence by taking the minimum comparison unit as granularity according to the sequence flow sequence; and the minimum comparison unit of the sequence flow to be matched and the minimum comparison unit of the template sequence comprise data bits with the same bit width.
In step S110, the template sequence may be determined according to parameter values in the database query condition, and the sequence flow to be matched may be determined according to the data query object. The matching method of the embodiment of the present invention may be used to query a data element matching specified data in a data column, for example, if a row with an address of "beijing" is to be searched in an address column, where the address column may include data elements such as "wuluqizi" and "harbin", and the data elements and the data bits occupied by the data elements and the data bits corresponding to "beijing" are different in length, so that the data elements in the address column are variable in length, and only one template of the template sequence has a length and is therefore fixed in length, then the character string "beijing" may correspond to the template sequence, and the address column may correspond to a sequence stream to be matched, and each word, such as "beijing", "wu", and the like, may correspond to a minimum comparison unit. The sequence flow order direction of the sequence flow to be matched is the same as the sequence flow order direction of the template sequence, for example, the sequence flow order direction may be from "north" to "kyo", and from "black" to "full" may correspond to the sequence flow direction. Alternatively, it may be a data table that is looked up for a column of data that matches the specified data. Or, it may be an operation link in the database access processing process, and the matching result may be further processed by other operations to obtain the database access result. For the sequence stream to be matched, data bits of one, two or more minimum comparison units may be cached at a time, which may be specifically determined according to the data type, the data prefetch redundancy requirement, and the like.
In order to facilitate caching, reading and the like by taking the minimum comparison unit as granularity, the data bits of the original sequence flow to be matched can be preprocessed to obtain the minimum comparison unit of the sequence flow to be matched, and the data bits of the template sequence can be preprocessed to obtain the minimum comparison unit of the template sequence. In short, the variable-length sequence matching method of the embodiment of the present invention may further include a step of preprocessing the sequence. Of course, in other embodiments, such preprocessing may be performed in other devices, for example, the sequence stream to be matched and/or the template sequence output by other devices are marked with the smallest comparison unit.
For example, before the step S110, that is, before the data bits of the sequence flow to be matched are buffered in the sequence flow order with the smallest comparing unit as the granularity, and before the data bits of the template sequence are buffered in the sequence flow order with the smallest comparing unit as the granularity, the variable-length sequence matching method shown in fig. 1 may further include the steps of:
s150: dividing subsequences of the sequence flow to be matched by taking the minimum comparison unit as granularity according to the sequence flow sequence, and dividing the template sequence by taking the minimum comparison unit as granularity according to the sequence flow sequence; the minimum comparison unit of the sequence flow to be matched comprises a data bit with a set bit width and a flag bit for identifying a subsequence to which the minimum comparison unit belongs, and the minimum comparison unit of the template sequence comprises the data bit with the set bit width.
In this step S150, the set bit width may be a bit width of one byte, for example, 8 bits, that is, the data bits in the minimum comparison unit may be 8 bits. The division with the minimum comparison unit as granularity may refer to various ways in which different minimum comparison units can be distinguished. The minimum comparison unit of the sequence stream to be matched can be the first bit or the first bits are flag bits, and the rest bits can be used as data bits. The minimum comparison unit of the template sequence with fixed length does not need to set a flag bit for distinguishing different minimum comparison units, so that when the template sequence only comprises one template, the length is fixed, the minimum comparison unit of the template sequence only comprises data bits, and the data bits with set bit width can be read each time, namely the data bits of one minimum comparison unit. If the template sequence includes a plurality of templates, the lengths of different templates may be different, and the template sequence may be variable in length, and at this time, a flag bit may be set for the minimum comparison unit in the template sequence to identify the template to which the minimum comparison unit belongs, similarly to the sequence stream to be matched.
In the step S150, the flag bit of the minimum comparison unit of the sequence flow to be matched may be a first value, so as to indicate that the minimum comparison unit and the next minimum comparison unit that is next to the minimum comparison unit belong to the same subsequence; the flag bit of the minimum comparison unit of the sequence stream to be matched may be a second value different from the first value, so as to indicate that the minimum comparison unit and the immediately next minimum comparison unit belong to different subsequences.
Where "next" is with respect to the order of the sequence flow, for example, in the data element "wu muqi" (sub-sequence), the minimum comparison unit corresponding to "lu" may be the immediately next minimum comparison unit to the minimum comparison unit corresponding to "u". Alternatively, the flag bit may be represented by a one-bit value, for example, the first value is 1 and the second value is 0. Of course, in other embodiments, a numerical value made up of a plurality of bits may be used as the flag bit.
In the step S110, the data bits of the sequence stream to be matched are buffered in the sequence of the sequence stream by using the minimum comparison unit as the granularity, which may specifically include the steps of: and sequentially caching the data bits of the minimum comparison units of the sequence flow to be matched by taking the minimum comparison unit as granularity according to the sequence flow sequence. By caching a plurality of minimum comparison units, the data bit of the minimum comparison unit can still be directly obtained from the cache space after the data bit of the previous minimum comparison unit is matched and compared, so that the waiting time for reading data is reduced through the redundancy of data pre-reading.
In the step S110, the data bits of the template sequence are buffered by using the minimum comparison unit as the granularity in the sequence flow order, which may specifically include the steps of: and sequentially caching the data bits of all the minimum comparison units of the template sequence by taking the minimum comparison unit as granularity according to the sequence flow sequence. All data bits of the template sequence may be buffered in this way, as allowed by hardware resources, to facilitate reading of the data bits of the smallest comparison unit of the template sequence. When the template sequence needs to occupy a large cache space, in order to avoid occupying too much cache space, the data bits of one or more minimum comparison units can be cached at one time, and when new data bits are needed, the data bits of the new minimum comparison unit of the template sequence are cached in.
In some embodiments, a specific implementation manner of buffering the data bits of the sequence stream to be matched in the sequence of the sequence stream by using the minimum comparison unit as the granularity in the step S110 may include: and under the condition of receiving a read data instruction of an external sequence flow output device, caching data bits of the sequence flow to be matched by taking the minimum comparison unit as granularity according to the sequence of the sequence flow.
In this embodiment, the external serial stream output device may be a hardware-side device or a CPU-side device. After a data reading instruction of an external sequence flow output device is received, the data bits are cached, so that the effective data bits can be conveniently obtained, gaps among the effective data bits (such as a minimum comparison unit) are reduced, and waste of clock cycles is reduced.
Step S120: and reading a data bit of a minimum comparison unit from the buffered data bits of the sequence flow to be matched according to the sequence flow order, and reading a data bit of a minimum comparison unit from the buffered data bits of the template sequence according to the sequence flow order.
In step S120, for both the sequence stream to be matched and the template sequence, the data bits of the minimum comparison unit are read in the sequence of the sequence stream, and the bit width of the data bits of the minimum comparison unit is the same, so the data bits of the two sequences are read in alignment.
In some embodiments, the step S120, that is, a specific implementation that reads a data bit of a smallest comparison unit from the buffered data bits of the sequence to be matched in the sequence flow order and reads a data bit of a smallest comparison unit from the buffered data bits of the template sequence in the sequence flow order, may include: in the case of receiving a match instruction of the external match result receiving apparatus, reading a data bit of a minimum comparison unit from the buffered data bits of the sequence stream to be matched in the sequence stream order, and reading a data bit of a minimum comparison unit from the buffered data bits of the template sequence in the sequence stream order.
In this embodiment, when the matching instruction of the external matching result receiving device is received, and the data bits are read for matching comparison, it is possible to prevent the data loss of the result of matching comparison caused by performing matching comparison when the external matching result receiving device cannot receive the matching result.
In some embodiments, after step S120, that is, after reading the data bit of the smallest comparing unit from the buffered data bits of the to-be-matched sequence stream in the sequence stream order, the method shown in fig. 1 may further include the steps of: and sending a request for acquiring a new minimum comparison unit of the sequence flow to be matched to an external sequence flow output device to cache a data bit of a next minimum comparison unit of a minimum comparison unit to which a latest cached data bit of the sequence flow to be matched belongs under the condition that an idle cache space for storing the data bit of the sequence flow to be matched exists.
In this embodiment, when the cache space is free, a new minimum comparison unit is requested to be cached, and the data bits can be pre-read, so that the data bits to be matched are redundant, and the waiting time for matching comparison is reduced.
Step S130: and performing matching comparison on the read data bit of the minimum comparison unit of the sequence stream to be matched and the read data bit of the minimum comparison unit of the template sequence.
In step S130, a comparator or a comparison circuit may be used to perform the matching comparison. The data bits of the smallest comparing units can be compared in sequence, so that for one comparing unit (subsequence), if the data bits of each smallest comparing unit of the subsequence and the purpose and order of the smallest comparing unit are matched and consistent, the subsequence can be considered as a matched result.
Step S140: and under the condition of consistent matching comparison, if the currently read minimum comparison unit to which the data bit of the sequence stream to be matched belongs is the last minimum comparison unit of the subsequence to which the data bit belongs and the currently read minimum comparison unit to which the data bit of the template sequence belongs is the last minimum comparison unit of the template, acquiring the index value of the currently read subsequence to which the data bit of the sequence stream to be matched belongs and outputting the acquired index value.
In step S140, the template sequence may be a template, in which case the template sequence is a template; in this case, the length of the template is fixed, the minimum comparison unit may only have data bits, and if a certain subsequence in the sequence stream to be matched matches with the data bits of the template, the subsequence may be used as a matched result. Alternatively, the template sequence may be divided into a plurality of templates, in which case each template of the template sequence corresponds to a sub-template sequence, and the length of the sub-template sequences may be different, so the template sequence may be considered to be of variable length, although the total length of the template sequence may be fixed; in this case, the smallest comparison unit of the template sequence may include a data bit that sets a bit width and a flag bit for identifying the template to which the smallest comparison unit belongs.
To facilitate knowledge of the location of the subsequences, the subsequences may be counted.
For example, when the data bits of the sequence stream to be matched are buffered in the sequence of the sequence stream by using the minimum comparison unit as the granularity in the step S110, the method shown in fig. 1 may further include the steps of: and counting the sub-sequences to which the data bits of the cached sequence flow to be matched belong to so as to obtain the index values of the corresponding sub-sequences. The index value may be used to identify the position of the matched subsequence, and may be used to obtain the matched data. The count value of the subsequence may be directly used as the index value, or the index value may be calculated or combined based on the count value.
For other various matching comparison results, the same or different processing can be performed, so as to achieve the purpose of matching and comparing the subsequence in the sequence flow to be matched with the template in the template sequence.
For example, for the case that the matches are relatively consistent, the variable-length sequence matching method shown in fig. 1 may further include the steps of: under the condition of matching comparison consistency, if the currently read minimum comparison unit to which the data bit of the sequence flow to be matched belongs is not the last minimum comparison unit of the subsequence to which the data bit belongs, and the currently read minimum comparison unit to which the data bit of the template sequence belongs is not the last minimum comparison unit of the template, acquiring the data bit of the next minimum comparison unit of the minimum comparison unit to which the data bit of the sequence flow to be matched belongs currently, and acquiring the data bit of the next minimum comparison unit of the minimum comparison unit to which the data bit of the template sequence belongs currently read, so as to perform next matching comparison. For example, for "beijing" in the template sequence and "beijing" or "beijing chaoyang" in the sequence flow to be matched, if "north" in the template sequence matches "north" in the sequence flow to be matched, then the "beijing" in the template sequence and "beijing" in the sequence flow to be matched may be continuously obtained for matching comparison, that is, the next matching comparison is performed.
For another example, for the case that the matching is not consistent, the variable-length sequence matching method shown in fig. 1 may further include the steps of: and under the condition that the matching comparison is not consistent, resetting the reading position of the template sequence to the initial position of the template, and jumping the reading position of the sequence stream to be matched to the initial minimum comparison unit of the next subsequence of the currently read subsequence to which the data bits belong. The comparison template module can be realized based on a register, and the reading position of the template sequence can be reset to the initial position of the template by resetting a register pointer; the sequence stream to be matched can be buffered in a register, and a plurality of minimum comparison units can be stored in the register, so that the pointer can be moved by combining the data bit width of the minimum comparison unit to reset the reading position of the sequence stream to be matched.
More specifically, for the case that the matching is not consistent, the following situations may be specifically included: if the matching comparison is not consistent, if the currently read data bit of the sequence stream to be matched belongs to the last minimum comparing unit of the sub-sequence to which the currently read data bit belongs and the currently read data bit of the template sequence belongs to the last minimum comparing unit, or the currently read data bit of the sequence stream to be matched does not belong to the last minimum comparison unit of the sub-sequence to which the currently read data bit belongs, and the currently read data bit of the template sequence belongs to the last minimum comparison unit, or the currently read data bit of the sequence stream to be matched belongs to the last minimum comparison unit of the subsequence to which the data bit belongs, and the data bit of the template sequence currently read does not belong to the last minimum compare unit, resetting the reading position of the template sequence to the initial position, and jumping the reading position of the sequence stream to be matched to the next subsequence of the subsequence to which the currently read data bits belong.
For example, for "beijing" in the template sequence and "beijing chaoyang" in the sequence flow to be matched, the "beijing" in the template sequence and "beijing chaoyang" in the sequence flow to be matched match in agreement, but "beijing" in the template sequence is the last word thereof (minimum comparison unit), and "beijing" in the sequence flow to be matched is not the last word thereof (minimum comparison unit), so that "beijing" and "beijing chaoyang" can be regarded as matching in disagreement. Of course, if it is desired to find the address beginning with "Beijing", it can be considered that "Beijing" and "Beijing Chaoyang" match consistently. For another example, conversely, for "beijing chaoyang" in the template sequence and "beijing" in the sequence flow to be matched, the "beijing" in the template sequence and "beijing" in the sequence flow to be matched match identically, but "beijing" in the template sequence is not the last word thereof (minimum comparison unit), and "beijing" in the sequence flow to be matched is the last word thereof (minimum comparison unit), so that "beijing" and "beijing chaoyang" may be regarded as matching non-identically. For another example, for "beijing" in the template sequence and "wu muqi" in the sequence flow to be matched, if "north" in the template sequence and "black" in the sequence flow to be matched are not matched, it may be directly considered that the subsequence "wu muqi" in the sequence flow is not matched with the template "beijing" in the template sequence.
Fig. 2 is a flowchart illustrating a database access method according to an embodiment of the present invention. As shown in fig. 2, the database access method of the embodiments may include:
step S210: obtaining a sequence flow to be matched and a template sequence based on a database access statement;
step S220: matching and comparing the sequence flow to be matched with the template sequence by using the variable-length sequence matching method of the embodiment of the invention to obtain the index values of the subsequences of the sequence flow to be matched which are consistent in matching and comparison;
step S230: and obtaining a data access result corresponding to the database access statement based on the obtained index value of the subsequence of the sequence flow to be matched.
In the step S210, the database access statement may be an SQL statement, for example, if an SQL statement is used to search a school with an address in beijing from a national school table, the "beijing" may correspond to the template sequence, the address column may be a sequence flow to be matched, and obviously, the information of the "beijing" and the address column may be obtained by analyzing the SQL statement. Of course, the data of the character type is only taken as an example, and in other embodiments, the data may be of other data types, for example, data types such as varchar, text, binary, tinyint, int, decimal, and the like. In addition, without being limited to querying a data element from a data column, for example, the data column addressed may also be looked up from a data table.
In step S220, the specific matching process can be as described in the above embodiments, and thus is not described again. In the step S230, after the matching, the data access result may be directly output, or other operations may be continued, for example, further operations such as searching, sorting, and aggregating.
Based on the same inventive concept as the variable-length sequence matching method shown in fig. 1, the embodiment of the present invention further provides a variable-length sequence matching apparatus, as described in the following embodiments. Because the principle of solving the problem of the variable-length sequence matching device is similar to that of the variable-length sequence matching method, the implementation of the variable-length sequence matching device can refer to the implementation of the variable-length sequence matching method, and repeated details are not repeated.
Fig. 3 is a schematic structural diagram of a variable-length sequence matching apparatus according to an embodiment of the present invention. As shown in fig. 3, the variable-length sequence matching apparatus of the embodiments may include: a temporary cache processing module 310, a comparison template module 320, and a comparison module 330.
The temporary buffering processing module 310 may be configured to buffer data bits of the sequence flow to be matched with a minimum comparison unit as a granularity according to the sequence flow order.
The comparison template module 320 may be configured to cache data bits of the template sequence in a sequence flow order with a minimum comparison unit as a granularity; and the minimum comparison unit of the sequence flow to be matched and the minimum comparison unit of the template sequence comprise data bits with the same bit width.
The temporary buffer processing module 310 may be further configured to read a data bit of a smallest comparison unit from the buffered data bits of the sequence stream to be matched according to the sequence stream order.
The compare template module 320 may be configured to read the data bits of a smallest compare unit from the buffered data bits of the template sequence in a sequence flow order.
The comparison module 330 is configured to compare the read data bit of a smallest comparison unit of the sequence stream to be matched with the read data bit of a smallest comparison unit of the template sequence.
The comparing module 330 is further configured to, if the matching comparison is consistent, if the minimum comparing unit to which the data bit of the currently read sequence stream to be matched belongs is the last minimum comparing unit of the subsequence to which the data bit belongs, and the minimum comparing unit to which the data bit of the currently read template sequence belongs is the last minimum comparing unit of the template, obtain the index value of the subsequence to which the data bit of the currently read sequence stream to be matched belongs, and output the obtained index value.
Fig. 4 is a schematic structural diagram of a variable-length sequence matching apparatus according to another embodiment of the present invention. As shown in fig. 4, the variable-length sequence matching apparatus shown in fig. 3 may further include: a pre-processing module 340. The pre-processing module 340 may be used to: dividing subsequences of the sequence flow to be matched by taking the minimum comparison unit as granularity according to the sequence flow sequence, and dividing the template sequence by taking the minimum comparison unit as granularity according to the sequence flow sequence; the minimum comparison unit of the sequence flow to be matched comprises a data bit with a set bit width and a flag bit for identifying a subsequence to which the minimum comparison unit belongs, and the minimum comparison unit of the template sequence comprises the data bit with the set bit width.
In some embodiments, the comparison module 330 may also be configured to: under the condition of matching comparison consistency, if the currently read minimum comparison unit to which the data bit of the sequence flow to be matched belongs is not the last minimum comparison unit of the subsequence to which the data bit belongs, and the currently read minimum comparison unit to which the data bit of the template sequence belongs is not the last minimum comparison unit of the template, acquiring the data bit of the next minimum comparison unit of the minimum comparison unit to which the data bit of the sequence flow to be matched belongs currently, and acquiring the data bit of the next minimum comparison unit of the minimum comparison unit to which the data bit of the template sequence belongs currently read, so as to perform next matching comparison. Alternatively, the comparing module 330 may be further configured to: and under the condition that the matching comparison is not consistent, resetting the reading position of the template sequence to the initial position of the template, and jumping the reading position of the sequence stream to be matched to the initial minimum comparison unit of the next subsequence of the currently read subsequence to which the data bits belong.
In some embodiments, the temporary cache processing module 310 may be further configured to: and counting the sub-sequences to which the data bits of the cached sequence flow to be matched belong to so as to obtain the index values of the corresponding sub-sequences.
In some embodiments, the temporary cache processing module allows caching data bits of at least two minimum comparison units of the sequence stream to be matched; and/or the comparison template module is implemented based on a register.
In some embodiments, the comparison template module 320 may be specifically configured to sequentially cache data bits of all minimum comparison units of the template sequence in the sequence flow order with the minimum comparison unit as a granularity.
In some embodiments, the temporary buffering processing module 310 may be specifically configured to sequentially buffer the data bits of the multiple minimum comparison units of the sequence flow to be matched by using the minimum comparison unit as a granularity according to the sequence flow order.
In some embodiments, the set bit width is a bit width of one byte.
In some embodiments, the flag of the minimum comparison unit of the sequence stream to be matched is a first value, so as to indicate that the minimum comparison unit and the next minimum comparison unit which is immediately adjacent to the minimum comparison unit belong to the same subsequence; and the flag bit of the minimum comparison unit of the sequence stream to be matched is a second value different from the first value so as to indicate that the minimum comparison unit and the next minimum comparison unit which is adjacent to the minimum comparison unit belong to different subsequences.
In some embodiments, the temporary buffering processing module 310 is specifically configured to, in a case that a read data instruction of an external serial stream output device is received, buffer data bits of a to-be-matched serial stream in the serial stream order with a minimum comparison unit as a granularity.
In some embodiments, the temporary buffer processing module 310 and the comparison template module 320 are specifically configured to, in a case where a match instruction of the external match result receiving apparatus is received, read a data bit of a smallest comparison unit from data bits of the to-be-matched sequence stream of the buffer in the sequence flow order, and read a data bit of a smallest comparison unit from data bits of the template sequence of the buffer in the sequence flow order.
In some embodiments, the temporary buffering processing module 310 is further configured to, in a case that there is a free buffer space for storing data bits of the sequence flow to be matched, send a request to obtain a new minimum comparison unit of the sequence flow to be matched to an external sequence flow output device, so as to buffer a data bit of a next minimum comparison unit of the minimum comparison unit to which a latest buffered data bit of the sequence flow to be matched belongs.
In some embodiments, the sequence of templates is one template.
In some embodiments, the sequence of templates is divided into a plurality of templates; the minimum comparison unit of the template sequence comprises a data bit with set bit width and a flag bit for identifying the template to which the minimum comparison unit belongs.
In addition, an embodiment of the present invention further provides a database system, where the database system includes: the variable length sequence matching apparatus as in any of the above embodiments. The database system may include hardware (e.g., FPGA (field programmable gate array), ASIC (application specific integrated circuit)), and may also include devices on the CPU side, such as computers, servers, and the like.
In order that those skilled in the art will better understand the present invention, a specific embodiment of the present invention will be described below.
In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the embodiments of the present invention are further described in detail below with reference to the accompanying drawings. The exemplary embodiments and descriptions of the present invention are provided to explain the present invention, but not to limit the present invention.
Fig. 5 is a schematic processing flow diagram of a variable-length sequence matching apparatus according to an embodiment of the present invention, and as shown in fig. 5, the overall processing flow of the apparatus can be divided into the following parts: the device comprises a temporary cache processing module, a comparison template module and a comparison module. The input is a sequence A with fixed length and a sequence stream B with variable length, and the output is an index value of a subsequence in the sequence stream B matched with the sequence A. Where matching means that the portion of the data bits in the sub-sequence in sequence flow B is exactly the same size as sequence a. The gray arrows shown in fig. 5 represent control flow information, the white arrows represent data flow information, and the width of the arrows reflects the relative size of the data amount to some extent. It can be further seen that there is a large gap between the data flows at the two ends of the temporary cache processing module, and once congestion occurs, the system performance is greatly affected.
The sequence A is used as a comparison template and has a fixed length, namely, the minimum comparison unit number with the fixed length is included, wherein the minimum comparison unit is determined according to the use case. Fig. 6 is a schematic diagram of a minimum comparison unit of a sequence a used as a template and a sequence flow B to be matched in an embodiment of the present invention, as shown in fig. 6, the length of the sequence a is M, M is not changed in the comparison process once it is determined, and the minimum comparison unit of the sequence a may be set to 8 data bits; the sequence A can be completely cached by the comparison template module, and the length of the sequence A is within the allowable range of the hardware storage resource. The sequence flow B is a sequence flow with indefinite length, the data flow is composed of a plurality of subsequences, one subsequence corresponds to one comparison unit, the comparison unit is internally composed of a plurality of minimum comparison units, as shown in FIG. 6, a subsequence n, namely the nth comparison unit, has 8 bits of data bits, 1 bit of flag bit, and the flag bit and the data bit jointly form the minimum comparison unit of the sequence flow B, whether the minimum comparison units belong to the same comparison unit is determined by the flag bit, 1 indicates that the immediately following minimum comparison unit and the data belong to the same comparison unit, and 0 indicates that the minimum comparison unit does not belong to the same comparison unit.
Different from the sequence A, on one hand, the sequence flow B cannot be completely cached due to the uncertainty of the length and large general data volume; on the other hand, when the device is matched with other devices to work, the other devices may fail to receive the output of the device in time, and the processing result is lost, so to avoid such situations, it is necessary to interact with other devices to maintain an output space, when the output space is about to be full, the other devices need to be queried to update the size of the new output space, if a proper and effective output space cannot be obtained, the device needs to enter a waiting state, and the sequence flow B is notified that the sequence flow B cannot flow forward to avoid losing data. In view of the above, it is necessary to buffer part of the data by providing a temporary buffer unit.
During each comparison, the temporary cache processing unit needs to complete the read, write, and count operations. "read" is embodied in the following aspects: and sequentially reading the minimum comparison units in the sequence flow B according to the sequence flow order, and caching the minimum comparison units to the temporary caching processing module. Fig. 7 is an application schematic diagram of a temporary cache processing module setting two minimum comparison unit cache spaces in an embodiment of the present invention, as shown in fig. 7, the size of the temporary cache processing module may be set to be twice the size of the minimum comparison unit, that is, it is ensured that the temporary cache processing unit can simultaneously cache the next two minimum comparison units, which are respectively used for comparison and data prefetching, and the data prefetching is performed to supply one minimum comparison unit data to the comparison module in each clock cycle without waiting, reduce redundant idle clock cycles through prefetching-comparison pipelining, compensate for a huge data difference at two ends of the module to a certain extent, and improve performance speed.
For data belonging to different comparison units, when the comparison unit reaches the end, that is, the minimum comparison unit flag bit is 0, the temporary cache processing unit records the index value of the current comparison unit by counting, the index value reflects the position of the comparison unit in the input sequence stream B, and by recording the index information, other devices can perform operations such as data positioning and data filtering and screening after obtaining the effective output of the device. The temporary cache processing unit writes the minimum comparison unit data into the comparison unit according to the cache sequence so as to be compared with the minimum comparison unit data in the sequence A; after the data in the temporary cache processing unit is compared, a new minimum comparison unit is requested to be read from the sequence flow B, and meanwhile, the data prefetched before is sent to the comparison module for the next round of comparison.
After the comparison template module receives the cached lower sequence A as a whole, a read pointer is set to read corresponding minimum comparison unit data from the comparison template, and the data pointed by the read pointer represents the minimum comparison unit data of the sequence A used for comparison at the current moment; the movement of the read pointer depends on the information fed back by the comparison module and the temporary buffer processing module: when the two minimum comparison units processed by the comparison module are consistent, the reading pointer moves one bit to read the next data of the sequence A; when the comparison module judges that the two inputs do not meet, the read pointer needs to be reset to the initial position, and the comparison template informs the temporary cache processing module to skip the comparison unit. When the temporary cache processing module monitors that a certain comparison unit reaches the end, namely the flag bit is 0, the comparison template is informed to reset the read pointer to the initial position.
When the read pointer points to the last minimum basic unit of the read pointer, the comparison template module inquires the temporary cache module whether the flag bit is 0, if so, the comparison template module indicates that the data bit lengths of the two are completely consistent, and then the comparison template module informs the temporary cache module to output the recorded index value.
In summary, the variable-length sequence matching method, the database access method, the variable-length sequence matching device, and the database system according to the embodiments of the present invention can implement matching comparison on variable-length sequence streams to be matched, so as to improve the performance of the special processor for the database, and further improve the performance of the database.
In the description herein, reference to the description of the terms "one embodiment," "a particular embodiment," "some embodiments," "for example," "an example," "a particular example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. The sequence of steps involved in the various embodiments is provided to schematically illustrate the practice of the invention, and the sequence of steps is not limited and can be suitably adjusted as desired.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (18)

1. A variable length sequence matching method, comprising:
caching data bits of the sequence flow to be matched by taking the minimum comparison unit as granularity according to the sequence flow sequence, and caching the data bits of the template sequence by taking the minimum comparison unit as granularity according to the sequence flow sequence; the minimum comparison unit of the sequence flow to be matched and the minimum comparison unit of the template sequence comprise data bits with the same bit width;
reading a data bit of a minimum comparison unit from the cached data bits of the sequence flow to be matched according to the sequence flow order, and reading a data bit of a minimum comparison unit from the cached data bits of the template sequence according to the sequence flow order;
performing matching comparison on the read data bit of the minimum comparison unit of the sequence stream to be matched and the read data bit of the minimum comparison unit of the template sequence;
under the condition of consistent matching comparison, if the currently read minimum comparison unit to which the data bit of the sequence stream to be matched belongs is the last minimum comparison unit of the subsequence to which the data bit belongs, and the currently read minimum comparison unit to which the data bit of the template sequence belongs is the last minimum comparison unit of the template, acquiring the index value of the currently read subsequence to which the data bit of the sequence stream to be matched belongs, and outputting the acquired index value;
under the condition of matching comparison consistency, if the currently read minimum comparison unit to which the data bit of the sequence flow to be matched belongs is not the last minimum comparison unit of the subsequence to which the data bit belongs and the currently read minimum comparison unit to which the data bit of the template sequence belongs is not the last minimum comparison unit of the template, acquiring the data bit of the next minimum comparison unit of the minimum comparison unit to which the data bit of the sequence flow to be matched belongs currently read and acquiring the data bit of the next minimum comparison unit of the minimum comparison unit to which the data bit of the template sequence belongs currently read for next matching comparison;
and under the condition that the matching comparison is not consistent, resetting the reading position of the template sequence to the initial position of the template, and jumping the reading position of the sequence stream to be matched to the initial minimum comparison unit of the next subsequence of the currently read subsequence to which the data bits belong.
2. The variable length sequence matching method of claim 1, wherein buffering data bits of a sequence stream to be matched at a minimum unit of comparison in sequence stream order at granularity, and before buffering data bits of a template sequence at a minimum unit of comparison in sequence stream order at granularity, the method further comprises:
dividing subsequences of the sequence flow to be matched by taking the minimum comparison unit as granularity according to the sequence flow sequence, and dividing the template sequence by taking the minimum comparison unit as granularity according to the sequence flow sequence; the minimum comparison unit of the sequence flow to be matched comprises a data bit with a set bit width and a flag bit for identifying a subsequence to which the minimum comparison unit belongs, and the minimum comparison unit of the template sequence comprises the data bit with the set bit width.
3. The variable length sequence matching method of claim 1, wherein when buffering data bits of a sequence stream to be matched with a minimum comparison unit as granularity in a sequence stream order, the method further comprises:
and counting the sub-sequences to which the data bits of the cached sequence flow to be matched belong to so as to obtain the index values of the corresponding sub-sequences.
4. The variable length sequence matching method of claim 1, wherein buffering data bits of the template sequence at a minimum unit of comparison as a granularity in a sequence flow order comprises:
and sequentially caching the data bits of all the minimum comparison units of the template sequence by taking the minimum comparison unit as granularity according to the sequence flow sequence.
5. The variable-length sequence matching method of claim 1, wherein buffering data bits of a sequence stream to be matched with a minimum comparison unit as granularity in a sequence stream order comprises:
and sequentially caching the data bits of the minimum comparison units of the sequence flow to be matched by taking the minimum comparison unit as granularity according to the sequence flow sequence.
6. The variable-length sequence matching method according to claim 2, wherein the set bit width is a bit width of one byte.
7. The variable-length sequence matching method of claim 2, wherein the flag bit of the minimum comparison unit of the sequence stream to be matched is a first value to indicate that the minimum comparison unit and the next minimum comparison unit which is next to the minimum comparison unit belong to the same subsequence; and the flag bit of the minimum comparison unit of the sequence stream to be matched is a second value different from the first value so as to indicate that the minimum comparison unit and the next minimum comparison unit which is adjacent to the minimum comparison unit belong to different subsequences.
8. The variable-length sequence matching method of claim 1, wherein buffering data bits of a sequence stream to be matched with a minimum comparison unit as granularity in a sequence stream order comprises:
and under the condition of receiving a read data instruction of an external sequence flow output device, caching data bits of the sequence flow to be matched by taking the minimum comparison unit as granularity according to the sequence of the sequence flow.
9. The variable-length sequence matching method of claim 1, wherein reading a data bit of a minimum comparison unit from data bits of the sequence stream to be matched buffered in sequence flow order and reading a data bit of a minimum comparison unit from data bits of the template sequence buffered in sequence flow order comprises:
in the case of receiving a match instruction of the external match result receiving apparatus, reading a data bit of a minimum comparison unit from the buffered data bits of the sequence stream to be matched in the sequence stream order, and reading a data bit of a minimum comparison unit from the buffered data bits of the template sequence in the sequence stream order.
10. The variable-length sequence matching method of claim 1, wherein after reading a data bit of a minimum comparison unit from data bits of the sequence stream to be matched buffered in sequence stream order, the method further comprises:
and sending a request for acquiring a new minimum comparison unit of the sequence flow to be matched to an external sequence flow output device to cache a data bit of a next minimum comparison unit of a minimum comparison unit to which a latest cached data bit of the sequence flow to be matched belongs under the condition that an idle cache space for storing the data bit of the sequence flow to be matched exists.
11. The variable length sequence matching method of claim 1 wherein the template sequence is a template.
12. The variable length sequence matching method of claim 1, wherein the template sequence is divided into a plurality of templates; the minimum comparison unit of the template sequence comprises a data bit with set bit width and a flag bit for identifying the template to which the minimum comparison unit belongs.
13. A database access method, comprising:
obtaining a sequence flow to be matched and a template sequence based on a database access statement;
matching and comparing the sequence stream to be matched with the template sequence by using the variable-length sequence matching method according to any one of claims 1 to 12 to obtain index values of subsequences of the sequence stream to be matched which are consistent in matching and comparison;
and obtaining a data access result corresponding to the database access statement based on the obtained index value of the subsequence of the sequence flow to be matched.
14. A variable length sequence matching apparatus, comprising:
the temporary cache processing module is used for caching the data bits of the sequence flow to be matched by taking the minimum comparison unit as granularity according to the sequence flow sequence;
the comparison template module is used for caching data bits of the template sequence by taking the minimum comparison unit as granularity according to the sequence flow sequence; the minimum comparison unit of the sequence flow to be matched and the minimum comparison unit of the template sequence comprise data bits with the same bit width;
the temporary cache processing module is further configured to read a data bit of a minimum comparison unit from the cached data bits of the sequence stream to be matched according to the sequence stream order;
the comparison template module is used for reading a data bit of a minimum comparison unit from the cached data bits of the template sequence according to the sequence flow order;
the comparison module is used for performing matching comparison on the read data bit of the minimum comparison unit of the sequence stream to be matched and the read data bit of the minimum comparison unit of the template sequence;
the comparison module is further configured to, if the currently read minimum comparison unit to which the data bit of the sequence stream to be matched belongs is the last minimum comparison unit of the subsequence to which the data bit belongs and the currently read minimum comparison unit to which the data bit of the template sequence belongs is the last minimum comparison unit of the template, obtain an index value of the subsequence to which the data bit of the sequence stream to be matched belongs and output the obtained index value;
the comparison module is further configured to:
under the condition of matching comparison consistency, if the currently read minimum comparison unit to which the data bit of the sequence flow to be matched belongs is not the last minimum comparison unit of the subsequence to which the data bit belongs and the currently read minimum comparison unit to which the data bit of the template sequence belongs is not the last minimum comparison unit of the template, acquiring the data bit of the next minimum comparison unit of the minimum comparison unit to which the data bit of the sequence flow to be matched belongs currently read and acquiring the data bit of the next minimum comparison unit of the minimum comparison unit to which the data bit of the template sequence belongs currently read for next matching comparison; alternatively, the first and second electrodes may be,
and under the condition that the matching comparison is not consistent, resetting the reading position of the template sequence to the initial position of the template, and jumping the reading position of the sequence stream to be matched to the initial minimum comparison unit of the next subsequence of the currently read subsequence to which the data bits belong.
15. The variable length sequence matching apparatus of claim 14, further comprising:
the preprocessing module is used for dividing the subsequences of the sequence flow to be matched by taking the minimum comparison unit as the granularity according to the sequence flow sequence and dividing the template sequence by taking the minimum comparison unit as the granularity according to the sequence flow sequence; the minimum comparison unit of the sequence flow to be matched comprises a data bit with a set bit width and a flag bit for identifying a subsequence to which the minimum comparison unit belongs, and the minimum comparison unit of the template sequence comprises the data bit with the set bit width.
16. The variable length sequence matching apparatus of claim 14,
the temporary cache processing module is further configured to count the sub-sequences to which the data bits of the cached sequence stream to be matched belong, so as to obtain the index values of the corresponding sub-sequences.
17. The variable-length sequence matching apparatus of claim 14, wherein the temporary buffering processing module allows buffering of data bits of at least two minimum comparison units of the sequence stream to be matched; and/or the comparison template module is implemented based on a register.
18. A database system, comprising: a variable length sequence matching apparatus as claimed in any one of claims 14 to 17.
CN202010639824.9A 2020-07-06 2020-07-06 Variable-length sequence matching method, database access method and device Active CN112000707B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010639824.9A CN112000707B (en) 2020-07-06 2020-07-06 Variable-length sequence matching method, database access method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010639824.9A CN112000707B (en) 2020-07-06 2020-07-06 Variable-length sequence matching method, database access method and device

Publications (2)

Publication Number Publication Date
CN112000707A CN112000707A (en) 2020-11-27
CN112000707B true CN112000707B (en) 2021-08-24

Family

ID=73467649

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010639824.9A Active CN112000707B (en) 2020-07-06 2020-07-06 Variable-length sequence matching method, database access method and device

Country Status (1)

Country Link
CN (1) CN112000707B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113434544B (en) * 2021-06-02 2022-11-18 中科驭数(北京)科技有限公司 Database data reading method, database data writing method and device

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105224543A (en) * 2014-05-30 2016-01-06 国际商业机器公司 For the treatment of seasonal effect in time series method and apparatus

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11561951B2 (en) * 2005-05-16 2023-01-24 Panvia Future Technologies, Inc. Multidimensional associative memory and data searching
CN1719444A (en) * 2005-07-19 2006-01-11 无敌科技(西安)有限公司 Method of implementing multi data translation
KR101607178B1 (en) * 2008-10-23 2016-03-29 아브 이니티오 테크놀로지 엘엘시 A method, a system, and a computer-readable medium storing a computer program for performing a data operation, measuring data quality, or joining data elements
CN104182460B (en) * 2014-07-18 2017-06-13 浙江大学 Time Series Similarity querying method based on inverted index
CN104679870B (en) * 2015-03-06 2018-01-30 成都维远艾珏信息技术有限公司 A kind of method of data acquisition for information system

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105224543A (en) * 2014-05-30 2016-01-06 国际商业机器公司 For the treatment of seasonal effect in time series method and apparatus

Also Published As

Publication number Publication date
CN112000707A (en) 2020-11-27

Similar Documents

Publication Publication Date Title
US20200117510A1 (en) Data set compression within a database system
US20190057090A1 (en) Method and device of storing data object
CN111046034A (en) Method and system for managing memory data and maintaining data in memory
US10296497B2 (en) Storing a key value to a deleted row based on key range density
CN105989015B (en) Database capacity expansion method and device and method and device for accessing database
CN111061758B (en) Data storage method, device and storage medium
US20240126762A1 (en) Creating compressed data slabs that each include compressed data and compression information for storage in a database system
CN110837584A (en) Method and system for constructing suffix array in block parallel manner
CN112000707B (en) Variable-length sequence matching method, database access method and device
Lu et al. TridentKV: A read-Optimized LSM-tree based KV store via adaptive indexing and space-efficient partitioning
US20210326320A1 (en) Data segment storing in a database system
CN115168319A (en) Database system, data processing method and electronic equipment
CN115438114B (en) Storage format conversion method, system, device, electronic equipment and storage medium
CN111190896A (en) Data processing method, data processing device, storage medium and computer equipment
Liu et al. Leaky buffer: A novel abstraction for relieving memory pressure from cluster data processing frameworks
CN107368281B (en) Data processing method and device
CN115544007A (en) Label preprocessing method and device, computer equipment and storage medium
CN114297196A (en) Metadata storage method and device, electronic equipment and storage medium
CN112965939A (en) File merging method, device and equipment
RU2417424C1 (en) Method of compensating for multi-dimensional data for storing and searching for information in database management system and device for realising said method
US10037148B2 (en) Facilitating reverse reading of sequentially stored, variable-length data
Cai et al. SplitDB: Closing the Performance Gap for LSM-Tree-Based Key-Value Stores
US20170337003A1 (en) System and Method for Concurrent Indexing and Searching of Data in Working Memory
CN117235078B (en) Method, system, device and storage medium for processing mass data at high speed
CN116048396B (en) Data storage device and storage control method based on log structured merging tree

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant