One-key query method and device for structured database
[ technical field ] A method for producing a semiconductor device
The invention relates to the technical field of data query, and provides a one-key query method and a one-key query device for a structured database.
[ background of the invention ]
The structured data is also called row data, and is data logically expressed and realized by a two-dimensional table structure, the fields of the data are relatively fixed, the data stored in different columns have different properties, for example, the name column stores Chinese characters, and the telephone number column and the identity card column store data and letters. In a big data platform application, a core function is to perform real-time search on the created ES index structured data, and the real-time search requires quick response. However, in the conventional one-key search function, after the query keyword is determined, since the keyword does not carry column name information of any related data column in the database, all fields in the two-dimensional table of the structured database need to be queried and searched according to query conditions, the search performance is poor, and the search efficiency is low. For example, if the current query condition is an identity card, all columns need to be queried one by one, and the identity card columns cannot be locked quickly, which greatly affects the search performance.
In view of the above, it is an urgent problem in the art to overcome the above-mentioned drawbacks of the prior art.
[ summary of the invention ]
The technical problems to be solved by the invention are as follows:
in the traditional structured data search, the one-key search function needs to search all fields in the two-dimensional table according to the search conditions, the search performance is poor, the search efficiency is low, and the search performance is greatly influenced.
The invention achieves the above purpose by the following technical scheme:
in a first aspect, the present invention provides a one-key query method for a structured database, including:
analyzing each line of data in the structured database to obtain the characteristic attribute of each line of data;
loading the obtained characteristic attribute of each line of data into a cache;
matching the characteristic attribute of the keyword to be queried with the characteristic attribute of each line of data in the cache;
and after matching the characteristic attributes of the corresponding row or rows of data successfully, performing one-key query in the corresponding row of data according to the keywords.
Preferably, the characteristic attribute includes one or more of a maximum data length, a character type, a character to appear, a maximum length of a consecutively appearing number, and a maximum length of a consecutively appearing letter.
Preferably, the analyzing each line of data in the structured database to obtain the characteristic attribute of each line of data specifically comprises:
determining and recording the maximum data length in each line of data;
determining and recording the character type contained in each column of data;
determining and recording characters appearing in each column of data;
wherein the recording is made only once for the repeatedly appearing characters.
Preferably, the matching the feature attributes included in the keyword to be queried with the feature attributes of each line of data in the cache specifically includes:
acquiring the data length of the keyword, respectively matching the data length with the maximum data length in each row of data, reserving the successfully matched row, and cutting off the row failed in matching;
acquiring character types contained in the keywords, respectively matching the character types contained in each row of data, reserving columns successfully matched, and cutting off columns failed in matching;
and acquiring characters contained in the keywords, respectively matching the characters with the characters appearing in each row of data, reserving the rows successfully matched, and cutting off the rows failed in matching.
Preferably, when any feature attribute of the keyword fails to match with the corresponding feature attribute of the k-column data, the keyword fails to match with the k-column data, and the continuous matching of other feature attributes between the keyword and the k-column data is stopped; when all the characteristic attributes of the keyword are successfully matched with all the corresponding characteristic attributes of the k-column data, the keyword is successfully matched with the k-column data; where k is any column in the structured database.
Preferably, when the data length of the keyword is greater than the maximum data length of k rows of data, the data length attribute matching between the keyword and the k rows of data fails, otherwise, the matching is successful;
when the character type contained in the keyword is consistent with the character type contained in the k-column data, the character type attribute matching between the keyword and the k-column data is successful, otherwise, the matching is failed;
when the characters appearing in the keywords are consistent with the characters appearing in the k-column data, the character attributes between the keywords and the k-column data are successfully matched, otherwise, the matching is failed; where k is any column in the structured database.
Preferably, after obtaining the characteristic attribute of each column of data, the method further includes:
respectively comparing the maximum data lengths corresponding to each row of data, and establishing a mapping relation between rows with the same maximum data length;
respectively comparing the character types corresponding to each line of data, and establishing a mapping relation between lines with the same character type;
and respectively comparing the corresponding appearance characters in each row of data, and establishing a mapping relation between the rows with the same appearance characters.
Preferably, when any feature attribute of the keyword fails to match with a corresponding feature attribute of k rows of data, the keyword fails to match with k rows of data, and meanwhile, when it is determined that the k rows contain one or more mapping relation data rows, it is determined that the keyword also fails to match with the one or more mapping relation data rows, and then matching of the keyword with the one or more mapping relation data rows is skipped;
when any feature attribute of the keyword is successfully matched with the corresponding feature attribute of the k-column data and the k-column data contains one or more mapping relation data columns, confirming that the matching of the corresponding feature attribute between the keyword and the one or more mapping relation data columns is also successful, and further skipping the matching of the corresponding feature attribute between the keyword and the one or more mapping relation data columns; where k is any column in the structured database.
Preferably, before performing the feature attribute matching, the method further includes: sequencing each line of data in the structured database according to the search frequency or the search quantity;
and when matching the characteristic attributes, matching the characteristic attributes between the keywords and each line of data in sequence according to the sequence of the searching frequency or the searching quantity from large to small.
In a second aspect, the present invention further provides a one-key query apparatus for a structured database, including at least one processor and a memory, where the at least one processor and the memory are connected through a data bus, and the memory stores instructions executable by the at least one processor, where the instructions are used to complete the one-key query method for the structured database according to the first aspect after being executed by the processor.
Compared with the prior art, the invention has the beneficial effects that: according to the invention, the characteristic attribute corresponding to each line of data in the structured database is obtained by performing data analysis in advance, more than half of fields can be automatically cut off during one-key search query by respectively matching the queried keyword with each line of data, and the corresponding data line is quickly locked according to the keyword, so that the search efficiency is greatly improved, and the query performance is improved.
[ description of the drawings ]
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required to be used in the embodiments of the present invention will be briefly described below. It is obvious that the drawings described below are only some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.
FIG. 1 is a flowchart of a one-touch query method for a structured database according to an embodiment of the present invention;
FIG. 2 is a two-dimensional representation of a structured database according to an embodiment of the present invention;
fig. 3 is a flowchart of a method for obtaining characteristic attributes of each row of data according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of feature attributes in a two-dimensional table according to an embodiment of the present invention;
fig. 5 is a flowchart of a method for matching feature attributes according to an embodiment of the present invention;
fig. 6 is a schematic diagram of a mapping relationship between columns in a two-dimensional table according to an embodiment of the present invention;
fig. 7 is an architecture diagram of a one-touch query device for a structured database according to an embodiment of the present invention.
[ detailed description ] embodiments
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other. The invention will be described in detail below with reference to the figures and examples.
Example 1:
the embodiment of the invention provides a one-key query method for a structured database, which specifically comprises the following steps as shown in fig. 1:
and step 10, analyzing each line of data in the structured database to obtain the characteristic attribute of each line of data. The characteristic attributes comprise one or more items of maximum data length, character type, appearing character, maximum length of continuous appearing numbers and maximum length of continuous appearing letters, and the characteristic attributes corresponding to each line of data are different; when the data are cleaned and put in storage, the characteristic attribute corresponding to each line of data is obtained in advance through data analysis.
And 20, loading the obtained characteristic attribute of each line of data into a cache. The characteristic attribute is equivalent to summarizing important data characteristics corresponding to each column of data, and plays a role of a 'catalogue' or a 'abstract', and specific data corresponding to each column is equivalent to 'text'.
And step 30, matching the characteristic attributes of the keywords to be inquired with the characteristic attributes of each line of data in the cache. When matching, each feature attribute of the keyword needs to be matched with each feature attribute corresponding to each column. For any k columns in the structured database, when any characteristic attribute of the keyword fails to be matched with the corresponding characteristic attribute of the k columns of data, judging that the keyword fails to be matched with the k columns of data, stopping continuous matching of the keyword and the k columns of data as long as the matching fails, and cutting off the k columns without continuous matching of other characteristic attributes between the keyword and the k columns of data; matching the next characteristic attribute between the keyword and the k lines of data only when the matching of the previous characteristic attribute between the keyword and the k lines of data is successful; and when the characteristic attributes of the keyword are successfully matched with the corresponding characteristic attributes of the k-column data, the keyword is successfully matched with the k-column data, and the k-column which is successfully matched is reserved.
And step 40, after matching the characteristic attributes of the corresponding row or rows of data successfully, performing one-key query in the corresponding row of data according to the keywords. After the keyword and each row of data are matched, columns which are failed to be matched are cut off, one or more columns which are successfully matched are reserved, the field content is greatly reduced, and finally, only one-key query of the data is needed to be carried out in the reserved columns which are successfully matched.
According to the one-key query method of the structured database, data analysis is performed in advance to obtain the characteristic attribute corresponding to each line of data in the structured database, more than half of fields can be automatically cut off during one-key search query by respectively matching the queried keyword with each line of data, and the corresponding data line can be quickly locked according to the keyword, so that the search efficiency is greatly improved, and the query performance is improved.
Taking the two-dimensional table data structure shown in fig. 2 as an example, assuming that the characteristic attributes include a maximum data length, a character type, and a character to appear, in the step 10, the process of obtaining the characteristic attribute of each column of data may refer to fig. 3, and includes the following steps:
step 101, determining and recording the maximum data length in each column of data. For example, for an identity card column, since the identity card numbers are all formed by 18 digits or digits plus letters, the data lengths corresponding to the data in the column are the same, and the same data length can be directly recorded; for the address, mailbox, etc. column, the data length corresponding to each data in the column may be different, and at this time, the maximum data length therein is determined by comparison and recorded.
And step 102, determining and recording the character type contained in each column of data. For example, for the ID card column, the data typically contains only numbers, some also letters; for the telephone number column, the data only contains numbers; for the name column, the data only contains Chinese characters; the other columns are not listed.
Step 103, determining and recording characters appearing in each column of data; wherein the recording is made only once for the repeatedly appearing characters. For example, for a data "21032530881" consisting entirely of numbers, the characters appearing are 0, 1, 2, 3, 5, 8, the same statistics are performed for all data in each column, and finally all characters corresponding to each column of data are obtained.
The implementation order of the above three steps can be interchanged, is not limited uniquely, and can be determined according to the order from easy to difficult when the characteristic attribute is obtained. For example, in the present scheme, since the analysis of the data length is fast, the data length is determined first, and then other characteristic attributes are determined. When the characteristic attribute comprises the maximum length of continuous appearing numbers and the maximum length of continuous appearing letters, the two items of data need to be determined and recorded during data analysis. For example, for a certain data "2103 b53acb 81" consisting of numbers and letters, the longest string of consecutive numbers is 2103, and the longest string of consecutive letters is acb, so that the maximum length of consecutive numbers and/or letters can be determined, and the same statistics are performed for all data in each column, and finally the maximum length of consecutive numbers and/or letters in each column is obtained.
Assuming that the two-dimensional table has n columns in total, the letters A, B, C represent the three characteristic attributes of the maximum data length, the character type and the appearing character respectively, a1, B1 and C1 represent the corresponding three characteristic attributes of the first column in the two-dimensional table respectively, and so on, Ak, Bk and Ck represent the corresponding three characteristic attributes of the kth column in the two-dimensional table respectively, and An, Bn and Cn represent the corresponding three characteristic attributes of the nth column in the two-dimensional table respectively. When the feature attributes are obtained, the recording result can refer to fig. 4. Meanwhile, for convenience of description, three characteristic attributes of data length, character type and appearance character of the keyword are respectively represented by letters a ', B ' and C '.
With continuing reference to fig. 5, the step 30 specifically includes the following steps:
step 301, acquiring the data length of the keyword, respectively matching the data length with the maximum data length in each row of data, retaining the successfully matched row, and cutting off the row failed in matching. Specifically, in conjunction with fig. 4, that is, the a 'features of the keyword are respectively compared with the a1, a2, a... and An features corresponding to each column, and if the a' features are successfully matched with a1, a2, a5 and a7, and if all other columns fail to be matched, only the 1 st, 2 nd, 5 th and 7 th columns are retained, and all other columns which fail to be matched are cut. And for any k columns in the structured database, when the data length of the keyword is greater than the maximum data length of the k columns of data, the data length attribute matching between the keyword and the k columns of data fails, otherwise, the matching is successful.
Step 302, acquiring the character types contained in the keywords, respectively matching the character types contained in each row of data, reserving the successfully matched row, and cutting off the row which is not successfully matched. Continuing to combine with fig. 4, at this time, only the 1 st, 2 nd, 5 th and 7 th columns remain in the two-dimensional table, the B' features of the keyword are continuously compared with the B1, B2, B5 and B7 features corresponding to the remaining four columns, respectively, and the columns in which the other a features have failed to be matched do not need to be matched with the B features; assuming that the B' feature matches the a1, a7 features successfully and fails the a2, a5 features, respectively, only columns 1 and 7 are retained, while columns 2 and 5 that fail the match are both pruned. And for any k columns in the structured database, when the character types contained in the keywords are consistent with the character types contained in the k columns of data, the character type attributes between the keywords and the k columns of data are successfully matched, otherwise, the matching is failed.
And 303, acquiring characters contained in the keywords, respectively matching the characters with the characters appearing in each row of data, reserving the successfully matched row, and cutting off the row which is not successfully matched. Continuing to combine with fig. 4, at this time, only the 1 st column and the 7 th column remain in the two-dimensional table, the C' feature of the keyword continues to be compared with the C1 and C7 features corresponding to the remaining two columns, respectively, and the columns in which other B features have failed to be matched do not need to be matched with the C features; assuming that the C' signature matches the C1 signature successfully and fails the C7 signature, only the 1 st column is retained and the 7 th column that failed the match is pruned away. And for any k columns in the structured database, when the characters appearing in the keywords are consistent with the characters appearing in the k columns of data, the character attributes between the keywords and the k columns of data are successfully matched, otherwise, the matching is failed.
According to the process, only the 1 st column of data is left in the two-dimensional table finally, only one-key searching is needed to be carried out on the 1 st column according to the key words, and most of the column of data is cut off through matching of the characteristic attributes, so that the searching time is greatly saved and the searching efficiency is improved when the one-key searching is carried out finally. The implementation sequence of steps 301, 302 and 303 may be interchanged, and is not limited.
With the embodiment of the present invention, there is also a preferred implementation scheme, where after the feature attributes of each line of data are obtained, before the feature attributes are matched with the keywords, the method further includes:
a. and respectively comparing the maximum data lengths corresponding to each row of data, and establishing a mapping relation between the rows with the same maximum data length. Referring to fig. 6, assuming that the maximum data length of the 1 st column is the same as the maximum data lengths of the 5 th and 7 th columns, that is, the a1 signature is the same as the a5 signature and the A8 signature, the 1 st, 5 th and 7 th columns are mapped to each other. Specifically, the labeling can be performed as shown in fig. 6, and the column number having the mapping relationship is labeled after the corresponding characteristic attribute of the corresponding column, i.e., labeled (5,7) after the a1 characteristic of the 1 st column, labeled (1,7) after the a5 characteristic of the 5 th column, and labeled (1,5) after the a7 characteristic of the 7 th column. Wherein, no mark represents that no mapping relation exists with other columns.
b. And respectively comparing the character types corresponding to each line of data, and establishing a mapping relation between the lines with the same character types. The specific method can be described in reference to a, and is not described herein again.
c. And respectively comparing the corresponding appearance characters in each row of data, and establishing a mapping relation between the rows with the same appearance characters. The specific method can be described in reference to a, and is not described herein again.
The sequence of the steps a, b and c can be interchanged, is not limited uniquely, only one or two of the characteristic attributes can be selected for comparison to determine the mapping relationship, and three characteristic attributes can be compared to determine the mapping relationship.
In a preferred scheme for determining the mapping relationship, when data matching is performed, for any k columns in a structured database, when matching of any characteristic attribute of the keyword and the corresponding characteristic attribute of the k columns of data fails, matching of the keyword and the k columns of data fails, and meanwhile, when it is determined that the k columns contain one or more mapping relationship data columns, matching of the keyword and the one or more mapping relationship data columns also fails, so that matching of the keyword and the one or more mapping relationship data columns is skipped; and when any characteristic attribute of the keyword is successfully matched with the corresponding characteristic attribute of the k rows of data and the k rows are determined to contain one or more mapping relation data rows, confirming that the matching of the corresponding characteristic attributes between the keyword and the one or more mapping relation data rows is also successful, and further skipping the matching of the corresponding characteristic attributes between the keyword and the one or more mapping relation data rows. Specific examples are as follows:
with reference to fig. 6, assuming that the a ' feature of the keyword matches the a1 feature in column 1 successfully, it can be known from the label (5,7) in column 1 that the a features in columns 5 and 7 are the same as the a feature in column 1, so it can be determined that the a ' feature of the keyword also matches the a5 and a7 features successfully, that is, the matching between the a ' feature and the a5 and a7 features is not required, and the columns 1,5, and 7 are directly reserved. Similarly, when the matching of the a ' feature of the keyword with the a1 feature in column 1 fails, it can be determined that the a ' feature of the keyword also inevitably fails to match with the a5 and a7 features, and the 5 th column, the 7 th column and the 1 st column are directly pruned without matching the a ' feature with the a5 and the a7 features. By the method, the matching times can be greatly reduced, the matching efficiency is improved, and the final query performance can be improved.
With reference to the embodiment of the present invention, there is also a preferred implementation scheme, where before performing feature attribute matching, the method further includes: and sequencing each column of data in the structured database according to the search frequency or the search number. According to the retrieval requirements of the user, a certain row or a certain number of rows of data in the two-dimensional table can be queried more, which indicates that the query requirements of the user are larger, after each row is sorted according to the search frequency or the search quantity from large to small, when feature attribute matching is carried out, the matching of the feature attributes between the keyword and the rows of data can be carried out in sequence according to the sequence of the search frequency or the search quantity from large to small. For example, if the search frequency in column 5 is the highest and the rank is the highest, the keyword and column 5 may be preferentially matched for each feature attribute. By the method, the user requirements can be further considered, and the query performance can be improved to a certain extent.
Example 2:
on the basis of the one-key query method for the structured database provided in embodiment 1, the present invention further provides a one-key query device for the structured database, which is used for implementing the method described above, and as shown in fig. 7, the present invention is a schematic diagram of a device architecture in an embodiment of the present invention. The one-touch query device of the structured database of the present embodiment includes one or more processors 21 and a memory 22. In fig. 7, one processor 21 is taken as an example.
The processor 21 and the memory 22 may be connected by a bus or other means, and fig. 7 illustrates the connection by a bus as an example.
The memory 22, which is a non-volatile computer-readable storage medium for a one-touch query method of a structured database, can be used to store non-volatile software programs, non-volatile computer-executable programs, and modules, such as the one-touch query method of the structured database in example 1. The processor 21 executes various functional applications and data processing of the one-touch query device of the structured database by executing the nonvolatile software program, instructions and modules stored in the memory 22, that is, implements the one-touch query method of the structured database of embodiment 1.
The memory 22 may include high speed random access memory and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, the memory 22 may optionally include memory located remotely from the processor 21, and these remote memories may be connected to the processor 21 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The program instructions/modules are stored in the memory 22 and, when executed by the one or more processors 21, perform the one-touch query method of the structured database in embodiment 1 described above, for example, perform the steps shown in fig. 1, fig. 3, and fig. 5 described above.
Those of ordinary skill in the art will appreciate that all or part of the steps of the various methods of the embodiments may be implemented by associated hardware as instructed by a program, which may be stored on a computer-readable storage medium, which may include: a Read Only Memory (ROM), a Random Access Memory (RAM), a magnetic or optical disk, or the like.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.