CN114817243A

CN114817243A - Method, device and equipment for establishing database joint index and storage medium

Info

Publication number: CN114817243A
Application number: CN202210316051.XA
Authority: CN
Inventors: 黄哲
Original assignee: Ping An International Smart City Technology Co Ltd
Current assignee: Ping An International Smart City Technology Co Ltd
Priority date: 2022-03-29
Filing date: 2022-03-29
Publication date: 2022-07-29

Abstract

The invention relates to the field of artificial intelligence and discloses a method, a device, equipment and a storage medium for establishing a database joint index. The method comprises the following steps: acquiring a historical query log in a database, and analyzing the historical query log to obtain at least two field item sets; respectively calculating the support degree of each field item set relative to all field item sets, and screening out the field item sets with the support degree larger than a preset support degree threshold value from all the field item sets; acquiring a plurality of groups of condition fields with sequential arrangement from all field item sets as field association units according to the number of preset field combinations, and constructing corresponding confidence links according to each field association unit; and filtering the confidence coefficient links according to a preset confidence coefficient threshold value, and establishing a joint index of the database according to the filtered confidence coefficient links and the screened field item set. The method and the device realize the efficient establishment of the joint index relationship of the data in the database.

Description

Method, device and equipment for establishing database joint index and storage medium

Technical Field

The invention relates to the field of artificial intelligence, in particular to a method, a device, equipment and a storage medium for establishing a database joint index.

Background

With the development of internet technology, data query is applied in more and more scenes, and the requirement on the speed of data query is higher and higher. In this regard, a database index method is adopted to facilitate fast query of data and update of data in a database table. The performance of data query is directly affected by the effect of database indexing, and the difference between the configuration optimization degrees and the data query performance may be large. Due to the complexity of the database, if manual configuration is adopted, the database index optimization has large workload and high difficulty. How to improve the database query performance and reduce the index optimization time is a key problem to be solved in the database index optimization work.

Currently, the establishment of the index mainly depends on the experience of developers, and when the business system function is realized, the index is established while the table is established according to the past development experience. So that which interfaces are frequently accessed and which structured query statements are hot statements cannot be well predicted, and a good index is established; and because the implementation logic at the bottom layer of the joint index has a certain order rule for the fields of the index, the index is invalid and the full-table query is performed due to the fact that the data index is not performed according to the requirement, and the efficiency of the data index is reduced. Namely, the current database index establishing method is low in efficiency.

Disclosure of Invention

The invention mainly aims to solve the problem that the existing database index establishing method is low in efficiency.

The first aspect of the present invention provides a method for establishing a database joint index, where the method for establishing a database joint index includes: acquiring a historical query log in a database, and analyzing the historical query log to obtain at least two field item sets, wherein each field item set comprises at least two condition fields which are sequentially arranged; respectively calculating the support degree of each field item set relative to all field item sets, and screening out the field item sets with the support degree larger than a preset support degree threshold value from all the field item sets; acquiring a plurality of groups of condition fields with sequential arrangement from all field item sets as field association units according to the number of preset field combinations, and constructing corresponding confidence links according to each field association unit; and filtering the confidence coefficient links according to a preset confidence coefficient threshold value, and establishing a joint index of the database according to the filtered confidence coefficient links and the screened field item set.

Optionally, in a first implementation manner of the first aspect of the present invention, the parsing the historical query log to obtain at least two field item sets includes: identifying separators in the historical query logs, and performing statement splitting on the historical query logs by adopting the separators to obtain at least two query statements; and extracting field item sets in the query sentences according to the sentence structure of the query sentences.

Optionally, in a second implementation manner of the first aspect of the present invention, the separately calculating a support degree of each field item set with respect to all field item sets includes: respectively extracting condition fields of the sequence ordering in each field entry set and taking the condition fields as field entries; comparing each field item with each field item set in sequence, and determining the field item sets containing the same field items in each field item set according to the comparison result; respectively calculating the proportion between the number of the field item sets correspondingly determined by each field item and the number of all the field item sets; and taking the proportion obtained by calculating each field item as the support degree of the corresponding field item set relative to all field item sets.

Optionally, in a third implementation manner of the first aspect of the present invention, the constructing a corresponding confidence link according to each of the field association units includes: traversing each field item set by sequentially adopting each field association unit, and determining the field item set containing the field association unit in each field item set according to the traversal result; respectively calculating the proportion between the number of the field item sets correspondingly determined by each field association unit and the number of all the field item sets; and carrying out permutation and combination on the proportion calculated by each field association unit to obtain a corresponding confidence link.

Optionally, in a fourth implementation manner of the first aspect of the present invention, the confidence threshold includes a first confidence threshold and a second confidence threshold, and the filtering the confidence link according to the preset confidence threshold includes: traversing the confidence links in sequence by adopting the first confidence threshold, and determining a first confidence coefficient which is lower than the first confidence threshold and appears in the confidence links for the first time; extracting segmented confidence links preceding the first confidence in the confidence links; traversing the segmented confidence links sequentially by adopting the second confidence threshold, and determining a second confidence which is lower than the second confidence threshold in the segmented confidence links; and combining the determined second confidence degrees according to the traversal sequence to obtain a filtered confidence degree link.

Optionally, in a fifth implementation manner of the first aspect of the present invention, the establishing a joint index of a database according to the filtered confidence links and the screened field item sets includes: extracting each field item set in the filtered confidence coefficient link; and selecting the field item set which is the same as the screened field item set according to the extracted field item sets, and establishing the joint index of the database by adopting the selected field item set.

The second aspect of the present invention provides an apparatus for establishing a database joint index, including: the log analysis module is used for acquiring a historical query log in a database and analyzing the historical query log to obtain at least two field item sets, wherein each field item set comprises at least two condition fields which are sequentially arranged; the calculation module is used for respectively calculating the support degree of each field item set relative to all field item sets and screening out the field item sets with the support degree larger than a preset support degree threshold value from all the field item sets; the link construction module is used for acquiring a plurality of groups of condition fields with sequential arrangement from all field item sets according to the number of preset field combinations as field association units, and constructing corresponding confidence links according to each field association unit; and the index establishing module is used for filtering the confidence coefficient links according to a preset confidence coefficient threshold value and establishing the joint index of the database according to the filtered confidence coefficient links and the screened field item set.

Optionally, in a first implementation manner of the second aspect of the present invention, the log parsing module includes: the symbol identification unit is used for identifying separators in the historical query logs and carrying out statement splitting on the historical query logs by adopting the separators to obtain at least two query statements; and the item set extraction unit is used for extracting the field item set in each query statement according to the statement structure of each query statement.

Optionally, in a second implementation manner of the second aspect of the present invention, the calculation module includes: a field extraction unit, configured to extract sequentially ordered condition fields in each field entry set as field entries; the comparison unit is used for comparing each field item with each field item set in sequence and determining the field item sets containing the same field items in each field item set according to the comparison result; the proportion calculation unit is used for respectively calculating the proportion between the number of the field item sets which are determined corresponding to the field items and the number of all the field item sets; and the support degree calculation unit is used for taking the proportion calculated by each field item as the support degree of the corresponding field item set relative to all the field item sets.

Optionally, in a third implementation manner of the second aspect of the present invention, the link constructing module includes: the traversal unit is used for sequentially traversing each field item set by adopting each field association unit and determining the field item set containing the field association unit in each field item set according to the traversal result; the proportion calculation unit is used for respectively calculating the proportion between the number of the field item sets correspondingly determined by each field association unit and the number of all the field item sets; and the permutation and combination unit is used for carrying out permutation and combination on the proportion calculated by each field association unit to obtain a corresponding confidence link.

Optionally, in a fourth implementation manner of the second aspect of the present invention, the index creating module includes: the first traversal unit is used for traversing the confidence links in sequence by adopting the first confidence threshold value and determining a first confidence coefficient which is lower than the first confidence threshold value and appears in the confidence links for the first time; a segmentation extracting unit, configured to extract a segmentation confidence link that is before the first confidence in the confidence links; a second traversing unit, configured to sequentially traverse the segmented confidence links by using the second confidence threshold, and determine a second confidence that is lower than the second confidence threshold in the segmented confidence links; and the combination unit is used for combining the determined second confidence degrees according to the traversal sequence to obtain the filtered confidence degree link.

Optionally, in a fifth implementation manner of the second aspect of the present invention, the index creating module further includes: the extraction unit is used for extracting each field item set in the filtered confidence coefficient link; and the index establishing unit is used for selecting the field item set which is the same as the screened field item set according to the extracted field item sets, and establishing the joint index of the database by adopting the selected field item set.

The third aspect of the present invention provides a device for establishing a database joint index, including: a memory and at least one processor, the memory having instructions stored therein; the at least one processor calls the instructions in the memory to enable the database joint index building device to execute the steps of the database joint index building method.

A fourth aspect of the present invention provides a computer-readable storage medium having stored therein instructions, which, when run on a computer, cause the computer to perform the steps of the above-described method for establishing a database joint index.

According to the technical scheme, a historical query log in a database is obtained, the historical query log is analyzed, and at least two field item sets are obtained, wherein each field item set comprises at least two condition fields which are sequentially arranged; and respectively calculating the support degree of each field item set relative to all field item sets, and screening out the field item sets with the support degree larger than a preset support degree threshold value from all the field item sets. Compared with the prior art, the method and the device have the advantages that the support degree of each field item set relative to all field item sets is calculated, the calculation results are screened, the field item sets with high query frequency in the query field item sets can be obtained, and the field item sets with high relationship degree can be screened.

Acquiring a plurality of groups of condition fields with sequential arrangement from all field item sets as field association units according to the number of preset field combinations, and constructing corresponding confidence links according to each field association unit; and filtering the confidence coefficient links according to a preset confidence coefficient threshold value, and establishing a joint index of the database according to the filtered confidence coefficient links and the screened field item set. Compared with the prior art, the method and the device have the advantages that the confidence chain is established for each field item set, the field item data with the large incidence relation is determined by utilizing the screened field item sets and the confidence chain, and the index relation is established, so that the joint index relation of the data in the database is efficiently established, and the query efficiency of the related data in the database is improved.

Drawings

FIG. 1 is a schematic diagram of a first embodiment of a method for establishing a database joint index according to the present invention;

FIG. 2 is a diagram of a second embodiment of the method for establishing a database joint index according to the present invention;

FIG. 3 is a diagram of a third embodiment of the method for establishing a database joint index according to the present invention;

FIG. 4 is a diagram illustrating a fourth embodiment of a method for establishing a database joint index according to the present invention;

FIG. 5 is a diagram of a fifth embodiment of the method for establishing a database joint index according to the present invention;

FIG. 6 is a schematic diagram of an embodiment of an apparatus for creating a database joint index according to the present invention;

FIG. 7 is a diagram of another embodiment of the apparatus for creating a database joint index according to the present invention;

FIG. 8 is a diagram of an embodiment of a device for creating a database joint index according to the present invention.

Detailed Description

The embodiment of the invention provides a method, a device, equipment and a storage medium for establishing a database joint index, wherein the method comprises the following steps: acquiring a historical query log in a database, and analyzing the historical query log to obtain at least two field item sets; respectively calculating the support degree of each field item set relative to all field item sets, and screening out the field item sets with the support degree larger than a preset support degree threshold value from all the field item sets; acquiring a plurality of groups of condition fields with sequential arrangement from all field item sets as field association units according to the number of preset field combinations, and constructing corresponding confidence links according to each field association unit; and filtering the confidence coefficient links according to a preset confidence coefficient threshold value, and establishing a joint index of the database according to the filtered confidence coefficient links and the screened field item set. The method and the device realize the efficient establishment of the joint index relationship of the data in the database.

The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein. Furthermore, the terms "comprises," "comprising," or "having," and any variations thereof, are intended to cover non-exclusive inclusions, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

For convenience of understanding, a detailed flow of the embodiment of the present invention is described below, and referring to fig. 1, a first embodiment of the method for establishing a database joint index according to the embodiment of the present invention includes:

101. acquiring a historical query log in a database, and analyzing the historical query log to obtain at least two field item sets, wherein each field item set comprises at least two condition fields which are sequentially arranged;

it is to be understood that the execution subject of the present invention may be a device for establishing a database joint index, and may also be a terminal or a server, which is not limited herein. The embodiment of the present invention is described by taking a server as an execution subject.

The embodiment of the application can acquire and process related data based on an artificial intelligence technology. Among them, Artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result.

The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

In this embodiment, the database refers to a collection of massive data that is stored in a computer for a long time, organized, sharable, and uniformly managed, and is described here by taking mysql (relational database management system) as an example; the historical query log refers to a query record of a user on data in the database mysql; the field item set refers to a set of field items in the queried data.

In practical application, relevant query log data in the database mysql are called, wherein the called query log can be set with query log data within a certain time range (for example, a 3-day time range), so that historical query logs within a certain time range in the database are obtained; analyzing the obtained historical query log, and obtaining at least two query sentences by identifying separators in the historical query log and splitting sentences of the historical query log by adopting the separators; and extracting field item sets in the query sentences according to the sentence structures of the query sentences. Wherein each field entry set contains at least two condition fields arranged in sequence. By acquiring the historical query logs related to the database for a certain time, the related field item set queried by the user in a certain time period can be obtained through analysis, and the accurate acquisition of related processing data is realized.

102. Respectively calculating the support degree of each field item set relative to all field item sets, and screening out the field item sets with the support degree larger than a preset support degree threshold value from all the field item sets;

in this embodiment, the support degree here refers to a support degree that represents the probability that a field item set 1 (e.g., AB) and a field item set 2 (e.g., ABC) occur simultaneously, and here is the probability that a field a and a field B occur simultaneously; the support threshold here refers to the probability of occurrence of a common index field of a historical index field analyzed by big data, and is obtained by adjusting and calculating. The corresponding support degree is calculated for each field item set, and then the support degree of each field item set obtained through calculation is screened by using the support degree threshold value, so that the field item sets with high query frequency can be screened, a large number of field item sets queried by users are screened, and a data basis is provided for corresponding index relations.

In practical application, according to the field item set obtained by the processing, sequentially ordered condition fields in each field item set are respectively extracted and used as field items, then the extracted field items are sequentially adopted to compare each field item set, and according to the comparison result, the field item sets containing the same field items in each field item set are determined; and further respectively calculating the proportion between the number of the field item sets which are determined by corresponding to the field items and the number of all the field item sets, and taking the proportion obtained by calculating the field items as the support degree of the corresponding field item sets relative to all the field item sets.

103. Acquiring a plurality of groups of condition fields with sequential arrangement from all field item sets as field association units according to the number of preset field combinations, and constructing corresponding confidence links according to the field association units;

in this embodiment, the number of field combinations herein refers to that all field items providing user query are arranged and combined according to preset data query field items in a database, so as to obtain field combinations corresponding to data; the confidence link here refers to that a field association unit is used to perform permutation and combination according to the field combination, and the probability of simultaneous occurrence between adjacent fields is calculated as the confidence of the field connection link, so as to construct and obtain the confidence link corresponding to each field association unit. By analyzing each field item set by using the preset number of field combinations and constructing a corresponding confidence chain by using the analysis result field association unit, the confidence of the connection relationship between each field item can be obtained by processing the field item set, so that the field item with a large link relationship between each connection field item can be obtained, and the screening and the preliminary index binding of the connection field items with corresponding confidence can be realized.

In practical application, according to the number of preset field combinations, field items are analyzed and processed from each field item set, so that a plurality of groups of condition fields with sequential arrangement are obtained as field association units; traversing each field item set by sequentially adopting each field association unit, and determining the field item set containing the field association unit in each field item set according to the traversal result; respectively calculating the proportion between the number of the field item sets correspondingly determined by each field association unit and the number of all the field item sets; and (4) carrying out permutation and combination on the proportion obtained by calculating the field association units to obtain corresponding confidence links among field items in the field item sets.

104. And filtering the confidence link according to a preset confidence threshold, and establishing a joint index of the database according to the filtered confidence link and the screened field item set.

In this embodiment, the confidence threshold refers to a support threshold obtained by adjusting and calculating the probability of a frequently-queried link field of a large data analysis historical link field, where the confidence threshold includes a first confidence threshold and a second confidence threshold; by filtering the processed confidence link, and further utilizing the filtered confidence chain and the screened field item set, selecting the field items existing at the same time, and establishing a corresponding joint index according to the chain, the method realizes the establishment of the corresponding joint index for the frequently-queried field item set, thereby accelerating the query speed of the user on the related data.

In practical application, traversing the confidence links by adopting the first confidence threshold sequence, and determining a first confidence which is lower than the first confidence threshold and appears in the confidence links for the first time; extracting a segmented confidence link before the first confidence in the confidence links; traversing the segmented confidence links by adopting the second confidence threshold value sequentially, and determining a second confidence coefficient lower than the second confidence threshold value in the segmented confidence links; and combining the determined second confidence degrees according to the traversal sequence to obtain a filtered confidence degree link. Extracting each field item set in the filtered confidence coefficient link; and selecting the field item set which is the same as the screened field item set according to the extracted field item sets, and establishing the joint index of the database by adopting the selected field item set.

In the embodiment of the invention, a historical query log in a database is obtained, and the historical query log is analyzed to obtain at least two field item sets, wherein each field item set comprises at least two condition fields which are sequentially arranged; and respectively calculating the support degree of each field item set relative to all field item sets, and screening out the field item sets with the support degree larger than a preset support degree threshold value from all the field item sets. Compared with the prior art, the method and the device have the advantages that the support degree of each field item set relative to all field item sets is calculated, the calculation results are screened, the field item sets with high query frequency in the query field item sets can be obtained, and the field item sets with high relationship degree can be screened.

Acquiring a plurality of groups of condition fields with sequential arrangement from all field item sets as field association units according to the number of preset field combinations, and constructing corresponding confidence links according to the field association units; and filtering the confidence link according to a preset confidence threshold, and establishing a joint index of the database according to the filtered confidence link and the screened field item set. Compared with the prior art, the method and the device have the advantages that the confidence chain is established for each field item set, the field item data with the large incidence relation is determined by utilizing the screened field item sets and the confidence chain, and the index relation is established, so that the joint index relation of the data in the database is efficiently established, and the query efficiency of the related data in the database is improved.

Referring to fig. 2, a second embodiment of the method for establishing a database joint index according to the embodiment of the present invention includes:

201. identifying separators in the historical query logs, and performing statement splitting on the historical query logs by adopting the separators to obtain at least two query statements;

in this embodiment, the separator herein refers to that splitting processing of a corresponding statement is implemented by setting a corresponding segmentation symbol, and analysis processing of a historical query statement is implemented by using a preset separator, so that a corresponding query statement can be obtained, and a required field item set is obtained.

In practical application, relevant query log data in the database mysql are called by setting a certain time range as a query time period, so that historical query logs in the database within the certain time range are obtained; and further identifying separators in the historical query logs, and selecting the character with the highest frequency of occurrence as a field separator of the sample log. Specifically, the specific algorithm for acquiring the field segmenter is as follows: excluding letters, numbers and special characters (generally not used as characters of field separators, such as ^ and the like), and then taking preset separator characters, so as to split the sentences of the historical query log by using the separators to obtain at least two query sentences;

202. extracting field item sets in the query sentences according to the sentence structures of the query sentences;

in this embodiment, the statement structure here refers to a normalized structured query statement preset for database data. Wherein, the field item set comprises the field item corresponding to the data to be inquired by the inquiry statement. By analyzing the statement structure in the historical query statement, the corresponding field item set can be obtained, so that the query field item set of the user in the corresponding time period can be obtained, and a data basis is laid for establishing the field item index relationship of the corresponding relationship.

In practical application, according to the statement structure of the query statement, the field item set in each query statement is extracted. Specifically, by counting each field item set in a preset time period: suppose the query field table provided by the system has { A, B, C, D, E, F, G } query field entries. Data statistics are performed in a preset time period, for example, a preset time is 3 days, data is summarized in evening on day 3, a total of 5 relevant query fields of the select statement (and need to be arranged in sequence, where a and B are yy and B and a and xx and a are yy are different, and are respectively an item set of { a, B } and { B, a }, and one select statement is one transaction, so there are 5 transactions at this time: respectively, item _ ab ═ { a, B }, item _ abc ═ a, B, C }, item _ abd ═ { a, B, D }, item _ adcef ═ a, D, B, E, F }, item _ gbca ═ { G, B, C, a }, and so on, thereby obtaining 5 field item sets. Wherein each of the field entry sets comprises at least two condition fields arranged in sequence;

203. respectively calculating the support degree of each field item set relative to all field item sets, and screening out the field item sets with the support degree larger than a preset support degree threshold value from all the field item sets;

204. acquiring a plurality of groups of condition fields with sequential arrangement from all field item sets as field association units according to the number of preset field combinations, and constructing corresponding confidence links according to the field association units;

205. and filtering the confidence link according to a preset confidence threshold, and establishing a joint index of the database according to the filtered confidence link and the screened field item set.

In the embodiment of the invention, separators in historical query logs are identified, and the separators are adopted to carry out statement splitting on the historical query logs to obtain at least two query statements; and extracting field item sets in the query sentences according to the sentence structure of each query sentence. Compared with the prior art, the method and the device have the advantages that the query sentences are obtained by processing the separators of the historical query sentences, and then the preset sentence structure is utilized for analysis processing, so that the field item set queried by the user is obtained. The method can not only analyze the query data of the corresponding database relatively simply, but also lay a data foundation for the relationship analysis among the field items for collecting the field item set corresponding to the index association relationship to be established.

Referring to fig. 3, a third embodiment of the method for establishing a database joint index according to the embodiment of the present invention includes:

301. obtaining a historical query log in a database, analyzing the historical query log to obtain at least two field item sets, wherein each field item set comprises at least two condition fields which are sequentially arranged;

302. respectively extracting condition fields of sequence ordering in each field item set and taking the condition fields as field items;

in this embodiment, the sequentially ordered condition field refers to the result of the permutation and combination of the search field conditions of the user provided by the system, and if the set of search field items of the user provided by the system has { a, B, C }, the sequentially ordered field items have { a, B }, { B, C }, { a, C }, { B, a }, { C, B }, { C, a }.

In practical application, according to the field item set obtained by the above processing, condition fields sequentially arranged in each field item set are respectively extracted as required field items.

303. Comparing each field item with each field item set in sequence, and determining the field item sets containing the same field items in each field item set according to the comparison result;

in this embodiment, according to the field items obtained by processing, the field items are sequentially adopted to compare the field item sets obtained by the historical log query processing, and then according to the comparison result, whether each field item set contains the field item set of the same field item is determined. For example, taking the field entries { a, B } as an example, the above processing results in 5: these are 5 field items obtained by item _ ab ═ { a, B }, item _ abc ═ a, B, C }, item _ abd ═ { a, B, D }, item _ adcef ═ a, D, B, E, F }, item _ gbca ═ { G, B, C, a }, and the like. At this time, the field item sets of field A and field B appearing simultaneously, namely the fields { A, B }, appear in transaction 1, transaction 2 and transaction 3 respectively (it is noted that transaction 4 contains A and B, but D exists in the middle, so the transaction is not counted; and transaction 5 contains A and B, both of which are not counted because the sequence is not matched). And determining that the field item sets containing the same field items in all the field item sets comprise a transaction 1, a transaction 2 and a transaction 3 according to the comparison result.

304. Respectively calculating the proportion between the number of the field item sets correspondingly determined by each field item and the number of all the field item sets;

in this embodiment, according to the comparison and determination results, the ratio between the number of field item sets determined corresponding to each field item and the number of all field item sets is calculated. As a result of the comparison and determination, the field item sets of the field items { a, B } containing the same field item include transaction 1, transaction 2, and transaction 3, so that the ratio of the number of the field item sets determined corresponding to the available field items { a, B } to the number of all the field item sets is 3:5 ═ 60%.

305. Taking the proportion obtained by calculating each field item as the support degree of the corresponding field item set relative to all field item sets;

in this embodiment, according to the result of the above processing, the ratio obtained by calculating each field entry is used as the support degree of the corresponding field entry set with respect to all field entry sets. The above-mentioned support of the field item { a, B } may be 3/5 ═ 60%. We calculate the support for each set of items, setting the field item { a, B } support 1-60%, the field item { B, C } support 2-20%, the field item { C, D } support 3-20%, and so on. Screening out a field item set with the support degree larger than a preset support degree threshold value from all the field item sets; assuming 20% as the threshold (which can be adjusted manually), more than 20% of the items are defined as the set of frequency items.

306. Acquiring a plurality of groups of condition fields with sequential arrangement from all field item sets as field association units according to the number of preset field combinations, and constructing corresponding confidence links according to the field association units;

307. and filtering the confidence link according to a preset confidence threshold, and establishing a joint index of the database according to the filtered confidence link and the screened field item set.

In the embodiment of the invention, condition fields of sequence ordering in each field item set are respectively extracted and used as field items; comparing each field item with each field item set in sequence, and determining the field item sets containing the same field items in each field item set according to the comparison result; respectively calculating the proportion between the number of the field item sets correspondingly determined by each field item and the number of all the field item sets; and taking the proportion obtained by calculating each field item as the support degree of the corresponding field item set relative to all field item sets. Compared with the prior art, the method and the device have the advantages that the corresponding support degree is calculated on the field item set by utilizing the field items arranged in sequence, the query frequency of which field items is high can be obtained through the analysis of the support degree, the higher field items are primarily screened, and accordingly the better index relation of the corresponding field items is established.

Referring to fig. 4, a fourth embodiment of the method for establishing a database joint index according to the embodiment of the present invention includes:

401. acquiring a historical query log in a database, and analyzing the historical query log to obtain at least two field item sets, wherein each field item set comprises at least two condition fields which are sequentially arranged;

402. respectively calculating the support degree of each field item set relative to all field item sets, and screening out the field item sets with the support degree larger than a preset support degree threshold value from all the field item sets;

403. traversing each field item set by using each field association unit in sequence, and determining the field item set containing the field association unit in each field item set according to the traversal result;

in this embodiment, the field association unit refers to a field association unit obtained by permutation and combination according to the query field provided by the system.

In practical application, according to the result of analyzing the historical query log and the number of preset field combinations, multiple groups of condition fields with sequential arrangement are obtained from all field item sets as field association units, each field item set is traversed by each field association unit in sequence, and then the field item set containing the field association units in each field item set is determined according to the traversal result.

404. Respectively calculating the proportion between the number of the field item sets correspondingly determined by each field association unit and the number of all the field item sets;

in this embodiment, according to the determination result, the ratio between the number of the field item sets determined by each field association unit and the number of all the field item sets is calculated respectively. For example, confidence represents the probability of event 2 occurring when event 1 occurs, and if the { A, B } term is taken as an example, the probability of field B occurring when field A occurs. The support count of { A, B } is 3, the support count of { A } is 5, the confidence is: 3/5 is 60%. By analogy, the confidence for { B, C } is: 2/5-40%.

405. Arranging and combining the proportions calculated by the field association units to obtain corresponding confidence links;

in this embodiment, the proportions calculated by the field association units are arranged and combined according to the calculation result, so as to obtain the corresponding confidence links. For example, the above calculation results { A, B } have a support count of 3, the { A } has a support count of 5, and the confidence is: 3/5 is 60%. By analogy, the confidence for { B, C } is: 2/5 is 40%, thus constructing a confidence link [ 60%, 40%, 0%, 0%, 100%, 0% ] of the table above.

406. And filtering the confidence link according to a preset confidence threshold, and establishing a joint index of the database according to the filtered confidence link and the screened field item set.

In the embodiment of the invention, each field item set is traversed by adopting each field association unit in sequence, and the field item set containing the field association unit in each field item set is determined according to the traversal result; respectively calculating the proportion between the number of the field item sets correspondingly determined by each field association unit and the number of all the field item sets; and (4) carrying out permutation and combination on the proportion obtained by calculating each field association unit to obtain a corresponding confidence link. Compared with the prior art, the method and the device have the advantages that the proportion of the corresponding field association unit is calculated for the acquired field item set, then the confidence chain is established according to the proportion, association and association degree among the field items can be obtained through analysis, and the index relation among the field items with the association relation is established better.

Referring to fig. 5, a fifth embodiment of the method for establishing a database joint index according to the embodiment of the present invention includes:

501. acquiring a historical query log in a database, and analyzing the historical query log to obtain at least two field item sets, wherein each field item set comprises at least two condition fields which are sequentially arranged;

502. respectively calculating the support degree of each field item set relative to all field item sets, and screening out the field item sets with the support degree larger than a preset support degree threshold value from all the field item sets;

503. acquiring a plurality of groups of condition fields with sequential arrangement from all field item sets as field association units according to the number of preset field combinations, and constructing corresponding confidence links according to the field association units;

504. traversing the confidence links in sequence by adopting a first confidence threshold, and determining a first confidence which is lower than the first confidence threshold and appears in the confidence links for the first time;

in this embodiment, according to the confidence links obtained through the processing, the confidence links are traversed by using a preset first confidence threshold sequence, and a first confidence lower than the first confidence threshold, which occurs in the confidence links for the first time, is determined. For example, if the preset first confidence threshold is 1%, the confidence links [ 60%, 40%, 0%, 0%, 100%, 0% ] traversing the table are adopted, which means that when a occurs, the probability of B occurrence is 60%, when B occurs, the probability of C occurrence is 40%, and when C occurs, the probability of D occurrence is 0%. At this point we set the threshold to 40%, then the filtered field is { A, B }, and does not include 100% of the probability of F occurring when E occurs. Since the probability of field E occurring at the time of D occurrence is 0%, the probability of the simultaneous occurrence of field E and field F is not statistically significant, and the third field entry { C } is known to be below the first confidence threshold.

505. Extracting a segmented confidence link before the first confidence in the confidence links;

in this embodiment, according to the determined first confidence, a segmented confidence link before the first confidence in the confidence links is extracted. And if the first confidence coefficient corresponds to 0% of the third bit, extracting the previous segmented confidence coefficient link to obtain { 60%, 40% }.

506. Traversing the segmented confidence links by adopting a second confidence threshold sequence, and determining a second confidence which is lower than the second confidence threshold in the segmented confidence links;

in this embodiment, according to the segmented confidence links obtained through the processing, the segmented confidence links are sequentially traversed by using a second confidence threshold, and then a second confidence lower than the second confidence threshold in the segmented confidence links is determined. For example, the second confidence threshold is 60%, then traversing the segmented confidence link { 60%, 40% } may determine that 40% is below the second confidence threshold.

507. Combining the determined second confidence degrees according to the traversal sequence to obtain a filtered confidence degree link;

in this embodiment, the determined second confidence degrees are combined according to the traversal order according to the processing result, so as to obtain a filtered confidence degree link. If the result obtained by the above processing is { 60% }, only one confidence level is left, and the filtered confidence level link is { 60% }.

508. Extracting each field item set in the filtered confidence coefficient link;

in this embodiment, each field entry set in the filtered confidence link is extracted according to the filtered confidence link. If the filtered confidence link is { 60% }, the extracted field entry set is { A, B }.

509. And selecting the field item set which is the same as the screened field item set according to the extracted field item sets, and establishing the joint index of the database by adopting the selected field item set.

In this embodiment, a field item set having the same field items as those in the screened field item set is selected according to the extraction result, and then a joint index of the field item set is established according to the selection result of the field item set. If statistics is carried out on the conditions of the various item sets, the high-frequency item set is [ { A, B } ], the field set filtered by the confidence coefficient is { A, B }, and both the field A and the field B exist in the result of integration, so that the joint index of the database is established for the field A and the field B.

In the embodiment of the invention, the confidence links are traversed by adopting a first confidence threshold sequence, and a first confidence which is lower than the first confidence threshold and appears in the confidence links for the first time is determined; extracting a segmented confidence link before the first confidence in the confidence links; traversing the segmented confidence links by adopting a second confidence threshold sequence, and determining a second confidence which is lower than the second confidence threshold in the segmented confidence links; combining the determined second confidence degrees according to the traversal sequence to obtain a filtered confidence degree link; extracting each field item set in the filtered confidence coefficient link; and selecting the field item set which is the same as the screened field item set according to the extracted field item sets, and establishing the joint index of the database by adopting the selected field item set. Compared with the prior art, the method and the device have the advantages that the index fields are selected and established by utilizing the confidence links obtained through analysis and the field item sets obtained through screening, the joint index of the database can be established for the related field items frequently inquired by the user, and the acquisition of related data of the database can be accelerated.

With reference to fig. 6, the method for establishing a database joint index in the embodiment of the present invention is described above, and an embodiment of the apparatus for establishing a database joint index in the embodiment of the present invention includes:

a log analysis module 601, configured to obtain a historical query log in a database, and analyze the historical query log to obtain at least two field item sets, where each field item set includes at least two condition fields arranged in sequence;

a calculating module 602, configured to calculate a support degree of each field item set with respect to all field item sets, and filter out, from all field item sets, a field item set with a support degree greater than a preset support degree threshold;

a link establishing module 603, configured to obtain multiple sets of condition fields with sequentially arranged existence from all field item sets according to a preset number of field combinations as field association units, and establish corresponding confidence links according to each field association unit;

and an index establishing module 604, configured to filter the confidence links according to a preset confidence threshold, and establish a joint index of the database according to the filtered confidence links and the screened field item set.

Referring to fig. 7, another embodiment of the apparatus for establishing a database joint index according to the embodiment of the present invention includes:

Further, the log parsing module 601 includes:

a symbol identification unit 6011, configured to identify a separator in the historical query log, and perform statement splitting on the historical query log by using the separator to obtain at least two query statements;

an item set extracting unit 6012, configured to extract, according to a statement structure of each query statement, a field item set in each query statement.

Further, the calculating module 602 includes:

a field extraction unit 6021, configured to extract sequentially ordered condition fields in each field item set respectively and serve as field items;

a comparing unit 6022, configured to compare each field item with each field item set in sequence, and determine, according to a comparison result, a field item set in each field item set that includes the same field item;

a proportion calculation unit 6023, configured to calculate a proportion between the number of field item sets determined corresponding to each field item and the number of all field item sets respectively;

a support degree calculation unit 6024 configured to use the ratio calculated by each field item as a support degree of the corresponding field item set with respect to all field item sets.

Further, the link building module 603 includes:

a traversal unit 6031, configured to sequentially traverse each field item set by using each field association unit, and determine, according to a result of the traversal, a field item set in each field item set that includes the field association unit;

a ratio calculation unit 6032, configured to calculate a ratio between the number of field item sets determined by each of the field association units and the number of all field item sets;

and a permutation and combination unit 6033, configured to perform permutation and combination on the proportions calculated by each field association unit to obtain a corresponding confidence link.

Further, the index establishing module 604 includes:

a first traversal unit 6041, configured to sequentially traverse the confidence links by using the first confidence threshold, and determine a first confidence that occurs in the confidence links for the first time and is lower than the first confidence threshold;

a segment extraction unit 6042 configured to extract a segment confidence link that is before the first confidence in the confidence links;

a second traversing unit 6043, configured to sequentially traverse the segmented confidence links by using the second confidence threshold, and determine a second confidence that is lower than the second confidence threshold in the segmented confidence links;

and a combining unit 6044, configured to combine the determined second confidence degrees according to a traversal order, so as to obtain a filtered confidence degree link.

Further, the index creating module 604 further includes:

an extracting unit 6045, configured to extract each field item set in the filtered confidence link;

an index establishing unit 6046, configured to select, according to each extracted field item set, a field item set that is the same as the screened field item set, and establish a joint index of the database by using the selected field item set.

In the embodiment of the invention, through the statistical analysis of the historical query logs of the database, the field item set obtained by the analysis is screened, the confidence coefficient link is established, and then the joint index of the corresponding field item is established by utilizing the confidence coefficient link and the screening result, thereby achieving the optimal query effect of the whole system.

Fig. 6 and fig. 7 describe the apparatus for establishing a database joint index in the embodiment of the present invention in detail from the perspective of a modular functional entity, and the apparatus for establishing a database joint index in the embodiment of the present invention is described in detail from the perspective of hardware processing.

Fig. 8 is a schematic structural diagram of a device for creating a database joint index according to an embodiment of the present invention, where the device 800 for creating a database joint index may generate a relatively large difference due to different configurations or performances, and may include one or more processors (CPUs) 810 (e.g., one or more processors) and a memory 820, and one or more storage media 830 (e.g., one or more mass storage devices) for storing an application 833 or data 832. Memory 820 and storage medium 830 may be, among other things, transient or persistent storage. The program stored in the storage medium 830 may include one or more modules (not shown), each of which may include a series of instruction operations in the apparatus 800 for establishing a joint index of a database. Further, the processor 810 may be configured to communicate with the storage medium 830, and execute a series of instruction operations in the storage medium 830 on the database joint index creation apparatus 800.

The database joint index creation apparatus 800 may also include one or more power supplies 840, one or more wired or wireless network interfaces 850, one or more input-output interfaces 860, and/or one or more operating systems 831, such as Windows Server, Mac OS X, Unix, Linux, FreeBSD, etc. Those skilled in the art will appreciate that the database federated index build device architecture illustrated in FIG. 8 does not constitute a limitation on the database federated index build device and may include more or fewer components than illustrated, or some components in combination, or a different arrangement of components.

The invention further provides a device for establishing the database joint index, wherein the computer device comprises a memory and a processor, and computer readable instructions are stored in the memory, and when being executed by the processor, the computer readable instructions cause the processor to execute the steps of the method for establishing the database joint index in the embodiments.

The present invention also provides a computer-readable storage medium, which may be a non-volatile computer-readable storage medium or a volatile computer-readable storage medium, having stored therein instructions, which, when executed on a computer, cause the computer to perform the steps of the database joint index establishment method.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The application is operational with numerous general purpose or special purpose computing system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like. The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A method for establishing a database joint index is characterized by comprising the following steps:

acquiring a historical query log in a database, and analyzing the historical query log to obtain at least two field item sets, wherein each field item set comprises at least two condition fields which are sequentially arranged;

respectively calculating the support degree of each field item set relative to all field item sets, and screening out the field item sets with the support degree larger than a preset support degree threshold value from all the field item sets;

acquiring a plurality of groups of condition fields with sequential arrangement from all field item sets as field association units according to the number of preset field combinations, and constructing corresponding confidence links according to each field association unit;

and filtering the confidence coefficient links according to a preset confidence coefficient threshold value, and establishing a joint index of the database according to the filtered confidence coefficient links and the screened field item set.

2. The method for building a database joint index according to claim 1, wherein the parsing the historical query log to obtain at least two field item sets comprises:

identifying separators in the historical query logs, and performing statement splitting on the historical query logs by adopting the separators to obtain at least two query statements;

and extracting field item sets in the query sentences according to the sentence structure of the query sentences.

3. The method for establishing a database joint index according to claim 1, wherein the calculating the support degree of each field item set relative to all field item sets respectively comprises:

respectively extracting condition fields of the sequence ordering in each field entry set and taking the condition fields as field entries;

comparing each field item with each field item set in sequence, and determining the field item sets containing the same field items in each field item set according to the comparison result;

respectively calculating the proportion between the number of the field item sets correspondingly determined by each field item and the number of all the field item sets;

and taking the proportion obtained by calculating each field item as the support degree of the corresponding field item set relative to all field item sets.

4. The method for establishing a database joint index according to claim 1, wherein the constructing a corresponding confidence link according to each of the field association units comprises:

traversing each field item set by sequentially adopting each field association unit, and determining the field item set containing the field association unit in each field item set according to the traversal result;

respectively calculating the proportion between the number of the field item sets correspondingly determined by each field association unit and the number of all the field item sets;

and carrying out permutation and combination on the proportion calculated by each field association unit to obtain a corresponding confidence link.

5. The method for establishing a database joint index according to claim 1, wherein the confidence threshold comprises a first confidence threshold and a second confidence threshold, and the filtering the confidence link according to a preset confidence threshold comprises:

traversing the confidence links in sequence by adopting the first confidence threshold, and determining a first confidence coefficient which is lower than the first confidence threshold and appears in the confidence links for the first time;

extracting segmented confidence links preceding the first confidence in the confidence links;

traversing the segmented confidence links sequentially by adopting the second confidence threshold, and determining a second confidence which is lower than the second confidence threshold in the segmented confidence links;

and combining the determined second confidence degrees according to the traversal sequence to obtain a filtered confidence degree link.

6. The method for creating a database joint index according to claim 5, wherein the creating a database joint index according to the filtered confidence links and the screened field item sets comprises:

extracting each field item set in the filtered confidence coefficient link;

and selecting the field item set which is the same as the screened field item set according to the extracted field item sets, and establishing the joint index of the database by adopting the selected field item set.

7. An apparatus for creating a database joint index, the apparatus comprising:

the log analysis module is used for acquiring a historical query log in a database and analyzing the historical query log to obtain at least two field item sets, wherein each field item set comprises at least two condition fields which are sequentially arranged;

the calculation module is used for respectively calculating the support degree of each field item set relative to all field item sets and screening out the field item sets with the support degree larger than a preset support degree threshold value from all the field item sets;

the link construction module is used for acquiring a plurality of groups of condition fields with sequential arrangement from all field item sets according to the number of preset field combinations as field association units, and constructing corresponding confidence links according to each field association unit;

and the index establishing module is used for filtering the confidence coefficient links according to a preset confidence coefficient threshold value and establishing the joint index of the database according to the filtered confidence coefficient links and the screened field item set.

8. The apparatus for building database joint index according to claim 7, wherein the computing module comprises:

a field extraction unit, configured to extract sequentially ordered condition fields in each field entry set as field entries;

the comparison unit is used for comparing each field item with each field item set in sequence and determining the field item sets containing the same field items in each field item set according to the comparison result;

the proportion calculation unit is used for respectively calculating the proportion between the number of the field item sets which are determined corresponding to the field items and the number of all the field item sets;

and the support degree calculation unit is used for taking the proportion calculated by each field item as the support degree of the corresponding field item set relative to all the field item sets.

9. An apparatus for creating a database joint index, the apparatus comprising: a memory and at least one processor, the memory having instructions stored therein;

the at least one processor invokes the instructions in the memory to cause the database joint index building apparatus to perform the steps of the database joint index building method according to any one of claims 1-6.

10. A computer-readable storage medium having stored thereon instructions which, when executed by a processor, perform the steps of the method for establishing a database joint index according to any one of claims 1 to 6.