CN107402920B - Method and device for determining correlation complexity of relational database table - Google Patents
Method and device for determining correlation complexity of relational database table Download PDFInfo
- Publication number
- CN107402920B CN107402920B CN201610329065.XA CN201610329065A CN107402920B CN 107402920 B CN107402920 B CN 107402920B CN 201610329065 A CN201610329065 A CN 201610329065A CN 107402920 B CN107402920 B CN 107402920B
- Authority
- CN
- China
- Prior art keywords
- association
- database table
- data structure
- marking
- complexity
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2282—Tablespace storage structures; Management thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/253—Grammatical analysis; Style critique
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a method and a device for determining the correlation complexity of a relational database table, which can obtain the correlation complexity of the table of an application system according to a log or a configuration file based on the relational database application system, thereby providing quantitative support for the quality management of the application system; and objective decision basis is provided for optimization and quality management of the application system. The method of the invention comprises the following steps: acquiring a log file and/or a configuration file of an application system based on a relational database, and segmenting each record in the acquired file to obtain a plurality of words; screening a plurality of words in each record according to a pre-stored table data dictionary to obtain a word segmentation sequence of each record; generating an association fingerprint corresponding to each participle sequence according to each participle sequence, and generating an association diagram data structure according to the obtained association fingerprint; calculating the complexity of the association diagram data structure, and taking the complexity of the association diagram data structure as the association complexity of the relational database table.
Description
Technical Field
The invention relates to the technical field of computers and software thereof, in particular to a method and a device for determining the correlation complexity of a relational database table.
Background
The development of the application system based on the relational database generally adopts a three-layer (data access layer, business logic layer and presentation layer) or multi-layer (the business logic layer is split into a plurality of sub-layers) architecture, wherein the data access layer mainly undertakes basic data access logic (adding, deleting, modifying and checking). In practical application, the longer the application system based on the relational database runs, the more times of newly-added functions and bug fixes are, the more SQL database associations are, the higher the complexity of the data access layer is, and the more difficult the system is to maintain.
The measurable data is manageable, and only if the correlation complexity of the SQL database is determined, the maintainability deterioration of the data access layer can be identified in time and an improved scheme can be found. Circle complexity is a measure of code complexity. In the concept of software testing, the circle complexity' is used for measuring the complexity of a module judgment structure, and is expressed as the number of independent linear paths in quantity, namely the minimum number of paths which need to be tested for reasonably preventing errors, the large circle complexity indicates that program codes are possibly low in quality and difficult to test and maintain, and the possible errors of programs and the high circle complexity have a great relationship according to experience. The degree of circle complexity is based on graph theory, and the general formula is v (g) ═ e-n +2, where e represents the number of edges in the control flow graph (corresponding to the portion of the sequential structure in the code), and n represents the number of nodes in the control flow graph, including the start point and the end point.
Although the existing complexity calculation method can be used for solving the problem of program complexity measurement, the existing complexity calculation method cannot be directly used for evaluating the table association complexity, cannot provide a quantitative result of the database table association complexity, and cannot provide objective decision basis for system optimization and quality management.
Disclosure of Invention
In view of the above, the present invention provides a method and an apparatus for determining the correlation complexity of a relational database table, which can extract the correlation fingerprint of the database table according to a log or a configuration file based on a relational database application system, and measure the complexity of table correlation of the application system based on a circle complexity algorithm, thereby providing quantitative support for quality management of the application system, providing a quantitative result of the correlation complexity of the database table, and providing a periodic report of the correlation complex library of the database table; and objective decision basis is provided for the optimization and quality management of the application system.
To achieve the above objects, according to one aspect of the present invention, there is provided a method of determining the association complexity of a relational database table.
The method for determining the correlation complexity of the relational database table comprises the following steps: acquiring a log file and/or a configuration file of an application system based on a relational database, and segmenting each record in the acquired file to obtain a plurality of words; screening the words in each record according to a pre-stored table data dictionary to obtain a word segmentation sequence of each record; the table database dictionary comprises a database table name and preset grammar keywords; generating an association fingerprint corresponding to each participle sequence according to each participle sequence, and generating an association diagram data structure according to the obtained association fingerprint; and calculating the complexity of the association diagram data structure, and taking the complexity of the association diagram data structure as the association complexity of the relational database table.
Optionally, before the step of obtaining the log file and/or the configuration file in the system, the method further includes: acquiring a database table name in a database; and storing the database table name and preset grammar keywords according to a set format to obtain a table data dictionary.
Optionally, the step of screening the words in each record according to a pre-stored table data dictionary to obtain a word segmentation sequence of each record includes: and comparing the words in each record with the words in the table data dictionary, and taking all the words in the table data dictionary as the word segmentation sequence of the record.
Optionally, the step of generating an associated fingerprint corresponding to each word segmentation sequence according to the word segmentation sequence includes: searching a grammar keyword for marking the beginning of association and a grammar keyword for marking the end of association in the word segmentation sequence; and generating the association fingerprint of the word segmentation sequence according to the database table name between each pair of grammar keywords for marking association start and grammar keywords for marking association end of the word segmentation sequence.
Optionally, the step of generating an association fingerprint of the segmentation sequence according to a database table name between each pair of the grammar keyword for marking the beginning of association and the grammar keyword for marking the end of association in the segmentation sequence includes: removing duplication of database table names included between a first pair of grammar keywords for marking association start and grammar keywords for marking association end of a word segmentation sequence, and then recording association relations between the database table names after duplication removal according to the sequence of the database table names to obtain sub-association fingerprints of the word segmentation sequence; comparing database table names included between a second pair of grammar keywords for marking association start and grammar keywords for marking association end of the word segmentation sequence with database table names in the sub-association fingerprints in sequence, and recording the association relation between the database table names and the database table names at the last in the sub-association fingerprints under the condition that the database table names are not coincident with the database table names in the sub-association fingerprints; and processing database table names included between the remaining grammar keywords for marking the association start and the grammar keywords for marking the association end in the word segmentation series according to the sequence of the grammar keywords for marking the association start and the grammar keywords for marking the association end, thereby obtaining the association fingerprint of the word segmentation series.
Optionally, the step of generating a correlation diagram data structure according to the obtained correlation fingerprint comprises: counting database table names in the associated fingerprints of all the word segmentation sequences, removing duplication of the database table names, and marking the remaining database table names after duplication removal as vertexes of an associated graph data structure; and recording the association relations among the rest database table names as the edges of the association diagram data structure, thereby obtaining the association diagram data structure.
Optionally, the step of calculating the complexity of the dependency graph data structure comprises: counting the number of vertexes and edges included in the association diagram data structure; calculating the complexity of the correlation diagram data structure according to the formula V (G) -e-n + 2; wherein e is the number of edges of the dependency graph data structure; n is the number of vertexes of the association graph data structure; v (G) is the complexity of the dependency graph data structure.
According to another aspect of the invention, an apparatus for determining the association complexity of a relational database table is provided.
The device for determining the correlation complexity of the relational database table comprises the following steps: the acquisition module is used for acquiring a log file and/or a configuration file of an application system based on a relational database, and segmenting each record in the acquired file to obtain a plurality of words; the screening module is used for screening the words in each record according to a pre-stored table data dictionary to obtain a word segmentation sequence of each record; the table database dictionary comprises a database table name and preset grammar keywords; the generation module is used for generating an association fingerprint corresponding to each participle sequence according to each participle sequence and generating an association diagram data structure according to the obtained association fingerprint; and the calculation module is used for calculating the complexity of the association diagram data structure and taking the complexity of the association diagram data structure as the association complexity of the relational database table.
Optionally, the system further comprises a data dictionary generating module, configured to obtain a database table name in a database, and then store the database table name and a preset syntax keyword according to a set format to obtain a table data dictionary.
Optionally, the screening module is further configured to: and comparing the words in each record with the words in the table data dictionary, and taking all the words in the table data dictionary as the word segmentation sequence of the record.
Optionally, the generating module is further configured to search a syntax keyword indicating a beginning of association and a syntax keyword indicating an end of association in the segmentation sequence, and then generate an association fingerprint of the segmentation sequence according to a database table name between each pair of syntax keyword indicating a beginning of association and each pair of syntax keywords indicating an end of association in the segmentation sequence.
Optionally, the generating module is further configured to: removing duplication of database table names included between a first pair of grammar keywords for marking association start and grammar keywords for marking association end of a word segmentation sequence, and then recording association relations between the database table names after duplication removal according to the sequence of the database table names to obtain sub-association fingerprints of the word segmentation sequence; comparing database table names included between a second pair of grammar keywords for marking association start and grammar keywords for marking association end of the word segmentation sequence with database table names in the sub-association fingerprints in sequence, and recording the association relation between the database table names and the database table names at the last in the sub-association fingerprints under the condition that the database table names are not coincident with the database table names in the sub-association fingerprints; and processing database table names included between the remaining grammar keywords for marking the association start and the grammar keywords for marking the association end in the word segmentation series according to the sequence of the grammar keywords for marking the association start and the grammar keywords for marking the association end, thereby obtaining the association fingerprint of the word segmentation series.
Optionally, the generating module is further configured to count database table names in the associated fingerprints of all the participle sequences, deduplicate the database table names, and record the database table names remaining after deduplication as vertices of the associated graph data structure; and recording the association relations among the rest database table names as the edges of the association diagram data structure, thereby obtaining the association diagram data structure.
Optionally, the computing module is further configured to: counting the number of vertexes and edges included in the association diagram data structure; calculating the complexity of the correlation diagram data structure according to the formula V (G) -e-n + 2; wherein e is the number of edges of the dependency graph data structure; n is the number of vertexes of the association graph data structure; v (G) is the complexity of the dependency graph data structure.
According to still another aspect of embodiments of the present invention, there is provided an electronic apparatus including: one or more processors; and the storage device is used for storing one or more programs, and when the one or more programs are executed by the one or more processors, the one or more processors realize the method for determining the association complexity of the relational database table provided by the invention.
According to a further aspect of the embodiments of the present invention, there is provided a computer readable medium, on which a computer program is stored, which when executed by a processor, implements the method for determining the association complexity of a relational database table provided by the present invention.
According to the technical scheme of the invention, because the associated fingerprint of the database table can be extracted from the log or the configuration file based on the relational database application system, and the complexity of table association of the application system is measured based on the circle complexity algorithm, quantitative support can be provided for quality management of the application system, a quantitative result of the complexity of database table association is provided, and a periodic report of the database table association complex library is provided; and objective decision basis is provided for the optimization and quality management of the application system.
Drawings
The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:
FIG. 1 is a schematic diagram of an apparatus for determining the association complexity of a relational database table according to an embodiment of the invention;
FIG. 2 is a schematic diagram of a method for determining the association complexity of a relational database table according to an embodiment of the invention.
Detailed Description
Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
FIG. 1 is a schematic diagram of an apparatus for determining the association complexity of a relational database table according to an embodiment of the invention. As shown in fig. 1, an apparatus 10 for determining the correlation complexity of a relational database table according to an embodiment of the present invention mainly includes an obtaining module 11, a screening module 12, a generating module 13, and a calculating module 14; the obtaining module 11 is configured to obtain a log file and/or a configuration file of an application system based on a relational database, and perform word segmentation on each record in the obtained file to obtain a plurality of words; the screening module 12 is configured to screen the multiple words in each record according to a table data dictionary saved in advance to obtain a word segmentation sequence of each record; the table database dictionary comprises a database table name and preset grammar keywords; the generation module 13 is configured to generate an association fingerprint corresponding to each word segmentation sequence according to each word segmentation sequence, and then generate an association diagram data structure according to the obtained association fingerprint; the calculation module 14 is configured to calculate the complexity of the dependency graph data structure, and use the complexity of the dependency graph data structure as the dependency complexity of the relational database table; the preset syntax key words may be SQL syntax key words.
The apparatus 10 for determining the correlation complexity of the relational database table according to the embodiment of the present invention may further include a data dictionary generating module (not shown in the figure) configured to obtain a database table name in the database, and then store the database table name and a preset SQL syntax keyword according to a set format to obtain a table data dictionary.
The screening module 12 of the apparatus 10 for determining the correlation complexity of the relational database table according to the embodiment of the present invention may be further configured to: comparing the words in each record with the words in the table data dictionary, and if the words exist in the table data dictionary, keeping the words; otherwise, deleting the word; and then taking the remaining words in the record as the word segmentation sequence of the record.
The generating module 13 of the apparatus 10 for determining the correlation complexity of the relational database table according to the embodiment of the present invention may also be configured to search for an SQL syntax keyword for indicating the start of correlation and an SQL syntax keyword for indicating the end of correlation in the word segmentation sequence, and then generate an association fingerprint of the word segmentation sequence according to a database table name between each pair of the SQL syntax keyword for indicating the start of correlation and the SQL syntax keyword for indicating the end of correlation in the word segmentation sequence.
The generation module 13 of the apparatus 10 for determining the correlation complexity of the relational database table according to the embodiment of the present invention may be further configured to: removing duplication of database table names included between a first pair of SQL grammar keywords for marking association start and SQL grammar keywords for marking association end of a word segmentation sequence, and then recording the association relation between the database table names after duplication removal according to the sequence of the database table names to obtain sub-association fingerprints of the word segmentation sequence; comparing database table names included between a second pair of SQL grammar keywords for marking association start and SQL grammar keywords for marking association end of the word segmentation sequence with database table names in the sub-association fingerprints in sequence, and recording the association relation between the database table names and the database table names at the last in the sub-association fingerprints under the condition that the database table names are not coincident with the database table names in the sub-association fingerprints; and processing database table names included between the residual SQL grammar keywords for marking the beginning of the association and the residual SQL grammar keywords for marking the end of the association in the word segmentation series according to the sequence of the SQL grammar keywords for marking the beginning of the association and the SQL grammar keywords for marking the end of the association, thereby obtaining the association fingerprint of the word segmentation series.
The generation module 13 of the apparatus 10 for determining the correlation complexity of the relational database table according to the embodiment of the present invention may be further configured to count database table names in correlation fingerprints of all the participle sequences, deduplicate the database table names, and mark the remaining database table names after deduplication as vertices of a correlation map data structure; and recording the association relations among the rest database table names as the edges of the association diagram data structure, thereby obtaining the association diagram data structure.
The calculation module of the apparatus 10 for determining the correlation complexity of the relational database table according to the embodiment of the present invention may be further configured to: counting the number of vertexes and edges included in the association diagram data structure; calculating the complexity of the correlation diagram data structure according to the formula V (G) -e-n + 2; wherein e is the number of edges of the dependency graph data structure; n is the number of vertexes of the association graph data structure; v (G) is the complexity of the dependency graph data structure.
FIG. 2 is a schematic diagram of a method for determining the association complexity of a relational database table according to an embodiment of the invention. As shown in FIG. 2, the execution subject of the method for determining the association complexity of the relational database table according to the embodiment of the present invention may be the apparatus 10 for determining the association complexity of the relational database table mentioned in FIG. 1, and the method mainly includes the following steps S20 to S23.
Step S20: the method comprises the steps of obtaining a log file and/or a configuration file of an application system based on a relational database, and performing word segmentation on each record in the obtained file to obtain a plurality of words. The log files referred to in this step include system logs, SQL log files, and other text files containing standard SQL (ANSI SQL92) statements, etc.; the mentioned configuration files comprise XML files configured with SQL, program source codes and the like; after the log file and/or the configuration file are/is obtained, the device 10 for determining the association complexity of the relational database table performs word segmentation on each record in the file, so as to obtain a plurality of words included in each word segmentation record.
Before step S20, the apparatus 10 for determining the association complexity of the relational database table generates a table data dictionary from the data in the database; that is, the device 10 for determining the correlation complexity of the relational database table first obtains the database table name in the relational database, and then stores the database table name and the preset SQL syntax key word according to the set format to obtain the table data dictionary; the table data dictionary comprises two parts, namely a table metadata word and an SQL (structured query language) grammar keyword; the base table metadata words can be manually input or captured from the relational database through a JDBC interface and at least comprise table names and view names; SQL syntax keywords include at least from, where; wherein, grammar key word from can be set for marking the beginning of table association, grammar key word is set for marking the end of table association; relational databases as referred to herein refer to databases that support the ANSI SQL92 SQL standard, such as oracle, db2, informix, mysql, and the like; if the database table names in the relational database are updated, for example, 1 new table is added, the apparatus 10 for determining the association complexity of the relational database table retrieves the database table names in the relational database to update the table data dictionary.
Step S21: and screening the words in each record according to a pre-stored table data dictionary to obtain a word segmentation sequence of each record. In this step, the apparatus 10 for determining the correlation complexity of the relational database table filters the words in each record obtained in step S20 according to the table data dictionary stored in advance, that is, compares the words in each record with the words in the table data dictionary, and if the words exist in the table data dictionary, the words are retained; otherwise, deleting the word; and then taking the remaining words in the record as the word segmentation sequence of the record.
Step S22: and generating an association fingerprint corresponding to each word segmentation sequence according to each word segmentation sequence, and generating an association diagram data structure according to the obtained association fingerprint. In this step, the device 10 for determining the correlation complexity of the relational database table searches for the SQL syntax keyword for marking the beginning of the correlation and the SQL syntax keyword for marking the end of the correlation in the word segmentation sequence; for example, a syntax keyword from is set for indicating the start of table association, and a syntax keyword where is set for indicating the end of table association;
the following description is given with specific two word segmentation sequences, assuming that there are 2 SQL logs in the system log file:
INFO-2016-02-23 13:44:33.094com.demo.Callback::DelEvent–se lect name,code from T_1,T_2as f where T_1.code=f.code and exists(select 1from T_3,T_1as g where T_3.type=f.code and T_3.flag=g.flag)execute time:257ms;
INFO-2016-02-23 13:45:08.432com.demo.Callback::DelEvent–se lect count(*)from T_2,T_4where T_2.group=T_4.groupcode execute time:549ms。
the word segmentation sequences obtained after the two logs are screened by the table data dictionary are respectively as follows:
word segmentation sequence 1: [ from, T _1, T _2, where, T _1, from, T _3, T _1, where, T _3, T _3 ];
word segmentation sequence 2: [ from, T _2, T _4, where, T _2, T _4 ].
Firstly, the device 10 for determining the correlation complexity of the relational database table determines the database table name between each pair of the SQL syntax keyword for marking the beginning of the correlation and the SQL syntax keyword for marking the end of the correlation by searching the SQL syntax keywords from and where in the word segmentation sequence; searching the sequence 1 to include two pairs of SQL grammar keywords for marking the beginning of association and SQL grammar keywords for marking the end of association, wherein the database names between the first pair of grammar keywords include T _1 and T _ 2; the database table name between the second pair of grammar keywords comprises T _3 and T _ 1; the sequence 2 comprises a pair of SQL (structured query language) syntax keywords for marking the beginning of association and SQL syntax keywords for marking the end of association, and database table names among the syntax keywords comprise T _2 and T _ 4;
secondly, the apparatus 10 for determining the correlation complexity of the relational database table generates the correlation fingerprint of the word segmentation sequence according to the database table name between each pair of the SQL grammar keywords for marking the beginning of the correlation and the SQL grammar keywords for marking the end of the correlation of the word segmentation sequence
For the word segmentation sequence 1, duplicate removal is performed on database table names included between a first pair of SQL grammar keywords for marking association start and an SQL grammar keyword for marking association end of the word segmentation sequence 1, and because T _1 and T _2 are included between the first pair of grammar keywords and no repeated database table names exist, only the association relation between the database table names needs to be recorded according to the sequence of the database table names, namely the association relation between the database table names T _1 and T _2 is recorded, so that sub-association fingerprints of the word segmentation sequence 1 are obtained; then comparing database table names included between the second pair of SQL grammar keywords for marking association start and the SQL grammar keywords for marking association end of the participle sequence 1 with database table names in the sub-association fingerprints in sequence, namely comparing the database table names T _3 with the database table names in the T _1 sub-association fingerprints in sequence, and comparing the database table names T _3 with the database table names in the sub-association fingerprints, wherein the database table names are not overlapped with the database table names T _1 and T _2, so that the association relation between the database table names T _3 and the database table names (namely the database table names T _2) positioned at the end in the sub-association fingerprints is recorded, and then comparing the database table names T _1 with the database table names in the sub-association fingerprints, and the processing is not performed because the T _1 is overlapped with the T _1 in the sub-association fingerprints; because the database table names included in the word segmentation sequence 1 are processed, the associated fingerprints of the word segmentation sequence 1 are obtained;
for the word segmentation sequence 2, the database table names included between the SQL grammar keywords for marking the beginning of association and the SQL grammar keywords for marking the end of association of the word segmentation sequence 2 are deduplicated, and because the first pair of grammar keywords includes T _2 and T _4, and there is no repeated database table names, only the association relation between the database table names needs to be recorded according to the sequence of the database table names, namely the association relation between the database table names T _2 and T _4 is recorded, so that the association fingerprint of the word segmentation sequence 2 is obtained;
if the word segmentation sequence comprises more than two pairs of SQL (structured query language) syntax keywords for marking correlation start and SQL syntax keywords for marking correlation end, the database table names included between the rest SQL syntax keywords for marking correlation start and the rest SQL syntax keywords for marking correlation end in the word segmentation sequence are processed according to the sequence of the SQL syntax keywords for marking correlation start and the SQL syntax keywords for marking correlation end, so that the correlation fingerprints of the word segmentation sequence are obtained.
Finally, the device 10 for determining the correlation complexity of the relational database table obtains the correlation diagram data structure according to the correlation fingerprints of all the participle sequences. In this embodiment, the database table names in the associated fingerprints of the word segmentation sequence 1 and the word segmentation sequence 2 are counted, and the database table names included in the two associated fingerprints are deduplicated; recording the names (namely T _1, T _2, T _3 and T _4) of the database tables which are left after the duplication removal as the vertexes of the association diagram data structure; and recording the association relations among the rest database table names as edges of the association diagram data structure (namely, the association relation between T _1 and T _2 is recorded as one edge of the association diagram data structure, the association relation between T _2 and T _3 is recorded as one edge of the association diagram data structure, and the association relation between T _2 and T _4 is recorded as one edge of the association diagram data structure), thereby obtaining the association diagram data structure.
Step S23: and calculating the complexity of the association diagram data structure, and taking the complexity of the association diagram data structure as the association complexity of the relational database table. In this step, the apparatus 10 for determining the association complexity of the relational database table counts the number of vertices and edges included in the association graph data structure in step S22 (for the present embodiment, the association graph data structure includes 4 vertices and 3 edges), so that the complexity of the association graph data structure of the present embodiment calculated according to the formula v (g) ═ e-n +2 is 1; wherein e is the number of edges of the dependency graph data structure; n is the number of vertexes of the association graph data structure; v (G) is the complexity of the dependency graph data structure; therefore, the technical scheme of the embodiment of the invention obtains the table association complexity of the system based on the relational database which can measure the operation by collecting the log and the configuration file.
According to the technical scheme of the embodiment of the invention, because the associated fingerprint of the database table can be extracted from the log or the configuration file based on the relational database application system, and the complexity of table association of the application system is measured based on the circle complexity algorithm, quantitative support can be provided for quality management of the application system, a quantitative result of the associated complexity of the database table can be provided, and a periodic report of the associated complex database of the database table can be provided; and objective decision basis is provided for the optimization and quality management of the application system.
The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (14)
1. A method of determining the association complexity of a relational database table, comprising:
acquiring a log file and/or a configuration file of an application system based on a relational database, and segmenting each record in the acquired file to obtain a plurality of words;
screening the words in each record according to a pre-stored table data dictionary to obtain a word segmentation sequence of each record; the table database dictionary comprises a database table name and preset grammar keywords;
generating an association fingerprint corresponding to each participle sequence according to each participle sequence, and generating an association diagram data structure according to the obtained association fingerprint; the association fingerprint is an association relation between database table names between grammar keywords for marking association start and grammar keywords for marking association end, the database table names are marked as vertexes of the association diagram data structure, and the association relation between the database table names is marked as edges of the association diagram data structure;
and calculating the complexity of the association diagram data structure, and taking the complexity of the association diagram data structure as the association complexity of the relational database table.
2. The method of claim 1, wherein the step of obtaining log files and/or configuration files in the system is preceded by the step of:
acquiring a database table name in a database;
and storing the database table name and preset grammar keywords according to a set format to obtain a table data dictionary.
3. The method of claim 1, wherein the step of filtering words in each of the records according to a pre-stored table data dictionary to obtain a word segmentation sequence for each record comprises:
and comparing the words in each record with the words in the table data dictionary, and taking all the words in the table data dictionary as the word segmentation sequence of the record.
4. The method of claim 1, wherein generating an associated fingerprint corresponding to each participle sequence from the participle sequence comprises:
searching a grammar keyword for marking the beginning of association and a grammar keyword for marking the end of association in the word segmentation sequence;
and generating the association fingerprint of the word segmentation sequence according to the database table name between each pair of grammar keywords for marking association start and grammar keywords for marking association end of the word segmentation sequence.
5. The method of claim 4, wherein the step of generating the association fingerprint for the segmentation sequence based on the database table name between each pair of the grammar key word indicating the beginning of the association and the grammar key word indicating the end of the association comprises:
removing duplication of database table names included between a first pair of grammar keywords for marking association start and grammar keywords for marking association end of a word segmentation sequence, and then recording association relations between the database table names after duplication removal according to the sequence of the database table names to obtain sub-association fingerprints of the word segmentation sequence;
comparing database table names included between a second pair of grammar keywords for marking association start and grammar keywords for marking association end of the word segmentation sequence with database table names in the sub-association fingerprints in sequence, and recording the association relation between the database table names and the database table names at the last in the sub-association fingerprints under the condition that the database table names are not coincident with the database table names in the sub-association fingerprints;
and processing database table names included between the remaining grammar keywords for marking the association start and the grammar keywords for marking the association end in the word segmentation series according to the sequence of the grammar keywords for marking the association start and the grammar keywords for marking the association end, thereby obtaining the association fingerprint of the word segmentation series.
6. Method according to any of claims 1 to 5, wherein the step of generating a correlation diagram data structure from the obtained correlation fingerprints comprises:
counting database table names in the associated fingerprints of all the word segmentation sequences, removing duplication of the database table names, and marking the remaining database table names after duplication removal as vertexes of an associated graph data structure; and recording the association relations among the rest database table names as the edges of the association diagram data structure, thereby obtaining the association diagram data structure.
7. The method according to any of claims 1 to 5, wherein the step of calculating the complexity of the dependency graph data structure comprises:
counting the number of vertexes and edges included in the association diagram data structure;
calculating the complexity of the correlation diagram data structure according to the formula V (G) -e-n + 2; wherein e is the number of edges of the dependency graph data structure; n is the number of vertexes of the association graph data structure; v (G) is the complexity of the dependency graph data structure.
8. An apparatus for determining the association complexity of a relational database table, comprising:
the acquisition module is used for acquiring a log file and/or a configuration file of an application system based on a relational database, and segmenting each record in the acquired file to obtain a plurality of words;
the screening module is used for screening the words in each record according to a pre-stored table data dictionary to obtain a word segmentation sequence of each record; the table database dictionary comprises a database table name and preset grammar keywords;
the generation module is used for generating an association fingerprint corresponding to each participle sequence according to each participle sequence and generating an association diagram data structure according to the obtained association fingerprint; the association fingerprint is an association relation between database table names between grammar keywords for marking association start and grammar keywords for marking association end, the database table names are marked as vertexes of the association diagram data structure, and the association relation between the database table names is marked as edges of the association diagram data structure;
and the calculation module is used for calculating the complexity of the association diagram data structure and taking the complexity of the association diagram data structure as the association complexity of the relational database table.
9. The apparatus of claim 8, further comprising a data dictionary generating module, configured to obtain a database table name in the database, and then store the database table name and a preset syntax keyword according to a set format to obtain a table data dictionary.
10. The apparatus of claim 8, wherein the screening module is further configured to: and comparing the words in each record with the words in the table data dictionary, and taking all the words in the table data dictionary as the word segmentation sequence of the record.
11. The apparatus of claim 8, wherein the generating module is further configured to find a syntax keyword indicating a beginning of association and a syntax keyword indicating an end of association in the segmentation sequence, and then generate the association fingerprint of the segmentation sequence according to a database table name between each pair of the syntax keyword indicating the beginning of association and the syntax keyword indicating the end of association in the segmentation sequence.
12. The apparatus of claim 11, wherein the generating module is further configured to:
removing duplication of database table names included between a first pair of grammar keywords for marking association start and grammar keywords for marking association end of a word segmentation sequence, and then recording association relations between the database table names after duplication removal according to the sequence of the database table names to obtain sub-association fingerprints of the word segmentation sequence;
comparing database table names included between a second pair of grammar keywords for marking association start and grammar keywords for marking association end of the word segmentation sequence with database table names in the sub-association fingerprints in sequence, and recording the association relation between the database table names and the database table names at the last in the sub-association fingerprints under the condition that the database table names are not coincident with the database table names in the sub-association fingerprints;
and processing database table names included between the remaining grammar keywords for marking the association start and the grammar keywords for marking the association end in the word segmentation series according to the sequence of the grammar keywords for marking the association start and the grammar keywords for marking the association end, thereby obtaining the association fingerprint of the word segmentation series.
13. The apparatus according to any one of claims 8 to 12, wherein the generating module is further configured to count database table names in the associated fingerprints of all the participle sequences, deduplicate the database table names, and mark the database table names remaining after deduplication as vertices of an associated graph data structure; and recording the association relations among the rest database table names as the edges of the association diagram data structure, thereby obtaining the association diagram data structure.
14. The apparatus of any of claims 8-12, wherein the computing module is further configured to:
counting the number of vertexes and edges included in the association diagram data structure;
calculating the complexity of the correlation diagram data structure according to the formula V (G) -e-n + 2; wherein e is the number of edges of the dependency graph data structure; n is the number of vertexes of the association graph data structure; v (G) is the complexity of the dependency graph data structure.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610329065.XA CN107402920B (en) | 2016-05-18 | 2016-05-18 | Method and device for determining correlation complexity of relational database table |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610329065.XA CN107402920B (en) | 2016-05-18 | 2016-05-18 | Method and device for determining correlation complexity of relational database table |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107402920A CN107402920A (en) | 2017-11-28 |
CN107402920B true CN107402920B (en) | 2020-02-07 |
Family
ID=60394012
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610329065.XA Active CN107402920B (en) | 2016-05-18 | 2016-05-18 | Method and device for determining correlation complexity of relational database table |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107402920B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108108441A (en) * | 2017-12-21 | 2018-06-01 | 新博卓畅技术(北京)有限公司 | A kind of database table structure analysis method and system |
CN109325019B (en) * | 2018-08-17 | 2022-02-08 | 国家电网有限公司客户服务中心 | Data association relationship network construction method |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101145162A (en) * | 2007-10-31 | 2008-03-19 | 金蝶软件(中国)有限公司 | Data base dynamic inquiry method and system |
CN102289482A (en) * | 2011-08-02 | 2011-12-21 | 北京航空航天大学 | Unstructured data query method |
US8166074B2 (en) * | 2005-11-14 | 2012-04-24 | Pettovello Primo M | Index data structure for a peer-to-peer network |
CN103593469A (en) * | 2013-11-30 | 2014-02-19 | 合一网络技术(北京)有限公司 | Method and device for calculating associated keywords through complementary information |
CN104021198A (en) * | 2014-06-16 | 2014-09-03 | 北京理工大学 | Relational database information retrieval method and device based on ontology semantic index |
CN104424269A (en) * | 2013-08-30 | 2015-03-18 | 中国电信股份有限公司 | Data linage analysis method and device |
-
2016
- 2016-05-18 CN CN201610329065.XA patent/CN107402920B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8166074B2 (en) * | 2005-11-14 | 2012-04-24 | Pettovello Primo M | Index data structure for a peer-to-peer network |
CN101145162A (en) * | 2007-10-31 | 2008-03-19 | 金蝶软件(中国)有限公司 | Data base dynamic inquiry method and system |
CN102289482A (en) * | 2011-08-02 | 2011-12-21 | 北京航空航天大学 | Unstructured data query method |
CN104424269A (en) * | 2013-08-30 | 2015-03-18 | 中国电信股份有限公司 | Data linage analysis method and device |
CN103593469A (en) * | 2013-11-30 | 2014-02-19 | 合一网络技术(北京)有限公司 | Method and device for calculating associated keywords through complementary information |
CN104021198A (en) * | 2014-06-16 | 2014-09-03 | 北京理工大学 | Relational database information retrieval method and device based on ontology semantic index |
Also Published As
Publication number | Publication date |
---|---|
CN107402920A (en) | 2017-11-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Deng et al. | The Data Civilizer System. | |
US11436213B1 (en) | Analysis of database query logs | |
EP2784665B1 (en) | Program and version control method | |
US11294869B1 (en) | Expressing complexity of migration to a database candidate | |
US9836389B2 (en) | Test data generation utilizing analytics | |
US10061787B2 (en) | Unified data model for integration between relational and non-relational databases | |
US10157211B2 (en) | Method and system for scoring data in a database | |
US9104784B2 (en) | Detecting multi-column composite key column sets | |
KR101696338B1 (en) | System and method for processing and analysing big data provding efficiently using columnar index data format | |
US10437853B2 (en) | Tracking data replication and discrepancies in incremental data audits | |
US10789295B2 (en) | Pattern-based searching of log-based representations of graph databases | |
US20170255709A1 (en) | Atomic updating of graph database index structures | |
US20130041900A1 (en) | Script Reuse and Duplicate Detection | |
US10878000B2 (en) | Extracting graph topology from distributed databases | |
US11487732B2 (en) | Database key identification | |
KR20150080533A (en) | Characterizing data sources in a data storage system | |
US9460142B2 (en) | Detecting renaming operations | |
CN110134694B (en) | Rapid comparison device and method for table data in double-activity database | |
CN104424360A (en) | Method and system for accessing a set of data tables in a source database | |
CN112307124B (en) | Database synchronous verification method, device, equipment and storage medium | |
TWI706260B (en) | Index establishment method and device based on mobile terminal NoSQL database | |
CN109325062B (en) | Data dependency mining method and system based on distributed computation | |
KR20150079467A (en) | Methods and systems for resolving conflicts in hierarchically-referenced data | |
CN107402920B (en) | Method and device for determining correlation complexity of relational database table | |
US11023449B2 (en) | Method and system to search logs that contain a massive number of entries |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |