CN113010611A - Method and system for automatically generating relations between relational database tables - Google Patents

Method and system for automatically generating relations between relational database tables Download PDF

Info

Publication number
CN113010611A
CN113010611A CN201911314469.1A CN201911314469A CN113010611A CN 113010611 A CN113010611 A CN 113010611A CN 201911314469 A CN201911314469 A CN 201911314469A CN 113010611 A CN113010611 A CN 113010611A
Authority
CN
China
Prior art keywords
tables
field
inter
relationship
dimension
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911314469.1A
Other languages
Chinese (zh)
Inventor
余宙
杨永智
刘福博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Abbott Technology Co Ltd
Original Assignee
Beijing Abbott Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Abbott Technology Co Ltd filed Critical Beijing Abbott Technology Co Ltd
Priority to CN201911314469.1A priority Critical patent/CN113010611A/en
Publication of CN113010611A publication Critical patent/CN113010611A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a method and a system for automatically generating relations among relational database tables. Necessary information read from each table in the relational database is analyzed and calculated to generate inter-table relation suggestions, and a user selects a final inter-table relation and a mode thereof from the suggested inter-table relations; the intermediate information used or generated in the analysis and calculation process, the inter-table relationship recommended to the user by the inter-table relationship analysis and generation device, the model and the like are stored in the storage device in the form of a data table; the structure of the data table is preset and stored in the storage device; the inter-table relationship analysis information table, the table association relationship suggestion table, and the model suggestion table are displayed in a display device such as a display in a form that can be visually recognized by a user. The method does not need to rely on any artificial rules, artificial judgment and predefined description, does not need to process the data table in advance, can analyze all tables and all fields, and improves the accuracy of judging the relationship between the tables.

Description

Method and system for automatically generating relations between relational database tables
Technical Field
The invention belongs to the technical field of business intelligence, and particularly relates to a method and a system for automatically generating relations among relational database tables.
Background
In Business Intelligence (BI) software, data warehouse, ETL and other software, correlation between tables is needed for modeling or data correlation analysis. In the traditional technical scheme, the relation between relational database tables is established mainly by means of field naming specification, manual judgment, PDM file compiling and the like. The traditional technology is seriously dependent on manual work, and if the naming is not standard, personnel cannot know the data structure and the relation, and the relation between tables is very difficult to establish under the conditions of lack of PDM files or description documents and the like. Especially, when the number of tables is large and the relationship is complex, a large amount of manpower is required to be invested for judgment, and even then, the problems of correlation error, inexhaustibility or incomplete coverage and the like still exist.
The invention patent application with the application number of 201711396232.3 provides a database table structure analysis method and a system, wherein html files are exported from a database, the html files are analyzed, and random sample data with proper quantity are obtained from the files; and obtaining a possible association relation by using a data association analysis algorithm according to the sample data. According to the technical scheme, a certain number of samples need to be extracted from the database, and the generated html file is adopted to carry out the relation analysis among the tables. The disadvantages are that: (1) because the sample is extracted instead of analyzing all tables of the database, the generated relationships between the tables may not be accurate due to incomplete samples, for example, in a data sampling mode, the number of dimensional values may affect the relationship judgment between two fields in the tables, and an inaccurate or even wrong conclusion may be obtained; if the data is completely exported, the file is not easy to export when the data size is large; (2) the html file is required to be generated firstly and then analyzed, and the processing flow is complex and inconvenient to operate.
The invention patent application No. 201810972829.6 proposes a table relation analysis support device and a table relation analysis support method. The table relation analysis support device is preferably configured to analyze a relation of data in a specific column between tables of a relational database, and includes: the table-to-table relationship analysis information defines a table in which the relationship of data of the table is in a one-to-one relationship or a many-to-one relationship as a table group; a one-to-one relationship convolution section that outputs inter-table relationship analysis information that summarizes the tables in the one-to-one relationship into a table group, based on the inter-table relationship information; and a many-to-one relationship convolution unit that updates the inter-table relationship analysis information by grouping tables having many-to-one relationships into a table group. The technical scheme has the following disadvantages: the method is only suitable for analyzing the relation of data in specific columns among tables, and the table types are classified into one-to-one relation and many-to-one relation, then the specific columns are analyzed for the relation among the tables respectively, and finally the specific columns are presented in a table group form.
Disclosure of Invention
The invention provides a method and a system for automatically generating the relationship between relational database tables, when the association relationship between tables is established in a business intelligence and data warehouse system, all tables and all fields in the relational database can be directly analyzed without depending on any artificial rules, artificial judgment, predefined description files and the like and processing the data tables in advance, so that the accuracy of judging the relationship between the tables is improved, and the association relationship between the table fields is automatically judged.
In order to solve the above technical problem, the present invention provides a method for automatically generating relationships between relational database tables, comprising:
for all tables of the database, acquiring fields of each table and a dimension value c _ ct _ d of each field;
for all tables of the database, acquiring the data line number T _ ct of each table;
for each table, calculating a value of an intermediate parameter V _ MK according to the field dimension value and the table row number, wherein V _ MK equals c _ ct _ d/T _ ct; when V _ MK obtained by calculation according to the dimension value c _ ct _ d of a certain field is 1, taking the column where the field is located as a main key of a table where the field is located; thereby obtaining a primary key for each table;
for each table, comparing the main key of the table with the dimensional values of all fields of the table and other tables, and calculating to obtain a similarity coefficient J _ AB of the dimensional values among columns; the key structure comprises a main key A field, a main key B field and a main key B field, wherein J _ AB is CTD _ A/CTD _ AB, CTD _ A is a dimension value of the main key A field after duplication is removed, and CTD _ AB is a dimension value of intersection of the field A after duplication removal and any field B in the table or other tables; when J _ AB is more than or equal to 0.97 and less than or equal to 1, the association relationship between the columns of the field A and the field B is one-to-many or one-to-one;
for each table, if there is a case where V _ MK ═ 1, determining the type of the table as a dimension table; for each table, if there is a case of 0.97 ≦ J _ AB ≦ 1, the type of the table is determined as a fact table; for each table, if there is a case where V _ MK ≦ 1 and 0.97 ≦ J _ AB ≦ 1, then determine the type of the table as the dimension table for the snowflake model middle layer;
and generating the association relationship among the tables by combining the association relationship among the columns according to the given table types.
As a preferred embodiment, when J _ AB is 1, there is a one-to-many or one-to-one relationship for field a to field B; for each table, if there is a case where J _ AB ═ 1, the type of the table is determined as a fact table; for each table, if there is a case where V _ MK is 1 and J _ AB is 1, the type of the table is determined as a dimension table of the snowflake model middle layer.
As a preferred implementation mode, a corresponding star model or snowflake model is generated according to the table association relation.
As a preferred embodiment, each item of information used or generated in the foregoing process is stored in the storage device in the form of an inter-table relationship analysis information table including one or more of the following list:
a table field list for storing table fields;
the table row number numerical table is used for storing the table row number;
a field dimension value table for storing dimension values;
table primary key decision suggestion table for storing the value of V _ MK;
a dimension value similarity calculation table for storing a similarity coefficient J _ AB of the dimension values between the columns;
the table attribute judgment table is used for storing the table type judgment mark;
the table association relationship suggestion table is used for storing the relationship between tables;
and the model suggestion table is used for storing the relation model between the tables.
In a preferred embodiment, the table structure of the inter-table relationship analysis information table is prepared in advance and stored in the storage device.
As a preferred embodiment, the system further includes an inter-table relationship analysis and generation device, the inter-table relationship analysis and generation device sequentially calculates and obtains a value of the intermediate parameter V _ MK and a similarity coefficient J _ AB of the inter-column dimension value according to the field dimension value and the number of rows of tables stored in the storage device, determines the type of each table according to the value of the intermediate parameter V _ MK and the similarity coefficient J _ AB of the inter-column dimension value, and generates an inter-table association relationship and a corresponding model according to the similarity coefficient J _ AB of each table type and the inter-column dimension value.
In a preferred embodiment, the inter-table relationship analysis information table, the table association relationship suggestion table, and the model suggestion table are displayed on the display device in a form that is visually recognizable by the user.
The invention also provides a system for automatically generating the relationship between the relational database tables, which interacts with the data of the relational database and comprises a table data reading device, an inter-table relationship analysis and generation device, a storage device and a display device;
in a relational database, various data are contained in various tables;
the table data reading device reads the number of rows, fields and field dimension information of each table for analyzing and generating the relationship among the tables from all the tables of the relational database;
the device for analyzing and generating the relationship among the tables generates the relationship among the tables and a corresponding model after analysis and calculation according to the information acquired by the device for reading the table data;
the storage device is used for storing various items of information read from the relational database by the data reading device, intermediate information used or generated in the analysis and calculation process by the inter-table relational analysis and generation device, and the inter-table relations and the models thereof generated by the inter-table relational analysis and generation device in the form of data tables;
the relationships between tables and their models are displayed on a display device in a form that is visually recognizable to a user.
As a preferred embodiment, the pieces of information used or generated by the inter-table relationship analysis and generation means are stored in the storage means in the form of an inter-table relationship analysis information table including one or more of the following list:
a table field list for storing table fields;
the table row number numerical table is used for storing the table row number;
a field dimension value table for storing dimension values;
table primary key decision suggestion table for storing the value of V _ MK;
a dimension value similarity calculation table for storing a similarity coefficient J _ AB of the dimension values between the columns;
the table attribute judgment table is used for storing the table type judgment mark;
the table association relationship suggestion table is used for storing the relationship between tables;
and the model suggestion table is used for storing the relationship between the tables.
In a preferred embodiment, the table structure of the inter-table relationship analysis information table is prepared in advance and stored in the storage device.
As a preferred embodiment:
for all tables of the database, reading and storing fields in each table by a table data reading device so as to form a table field list;
for all tables of the database, the table data reading device reads the data line number of each table, and obtains and stores the data line number T _ ct of each table, so as to form a table line number numerical table;
the table data reading device reads and stores the dimension value of each field, so as to form a field dimension value numerical table;
for each table, the inter-table relationship analysis and generation device calculates the value of an intermediate parameter V _ MK according to the field dimension value and the table row number, wherein V _ MK is c _ ct _ d/T _ ct; when V _ MK obtained by calculation according to the dimension value c _ ct _ d of a certain field is 1, taking the column where the field is located as a main key of a table where the field is located; thereby obtaining a primary key for each table; the value of the intermediate parameter V _ MK is stored in the table main key judgment suggestion table;
for each table, the device for analyzing and generating the relationship among the tables compares the primary key of the device with the dimensional values of all fields of the table and other tables, and calculates and obtains the similarity coefficient J _ AB of the dimensional values among the columns; the key structure comprises a main key A field, a main key B field and a main key B field, wherein J _ AB is CTD _ A/CTD _ AB, CTD _ A is a dimension value of the main key A field after duplication is removed, and CTD _ AB is a dimension value of intersection of the field A after duplication removal and any field B in the table or other tables; when J _ AB is more than or equal to 0.97 and less than or equal to 1, the association relationship between the columns of the field A and the field B is one-to-many or one-to-one; the similarity coefficient J _ AB is stored in a dimension value similarity calculation table;
for each table, the inter-table relationship analyzing and generating device makes the following judgments: if there is a case where V _ MK ═ 1, determining the type of the table as a dimension table; for each table, if there is a case of 0.97 ≦ J _ AB ≦ 1, the type of the table is determined as a fact table; for each table, if there is a case where V _ MK ≦ 1 and 0.97 ≦ J _ AB ≦ 1, then determine the type of the table as the dimension table for the snowflake model middle layer;
the inter-table relationship analysis and generation device generates the inter-table association relationship by combining the inter-column association relationship according to the given table types.
In a preferred embodiment, the device for analyzing and generating relationships between tables generates a corresponding star model or snowflake model according to the relationships between tables.
Compared with the prior art, the invention has the remarkable advantages that:
(1) the invention analyzes all tables of the database and all fields of all tables, can obtain comprehensive and accurate analysis data of the relationship among the tables, and the obtained relationship among the tables is comprehensive and accurate; in addition, the database is directly inquired through the program, analysis is carried out in a mode of directly connecting with the database, and the operation step of exporting files in the middle is not needed, so that the analysis process is simplified, and the analysis efficiency is improved. The technical scheme is simple and convenient to integrate and load with other functional modules.
(2) The invention can implement undifferentiated analysis on all tables in the database, namely, the information of the type, the field and the like of the table does not need to be solved before the analysis, and the manual intervention is not needed.
(3) In the invention, in the process of analyzing and generating the relationships among the tables, the used or generated relationship analysis information among the tables and the relationships among the tables are stored in the storage device in the form of the data tables so as to be called by the device for analyzing and generating the relationships among the tables at any time, so that the device is suitable for all the tables of the database and does not need to classify the tables in the database in advance.
(4) The invention obtains the correlation among tables through the correlation analysis algorithm, can rapidly and accurately analyze the relation among the whole tables of the unknown database, and can visually display the relation, thereby saving a great amount of labor and time cost.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
Drawings
FIG. 1 is a hardware/software composition diagram of one embodiment of a system for automatically generating relationships between relational database tables of the present invention.
FIG. 2 is a functional block diagram of one embodiment of a system for automatically generating relationships between relational database tables according to the present invention.
FIG. 3 is a flow diagram of one embodiment of a system for automatically generating relationships between relational database tables of the present invention.
FIG. 4 is a flow chart of analyzing and generating inter-table associative relations in one embodiment of the method for automatically generating relations between relational database tables of the present invention.
Detailed Description
It is easily understood that various embodiments of the present invention can be conceived by those skilled in the art according to the technical solution of the present invention without changing the essential spirit of the present invention. Therefore, the following detailed description and the accompanying drawings are merely illustrative of the technical aspects of the present invention, and should not be construed as all of the present invention or as limitations or limitations on the technical aspects of the present invention. Rather, these embodiments are provided so that this disclosure will be thorough and complete. The preferred embodiments of the present invention will now be described in detail with reference to the accompanying drawings, which form a part hereof, and which together with the embodiments of the invention serve to explain the innovative concepts of the invention.
In many scenarios, a user needs to know or use the association relationship between database tables, i.e., the relationship between tables. For example:
(1) the user uses a dispersed statistical table, such as various electronic tables based on system and manual arrangement, and summarizes the statistical table to a data warehouse for modeling and correlation analysis;
(2) and the data system directly accesses the database of the service system. A business system of a general enterprise is modeled according to a transaction requirement in a normal mode, modeling is not performed according to data warehouse dimensions, and the inter-table relation needs to be determined when the business system is directly accessed to be analyzed.
(3) The data system is directly connected with the data mart system. After modeling through a data warehouse, a general enterprise can have a huge data mart system, wherein a plurality of mart wide tables generated for specific scene reports exist, and when the relationship between the wide tables and the wide tables is constructed and analyzed, a clear relationship between the tables is also needed.
In actual service, the relationships among the tables are not unique, various relationships among the tables can be obtained through analysis and calculation, and the most suitable relationships among the various relationships among the tables can be selected by a user. That is, a plurality of inter-table relationship suggestions may be provided for the user, and then the user may select the final inter-table relationship as the inter-table relationship formally output by the system. Of course, the relationships between the tables may be unique, which does not prevent the application and implementation of the solution described in the present invention.
In order to achieve the above object, the present invention provides a method and system for automatically generating relationships between relational database tables.
In one embodiment of the invention, the system for generating the relationship between the relational database tables directly interacts with the data of the relational database and comprises a table data reading device, an inter-table relationship analysis and generation device, a storage device and a display device.
Wherein the content of the first and second substances,
in a relational database, various data are contained in various tables;
the table data reading device reads the necessary information of each table from the relational database and provides the necessary information to the inter-table relational analysis and generation device; the table data reading means further reads the inter-table relationship analysis information from the storage means and supplies the same to the inter-table relationship analysis and generation means, and further, the table data reading means reads the table association relationship suggestion table and the model suggestion table from the storage means and supplies the same to the display means.
The inter-table relation analyzing and generating device generates inter-table relation suggestions after analysis and calculation according to necessary information read from each table in the relational database by the table data reading device, and a user selects the final inter-table relation and the mode thereof in the suggested inter-table relations.
The storage device stores the necessary information of each table read from the relational database by the data reading device, the intermediate information used or generated by the table relation analyzing and generating device in the analyzing and calculating process, the table relation recommended to the user by the table relation analyzing and generating device, the model thereof and the like in the form of data tables so as to be called by the table relation analyzing and generating device or the user at any time. The structure of the data table is preset and stored in the storage device, and the inter-table relation analyzing and generating device writes or calls various data or information according to the analyzing and calculating process. Of course, during the use process, the user can also modify the structure of the data table through the man-machine interaction device.
For convenience of description, the information necessary for each table read from the relational database by the data reading device and the intermediate information used or generated by the inter-table relationship analyzing and generating device in the analyzing and calculating process are collectively referred to as inter-table relationship analyzing information, and the data table for storing the inter-table relationship analyzing information in the storage device is collectively referred to as an inter-table relationship analyzing information table. In one embodiment, the inter-table relationship analysis information table may include:
table 1 full database table list db _ table;
table 2 table field list db _ column;
table 3 table row number value table db _ table _ ct;
table 4 field dimension numerical table db _ colony _ ct;
table 5 table primary key determination suggestion table db _ table _ mk;
table 6 dimension value similarity calculation table db _ column _ J;
table 7 table attribute judgment table db _ table _ FD;
the inter-table relationship analysis and generation means stores the suggestions of inter-table relationships recommended to the user and the inter-table relationship models in the form of table association relationship suggestion tables and model suggestion tables in the storage means. Therefore, the storage device also stores the following data table in advance:
table 8 table association suggestion table db _ column _ rel;
table 9 the model suggests table db _ model.
The inter-table relationship analysis information table, the table association relationship suggestion table, and the model suggestion table are displayed in a display device such as a display in a form that can be visually recognized by a user.
The table data reading device, the inter-table relationship analyzing and generating device, the storage device, and the display device may be implemented on a general information processing device, as shown in fig. 1, having a cpu (central processing unit), a display device, a main memory, a data reading device, an external storage device, and a network communication device.
The CPU, the display device, the main memory, the data reading device, the external storage device, and the network communication device are connected via a bus, and can transmit and receive data.
The CPU executes the program and data loaded from the external storage device to the main memory, and controls the operations of the respective parts of the table data reading device, the inter-table relationship analyzing and generating device, the storage device, and the display device.
The external storage device is a mass storage device such as an HDD (Hard Disk Drive) or an SDD (Solid state Drive). The external storage device is provided with a relational database inter-table relational analysis and generation program and stores inter-table relational analysis information, an inter-table relational analysis information table, a table association relation suggestion table and a model suggestion table. The CPU executes a relational analysis and generation program between relational database tables loaded from an external storage device to a main memory. The relational database inter-table relation analysis and generation program is used for executing the functions of the inter-table relation analysis and generation device.
In one embodiment below, it will be described how the aforementioned relational database inter-table relationship generation system performs inter-table relationship analysis and calculation under the control of the CPU. The process of analyzing and calculating the relationships between tables shown in this embodiment includes the following steps:
and S100, configuring data connection according to the connection parameters of the database, and testing the communication condition of the table data reading device and the database so that the table data reading device can read the data information in the relational database.
S200, reading lists of all the tables from the database by the table data reading device, and storing the table list information into the tables preset in the storage device to form a full database table list. The full database table list is shown in table 1.
Table 1 full database table list db _ table
table_id table_name comment
141 com_balance Balance sheet
146 com_basic_info Company basic information table
208 com_profit Profit table
497 ntf_jz_com_team Character basic information table
635 sec_industry_new Securities belonging to trade
652 sec_plate_category Security block level table
In table 1, table _ ID is the ID of each table, table _ name is the name of each table, and comment is the chinese name of each table. The Chinese names shown in the comment column are for convenience of illustration only, and in actual implementation, the preset full database table list does not necessarily need to contain a column of data information of the Chinese names.
S300, after acquiring the list of the full database table, reading and comparing the parameters of each table and each field in the database to generate the relationship among the tables for the user to refer and determine:
s301, the table data reading device reads the fields in each table and stores the fields in the tables preset in the storage device, and therefore a table field list is formed. A portion of the table field list is shown in table 2.
Table 2 table field list db _ column (partial examples)
id db_table_id column_name column_data_type column_comment
4916 141 com_uni_code bigint Corporate unified coding
4921 141 currency_code bigint Currency code
5063 141 whether_published varchar Whether report publishes value
5064 141 special_case_description varchar Description of specific cases
5082 141 come_source varchar Data source
5086 141 remark varchar Note
5087 141 creator varchar Creators
5088 141 editor varchar Editor
5089 141 src_id varchar Data correspondence advertisement ID
5734 146 com_name varchar Company full name
5735 146 com_ename varchar Company English name
5736 146 com_sname varchar Company abbreviation
5737 146 com_spell varchar Company Pinyin abbreviation
5738 146 com_espell varchar Company English abbreviation
5741 146 com_type_name varchar Company type (name)
5742 146 concept varchar The related concept
5743 146 major_product_type varchar Type of main camp product
5744 146 major_product_name varchar Name of main camp product
5746 146 board_chair_man varchar President
5747 146 legal_rp varchar Representative of a legal person
5748 146 uni_credit varchar Unified credit coding
5749 146 busi_number varchar Trade and industry registration number
Table 2 is a partial field list in table 1, table _ id being 141 and name com _ balance (balance sheet), and table _ id being 146 and name com _ basic _ info (company basic information sheet). In table 2, ID is the ID of each field, column _ name is the field name, column _ data _ type is the field type, and column _ comment is the field literal name. Also, the chinese names shown in the column of column _ comment are merely for convenience of illustration, and in actual implementation, the preset table field list does not necessarily need to contain a column of data information of the chinese names.
S302, the table data reading device reads the data line number of each table in the database and stores the data line number into a table preset in the storage device, so as to form a table line number numerical table, and the table line number numerical table is shown in table 3.
Table 3 table row number value table db _ table _ ct
T_id T_nm T_ct
141 com_balance 183685
146 com_basic_info 25395
208 com_profit 108287
497 ntf_jz_com_team 133799
635 sec_industry_new 73510
652 sec_plate_category 1977
In table 3, T _ ID is the table ID, T _ nm is the table name, and T _ ct is the number of table rows.
S303, the table data reading device reads the dimension value of each field and stores the dimension value into the table preset in the storage device, so as to form a field dimension number table, which is shown in table 4.
TABLE 4 field dimension numerical Table db _ colony _ ct
T_id C_nm c_ct_d
141 com_uni_code 3897
141 currency_code 3
141 whether_published 3
141 special_case_description 3
141 come_source 5
141 remark 6
141 creator 2
141 editor 10
141 src_id 20246
146 com_uni_code 25395
146 com_name 25395
146 com_ename 24480
146 com_sname 24875
146 com_spell 2
146 com_espell 8510
146 com_type_name 13
146 concept 3413
146 industry_code 131
In table 4, C _ nm is the field name, and C _ ct _ d is the dimension value of the field.
S304, reading the number of rows in the table 3 and the dimension values of the fields in the table 4 by the table data reading device and providing the dimension values to the device for analyzing and generating the relationship among the tables; the inter-table relationship analysis and generation device calculates a V _ MK value, and stores the V _ MK value in a table preset in the storage device, thereby forming a table main key judgment suggestion table; the inter-table relationship analysis and generation device gives a primary key suggestion according to the V _ MK value. Table main key determination suggestion table is shown in table 5.
Table 5 table primary key decision suggestion table db _ table _ mk
T_id T_nm T_ct C_id C_nm c_ct_d V_MK
146 com_basic_info 25395 …… com_uni_code 25395 1
146 com_basic_info 25395 …… com_name 25395 1
…… …… …… …… …… …… ……
In table 5, C _ ID is a field ID. The invention uses the value of V _ MK as the judgment basis of the main key of the field C _ nm in the table T _ nm, thereby giving the suggestion of the main key, wherein the suggested main keys may be several or one, and are collectively called the main key suggestion. Wherein V _ MK is c _ ct _ d/T _ ct.
When V _ MK is 1, the inter-table relationship analysis and generation means suggests the column in which the corresponding field C _ nm is located as the primary key of the table T _ nm to which it belongs. That is, when the dimension value is equal to the table row number, the column of the corresponding field C _ nm may be the only primary key or one of the primary keys of the table T _ nm. For example, in table 5, for the table com _ basic _ info, the row number is 25395, the dimension value of the field com _ uni _ code is 25395, and at this time, V _ MK is 1, the column where the field com _ uni _ code is located is taken as the main key of the table com _ basic _ info. Similarly, in the table com _ basic _ info, the dimension value of the field com _ name is also 25395, and if V _ MK is 1, the column in which the field com _ name is located is also used as the primary key of the table com _ basic _ info. Therefore, the table com _ basic _ info primary key suggests: the column in which the field com _ uni _ code is located and the column in which the field com _ name is located both serve as the primary key of the table com _ basic _ info.
And S305, reading the dimensional value of the main key confirmed in the step S304 by the table data reading device and providing the dimensional value to the inter-table relationship analysis and generation device, comparing the dimensional value of the main key with dimensional values of other columns in the table 2 or the table 4 in pairs by the inter-table relationship analysis and generation device, calculating a similarity coefficient of the dimensional values among the columns, and taking the similarity coefficient as a judgment basis of the association relationship among the columns. The dimension value numerical values of the main key, the dimension value numerical values of other columns, the similarity coefficient and the like are stored in a table preset in a storage device, so that a dimension value similarity calculation table is formed. The dimensional value similarity calculation table is shown in table 6.
TABLE 6 dimension value similarity calculation Table db _ column _ J
C_ID_A C_ID_B CTD_A CTD_AB J_AB
25395 25475 0.9969
3897 25475 0.153
…… …… ……
Table 6 is an example of a dimension value similarity calculation table of a primary key column a and any other column B, where C _ ID _ a is an ID of the primary key column a (i.e., a column in which the field a is located), C _ ID _ B is an ID of any other column B (i.e., a column in which any field B in all the other fields is located except the primary key a, and any other column B may be other columns of the table in which the primary key column a is located, or may be any column of other tables), CTD _ a is a dimension value of a field of the primary key column a after deduplication, and CTD _ AB is a dimension value of an intersection of the field a and the field B after deduplication.
J _ AB is a dimension value similarity coefficient between the field a and the field B, and J _ AB is CTD _ a/CTD _ AB.
When J _ AB is 1, the inter-table relationship analysis and generation apparatus determines that the dimension value of field B is a subset of the dimension value of field a. That is, there may be a one-to-one, one-to-many, many-to-one relationship for field a versus field B. The relation type can be judged according to the V _ MK value corresponding to the field A, and the technical scheme only considers the judgment of one-to-many or one-to-one relation.
Considering data quality and other factors, certain fault tolerance is considered for J _ AB when necessary, for example, when J _ AB > is 0.99 or J _ AB > is 0.99, it can be determined that there may be one-to-many or one-to-one relationship between field a and field B.
S306, the device for analyzing and generating the relationship between tables judges the type of the tables according to the V _ MK obtained in the step S304 and the J _ AB value obtained in the step S305, namely, judges that a certain table is a dimension table or/and a fact table, and the judging method is shown as a table attribute judging table shown in the table 7.
Table 7 table attribute determination table db _ table _ FD
T_id isF isD
141 1
146 1
208 …… ……
497 …… ……
635 …… ……
652 …… ……
In table 7, isF is a decision flag of the fact table type, isD is a decision flag of the dimension table type, and the decision rule is as follows:
in table 5, if a record having V _ MK ═ 1 exists in a certain table T _ id, the isD value corresponding to the table T _ id in table 7 is updated to 1. For example, for the table with T _ id of 146, in table 5, there is a record with V _ MK equal to 1, so its corresponding isD value in table 7 is updated to 1, i.e. the type of the table suggests the dimension table type.
In table 6, if there is a record with J _ AB ═ 1 in field B of a certain table T _ id (considering fault tolerance, J _ AB ═ 0.99 or J _ AB ═ 0.97 is also possible), the isF value corresponding to the table T _ id in table 7 is updated to 1, i.e. the type of the table is suggested as the fact table type.
For a table corresponding to a certain T _ id, there may be a case where both isF and isF are 1, such as a dimension table of an intermediate layer in a snowflake model.
S307, according to the fact table and the judgment result of the dimension table given in the table 7 in the step S306, the inter-column association relationship suggestion (namely, the one-to-many or one-to-one relationship between the field A and the field B) given in the table 6 is combined to generate the inter-table association relationship suggestion. Table association suggestions are shown in table 8.
Table 8 table association relationship suggestion table db _ column _ rel
Figure BDA0002325464830000131
The model _ ID is the ID of the model, the left _ table _ ID is the left table ID, the right _ table _ ID is the right ID, the left _ column _ ID is the left field ID, the right _ column _ ID is the right field ID, the relationship is the association relationship, and the left join is the left association.
In this embodiment, a format of the fact table left association dimension table is adopted. The invention can also adopt the association relationship with other formats, such as right join right association, inner join inner association and the like.
S400, generating corresponding star model or snowflake model suggestions by the table relation analyzing and generating device according to the table relation suggestions in the table association relationship suggestion table shown in the table 8, and storing the star model or snowflake model suggestions in a storage device in the form of a model suggestion table.
And S500, confirming the final inter-table relation and model by the user according to the table association relation suggestion and the model suggestion given by the inter-table relation analysis and generation device, and formally generating the final inter-table relation and model into the system after confirmation.
The structure of each table in the above embodiments is merely an example, and in actual operation, the number of column data items is not necessarily only the items shown in each table in the above embodiments, and other item data may be provided.
The invention automatically generates the relationship between tables by analyzing and comparing all the field values of the tables in the relational database; and displaying the inter-table relation of the database in a model suggestion table mode according to the obtained incidence relation. The invention aims to obtain the relationship between tables of the database through the data association analysis of the unknown database, so that a user can clearly know the table structure of the unknown database, and the use and the utilization of the database are facilitated. The display device displays the analysis information of the relationships among the tables and displays the relationships among the tables of the database in the form of the model suggestion table.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention.
It should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes described in a single embodiment or with reference to a single figure, for the purpose of streamlining the disclosure and aiding in the understanding of various aspects of the invention by those skilled in the art. However, the present invention should not be construed such that the features included in the exemplary embodiments are all the essential technical features of the patent claims.
Those skilled in the art will appreciate that all or part of the flow of the method implementing the above embodiments may be implemented by a computer program, which is stored in a computer readable storage medium, to instruct related hardware. The computer readable storage medium is a magnetic disk, an optical disk, a read-only memory or a random access memory.
It should be understood that the devices, modules, units, components, etc. included in the system of one embodiment of the present invention may be adaptively changed to be provided in an apparatus or system different from that of the embodiment. The different devices, modules, units or components comprised by the system of an embodiment may be combined into one device, module, unit or component or may be divided into a plurality of sub-devices, sub-modules, sub-units or sub-components.
The means, modules, units or components in the embodiments of the present invention may be implemented in hardware, or may be implemented in software running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that embodiments in accordance with the present invention may be practiced using a microprocessor or Digital Signal Processor (DSP). The present invention may also be embodied as a computer program product or computer-readable medium for performing a portion or all of the methods described herein.

Claims (12)

1. A method for automatically generating the relation between relational database tables,
for all tables of the database, acquiring fields of each table and a dimension value c _ ct _ d of each field;
for all tables of the database, acquiring the data line number T _ ct of each table;
for each table, calculating a value of an intermediate parameter V _ MK according to the field dimension value and the table row number, wherein V _ MK equals c _ ct _ d/T _ ct; when V _ MK obtained by calculation according to the dimension value c _ ct _ d of a certain field is 1, taking the column where the field is located as a main key of a table where the field is located; thereby obtaining a primary key for each table;
for each table, comparing the main key of the table with the dimensional values of all fields of the table and other tables, and calculating to obtain a similarity coefficient J _ AB of the dimensional values among columns; the key structure comprises a main key A field, a main key B field and a main key B field, wherein J _ AB is CTD _ A/CTD _ AB, CTD _ A is a dimension value of the main key A field after duplication is removed, and CTD _ AB is a dimension value of intersection of the field A after duplication removal and any field B in the table or other tables; when J _ AB is more than or equal to 0.97 and less than or equal to 1, the association relationship between the columns of the field A and the field B is one-to-many or one-to-one;
for each table, if there is a case where V _ MK ═ 1, determining the type of the table as a dimension table; for each table, if there is a case of 0.97 ≦ J _ AB ≦ 1, the type of the table is determined as a fact table; for each table, if there is a case where V _ MK ≦ 1 and 0.97 ≦ J _ AB ≦ 1, then determine the type of the table as the dimension table for the snowflake model middle layer;
and generating the association relationship among the tables by combining the association relationship among the columns according to the given table types.
2. The method of automatically generating relationships between relational database tables as recited in claim 1, wherein when J _ AB ═ 1, there is a one-to-many or one-to-one relationship for field a to field B; for each table, if there is a case where J _ AB ═ 1, the type of the table is determined as a fact table; for each table, if there is a case where V _ MK is 1 and J _ AB is 1, the type of the table is determined as a dimension table of the snowflake model middle layer.
3. The method of automatically generating relationships between relational database tables as recited in claim 1, wherein corresponding star models or snowflake models are generated based on table associations.
4. A method of automatically generating relationships between relational database tables according to claim 1, wherein the items of information used or generated in the preceding process are stored in the storage means in the form of tables of inter-table relational analysis information including one or more of the following:
a table field list for storing table fields;
the table row number numerical table is used for storing the table row number;
a field dimension value table for storing dimension values;
table primary key decision suggestion table for storing the value of V _ MK;
a dimension value similarity calculation table for storing a similarity coefficient J _ AB of the dimension values between the columns;
the table attribute judgment table is used for storing the table type judgment mark;
the table association relationship suggestion table is used for storing the relationship between tables;
and the model suggestion table is used for storing the relation model between the tables.
5. The method of automatically generating relationships between relational database tables according to claim 4, wherein the table structure of each of the tables of relational analysis information is pre-established and stored in a storage device.
6. The method of automatically generating relations between relational database tables according to claim 1, further comprising an inter-table relation analyzing and generating means for sequentially calculating a similarity coefficient J _ AB that obtains a value of an intermediate parameter V _ MK and an inter-column dimension value from the field dimension value and the number of rows of tables stored in the storage means, judging the type of each table from the value of the intermediate parameter V _ MK and the similarity coefficient J _ AB of the inter-column dimension value, and generating an inter-table association relation and a corresponding model from the similarity coefficient J _ AB of each table type and inter-column dimension value.
7. The method of automatically generating relationships between relational database tables according to claim 1 wherein the table-to-table relationship analysis information table, the table association relationship suggestion table and the model suggestion table are displayed in a user visually recognizable form on a display device.
8. A system for automatically generating the relation between relational database tables interacts with the data of the relational database, which is characterized by comprising a table data reading device, an inter-table relation analyzing and generating device, a storage device and a display device;
in a relational database, various data are contained in various tables;
the table data reading device reads the number of rows, fields and field dimension information of each table for analyzing and generating the relationship among the tables from all the tables of the relational database;
the device for analyzing and generating the relationship among the tables generates the relationship among the tables and a corresponding model after analysis and calculation according to the information acquired by the device for reading the table data;
the storage device is used for storing various items of information read from the relational database by the data reading device, intermediate information used or generated in the analysis and calculation process by the inter-table relational analysis and generation device, and the inter-table relations and the models thereof generated by the inter-table relational analysis and generation device in the form of data tables;
the relationships between tables and their models are displayed on a display device in a form that is visually recognizable to a user.
9. A system for automatically generating relationships between relational database tables according to claim 8, wherein the items of information used or generated by the inter-table relational analysis and generation means are stored in the storage means in the form of an inter-table relational analysis information table comprising one or more of the following:
a table field list for storing table fields;
the table row number numerical table is used for storing the table row number;
a field dimension value table for storing dimension values;
table primary key decision suggestion table for storing the value of V _ MK;
a dimension value similarity calculation table for storing a similarity coefficient J _ AB of the dimension values between the columns;
the table attribute judgment table is used for storing the table type judgment mark;
the table association relationship suggestion table is used for storing the relationship between tables;
and the model suggestion table is used for storing the relationship between the tables.
10. The system of automatically generating relationships between relational database tables according to claim 9, wherein the table structure of the tables of relational analysis information is pre-established and stored in a storage device.
11. The system for automatically generating relationships between relational database tables according to any one of claims 8 to 10,
for all tables of the database, reading and storing fields in each table by a table data reading device so as to form a table field list;
for all tables of the database, the table data reading device reads the data line number of each table, and obtains and stores the data line number T _ ct of each table, so as to form a table line number numerical table;
the table data reading device reads and stores the dimension value of each field, so as to form a field dimension value numerical table;
for each table, the inter-table relationship analysis and generation device calculates the value of an intermediate parameter V _ MK according to the field dimension value and the table row number, wherein V _ MK is c _ ct _ d/T _ ct; when V _ MK obtained by calculation according to the dimension value c _ ct _ d of a certain field is 1, taking the column where the field is located as a main key of a table where the field is located; thereby obtaining a primary key for each table; the value of the intermediate parameter V _ MK is stored in the table main key judgment suggestion table;
for each table, the device for analyzing and generating the relationship among the tables compares the primary key of the device with the dimensional values of all fields of the table and other tables, and calculates and obtains the similarity coefficient J _ AB of the dimensional values among the columns; the key structure comprises a main key A field, a main key B field and a main key B field, wherein J _ AB is CTD _ A/CTD _ AB, CTD _ A is a dimension value of the main key A field after duplication is removed, and CTD _ AB is a dimension value of intersection of the field A after duplication removal and any field B in the table or other tables; when J _ AB is more than or equal to 0.97 and less than or equal to 1, the association relationship between the columns of the field A and the field B is one-to-many or one-to-one; the similarity coefficient J _ AB is stored in a dimension value similarity calculation table;
for each table, the inter-table relationship analyzing and generating device makes the following judgments: if there is a case where V _ MK ═ 1, determining the type of the table as a dimension table; for each table, if there is a case of 0.97 ≦ J _ AB ≦ 1, the type of the table is determined as a fact table; for each table, if there is a case where V _ MK ≦ 1 and 0.97 ≦ J _ AB ≦ 1, then determine the type of the table as the dimension table for the snowflake model middle layer;
the inter-table relationship analysis and generation device generates the inter-table association relationship by combining the inter-column association relationship according to the given table types.
12. The system for automatically generating relationships between relational database tables according to claim 11, wherein the inter-table relationship analysis and generation means generates the corresponding star or snowflake models based on the inter-table associations.
CN201911314469.1A 2019-12-19 2019-12-19 Method and system for automatically generating relations between relational database tables Pending CN113010611A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911314469.1A CN113010611A (en) 2019-12-19 2019-12-19 Method and system for automatically generating relations between relational database tables

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911314469.1A CN113010611A (en) 2019-12-19 2019-12-19 Method and system for automatically generating relations between relational database tables

Publications (1)

Publication Number Publication Date
CN113010611A true CN113010611A (en) 2021-06-22

Family

ID=76381122

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911314469.1A Pending CN113010611A (en) 2019-12-19 2019-12-19 Method and system for automatically generating relations between relational database tables

Country Status (1)

Country Link
CN (1) CN113010611A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115292508A (en) * 2022-06-29 2022-11-04 江苏昆山农村商业银行股份有限公司 Knowledge graph construction method and system based on table data

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103793422A (en) * 2012-10-31 2014-05-14 国际商业机器公司 Methods for generating cube metadata and query statements on basis of enhanced star schema
US20140172775A1 (en) * 2012-12-17 2014-06-19 Sap Ag Field extensibility in a multi-tenant environment with columnar database support
CN104391948A (en) * 2014-12-01 2015-03-04 广东电网有限责任公司清远供电局 Data standardization construction method and system of data warehouse
CN104899291A (en) * 2015-06-05 2015-09-09 北京京东尚科信息技术有限公司 Method and device for multidimensional analysis of relational database
CN105824914A (en) * 2016-03-15 2016-08-03 浙江大学 Configuration-based snowflake model information extraction method
CN110019564A (en) * 2017-09-28 2019-07-16 株式会社日立制作所 Table relation decomposing auxiliary device and table relation decomposing householder method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103793422A (en) * 2012-10-31 2014-05-14 国际商业机器公司 Methods for generating cube metadata and query statements on basis of enhanced star schema
US20140172775A1 (en) * 2012-12-17 2014-06-19 Sap Ag Field extensibility in a multi-tenant environment with columnar database support
CN104391948A (en) * 2014-12-01 2015-03-04 广东电网有限责任公司清远供电局 Data standardization construction method and system of data warehouse
CN104899291A (en) * 2015-06-05 2015-09-09 北京京东尚科信息技术有限公司 Method and device for multidimensional analysis of relational database
CN105824914A (en) * 2016-03-15 2016-08-03 浙江大学 Configuration-based snowflake model information extraction method
CN110019564A (en) * 2017-09-28 2019-07-16 株式会社日立制作所 Table relation decomposing auxiliary device and table relation decomposing householder method

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115292508A (en) * 2022-06-29 2022-11-04 江苏昆山农村商业银行股份有限公司 Knowledge graph construction method and system based on table data
CN115292508B (en) * 2022-06-29 2024-02-02 江苏昆山农村商业银行股份有限公司 Knowledge graph construction method and system based on table data

Similar Documents

Publication Publication Date Title
US8341131B2 (en) Systems and methods for master data management using record and field based rules
CN104866426A (en) Software test integrated control method and system
US20090150447A1 (en) Data warehouse test automation framework
CN104299105A (en) Credit data management system supporting complex enterprise environment and credit data management method
CN110119395B (en) Method for realizing association processing of data standard and data quality based on metadata in big data management
CN110795524B (en) Main data mapping processing method and device, computer equipment and storage medium
CN107533554A (en) Document verification system
CN110728422A (en) Building information model, method, device and settlement system for construction project
CN112000656A (en) Intelligent data cleaning method and device based on metadata
CN112883042A (en) Data updating and displaying method and device, electronic equipment and storage medium
CN112419030B (en) Method, system and equipment for evaluating financial fraud risk
CN113377758A (en) Data quality auditing engine and auditing method thereof
CN114840531B (en) Data model reconstruction method, device, equipment and medium based on blood edge relation
Zhao et al. A Graph‐Based Method for IFC Data Merging
JP2007133624A (en) Information management method and device using connection relation information
CN113010611A (en) Method and system for automatically generating relations between relational database tables
CN114722789B (en) Data report integrating method, device, electronic equipment and storage medium
CN112786124A (en) Problem troubleshooting method and device, storage medium and equipment
CN116701506A (en) Demand plan compliance verification method fusing unstructured data
CN111723129B (en) Report generation method, report generation device and electronic equipment
US9489438B2 (en) Systems and methods for visualizing master data services information
CN113806336A (en) Data quality evaluation method and system
Gill et al. An open source ETL tool-medium and small scale enterprise ETL (MaSSEETL)
CN113610594B (en) Equipment review price data processing method and system
CN116610727B (en) Analysis processing method and device for enterprise statistical data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination