CN111143483A

CN111143483A - Method, apparatus and computer readable storage medium for determining data table relationships

Info

Publication number: CN111143483A
Application number: CN201911382193.0A
Authority: CN
Inventors: 王燕忠
Original assignee: Beijing Qiqi Technology Co Ltd
Current assignee: Beijing Qiqi Technology Co Ltd
Priority date: 2019-12-27
Filing date: 2019-12-27
Publication date: 2020-05-12

Abstract

The invention discloses a method for determining the relationship between data tables in a database, which comprises the following steps: automatically performing a single or multiple-cycle relationship identification operation on the data table according to one or more configuration parameters; wherein in the relationship identification operation that loops multiple times, the method performs: judging whether a relationship or a new relationship exists between the data tables; extracting a data relationship in response to the relationship or the new relationship existing; and adding a loop record and repeatedly executing the judging and extracting operation; and outputting the table relation result of automatic identification after the relation identification operation which is circulated for many times is finished. The operation of automatically executing the table relation recognition can accelerate the data recognition speed, and the result after automatic recognition can be used as a reference basis for further data analysis.

Description

Method, apparatus and computer readable storage medium for determining data table relationships

Technical Field

The present invention relates generally to the field of databases. More particularly, the present invention relates to a method, apparatus, and computer-readable storage medium for determining relationships between data tables in a database.

Background

With the development of the digital era, data needs to be uploaded to a specified information system for uniform analysis and processing. However, in the process, the problem that the delivered planning table is incomplete and the uploaded data of each place is inaccurate is usually faced, so that the data uploaded by each place cannot be analyzed in a centralized manner. In consideration of the point, a data acquisition tool can be developed to solve the problems of inaccurate uploaded data of various regions and incomplete issued acquisition standard planning tables. However, for the collected data, it is common practice to determine the relationship between the uploaded data tables through manual operation, which causes inefficiency in the identification of the relationship between the data tables and brings about considerable labor cost. Therefore, it is a technical challenge to efficiently and automatically determine the relationship between data tables and simultaneously improve the accuracy of the relationship identification.

Disclosure of Invention

To at least partially solve the technical problems noted in the background, aspects of the present invention provide a method, apparatus, and computer-readable storage medium for determining relationships between data tables in a database.

In one aspect, the present invention provides a method for determining relationships between data tables in a database, comprising: automatically performing a single or multiple-cycle relationship identification operation on the data table according to one or more configuration parameters; wherein in the relationship identification operation that loops multiple times, the method performs: judging whether a relationship or a new relationship exists between the data tables; extracting a data relationship in response to the relationship or the new relationship existing; and adding a loop record and repeatedly executing the judging and extracting operation; and outputting the table relation result of automatic identification after the relation identification operation which is circulated for many times is finished.

In one embodiment, the above method further comprises configuring the configuration parameters according to one or more of: setting whether the analysis result is taken as a reference or an annotation is taken as a reference to carry out relationship identification; setting the number of data recorded in single analysis in a cycle; setting the number of data pieces increased in each cycle; setting the maximum cycle number; and whether the data is extracted randomly or in a certain order.

In another embodiment, wherein the relationship identification operation comprises analyzing relationships between data tables according to foreign keys in the data tables.

In yet another embodiment, wherein the relationship identifying operation includes identifying relationships between data tables by determining whether a similarity between table fields of the data tables is greater than or equal to a predetermined threshold.

In yet another embodiment, wherein the single relational identification operation comprises automatically performing a single relational identification for all data tables based on the one or more configuration parameters.

In yet another embodiment, wherein responsive to the presence of a new relationship, a loop record is added and data relationship identification is performed, and responsive to the absence of the new relationship, the loop is ended.

In yet another embodiment, wherein the automatic recognition results in an entity-contact ER graph.

In another aspect, the present invention provides an apparatus for determining relationships between data tables in a database, comprising: at least one processor; at least one memory storing computer program instructions that, when executed by at least one processor, cause the apparatus to perform the above-described method.

In yet another aspect, the invention provides a computer readable storage medium comprising program instructions for determining relationships between data tables in a database, which when executed by a processor, perform the method described above and its various embodiments.

According to the technical scheme for executing the table relationship identification disclosed by the invention, the time for identifying the table relationship can be saved to the greatest extent, and the identification accuracy is improved. Furthermore, according to the data relation diagram established by table relation recognition, the scheme of the invention can also provide various analysis and retrieval schemes by the data relation diagram so as to correct the table relation automatically recognized and further improve the accuracy of the relation recognition.

Drawings

The above and other objects, features and advantages of exemplary embodiments of the present invention will become readily apparent from the following detailed description read in conjunction with the accompanying drawings. In the accompanying drawings, which are meant to be exemplary and not limiting, several embodiments of the invention are shown and indicated, like or corresponding reference numerals being used for like or corresponding parts, wherein:

FIG. 1 is a functional block diagram illustrating a data service system according to an embodiment of the present invention;

FIG. 2 is a flow diagram illustrating a method for data collection and automatic identification of a data service system according to an embodiment of the present invention;

FIG. 3 is a flow diagram illustrating a method for automatic identification of data table relationships according to an embodiment of the invention;

FIG. 4 is a flow diagram illustrating a method for automatic identification of data table relationships in accordance with another embodiment of the present invention; and

FIG. 5 is a functional block diagram illustrating data table relationships according to an embodiment of the present invention.

Detailed Description

The technical solution of the present invention provides a method, apparatus and computer-readable storage medium for determining relationships between data tables in a database as a whole. Different from the existing data identification mode, the invention provides the technical scheme capable of automatically identifying the table and establishing the table relation diagram, so that the system identification speed can be increased, and in some aspects, a user can more easily analyze and search the data table through the table relation diagram, thereby increasing the analysis accuracy.

The technical solution of the present invention and various embodiments thereof will be described in detail below with reference to the accompanying drawings.

Fig. 1 is a functional block diagram illustrating a data service system 100 according to an embodiment of the present invention. As shown in fig. 1, the data service system 100 of the present invention can be divided into a data layer 110 and an application layer 120 according to functions and roles, wherein the data layer can be used to identify and save data. In one or more embodiments, the application layer may be divided into three functional blocks, task management 122, analysis tool 124, and system management 126, depending on function and role. The following will be described in detail with respect to the respective functional blocks:

for the task management function block 122, its main functions are throughout the data analysis process, and its specific operations may include, but are not limited to: the task is subjected to task operations such as new creation, viewing, deletion, import, export and sharing, and the task content can comprise data connection, extraction configuration, analysis configuration, template identification, code table identification, log table identification, table field identification, automatic analysis relationship identification, data label identification, data processing configuration, task starting, task log and other identification works related to table relationship establishment. The results after the task is completed can be displayed by adding labels or establishing table relations in the data.

For the analysis tool function block 124, its main function involves analyzing the results (table relationships) after the completion of the automatic execution again, including: filtering empty tables, filtering empty fields, data table analysis, table field analysis, table relationship analysis, table field retrieval, table field value retrieval, and the like. Thus, the accuracy of automatic analysis can be verified, and data table relationships and field value annotations can be further deeply analyzed.

For the system management function block 126, its main functions relate to user login operation and user management, where the main functions of user management include: and the contents of message reminding and operation log viewing, login password modification, user login switching, document viewing assistance and the like during task execution. In addition, the system management is also used for updating and maintaining subsequent information aiming at the data semantic library and the industry semantic library. In some embodiments, functions such as database setup and system setup may also be performed by the system management function block.

In one or more embodiments, the database of the present invention may use SQL Server (structured query Language Server). Using the SQL language, performing queries to a database, retrieving data from a database, inserting new records into a database, updating data in a database, deleting records from a database, creating a new database, creating new tables in a database, creating stored procedures in a database, creating views in a database, or setting up a table, storing procedures and permissions for views may be implemented. In other embodiments, the database of the present invention may use a Remote Dictionary Server (redis). In particular, the redis is a data structure server, which supports data persistence, and can save data in a memory in a disk, and can be loaded again for use at the time of restart. Based on the above description, those skilled in the art will appreciate that the database of the present invention may use various database management systems, which are currently available or developed in the future, as long as the database management system can provide a safe and reliable storage function for the structured data.

In some embodiments, when data needs to be read from a database and analyzed, the system may automatically implement operations such as data collection, extraction, cleaning, verification, warehousing, and the like through a collection template, where the reason for performing the extraction is that the obtained data may have multiple structures and types, and the data extraction process may help to convert these complex data into a single type or a type convenient to process, so as to achieve the purpose of rapid analysis and processing. For a scrubbing operation, some data is not intended to be of interest and other data is completely erroneous or irrelevant data, for example, because most data is not entirely valuable. Therefore, such useless data can be removed by the cleansing operation, and effective data can be extracted.

In general, for data for which there is already a matching acquisition template, the data acquisition procedure may be performed directly through the matching acquisition template. However, for data where there is no matching template, data recognition can be accomplished by the automated analysis of the present invention. For this reason, the specific flow of the data automatic identification and automatic analysis method of the present invention will be described below with reference to fig. 2 and 3.

FIG. 2 is a flow diagram illustrating a method 200 for automatic identification of a data service system in accordance with an embodiment of the present invention. As shown in FIG. 2, at step 201, the method 200 performs data acquisition preparations, such as identifying the database type and the data to be acquired. At step 202, the method 200 may identify a template based on the type of source database of the pre-collected data to determine whether there is a template matching the source database. When it is determined that there is a matching template for collecting data, the method 200 collects valid data from the source database through the template at step 203 without collecting all of the data. Next, at step 204, the method 200 outputs the collected valid data to a designated database for data processing.

Conversely, if at step 202 the method 200 does not find a matching template for the pre-collected data, then one or more fields in the data or relationships between tables of data may be identified by performing an automated analysis. Therefore, the system of the invention provides two automatic analysis schemes which respectively correspond to field marking identification and table relation identification, and the specific details are as follows:

1. and (3) field marking identification: the remark column of the field name in each data table has a corresponding label, and the identification label can clearly judge which data table fields have correlation. For example: the field name is Zhang III (name of a person), and the remark column is marked as 'name'.

2. Table relationship identification: the data tables with field relations are associated according to the comparison result by comparing the fields in the data tables one by one or in batches, and further a table relation graph is established. The table relationship diagram can be presented in a tree diagram manner, wherein the data tables and the fields are graph parts, and the data tables can be associated through field relationships (branch parts).

Based on the above two identification schemes, at step 205, the method 200 may determine whether to perform field label identification or table relationship identification according to the received external instruction. When field annotation recognition is performed, then at step 206, the method 200 determines whether the remark column for the field name in the data table has been annotated. When it is determined that an annotation already exists, then the method 200 performs a table field identification operation at step 208 and updates the annotation for the note that the annotation already exists but is flagged as erroneous at step 209. When it is determined that the aforementioned annotation does not exist, then at step 207, the method 200 performs a lookup of the semantic library and, at step 208, performs a table field identification operation. Next, at step 209, the remarks column for the field is marked with the corresponding label and stored.

In the present invention, when it is determined that table relationship identification is to be performed, the method 200 will continue to perform the table relationship identification operation represented by block a, which will be described in detail in conjunction with fig. 3 and 4. To facilitate a more clear understanding of the table relationship identification operations by those skilled in the art, the present invention will be described below in conjunction with fig. 3 and 4 for an automatic table relationship identification operation.

FIG. 3 is a flow diagram illustrating a method 300 for automatic identification of data table relationships, according to an embodiment of the invention. As shown in FIG. 3, the method 300 automatically performs the table relationship identification operation shown in block A of FIG. 2. Specifically, at step 301, a single or multiple-loop relationship identification operation is automatically performed on the data table according to one or more configuration parameters. In one embodiment, the configuration parameters may be configured according to one or more of the following: setting relation recognition based on the analysis result or the annotation, setting the number of data recorded in single analysis in a cycle, setting the number of data added in each cycle, and setting the maximum cycle number; and whether the data is extracted randomly or in a certain order. In one scenario, each field in the data table may be referred to as a data entry. In other words, one number of data pieces may be one number of fields, and each data table includes a plurality of numbers of data pieces.

One or more of the above-described methods for configuring configuration parameters are described in detail below:

1. setting relationship recognition by taking analysis results as reference or taking comments as reference

Based on the setting, the analysis result may be the table relationship obtained by the method 300 in performing the field comparison, the annotation may be a label corresponding to a remark column of the field name in each data table, and it may also be clearly determined which data table fields have relevance to each other by identifying the label. Based on this, through this setting of configuration parameters, the method 300 may decide whether to use the obtained table relationship by field comparison or the label of the field name in each data table as a basis for performing relationship identification after completing the relationship identification operation.

2. Setting the number of data pieces recorded by single analysis in cyclic relation recognition operation

The number of data records may represent the number of data records in a row, wherein a data record may be composed of one or more fields. For example, one data record in the personal table includes fields such as identification number, name and age, wherein the fields such as identification number, name and age are the number of data. The number of data records of a single-analysis record refers to the number of rows of data records of the child table that are compared to the fields of the parent table at a time.

3. Setting the number of pieces of data each increased in a cyclic relationship identification operation

The amount of data processed by the system each time is quite large, and if the result of comparison through one time is used only, it is easy for the relation to be missed and not recognized, so that comparison in batches is usually required. Currently, for example, when analyzing thousands of data tables, the number of data fields of one alignment may be set to 50. Similarly, when tens of thousands of data tables are to be analyzed, the number of data to be compared may be set to 500. The methods herein are merely exemplary and not limiting, and one skilled in the art may envision using other ways of operation to increase the number of data pieces based on the teachings herein.

4. Setting maximum number of cycles

The accuracy of relationship identification can be increased by performing identification through multiple cycles. However, a larger number of cycles also means that the time taken is relatively increased. Therefore, by setting the maximum number of cycles as a reference for the cycle, re-extraction of data can be stopped when the number of times data is extracted reaches the maximum number of cycles.

5. Randomly or in a certain order

In the relation identification operation, aiming at the selection of data, the field range in the extracted data number is wider by randomly extracting the data number, and the situation that the same or similar fields are repeatedly compared is avoided. In some scenarios, the number of pieces of data may also be extracted in a certain order if the field included in each number of pieces of data is significantly different.

In one embodiment, when the amount of data is small or the difference in data types is small, a single relationship identification operation may be performed. Conversely, when the data amount is large or the data type difference is large, the relationship identification operation may be performed in a loop multiple times. At step 302, when the method 300 performs a single relationship identification operation, the fields of all data tables may be compared with the foreign key in the data table, so that the data tables with the same foreign key are related and the automatically identified table relationship result is output.

Further, upon selecting the relationship identification operation to automatically perform the loop multiple times, at step 303, the method 300 determines whether a relationship or a new relationship exists between the data tables. Before this determination, the method 300 may optionally determine the number of fields to be compared for each loop, and may set the number of data pieces to be extracted for each loop by configuring parameters. Further, the relationship between the data tables may be determined by comparing the number of extracted data items with the foreign key in the data tables, analyzing whether the data tables have a relationship therebetween, or determining whether the similarity between fields of the data tables is greater than or equal to a predetermined threshold value to identify the relationship between the data tables. For example, after the first loop is executed, it is determined whether the extracted data includes the same field as the above-described foreign key. If such a field is included, this indicates that a relationship exists. Next, at step 304, in response to the existence of a relationship or a new relationship, the method 300 extracts the data relationship. After accessing the extracted data relationships, the method 300 continues to determine whether only this relationship or other relationships exist between the data tables. If necessary, different data pieces can be extracted again and repeatedly for comparison, so that the accuracy of relationship identification is not reduced due to the fact that the extracted data quantity is too small.

After completing the preliminary relationship establishment between the data tables, the method 300 adds a loop record and repeats the determining and extracting operations at step 305 in order to again confirm whether there are any more new relationships. For example, a set of data pieces may be added for comparison. After at least two more loop comparisons, if the same relationship still exists, it indicates that no new relationship exists, and the loop may be ended. On the contrary, if a new relationship exists, a new group of data can be added for comparison, and the above operations of judging and extracting relationships are repeated until no relationship exists or no new relationship exists between the data tables. Further, at step 306, after the end of the multiple-loop relationship identification operation, the method 300 outputs the automatically identified table relationship result.

To further illustrate the specific operation of data table relationship identification, FIG. 4 of the present invention shows a flow diagram of a method 400 for automatic identification of data table relationships, according to another embodiment of the present invention.

As shown in FIG. 4, the method 400 automatically performs a table relationship identification operation as shown in step A of FIG. 2. Due to the large number of data tables obtained by the collection, a large number of system tables, such as code tables, may be included. In view of this, at step 401, form selection may be performed before table relationship identification, with the system table being pre-excluded to reduce the amount of data identification required. Next, the method 400 selects one of the data tables to be compared as a parent table, and the remaining data tables to be compared as child tables. Further, at step 402, the number and manner of comparison of all fields in the parent table with the fields in the child table is determined by setting configuration parameters (regarding the setting of the configuration parameters, refer to the description in fig. 3), wherein the relationship identification of the parent table and the child table will be further described at step 404.

In one scenario, when the amount of data is small or the difference in data types is small, a single relationship identification operation may be performed. In contrast, when the data amount is large or the difference in data type is large, the relationship identification operation may be performed in a loop multiple times. Thus, at step 403, the method 400 may compare all data tables according to the foreign key in the data tables when performing a single relationship identification operation, so that a relationship may be established between data tables having the same foreign key. After the comparison, at step 408, the method 400 outputs the auto-id result after identifying according to the primary foreign key relationships of the data parent and child tables.

Further, when selecting a relationship identification operation that automatically performs a plurality of cycles, the method 400 may operate according to one or more of the configuration parameters previously described, such as the number of data pieces to be analyzed and recorded for a single cycle, the number of data pieces that are incremented each time in a relationship identification operation for a cycle, the maximum number of cycles, and randomly or in a certain order. At step 404, the method 400 determines the data table relationship by comparing a field or foreign key in the parent table to a plurality of child tables, which may be by comparing the number of data pieces extracted in the child tables based on the foreign key in the data table. Next, at step 405, the method 400 determines whether there is an unprocessed new relationship between the data tables, wherein one possible determination method is to determine whether there is a new relationship by designating one of the data tables in the database as a parent table (or called a master table), the remaining data tables as child tables (or called slave tables, which include a plurality of fields), extracting all fields in the parent table (all fields include a master key and a plurality of foreign keys), and comparing all fields with data in the plurality of child tables one by one to determine whether there is a new relationship.

In one or more embodiments, in performing table relationship identification, the method 400 may also base table relationship identification on a primary foreign key of a data table in the database. Because the data tables in a relational database all include a primary key, a primary key is a field or set of fields that uniquely identifies a row in a table. For example, assume that the data table is a personal table of personal information, which contains a record of fields such as identification number, name and age. Since only the identification number uniquely identifies the individual, and other fields such as name and age may be duplicated, the identification number is the primary key of the personal table.

In some application scenarios, the primary key may also be a field group. For example, another personal table similar to the one described above has a record that includes fields for name, age, and gender. When comparing the two personal lists, comparing name and age alone may have multiple duplicate records, with only the combination of name and age being used as a record for unique identification. In view of this, the field set of name and age may be considered a primary key. In addition, since the primary key can uniquely identify a certain row of records, it can be ensured that no error occurs when data updating and deleting are performed, and therefore, the primary key can be used for being associated with other tables and can be used as a unique identifier in the primary table.

In addition, the data table may also include one or more foreign keys that may be used to create an association with another table. For example, one record in the student table includes the school number, name, sex, class, and the like, wherein the name in the personal table is not the primary key of the personal table, but the name in the personal table and the name in the student table may correspond to each other, and the name in the student table is the primary key of the student table, so the name in the student table may be the foreign key of the personal table. The identification of table relationships of the present invention will be clearly understood by those skilled in the art from the description herein. Further, the table relationship identifying operation may identify the relationship between the data tables by determining whether a similarity between table fields of the data tables is greater than or equal to a predetermined threshold. In one scenario, when the parent table and the child table do not have a primary foreign key relationship through comparison, it may be determined that the new relationship is obtained according to the similarity between the table fields being above a certain predetermined threshold (e.g., 80%). Exemplary ways of identifying table relationships of the present invention will be apparent to those skilled in the art from the above detailed description of the primary foreign key identification.

Further, at step 406, the method 400 extracts associated fields and primary foreign key relationships between the data after confirming the relationships between the data tables by field alignment. In one scenario, the method 400 continues to determine whether only this relationship or new relationships between the data tables exist. To do so, at step 407, the fields in the parent table are compared to another set of fields in the child table, and then a return is made to step 405 to determine if there is an unprocessed new relationship between the data tables. When it is determined that a new relationship exists, steps 406-407 may be repeated to extract the data relationship and add a new field set for comparison. Specifically, the method 400 may loop through the aforementioned operations of determining the relationship (step 405), extracting the data relationship (step 406), and performing data relationship identification (step 407) according to the existence of the relationship or the new relationship between the data tables until determining that no relationship or no new relationship exists between the data tables. Through extracting different data for a plurality of times and comparing the data, the accuracy rate of relation identification can not be reduced due to the fact that the extracted data amount is too small.

At step 408, the method 400 outputs an automatic identification result after identifying the main foreign key Relationship according to the data parent table and the data child table, wherein the automatic identification result may be displayed in the form of an ER Diagram (also called Entity-contact Diagram), and the ER Diagram of the specific table Relationship is shown in fig. 5.

FIG. 5 is a functional block diagram illustrating table relationships according to an embodiment of the present invention. The interface of the data service system of the present invention has one or more display areas available for operation, and only a portion of the data service system or a simplified version thereof is disclosed based on a convenient description. The following description is directed to only a portion of the main operation display area related to the technical solution disclosure.

As shown in fig. 5, the interface 500 is divided into a first display area 510 and a second display area 520, wherein the first display area is used for presenting the function stage currently executed by the system and is displayed in the form of function blocks. According to the execution program of the system, the functional phases can be divided into functions of task management, data query, analysis tool, system management and the like. The second display area is used for presenting the result obtained after the function block of the first display area is executed, and comprises a table relation ER graph.

As described above, the relationship between data tables can be determined by comparing the similarity between the main foreign key and the table field, and the ER diagram of the table relationship according to the embodiment of the invention can be constructed according to the association between a plurality of data tables and a plurality of table fields. As shown in the second display area, the blocks 521-526 in the table relationship ER map respectively correspond to a data table. Further, the field relationship between each data table can be clearly identified through the tree structure (including view or relationship, etc.) expanded by the ER diagram.

In one application scenario, data table a includes a plurality of fields a11, a12, a13 …, etc., and similarly data table b includes a plurality of fields b11, b12, b13 …, etc., as well as data table c, data table d, data table e, data table f, etc., each include a plurality of fields, and data table relationships have been established based on field comparisons in the plurality of data tables. Further, through system analysis, a relationship line 527 may be generated between the data tables where the relationship exists. When the relationship between the two data tables is judged to be fuzzy (for example, after comparison, the fields are the same, but the corresponding class codes are different), the relationship can be identified by a colored table relationship line after automatic analysis. It is to be understood that the display lines in fig. 5 are merely exemplary and not restrictive, and the number and representation thereof may vary depending on the number of relational tables.

The scheme of the present invention is described in detail above. Based on the description, those skilled in the art can understand that the technical scheme of table relationship identification disclosed by the invention can complete identification operation through automatic relationship identification under the condition that no corresponding template is used for collecting data, so that a large amount of time cost of manual identification can be saved. In addition, the recognition with regularity through the system also reduces the error efficiency, and can generate great advantages for the integration of subsequent data. Further, as will be apparent to those skilled in the art from the description of the present specification, the present invention also discloses an apparatus for determining relationships between data tables in a database, comprising: at least one processor; at least one memory storing computer program instructions that, when executed by at least one processor, cause the apparatus to perform the methods and embodiments thereof described in connection with the figures. Also disclosed is a computer readable storage medium comprising program instructions for determining relationships between data tables in a database, which when executed by a processor, perform the method described in connection with the figures and its various embodiments.

It should be appreciated that aspects of the invention may be performed by any module, unit, component, server, computer, terminal, or device executing instructions, and that such module, unit, component, server, computer, terminal, or device may include or otherwise access a computer-readable medium, such as a storage medium, computer storage medium, or data storage device (removable) and/or non-removable) such as, for example, a magnetic disk, optical disk, or magnetic tape. Computer storage media may include volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information, such as computer readable instructions, data structures, program modules or other data.

Examples of computer storage media include RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by an application, a module, or both. Any such computer storage media may be part of, or accessible or connectable to, a device. Any applications or modules described herein may be implemented using computer-readable/executable instructions that may be stored or otherwise maintained by such computer-readable media.

It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification and claims of this application, the singular form of "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should be further understood that the term "and/or" as used in the specification and claims of this specification refers to any and all possible combinations of one or more of the associated listed items and includes such combinations.

The principles of the present invention have been explained above by means of a number of embodiments, and such an explanation is only intended to help understand the method of the present invention and its core idea. The invention is not limited to the embodiments described above, but rather to the embodiments of the invention, which are applicable to various fields of application.

Claims

1. A method for determining relationships between data tables in a database, comprising:

automatically performing a single or multiple-cycle relationship identification operation on the data table according to one or more configuration parameters;

wherein in the relationship identification operation that loops multiple times, the method performs:

judging whether a relationship or a new relationship exists between the data tables;

extracting a data relationship in response to the relationship or the new relationship existing; and

adding a loop record and repeatedly executing the judging and extracting operation;

and outputting the table relation result of automatic identification after the relation identification operation which is circulated for many times is finished.

2. The method of claim 1, further comprising:

selecting a data table before executing the relation identification operation which is circulated for multiple times; and

and carrying out relation identification operation on the data tables except the system table in the data tables.

3. The method of claim 1, further comprising configuring the configuration parameters according to one or more of:

setting whether the analysis result is taken as a reference or an annotation is taken as a reference to carry out relationship identification;

setting the number of data recorded in single analysis in a cycle;

setting the number of data pieces increased in each cycle;

setting the maximum cycle number; and

the data is extracted randomly or in a certain order.

4. The method of claim 1, wherein the relationship identification operation comprises analyzing relationships between data tables according to foreign keys in the data tables.

5. The method of claim 4, wherein the relationship identifying operation comprises identifying relationships between data tables by determining whether a similarity between table fields of the data tables is greater than or equal to a predetermined threshold.

6. The method of claim 2 or 5, wherein the single instance of the relationship identification operation comprises automatically performing a single instance of the relationship identification for all data tables according to the one or more configuration parameters.

7. The method of claim 1, wherein in response to a new relationship existing, adding a loop record and performing data relationship identification, and in response to the new relationship not existing, ending the loop.

8. The method of claim 1, wherein the automatic recognition result is an entity-contact ER graph.

9. An apparatus for determining relationships between data tables in a database, comprising:

at least one processor;

at least one memory storing computer program instructions that, when executed by at least one processor, cause the apparatus to perform the method of any of claims 1-8.

10. A computer readable storage medium comprising program instructions for determining relationships between data tables in a database, which when executed by a processor, performs the method of any one of claims 1-8.