WO2023097521A1 - 数据模型生成的方法和装置 - Google Patents
数据模型生成的方法和装置 Download PDFInfo
- Publication number
- WO2023097521A1 WO2023097521A1 PCT/CN2021/134650 CN2021134650W WO2023097521A1 WO 2023097521 A1 WO2023097521 A1 WO 2023097521A1 CN 2021134650 W CN2021134650 W CN 2021134650W WO 2023097521 A1 WO2023097521 A1 WO 2023097521A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- target
- data
- identifier
- database
- statement
- Prior art date
Links
- 238000013499 data model Methods 0.000 title claims abstract description 119
- 238000000034 method Methods 0.000 title claims abstract description 73
- 230000008520 organization Effects 0.000 claims abstract description 11
- 239000000284 extract Substances 0.000 claims description 16
- 238000003860 storage Methods 0.000 claims description 10
- 230000004931 aggregating effect Effects 0.000 claims description 5
- 238000001914 filtration Methods 0.000 claims description 2
- 238000012827 research and development Methods 0.000 abstract description 5
- 101100328886 Caenorhabditis elegans col-2 gene Proteins 0.000 description 20
- 101100328884 Caenorhabditis elegans sqt-3 gene Proteins 0.000 description 19
- 238000004891 communication Methods 0.000 description 14
- 238000010586 diagram Methods 0.000 description 13
- 238000012545 processing Methods 0.000 description 9
- 230000006870 function Effects 0.000 description 8
- 230000008569 process Effects 0.000 description 8
- 230000002776 aggregation Effects 0.000 description 4
- 238000004220 aggregation Methods 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 4
- 238000004590 computer program Methods 0.000 description 4
- 230000005540 biological transmission Effects 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000013461 design Methods 0.000 description 3
- 240000008790 Musa x paradisiaca Species 0.000 description 2
- 244000061456 Solanum tuberosum Species 0.000 description 2
- 235000002595 Solanum tuberosum Nutrition 0.000 description 2
- 235000021015 bananas Nutrition 0.000 description 2
- 238000013500 data storage Methods 0.000 description 2
- 239000003599 detergent Substances 0.000 description 2
- 238000009826 distribution Methods 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 235000012015 potatoes Nutrition 0.000 description 2
- -1 table2 Proteins 0.000 description 2
- 235000007688 Lycopersicon esculentum Nutrition 0.000 description 1
- 244000141359 Malus pumila Species 0.000 description 1
- 240000003768 Solanum lycopersicum Species 0.000 description 1
- 235000021016 apples Nutrition 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 239000002131 composite material Substances 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000002085 persistent effect Effects 0.000 description 1
- 230000002441 reversible effect Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 235000013311 vegetables Nutrition 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/211—Schema design and management
- G06F16/212—Schema design and management with details for data modelling support
Definitions
- the present application relates to the field of database technology, and more specifically, to a method and device for generating a data model.
- the present application provides a method and device for generating a data model, which can generate a data model closely related to a target business with high accuracy and efficiency, so that an enterprise can conduct independent research and development based on the generated data model.
- a method for generating a data model including: acquiring a data access statement sent from a client to a database, the data access statement being used to access data of a target business in the database; according to the data access statement Obtaining the identification of each target table in at least one target table in the database called by the target service and the identification of the target column in the target table; according to the identification of the target table and the organization of the tables in the database and structure, determine the data type of the target column; determine the data model of the database according to the identifier of the target table, the identifier of the target column, and the data type.
- the data access statement related to the user's operation will be sent from the user end to the database. Therefore, the data access statement obtained by the above technical solution is closely related to the target business accessed by the user, and the data model generated based on the data access statement is also closely related to the target business accessed by the user, which can effectively avoid the problem of generating a large number of invalid data models . Further, the above technical solution does not rely on manual experience, effectively improving the efficiency and accuracy of data model generation.
- the obtaining the data access statement sent by the client to the database includes: filtering a plurality of pieces of information generated based on the target service to obtain the data access statement, wherein the multiple This information includes the data access statement.
- a plurality of pieces of information can be generated based on the target service, and the pieces of information include other information besides the data access statement.
- the above technical solution filters the multiple information to filter out information other than data access statements in the multiple information, so that the remaining information is data access statements closely related to the target business, thereby further avoiding the generation of Lots of problems with invalid data models.
- acquiring the identifier of each target table in at least one target table in the database called by the target service and the identifier of the target column in the target table comprising: when the data access statement is a query statement, extracting a first character string between a secondary keyword and a conditional keyword in the query statement, where the first character string is an identifier of the target table; Extracting a second character string between the selection key and the secondary key in the query statement, where the second character string is an identifier of the target column.
- the method further includes: obtaining the multiple target tables according to the join query statement The relationship of the table on the target business; said determining the data model of the database according to the identification of the target table, the identification of the target column, and the data type includes: according to the identification of the target table, The identifier of the target column, the relationship of the multiple target tables on the target service, and the data type determine the data model.
- the structural information in the data access statement is used to analyze the potential relationship between tables, and a data model is generated according to the relationship. Since the above technical solution does not depend on the reference foreign key in the schema in the database, it avoids the disadvantages of the generated data model being inaccurate or even unable to generate the data model relationship due to the absence of the reference foreign key in the schema in the database, which further improves the generated data. model accuracy.
- the obtaining the relationship of the multiple target tables on the target business according to the connection query statement includes: following the condition keyword in the connection query statement In the third character string, the relationship between the multiple target tables on the target business is acquired.
- acquiring the identifier of each target table in at least one target table in the database called by the target service and the identifier of the target column in the target table comprising: when the data access statement is an insert statement, extracting a fourth character string between an insert keyword and a value keyword in the insert statement, where the fourth character string is an identifier of the target table; Extracting a fifth character string in brackets after the value keyword in the insert statement, where the fifth character string is an identifier of the target column.
- the method further includes: aggregating the identifier of the target table and the identifier of the target column;
- the data type, determining the data model of the database includes: determining the data model based on the aggregated identifier of the target table and the identifier of the target column, and based on the data type.
- the above technical solution before generating the data model, aggregates the identification of the target table and the identification of the target column. Compared with the identification of the target table and the identification of the target column in a discrete state, it is convenient for subsequent operations and is conducive to improving the efficiency of data model generation. efficiency.
- a device for generating a data model including: a memory for storing a program; a processor for executing the program stored in the memory, and when the program stored in the memory is executed, the processing
- the device is configured to execute the method in the above first aspect or its various implementation manners.
- a computer-readable storage medium which stores program codes for device execution, where the program codes include instructions for executing the steps in the methods of the above-mentioned first aspect or various implementations thereof.
- a device for generating a data model including: a preprocessor, configured to acquire a data access statement sent from a client to a database, and the data access statement is used to access data of a target service in the database; An analyzer, configured to obtain the identifier of each target table in at least one target table in the database called by the target service and the identifier of the target column in the target table according to the data access statement; the model generator, It is used to determine the data type of the target column according to the identification of the target table and the organization and structure of the table in the database; the model generator is also used to determine the target column according to the identification of the target table and the target column The identifier and the data type determine the data model of the database.
- an apparatus for generating a data model including units configured to execute the method in the above first aspect or various implementations thereof.
- FIG. 1 is a schematic diagram of a system architecture according to an embodiment of the present application.
- Fig. 2 is a schematic diagram of a method for generating a data model according to an embodiment of the present application.
- FIG. 3 is a flow chart of a method for generating a data model according to an embodiment of the present application.
- FIG. 4 is a flowchart of another method for generating a data model according to an embodiment of the present application.
- Fig. 5 is a schematic block diagram of an apparatus for generating a data model according to an embodiment of the present application.
- Fig. 6 is a schematic block diagram of another apparatus for generating a data model according to an embodiment of the present application.
- Fig. 7 is a schematic block diagram of another device for generating a data model according to an embodiment of the present application.
- sequence numbers of the processes do not mean the order of execution, and the execution order of the processes should be determined by their functions and internal logic, rather than by the execution order of the embodiments of the present application.
- the implementation process constitutes no limitation.
- the first method is based on manual analysis. Specifically, this method mainly sorts out the columns (or called fields) and data on the interactive forms of each application system when processing various businesses manually, and finds the columns and data of tables that match the columns and data in the database. data. Then, the information is summarized and sorted, and finally the tables and corresponding columns of the database that the application system may call are analyzed.
- the application system is a financial system, and the user needs to query the sales of each quarter of the year. After the user queries, the output data of the financial system are "A", "B", "C” and "D".
- the user can reversely look up "A”, “B”, “C” and “D” in the database, or determine how many A data with characteristics, such as “C” and “D”, and then look up “C” and “D” in the database, and finally analyze the tables and corresponding columns of the database that the financial system may call.
- This method mainly relies on manual experience, so the efficiency and accuracy are relatively low, especially in the case of a database including a large number of tables.
- the calculation logic of the application system after form submission may involve aggregation query and processing of many tables or intermediate table structures, these queries and processing cannot be visually seen from the user interface (UI), so manual
- UI user interface
- the second method is an analysis method based on the organization and structure (Schema) of tables in the database. Specifically, the method obtains the table structure and relationship of all databases in the current application system by calling the relevant interface of the corresponding database of the application system, and finally reversely generates the data model.
- this method cannot dynamically perceive which tables in a certain business database have been accessed, it cannot help users effectively sort out the data models used by the current application system. For example, there are 1000 tables in the database. When a user accesses the data of a certain business, the application system may only use 100 of them. However, since this method cannot know which 100 of the 1000 tables are used, it can only Perform reverse data model generation for all 1000 tables, which takes a long time and is inefficient. In addition, since there may be data generated by non-human-computer interaction or data irrelevant to the business currently accessed by the user in the database, the generated data model may be separated from the business and has nothing to do with the business currently accessed by the user.
- the relationship in the data model generated by this method is highly dependent on the reference foreign key of the table in the database. If there is no such information in the table, the relationship of the data model cannot be generated. A large number of actual production environments have proved that the larger the enterprise system, the less such strong reference foreign key relationships are included. Therefore, the feasibility of this method in the actual production environment is not high.
- the embodiment of the present application proposes a data model generation method, which can generate a data model closely related to the target business with high accuracy and efficiency, so that enterprises can conduct independent research and development based on the generated data model.
- FIG. 1 is a schematic diagram of a system architecture according to an embodiment of the present application.
- the system architecture shown in FIG. 1 includes a client 110 , a database 120 and a third-party device 130 .
- the client 110 may be an interface for providing data query service management.
- the user terminal 110 may also be called a client or other names.
- the database 120 can be, for example, mysql, oracle, sqlserver, sqlite, etc. Information transmission can be carried out between the client terminal 110 and the database 120. For example, when a user needs to inquire about data, such as inquiring about the sales volume of each quarter in a year, the query information can be input at the client terminal 110, and the client terminal 110 receives the query information. After querying the information, convert the query information into a database language, such as structured query language (structured query language, SQL), and send the database language to the database 120 . Afterwards, the database 120 queries the sales of each quarter of the year based on the received database language, and sends the query results to the client 110 . Afterwards, the user terminal 110 can output the query results to the user, so that the user can obtain the sales of each quarter of the year.
- a database language such as structured query language (structured query language, SQL)
- the third-party device 130 may communicate with the client 110 and the database 120 to obtain information transmitted between the client 110 and the database 120 , and perform some operations based on the information, such as generating a data model of the database 120 .
- the third party 130 may include a communication interface to implement a communication connection with other devices (such as the client 110).
- the communication connection may be wired or wireless.
- the third-party device 130 can obtain the information during the information transmission; or, if the client 110 sends information to the database 120, the third-party device 130 can obtain the information from the database 120 after the information is sent to the database 120 .
- the third party device 130 may be a server.
- the server is a device that provides computing services.
- the composition of the server includes a processor, hard disk, memory, system bus, etc.
- the server is similar to a general-purpose computer architecture, but due to the need to provide high Reliability, security, scalability, manageability and other aspects have high requirements.
- FIG. 1 is only a schematic diagram of a system architecture provided by an embodiment of the present application, and the positional relationship among devices, devices, modules, etc. shown in the figure does not constitute any limitation.
- FIG. 2 shows a schematic flowchart of a method 200 for generating a data model according to an embodiment of the present application.
- the method 200 may be executed by a third-party device other than the client and the database, such as the third-party device 130 shown in FIG. 1 .
- the method 200 may include at least part of the following contents.
- Step 210 Obtain the data access statement sent by the client to the database, the data access statement is used to access the data of the target service in the database.
- Step 220 According to the data access statement, obtain the identification of each target table in at least one target table in the database called by the target service and the identification of the target column in each target table.
- Step 230 Determine the data type of the target column according to the identifier of the target table and the schema of the database.
- Step 240 Determine the data model of the database according to the identifier of the target table, the identifier and data type of the target column.
- the data access statement related to the user's operation will be sent from the user end to the database. Therefore, the data access statement obtained in the embodiment of the present application is closely related to the target business accessed by the user, and the data model generated based on the data access statement is also closely related to the target business accessed by the user, which can effectively avoid the generation of a large number of invalid data models. question. Further, the above technical solution does not rely on manual experience, effectively improving the efficiency and accuracy of data model generation.
- the third-party device may include a preprocessor, an analyzer, and a model generator.
- step 210 may be performed by a preprocessor
- step 220 may be performed by an analyzer
- steps 230 and 240 may be performed by a model generator.
- the third-party device may also include other devices, such as a data type processor.
- step 210 may be performed by a preprocessor
- step 220 may be performed by an analyzer
- step 230 may be performed by a data type processor
- step 240 may be performed by a model generator.
- the method in the embodiment of the present application will be described below by taking step 210 to be executed by the preprocessor, step 220 to be executed by the analyzer, and steps 230 and 240 to be executed by the model generator as examples.
- the preprocessor can obtain data access statements through plugins provided by the database, such as command-line tools.
- the preprocessor may enable a certain option, and then may monitor the data access statement sent by the client to the database, so as to obtain the data access statement.
- the preprocessor may obtain the data access statement from the database, or may also obtain the data access statement during the process of transmitting the data access statement from the client to the database.
- the data access statement may include but not limited to SQL statement.
- the embodiment of the present application uses an SQL statement as an example for description.
- the SQL statement may include a data definition language (data definition language, DDL) and a data manipulation language (data manipulation language, DML), and the SQL statement in the embodiment of the present application is a DML statement.
- DML statements may include insert (insert) statements, delete (delete) statements, modify (update) statements, and query (select) statements.
- the SQL statement in the embodiment of the present application is a select statement or an insert statement.
- the database can generate context information based on the SQL statement, such as a generated timestamp, session ID or process ID, etc. That is to say, the database can generate multiple pieces of information based on the target service, and the multiple pieces of information include other information besides multiple SQL statements. Moreover, even SQL statements may include delete statements and update statements.
- the user end may also transmit other information to the database.
- step 210 may specifically include: the preprocessor filters the multiple information generated based on the target business, filters out information other than the SQL statement in the multiple information, and filters out the delete statement and the update statement in the SQL statement , to get the select statement or insert statement.
- the technical solution filters the plurality of information to filter out information other than data access statements in the plurality of information, so that the remaining information is data access statements closely related to the target business, thereby further avoiding the generation of Lots of problems with invalid data models.
- the preprocessor may receive multiple lines of information at the same time, therefore, optionally, as shown in Figure 3 and Figure 4, the preprocessor may also analyze the received multiple lines of information line by line, and then perform Filter to get the SQL statement.
- the identifier of the target table may be a table name of the target table, an ID of the target table, or a serial number of the target table in all tables in the database.
- the identifier of the target column may be a column name of the target column, an ID of the target column, or a number of the target column in the target table.
- the identification of the target table is the table name
- the identification of the target column is the column name.
- the analyzer obtains the table name of the target table and the column name of the target column in different ways.
- the analyzer may extract the first character string between the from (from) keyword and the condition keyword in the select statement, and the first character string is the table name of the target table.
- the select statement is select col1, col2, col3 from table_1where..., the character string between the from keyword and the condition keyword in the select statement is table_1, and the extracted table name is table_1.
- the select statement is select col1, col2, col3 from table_1 left join table_2 on..., the strings between the from keyword and the condition keyword in the select statement are table_1 and table_2, and the extracted table names are table_1 and table_2.
- the multiple first character strings can be separated by commas to obtain a set of table names, such as [table_1, table_2] above.
- the analyzer may extract a second character string between the select keyword and the from keyword in the select statement, where the second character string is the column name of the target column. If there are multiple column names, multiple column names can also be separated by commas.
- the select statement is select col1, col2, col3 from table_1 where...
- the string between the select keyword and the from keyword in the select statement is col1, col2, col3, and the column name of the target column is [col1, col2, col3].
- the analyzer can extract the fourth character string between the insert into keyword and the values keyword in the insert statement, and the fourth character string is the table name of the target table.
- the analyzer may extract the fifth character string in the parentheses behind the values keyword in the insert statement, where the fifth character string is the column name of the target column.
- method 200 It may also include: the analyzer obtains the relationship of multiple target tables in the target business according to the connection query statement. In other words, the analyzer can obtain which columns are associated among multiple target tables in the target business according to the connection query statement.
- the analyzer can obtain the relationship of multiple target tables in the target business from the third character string after the condition keyword in the join query statement.
- table_1 and table_2 After determining the relationship between table_1 and table_2, you can concatenate table1, col1, table2, and col2 together to form a quadruple [table1, col1, table2, col2].
- This quadruple can indicate that table_1 and table_2 are associated through the column named col1 in table_1 and the column named col2 in table_2.
- the third character string after the condition keyword can be extracted respectively to obtain a multi-dimensional quaternion array. For example:
- the analyzer in addition to determining the data model according to the table name of the target table, the column name and data type of the target column, the analyzer can also determine the data model according to the relationship between multiple tables in the target business.
- the structural information in the data access statement is used to analyze the potential relationship between tables, and a data model is generated according to the relationship. Since the above technical solution does not depend on the reference foreign key in the schema in the database, it avoids the disadvantages of the generated data model being inaccurate or even unable to generate the data model relationship due to the absence of the reference foreign key in the schema in the database, which further improves the generated data. model accuracy.
- the analyzer obtains the table name of the target table and the column name of the target column in different ways. Therefore, in one implementation, there can be two analyzers, one of which (for convenience of description, called the first analyzer) can obtain the table name of the target table, the column name of the target column and multiple For the relationship of the target table in the target business, another analyzer (called the second analyzer) can obtain the table name of the target table and the column name of the target column in the insert statement.
- the first analyzer for convenience of description, called the first analyzer
- the second analyzer can obtain the table name of the target table and the column name of the target column in the insert statement.
- the analyzer in Figure 3 is the second analyzer, and the second analyzer can obtain table names and column names; the analyzer in Figure 4 is the first analyzer, and the first analysis The browser can get the table name, column name and the relationship between the table and the table.
- the preprocessor may determine the type of the SQL statement, and then send the SQL statement to the corresponding analyzer. That is, the select statement is sent to the first analyzer in Figure 4, and the insert statement is sent to the second analyzer in Figure 3.
- the analyzer in FIG. 4 .
- the analyzer can perform three steps: obtaining the table name of the target table in the SQL statement, obtaining the column name of the target column, and obtaining the relationship of multiple target tables in the target business. If the SQL statement is a connection query statement, the analyzer can execute the three steps in sequence; if the SQL statement is another statement (such as an insert statement), the analyzer does not need to execute the relationship between multiple target tables in the target business. In this step, only the first two steps are performed.
- the method 200 may also include: aggregating the table name of the target table and the column name of the target column, or aggregating the table name of the target table, the column name of the target column Column names and the relationship between multiple target tables in the target business are aggregated.
- the aggregated table names and column names may be referred to as a structured result set, or the aggregated table names, column names, and relationships may be referred to as a structured result set.
- the structured result set can be understood to describe an object in a non-discrete manner. For example, when describing what items are in the supermarket, it can be described as apples, bananas, etc. for fruits, laundry detergent, towels, etc. for daily necessities, and potatoes, tomatoes, etc. for vegetables, instead of bananas, laundry detergent, potatoes, etc. in the supermarket.
- the table names are table_X and table_Y respectively, where the column names of the target columns in table_X are col1, col2 and colN respectively, and the column names of the target columns in table_Y are col1, col2 and colN respectively , and table_X and table_Y are associated through the column named col1 in table_X and the column named col2 in table_Y.
- JSON JS object notation
- the above technical solution before generating the data model, aggregates the identification of the target table and the identification of the target column. Compared with the identification of the target table and the identification of the target column in a discrete state, it is convenient for subsequent operations and is conducive to improving the efficiency of data model generation. efficiency.
- Model Builder After the analyzer determines the column name of the target column, Model Builder also needs to determine the data type of the target column. For example, whether the data type of the target column is an integer type or a string type. Specifically, the model generator can extract the data type of the target column in the target table from the schema of the database according to the table name of the target table, so that a list of triples can be obtained, for example, [[table_1,col1,datatype] ,[table_1,col2,datatype],[table_N,colN,datatype]]. Among them, datatype represents the data type.
- the model generator can base on the table name of the target table, the column name and data type of the target column, or based on the table name of the target table, the column name of the target column, multiple target tables in the target business
- the relationships and data types on the database determine the data model of the database.
- the table name of the target table can be used as the entity name of the data model
- the column name of the target column can be used as the attribute name of the corresponding entity
- the data type of the target column can be used as the data type of the corresponding entity attribute.
- the target table of the target business call includes four, and the table names are Table_X, Table_Y, Table_Z, and Table_N, where the column name of the target column in Table_X is Col_X, and the data type of the target column is Datatype_X; the column name of the target column in Table_Y Col_Y, the data type of the target column is Datatype_Y; the column name of the target column in Table_Z is Col_Z, the data type of the target column is Datatype_Z; the column name of the target column in Table_N is Col_N, and the data type of the target column is Datatype_N.
- the generated data model can be shown in Table 1.
- the table names of the multiple target tables that are related to each other can be used as the entity names of the multiple related entities, and the column names of the multiple target columns that are related to each other As an associated property of multiple entities that are related to each other.
- Table 1 Take Table 1 as an example for illustration. Assuming that Table_X and Table_Y are related to each other through the column named Col_X in Table_X and the column named Col_Y in Table_Y, and Table_Z and TableN are related to each other through the column named Col_Z in Table_Z and the column named Col_Z in Table_N, then The obtained relationships among the multiple entities can be shown in Table 2.
- from_tab and to_tab are the entity names of the two entities that are related to each other.
- from_col and to_col are the associated attributes of the two entities associated with each other.
- the analyzer can also perform attribute expansion on the entity with the same name.
- the method 200 may further include: persisting the data model, so as to store the data model in an underlying database.
- Fig. 5 shows a schematic block diagram of an apparatus 500 for generating a data model according to an embodiment of the present application.
- the apparatus 500 for generating a data model may execute the method 200 for generating a data model in the embodiment of the present application, and the apparatus 500 for generating a data model may be a third-party device in the foregoing method.
- the device 500 for generating the data model may include:
- the preprocessor 510 is configured to obtain a data access statement sent by the client to the database, and the data access statement is used to access data of a target service in the database.
- the analyzer 520 is configured to obtain, according to the data access statement, an identifier of each target table in at least one target table in the database called by the target service and an identifier of a target column in the target table.
- a model generator 530 configured to determine the data type of the target column according to the identifier of the target table and the organization and structure of the tables in the database.
- the model generator 530 is further configured to determine the data model of the database according to the identifier of the target table, the identifier of the target column, and the data type.
- the preprocessor 510 is specifically configured to: filter a plurality of information generated based on the target service to obtain the data access statement, wherein the plurality of The information includes the data access statements.
- the analyzer 520 is specifically configured to: when the data access statement is a query statement, extract the first keyword between the secondary keyword and the conditional keyword in the query statement.
- a character string, the first character string is the identification of the target table; extract the second character string between the selection keyword and the secondary keyword in the query access statement, the second character string is The ID of the target column.
- the analyzer 520 is specifically configured to: according to the connection query statement , to obtain the relationship of the multiple target tables on the target business; according to the identifier of the target table, the identifier of the target column, the relationship of the multiple target tables on the target business and the data Type, which identifies the data model.
- the analyzer 520 is specifically configured to: obtain the multiple target tables from the third character string after the condition keyword in the connection query statement.
- the target business relationship is specifically configured to: obtain the multiple target tables from the third character string after the condition keyword in the connection query statement.
- the analyzer 520 is specifically configured to: when the data access statement is an insert statement, extract the first key between the insert keyword and the value keyword in the insert statement.
- the fourth character string is the identifier of the target table; extract the fifth character string in brackets after the value keyword in the insert statement, the fifth character string is the target column logo.
- the analyzer 520 is further configured to: aggregate the identifier of the target table and the identifier of the target column; the model generator 530 is specifically configured to: based on aggregation The identification of the target table and the identification of the target column, and based on the data type, determine the data model.
- preprocessor 510 the analyzer 520, and the model generator 530 can respectively implement the corresponding operations of the preprocessor, analyzer, and model generator in the method 200, and details are not repeated here for brevity.
- FIG. 6 shows a schematic block diagram of an apparatus 600 for generating a data model according to another embodiment of the present application.
- the apparatus 600 for generating a data model may execute the method 200 for generating a data model in the embodiment of the present application, and the apparatus 600 for generating a data model may be a third-party device in the foregoing method.
- the device 600 for generating the data model may include:
- the obtaining unit 610 is configured to obtain a data access statement sent by the client to the database, where the data access statement is used to access data of a target service in the database.
- the obtaining unit 610 is further configured to obtain, according to the data access statement, an identifier of each target table in at least one target table in the database called by the target service and an identifier of a target column in the target table.
- the determining unit 620 is configured to determine the data type of the target column according to the identifier of the target table and the organization and structure of the tables in the database.
- the determining unit 620 is further configured to determine the data model of the database according to the identifier of the target table, the identifier of the target column, and the data type.
- the obtaining unit 610 is specifically configured to: filter a plurality of pieces of information generated based on the target service to obtain the data access statement, wherein the pieces of information Include the data access statement.
- the acquiring unit 610 is specifically configured to: when the data access statement is a query statement, extract the first keyword between the secondary keyword and the conditional keyword in the query statement.
- a character string, the first character string is the identifier of the target table; extract the second character string between the selection keyword and the secondary keyword in the query statement, the second character string is the The ID of the target column described above.
- the obtaining unit 610 is specifically configured to: according to the connection query statement , to obtain the relationship of the multiple target tables on the target business; according to the identification of the target table, the identification of the target column, the relationship of the multiple target tables on the target business and the data Type, which identifies the data model.
- the acquiring unit 610 is specifically configured to: acquire the multiple target tables from the third character string after the condition keyword in the connection query statement.
- the target business relationship is specifically configured to: acquire the multiple target tables from the third character string after the condition keyword in the connection query statement.
- the acquiring unit 610 is specifically configured to: when the data access statement is an insert statement, extract the first key between the insert keyword and the value keyword in the insert statement.
- the fourth character string is the identification of the target table; extract the fifth character string in the query brackets of the value keyword in the insert statement, the fifth character string is the target column logo.
- the data model generation apparatus 600 further includes an aggregation unit configured to aggregate the identifier of the target table and the identifier of the target column; the determining unit 620 specifically It is configured to: determine the data model based on the aggregated identifier of the target table and the identifier of the target column, and based on the data type.
- FIG. 7 is a schematic diagram of a hardware structure of an apparatus 700 for generating a data model according to an embodiment of the present application.
- the apparatus 700 for generating a data model shown in FIG. 7 may be a third-party device, and the apparatus 700 for generating a data model includes a memory 701 , a processor 702 , a communication interface 703 and a bus 704 .
- the memory 701 , the processor 702 , and the communication interface 703 are connected to each other through a bus 704 .
- the memory 701 may be a read-only memory (read-only memory, ROM), a static storage device and a random access memory (random access memory, RAM).
- the memory 701 may store a program. When the program stored in the memory 701 is executed by the processor 702, the processor 702 and the communication interface 703 are used to execute each step of the method for generating a data model in the embodiment of the present application.
- the processor 702 can adopt a general-purpose CPU, a microprocessor, an application specific integrated circuit (application specific integrated circuit, ASIC), a graphics processing unit (graphics processing unit, GPU) or one or more integrated circuits for executing related programs, In order to realize the functional functions required to be executed by the units in the device of the embodiment of the present application, or execute the method for generating the data model of the embodiment of the present application.
- ASIC application specific integrated circuit
- GPU graphics processing unit
- the processor 702 may also be an integrated circuit chip, which has a signal processing capability. During implementation, each step of the method for generating a data model in the embodiment of the present application may be completed by an integrated logic circuit of hardware in the processor 702 or instructions in the form of software.
- processor 702 can also be general-purpose processor, digital signal processor (digital signal processing, DSP), ASIC, off-the-shelf programmable gate array (field programmable gate array, FPGA) or other programmable logic device, discrete gate or transistor logic devices, discrete hardware components.
- DSP digital signal processor
- ASIC off-the-shelf programmable gate array
- FPGA field programmable gate array
- Various methods, steps, and logic block diagrams disclosed in the embodiments of the present application may be implemented or executed.
- a general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.
- the steps of the methods disclosed in connection with the embodiments of the present application may be directly implemented by a hardware processor, or implemented by a combination of hardware and software modules in the processor.
- the software module can be located in a mature storage medium in the field such as random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, register.
- the storage medium is located in the memory 701, and the processor 702 reads the information in the memory 701, and combines its hardware to complete the functions required by the units included in the device of the embodiment of the present application, or execute the functions generated by the data model of the embodiment of the present application. method.
- the communication interface 703 uses a transceiver device such as but not limited to a transceiver to implement communication between the data model generation apparatus 700 and other devices or communication networks.
- a transceiver device such as but not limited to a transceiver to implement communication between the data model generation apparatus 700 and other devices or communication networks.
- the bus 704 may include a path for transferring information between various components of the data model generation apparatus 700 (eg, memory 701 , processor 702 , communication interface 703 ).
- the device 700 for generating a data model only shows a memory, a processor, and a communication interface
- the device 700 for generating a data model may also include other necessary devices.
- the apparatus 700 for generating a data model may also include hardware devices for implementing other additional functions.
- the apparatus 700 for generating a data model may only include components necessary to implement the embodiment of the present application, and does not necessarily include all the components shown in FIG. 7 .
- the embodiment of the present application also provides a computer-readable storage medium, which stores program code for execution by a device, where the program code includes instructions for executing the steps in the above method for generating a data model.
- the embodiment of the present application also provides a computer program product, the computer program product includes a computer program stored on a computer-readable storage medium, the computer program includes program instructions, and when the program instructions are executed by the computer, the The computer executes the above-mentioned method for generating a data model.
- the above-mentioned computer-readable storage medium may be a transitory computer-readable storage medium, or a non-transitory computer-readable storage medium.
- the disclosed devices and methods may be implemented in other ways.
- the device embodiments described above are only illustrative.
- the division of the units is only a logical function function division.
- multiple units or components can be combined Or it can be integrated into another system, or some features can be ignored, or not implemented.
- the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
- the aspects, implementations, implementations or features of the described embodiments can be used alone or in any combination. Aspects of the described embodiments can be implemented by software, hardware or a combination of hardware and software.
- the described embodiments may also be embodied by a computer-readable medium storing computer-readable code comprising instructions executable by at least one computing device.
- the computer readable medium can be associated with any data storage device that can store data that can be read by a computer system.
- Exemplary computer readable media may include read-only memory, random access memory, compact disc read-only memory (CD-ROM), hard disk drive (HDD), digital Video disc (digital video disc, DVD), magnetic tape, and optical data storage device, etc.
- the computer readable medium can also be distributed over network coupled computer systems so that the computer readable code is stored and executed in a distributed manner.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
本申请实施例提供了一种数据模型生成的方法和装置,能够以较高的准确度和效率生成与目标业务紧密相关的数据模型,使得企业可以基于生成的数据模型进行自主研发。该数据模型生成的方法包括:获取用户端向数据库发送的数据访问语句,所述数据访问语句用于访问所述数据库中目标业务的数据;根据所述数据访问语句,获取所述目标业务调用的所述数据库中的至少一个目标表中每一个目标表的标识和所述目标表中目标列的标识;根据所述目标表的标识和所述数据库中表的组织和结构,确定所述目标列的数据类型;根据所述目标表的标识、所述目标列的标识和所述数据类型,确定所述数据库的数据模型。
Description
本申请涉及数据库技术领域,并且更为具体地,涉及一种数据模型生成的方法和装置。
目前,大中型企业中可能会有大量的应用系统,且应用系统中数据库的表的分布和结构比较复杂。由于这些应用系统通常是由第三方公司实施或开发的,因此,企业并不知道这些应用系统上的业务模块使用的是底层数据库中的哪些表和哪些列,这给企业未来自主研发带来了很大的困难。
发明内容
本申请提供了一种数据模型生成的方法和装置,能够以较高的准确度和效率生成与目标业务紧密相关的数据模型,使得企业可以基于生成的数据模型进行自主研发。
第一方面,提供了一种数据模型生成的方法,包括:获取用户端向数据库发送的数据访问语句,所述数据访问语句用于访问所述数据库中目标业务的数据;根据所述数据访问语句,获取所述目标业务调用的所述数据库中的至少一个目标表中每一个目标表的标识和所述目标表中目标列的标识;根据所述目标表的标识和所述数据库中表的组织和结构,确定所述目标列的数据类型;根据所述目标表的标识、所述目标列的标识和所述数据类型,确定所述数据库的数据模型。
在用户操作应用系统(如访问目标业务的数据)时,和用户的操作有关的数据访问语句会从用户端发送到数据库。因此,上述技术方案获取到的数据访问语句与用户访问的目标业务紧密相关,基于该数据访问语句生成的数据模型同样与用户访问的目标业务紧密相关,能够有效避免生成大量的无效数据模型的问题。进一步地,上述技术方案不依赖于人工经验,有效提高了数据模型生成的效率和准确度。
在一些可能的实现方式中,所述获取用户端向数据库发送的数据访问语句,包括:对基于所述目标业务生成的多个信息进行过滤,以得到所述数据访问语句,其中,所述多个信息包括所述数据访问语句。
基于目标业务可以生成多个信息,该多个信息中除了包括数据访问语句之外还包括其他信息。上述技术方案对该多个信息进行过滤,以滤除多个信息中除数据访问语句之外的信息,这样,剩下的信息都是与目标业务紧密相关的数据访问语句,从而能够进一步避免生成大量的无效数据模型的问题。
在一些可能的实现方式中,所述根据所述数据访问语句,获取所述目标业务调用的所述数据库中的至少一个目标表中每一个目标表的标识和所述目标表中目标列的标识,包括:当所述数据访问语句为查询语句时,提取所述查询语句中的从关键字与条件关键字之间的第一字符串,所述第一字符串为所述目标表的标识;提取所述查询语句中的选 择关键字与所述从关键字之间的第二字符串,所述第二字符串为所述目标列的标识。
在一些可能的实现方式中,当所述查询语句为连接查询语句且所述至少一个目标表包括多个目标表时,所述方法还包括:根据所述连接查询语句,获取所述多个目标表在所述目标业务上的关系;所述根据所述目标表的标识、所述目标列的标识和所述数据类型,确定所述数据库的数据模型,包括:根据所述目标表的标识、所述目标列的标识、所述多个目标表在所述目标业务上的关系以及所述数据类型,确定所述数据模型。
上述技术方案,利用数据访问语句中的结构信息,分析出表与表之间潜在的关系,并根据该关系生成数据模型。由于上述技术方案不依赖于数据库中schema中的参考外键,因此避免了数据库中schema无参考外键而造成生成的数据模型不准确甚至无法生成数据模型关系的弊端,使得进一步提高了生成的数据模型的准确度。
在一些可能的实现方式中,所述根据所述连接查询语句,获取所述多个目标表在所述目标业务上的关系,包括:从所述连接查询语句中的所述条件关键字之后的第三字符串中,获取所述多个目标表在所述目标业务上的关系。
在一些可能的实现方式中,所述根据所述数据访问语句,获取所述目标业务调用的所述数据库中的至少一个目标表中每一个目标表的标识和所述目标表中目标列的标识,包括:当所述数据访问语句为插入语句时,提取所述插入语句中的插入关键字与值关键字之间的第四字符串,所述第四字符串为所述目标表的标识;提取所述插入语句中所述值关键字之后的括号内的第五字符串,所述第五字符串为所述目标列的标识。
在一些可能的实现方式中,所述方法还包括:对所述目标表的标识和所述目标列的标识进行聚合;所述根据所述目标表的标识、所述目标列的标识和所述数据类型,确定所述数据库的数据模型,包括:基于聚合后的所述目标表的标识和所述目标列的标识,以及基于所述数据类型,确定所述数据模型。
上述技术方案,在生成数据模型之前,对目标表的标识和目标列的标识进行聚合,与目标表的标识和目标列的标识处于离散状态相比,方便后续操作,有利于提高数据模型生成的效率。
第二方面,提供了一种数据模型生成的装置,包括:存储器,用于存储程序;处理器,用于执行所述存储器存储的程序,当所述存储器存储的程序被执行时,所述处理器用于执行上述第一方面或其各实现方式中的方法。
第三方面,提供了一种计算机可读存储介质,存储用于设备执行的程序代码,所述程序代码包括用于执行上述第一方面或其各实现方式中的方法中的步骤的指令。
第四方面,提供了一种数据模型生成的装置,包括:预处理器,用于获取用户端向数据库发送的数据访问语句,所述数据访问语句用于访问所述数据库中目标业务的数据;分析器,用于根据所述数据访问语句,获取所述目标业务调用的所述数据库中的至少一个目标表中每一个目标表的标识和所述目标表中目标列的标识;模型生成器,用于根据所述目标表的标识和所述数据库中表的组织和结构,确定所述目标列的数据类型;所述模型生成器还用于,根据所述目标表的标识、所述目标列的标识和所述数据类型,确定所述数据库的数据模型。
第五方面,提供了一种数据模型生成的装置,包括用于执行上述第一方面或其各实现方式中的方法的各单元。
图1是本申请实施例的一种系统架构的示意图。
图2是本申请实施例的数据模型生成的方法的示意性图。
图3是本申请实施例的一种数据模型生成的方法的流程图。
图4是本申请实施例的另一种数据模型生成的方法的流程图。
图5是本申请实施例的一种数据模型生成的装置的示意性框图。
图6是本申请实施例的另一种数据模型生成的装置的示意性框图。
图7是本申请实施例的再一种数据模型生成的装置的示意性框图。
附图标记列表:
110,用户端;
120,数据库;
130,第三方设备;
200,本申请实施例的数据模型生成的方法;
210,获取用户端向数据库发送的数据访问语句;
220,根据数据访问语句,获取目标业务调用的数据库中的至少一个目标表中每一个目标表的标识和每一个目标表中目标列的标识;
230,根据目标表的标识和数据库中表的组织和结构,确定目标列的数据类型;
240,根据目标表的标识、目标列的标识和数据类型,确定数据模型;
Pr,预处理器;
An,分析器;
Bu,模型生成器;
Re,逐行解析多行信息;
Fi,过滤;
Di,将SQL语句发送给对应的分析器;
Tn,获取目标表的表名;
Cn,获取目标列的列名;
Dt,获取目标列的数据类型;
Ge,生成数据模型;
Er,获取关系;
As,聚合;
500,数据模型生成的装置;
510,预处理器;
520,分析器;
530,模型生成器;
600,据模型生成的装置;
610,获取单元;
620,确定单元;
700,数据模型生成的装置;
701,存储器;
702,处理器;
703,通信接口;
704,总线。
下面结合附图,对本申请实施例中的技术方案进行描述。应理解,本说明书中的具体的例子只是为了帮助本领域技术人员更好地理解本申请实施例,而非限制本申请实施例的范围。
应理解,在本申请的各种实施例中,各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
还应理解,本说明书中描述的各种实施方式,既可以单独实施,也可以组合实施,本申请实施例对此不作限定。
除非另有说明,本申请实施例所使用的所有技术和科学术语与本申请的技术领域的技术人员通常理解的含义相同。本申请中所使用的术语只是为了描述具体的实施例的目的,不是旨在限制本申请的范围。
通常,大中型企业中可能会有大量的应用系统,且应用系统中数据库的表的分布和结构比较复杂。由于这些应用系统通常是由第三方公司实施或开发的,即使第三方公司会向企业提供关于该应用系统的设计文档,但有时候会存在设计文档的内容不全面或设计文档中的说明书和实际落地的应用系统不一致的问题。因此,企业并不知道这些应用系统上的业务模块使用的是底层数据库中的哪些表和哪些列,这给企业未来自主研发带来了很大的困难。比如,企业要基于现有的应用系统的数据对应用系统进行信息化升级改造或者进行业务拓展。
目前业内普遍采用两种方法解决上述问题,第一种为基于人工解析的方法。具体而言,该方法主要通过人工来梳理各个应用系统在处理各种业务时的交互表单上的列(或称为字段)和数据,找到数据库中与该列和数据相匹配的表的列和数据。然后,再将这些信息进行汇总整理,最终分析出应用系统可能调用的数据库的表和对应的列。比如,该应用系统为财务系统,用户需要查询一年中每个季度的销售额,在用户查询后,该财务系统输出的数据为“A”、“B”、“C”和“D”,则用户可以在数据库中反向地查找“A”、“B”、“C”和“D”,或者,在“A”、“B”、“C”和“D”四个数据中确定几个具有特征的数据,如“C”和“D”,然后,在数据库中查找“C”和“D”,并最终分析出该财务系统可能调用的数据库的表和对应的列。
该方法主要依赖于人工经验,因此效率和准确度相对来说较低,特别是面对数据库包括大量表的情况下。此外,由于应用系统表单提交后的计算逻辑可能涉及众多表或者中间表结构的聚合查询和处理,这些查询和处理无法通过视觉从用户界面(user interface,UI)上直观的看到,因此,人工解析的方法可能会忽略掉很多潜在的数据模型,导致获取到的数据模型不太准确。
第二种方法为基于数据库中表的组织和结构(Schema)的分析法。具体而言,该方法通过调用应用系统对应数据库的相关接口,来获取目前应用系统中所有的数据库的表结构和关系,最后反向生成数据模型。
由于该方法无法动态感知针对某个业务数据库中的哪些表被访问过,因此,无法帮助用户有效梳理出目前应用系统用到的数据模型。例如,数据库中有1000张表,在用户访问某个业务的数据时,应用系统可能仅使用了其中的100张表,但是由于该方法无法获知1000张表中的哪100张表被使用,所以只能对所有的1000张表进行反向数据模型生成,使得花费的时间较长且效率低下。此外,由于数据库中可能存在非人机交互产生的数据或与用户当前访问的业务无关的数据,使得可能出现生成的数据模型脱离了业务,与用户当前访问的业务无关的问题。
进一步地,该方法生成的数据模型中的关系,高度依赖数据库的表的参考外键。如果表中没有此类信息,则无法生成数据模型的关系。通过大量的实际生产环境证明,越庞大的企业系统,越不包含此类强参考外键关系。所以该方法在实际生产环境下的可行性不高。
鉴于此,本申请实施例提出了一种数据模型生成的方法,能够以较高的准确度和效率生成与目标业务紧密相关的数据模型,使得企业可以基于生成的数据模型进行自主研发。
图1是本申请实施例的一种系统架构的示意图。
图1所示的系统架构包括用户端110、数据库120和第三方设备130。用户端110可以为用于提供数据查询服务管理的接口。其中,用户端110也可以称为客户端或其他名称。
数据库120例如可以为mysql、oracle、sqlserver、sqlite等。用户端110和数据库120之间可进行信息传输,例如,在用户需要查询数据时,比如查询一年中每个季度的销售额时,可在用户端110输入查询信息,用户端110接收到该查询信息后,将该查询信息转化为数据库语言,如结构化查询语言(structured query language,SQL),并将数据库语言发送给数据库120。之后,数据库120基于接收到的数据库语言查询到一年中每个季度的销售额,并向用户端110发送查询到的结果。之后,用户端110可向用户输出查询到的结果,从而用户能够获取到一年中每个季度的销售额。
第三方设备130可以与用户端110和数据库120通信连接,以获取到用户端110和数据库120之间传输的信息,并基于该信息执行一些操作,如生成数据库120的数据模型。具体地,第三方130可以包括通信接口,以实现与其他设备(如用户端110)的通信连接。该通信连接可以是有线方式,也可以是无线方式。第三方设备130可以在信息传输的过程中获取到该信息;或者,若用户端110向数据库120发送信息,则第三方设备130可以在信息发送给数据库120之后,从数据库120中获取到该信息。
示例性地,第三方设备130可以为服务器。其中,服务器为提供计算服务的设备,服务器的构成包括处理器、硬盘、内存、系统总线等,服务器和通用的计算机架构类似,但是由于需要提供高可靠的服务,因此在处理能力、稳定性、可靠性、安全性、可扩展性、可管理性等方面要求较高。
应理解,图1仅是本申请实施例提供的一种系统架构的示意图,图中所示设备、器 件、模块等之间的位置关系不构成任何限制。
下面结合图2,对本申请实施例的数据模型生成的方法的主要过程进行介绍。
图2示出了本申请实施例的数据模型生成的方法200的示意性流程图。方法200可以由除用户端和数据库之外的第三方设备执行,如图1所示的第三方设备130。如图2所示,方法200可以包括以下内容中的至少部分内容。
步骤210:获取用户端向数据库发送的数据访问语句,该数据访问语句用于访问数据库中目标业务的数据。
步骤220:根据数据访问语句,获取目标业务调用的数据库中的至少一个目标表中每一个目标表的标识和每一个目标表中目标列的标识。
步骤230:根据目标表的标识和数据库的schema,确定目标列的数据类型。
步骤240:根据目标表的标识、目标列的标识和数据类型,确定数据库的数据模型。
在用户操作应用系统(如访问目标业务的数据)时,和用户的操作有关的数据访问语句会从用户端发送到数据库。因此,本申请实施例获取到的数据访问语句与用户访问的目标业务紧密相关,基于该数据访问语句生成的数据模型同样与用户访问的目标业务紧密相关,能够有效避免生成大量的无效数据模型的问题。进一步地,上述技术方案不依赖于人工经验,有效提高了数据模型生成的效率和准确度。
在本申请实施例中,第三方设备可以包括预处理器、分析器和模型生成器。示例性地,步骤210可以由预处理器执行,步骤220可以由分析器执行,步骤230和步骤240可以由模型生成器执行。当然,该第三方设备也可以包括其他器件,如数据类型处理器。此时,步骤210可以由预处理器执行,步骤220可以由分析器执行,步骤230可以由数据类型处理器执行,步骤240可以由模型生成器执行。为了方便描述,下文将以步骤210由预处理器执行,步骤220由分析器执行,步骤230和步骤240由模型生成器执行为例描述本申请实施例的方法。
作为一种示例,预处理器可以通过数据库提供的插件,如命令行工具,获取到数据访问语句。
作为另一种示例,预处理器可以启动某个选项,之后可以监听到用户端发送给数据库的数据访问语句,从而能够获取到该数据访问语句。
可选地,预处理器可以从数据库中获取到该数据访问语句,或者,也可以在数据访问语句从用户端向数据库传输的过程中获取到该数据访问语句。
可选地,数据访问语句可以包括但不限于SQL语句。本申请实施例以SQL语句为示例进行说明。SQL语句可以包括数据定义语言(data definition language,DDL)和数据操纵语言(data manipulation language,DML),本申请实施例的SQL语句为DML语句。DML语句可以包括插入(insert)语句、删除(delete)语句、修改(update)语句和查询(select)语句。考虑到delete语句和update语句的颗粒度太小,比如通常为删除一个列或修改一个列,导致基于delete语句和update语句可能无法分析出数据模型。因此,本申请实施例的SQL语句为select语句或insert语句。
若SQL语句为预处理器从数据库中获取到的,考虑到在数据库接收到用户端发送的SQL语句后,数据库可以基于SQL语句生成上下文信息,如生成时间戳、会话ID或者 进程ID等。也就是说,数据库可以基于目标业务生成多个信息,该多个信息除了多个SQL语句之外,还包括其他信息。并且,即使是SQL语句,也有可能包括delete语句和update语句。
或者,若SQL语句为从用户端向数据库传输的过程中获取到的,那么除了SQL语句之外,用户端还可能向数据库传输其他信息。
因此,步骤210具体可以包括:预处理器对基于目标业务生成的多个信息进行过滤,滤除该多个信息中除SQL语句之外的其他信息以及滤除SQL语句中的delete语句和update语句,以得到select语句或insert语句。
该技术方案对该多个信息进行过滤,以滤除多个信息中除数据访问语句之外的信息,这样,剩下的信息都是与目标业务紧密相关的数据访问语句,从而能够进一步避免生成大量的无效数据模型的问题。
预处理器可能同时接收多行信息,因此,可选地,如图3和图4所示,预处理器还可以对接收到的多行信息进行逐行解析,之后,再对每行信息进行过滤,以得到SQL语句。
可选地,在本申请实施例中,目标表的标识可以为目标表的表名或目标表的ID或目标表在数据库中所有表中的编号。与目标表的标识类似,目标列的标识可以为目标列的列名或目标列的ID或目标列在目标表中的编号。下文均以目标表的标识为表名,目标列的标识为列名为例进行说明。
针对select语句和insert语句,分析器获取目标表的表名和目标列的列名的方式不同。
对于select语句,具体而言,分析器可以提取select语句中的从(from)关键字与条件关键字之间的第一字符串,该第一字符串为目标表的表名。例如,select语句为select col1,col2,col3 from table_1where…,该select语句中from关键字与条件关键字之间的字符串为table_1,则提取出的表名为table_1。再例如,select语句为select col1,col2,col3 from table_1 left join table_2 on…,该select语句中from关键字与条件关键字之间的字符串为table_1和table_2,则提取出的表名为table_1和table_2。
若分析器提取出多个第一字符串,则该多个第一字符串可以通过逗号分隔,即可获得一组表名,例如上文中的[table_1,table_2]。
另外,分析器可以提取select语句中的选择(select)关键字和from关键字之间的第二字符串,该第二字符串为目标列的列名。若列名有多个,则多个列名同样可以通过逗号分隔。例如,select语句为select col1,col2,col3 from table_1 where…该select语句中select关键字和from关键字之间的字符串为col1,col2,col3,则目标列的列名为[col1,col2,col3]。
对于insert语句,具体而言,分析器可以提取insert语句中的插入(insert into)关键字和值(values)关键字之间的第四字符串,该第四字符串为目标表的表名。另外,分析器可以提取insert语句中的values关键字后面括号内的第五字符串,该第五字符串为目标列的列名。
进一步地,若select语句为连接(join)查询语句,如上文中的select col1,col2,col3 from table_1 left join table_2 on…语句,且目标表包括多个目标表,则如图4所示,方法 200还可以包括:分析器根据连接查询语句,获取多个目标表在目标业务上的关系。换言之,分析器可以根据连接查询语句,获取到多个目标表之间在目标业务上是通过哪些列关联在一起的。
具体而言,分析器可以从join查询语句中条件关键字之后的第三字符串中,获取多个目标表在目标业务上的关系。比如,join查询语句为select col1,col2,col3 from table_1 left join table_2 on table_1.col1=table_2.col2,可以看出,table_1和table_2之间有关系,且table_1和table_2之间通过列名为col1和col2的列关联在一起,即table_1中列名为col1的列的值和table_2中列名为col2的列的值相互参考,且table_2中列名为col2的列的值来自table_1中列名为col1的列中的值。
在确定table_1和table_2的关系后,可以将table1,col1,table2,col2串接在一起形成一个四元组[table1,col1,table2,col2]。该四元组可以表示table_1和table_2之间通过table_1中列名为col1的列和table_2中列名为col2的列关联在一起。
如果join查询语句包括多个join连用的情况,则可以分别提取条件关键字之后的第三字符串,得到一个多维四元数组。例如:
[[table1,col1,table2,col2],
[table2,col2,table3,col3],
[…,…,…,…],
[tableN-1,colN-1,tableN,colN]]
在这种情况下,分析器除了可以根据目标表的表名、目标列的列名和数据类型确定数据模型之外,还可以根据多个表在目标业务上的关系确定数据模型。
上述技术方案,利用数据访问语句中的结构信息,分析出表与表之间潜在的关系,并根据该关系生成数据模型。由于上述技术方案不依赖于数据库中schema中的参考外键,因此避免了数据库中schema无参考外键而造成生成的数据模型不准确甚至无法生成数据模型关系的弊端,使得进一步提高了生成的数据模型的准确度。
应理解,在本申请实施例中,“第一”、“第二”、“第三”和“第四”仅仅为了区分不同的对象,但并不对本申请实施例的范围构成限制。
还应理解,本申请实施例中的具体的例子只是为了帮助本领域技术人员更好地理解本申请实施例,而非限制本申请实施例的范围。
通过上述描述可以看出,针对select语句和insert语句,分析器获取目标表的表名和目标列的列名的方式不同。因此,在一种实现方式中,分析器可以为两个,其中一个分析器(为了方便描述,称为第一分析器)可以获取select语句中目标表的表名、目标列的列名和多个目标表在目标业务上的关系,另一个分析器(称为第二分析器)可以获取insert语句中目标表的表名和目标列的列名。
例如,如图3和图4所示,图3中的分析器为第二分析器,第二分析器可以获取表名、列名;图4中的分析器为第一分析器,第一分析器可以获取表名、列名和表与表之间的关系。
在该实现方式中,在步骤220之前,预处理器可以确定SQL语句的类型,之后再将SQL语句发送给对应的分析器。即将select语句发送给图4中的第一分析器,将insert语句发送给图3中的第二分析器。
在另一种实现方式中,分析器可以为三个,其中一个分析器可以获取连接查询语句的表名和列名,另一个分析器可以获取select语句中除连接查询语句的其他语句的表名和列名,剩下的一个分析器可以获取insert语句的表名和列名。
在另一种实现方式中,分析器可以为一个,例如,为图4中的分析器。该分析器可以执行获取SQL语句中目标表的表名、获取目标列的列名以及获取多个目标表在目标业务上的关系共三个步骤。若SQL语句为连接查询语句,则该分析器可以依次执行该三个步骤;若SQL语句为其他语句(如insert语句),则该分析器可以不执行获取多个目标表在目标业务上的关系这个步骤,仅执行前两个步骤。
为了方便后续模型生成器处理,可选地,如图4所示,方法200还可以包括:对目标表的表名和目标列的列名进行聚合,或者,对目标表的表名、目标列的列名以及多个目标表在目标业务上的关系进行聚合。
聚合后的表名和列名可以称为结构化的结果集,或者,聚合后的表名、列名和关系可以称为结构化的结果集。其中,结构化的结果集可以理解用非离散的方式来描述一个对象。比如,在描述超市有什么物品时,可以描述为水果有苹果、香蕉等,日用品有洗衣液、毛巾等,蔬菜有土豆、西红柿等,而不是描述为超市有香蕉、洗衣液、土豆等。
示例性地,假设目标表有两个,表名分别为table_X和table_Y,其中,table_X中目标列的列名分别为col1、col2和colN,table_Y中目标列的列名分别为col1、col2和colN,且table_X和table_Y通过table_X中列名为col1的列和table_Y中列名为col2的列关联在一起。在将目标表的表名、目标列的列名以及两个目标表的关系聚合后可以得到如下所示的JS对象简谱(javascript object notation,JSON)复合结构体:
上述技术方案,在生成数据模型之前,对目标表的标识和目标列的标识进行聚合,与目标表的标识和目标列的标识处于离散状态相比,方便后续操作,有利于提高数据模型生成的效率。
在分析器确定目标列的列名后,模型生成器还需要确定目标列的数据类型。比如,目标列的数据类型是整数类型还是字符串类型等。具体而言,模型生成器可以根据目标表的表名,从数据库的schema中提取出该目标表中目标列的数据类型,从而可以得到三元组列表,例如,[[table_1,col1,datatype],[table_1,col2,datatype],[table_N,colN,datatype]]。其中,datatype表示数据类型。
在得到目标列的数据类型后,模型生成器可以基于目标表的表名、目标列的列名和数据类型,或者,基于目标表的表名、目标列的列名、多个目标表在目标业务上的关系和数据类型,确定数据库的数据模型。
可选地,目标表的表名可以作为数据模型的实体名,目标列的列名可以作为对应实体的属性名,目标列的数据类型可以作为对应实体属性的数据类型。
假设目标业务调用的目标表包括四个,表名分别为Table_X、Table_Y、Table_Z和Table_N,其中,Table_X中目标列的列名为Col_X,目标列的数据类型是Datatype_X;Table_Y中目标列的列名为Col_Y,目标列的数据类型是Datatype_Y;Table_Z中目标列的列名为Col_Z,目标列的数据类型是Datatype_Z;Table_N中目标列的列名为Col_N,目标列的数据类型是Datatype_N。则生成的数据模型可以如表1所示。
表1
进一步地,若多个目标表在目标业务上具有关系,则可以将相互关联的多个目标表的表名作为相互关联的多个实体的实体名,将相互关联的多个目标列的列名作为相互关联的多个实体的关联属性。
以表1为例进行说明。假设Table_X和Table_Y通过Table_X中列名为Col_X的列和Table_Y中列名为Col_Y的列相互关联,Table_Z和TableN通过Table_Z中列名为Col_Z 的列和Table_N中列名为Col_Z的列相互关联,则得到的多个实体之间的关系可以如表2所示。
表2
From_Entity | From_Attribute_Name | To_Entity | To_Attribute_Name |
Table_X | Col_X | Table_Y | Col_Y |
Table_N-1 | Col_N-1 | Table_N | Col_N |
其中,from_tab和to_tab的value即为相互关联的两个实体的实体名。from_col和to_col即为相关关联的两个实体的关联属性。
根据insert语句以及select语句中非join查询语句最终生成的数据模型如表1所示,根据join查询语句最终生成的数据模型如表1和表2所示。
需要说明的是,如果有同名实体的存在,则分析器还可以对同名实体进行属性扩充。
可选地,方法200还可以包括:对数据模型进行持久化,以将数据模型存储到底层数据库中。
上文详细描述了本申请实施例的方法实施例,下面描述本申请实施例的装置实施例,装置实施例与方法实施例相互对应,因此未详细描述的部分可参见前面各方法实施例,装置可以实现上述方法中任意可能实现的方式。
图5示出了本申请一个实施例的数据模型生成的装置500的示意性框图。该数据模型生成的装置500可以执行上述本申请实施例的数据模型生成的方法200,该数据模型生成的装置500可以为前述方法中的第三方设备。
如图5所示,该数据模型生成的装置500可以包括:
预处理器510,用于获取用户端向数据库发送的数据访问语句,所述数据访问语句用于访问所述数据库中目标业务的数据。
分析器520,用于根据所述数据访问语句,获取所述目标业务调用的所述数据库中的至少一个目标表中每一个目标表的标识和所述目标表中目标列的标识。
模型生成器530,用于根据所述目标表的标识和所述数据库中表的组织和结构,确定所述目标列的数据类型。
所述模型生成器530还用于,根据所述目标表的标识、所述目标列的标识和所述数据类型,确定所述数据库的数据模型。
可选地,在本申请一个实施例中,所述预处理器510具体用于:对基于所述目标业务生成的多个信息进行过滤,以得到所述数据访问语句,其中,所述多个信息包括所述数据访问语句。
可选地,在本申请一个实施例中,所述分析器520具体用于:当所述数据访问语句为查询语句时,提取所述查询语句中的从关键字与条件关键字之间的第一字符串,所述第一字符串为所述目标表的标识;提取所述查询访问语句中的选择关键字与所述从关键字之间的第二字符串,所述第二字符串为所述目标列的标识。
可选地,在本申请一个实施例中,当所述查询语句为连接查询语句且所述至少一个目标表包括多个目标表时,所述分析器520具体用于:根据所述连接查询语句,获取所述多个目标表在所述目标业务上的关系;根据所述目标表的标识、所述目标列的标识、 所述多个目标表在所述目标业务上的关系以及所述数据类型,确定所述数据模型。
可选地,在本申请一个实施例中,所述分析器520具体用于:从所述连接查询语句中的所述条件关键字之后的第三字符串中,获取所述多个目标表在所述目标业务上的关系。
可选地,在本申请一个实施例中,所述分析器520具体用于:当所述数据访问语句为插入语句时,提取所述插入语句中的插入关键字与值关键字之间的第四字符串,所述第四字符串为所述目标表的标识;提取所述插入语句中所述值关键字之后的括号内的第五字符串,所述第五字符串为所述目标列的标识。
可选地,在本申请一个实施例中,所述分析器520还用于:对所述目标表的标识和所述目标列的标识进行聚合;所述模型生成器530具体用于:基于聚合后的所述目标表的标识和所述目标列的标识,以及基于所述数据类型,确定所述数据模型。
应理解,预处理器510、分析器520以及模型生成器530分别可以实现法200中的预处理器、分析器和模型生成器的相应操作,为了简洁,在此不再赘述。
图6示出了本申请另一个实施例的数据模型生成的装置600的示意性框图。该数据模型生成的装置600可以执行上述本申请实施例的数据模型生成的方法200,该数据模型生成的装置600可以为前述方法中的第三方设备。
如图6所示,该数据模型生成的装置600可以包括:
获取单元610,用于获取用户端向数据库发送的数据访问语句,所述数据访问语句用于访问所述数据库中目标业务的数据。
所述获取单元610还用于,根据所述数据访问语句,获取所述目标业务调用的所述数据库中的至少一个目标表中每一个目标表的标识和所述目标表中目标列的标识。
确定单元620,用于根据所述目标表的标识和所述数据库中表的组织和结构,确定所述目标列的数据类型。
所述确定单元620还用于,根据所述目标表的标识、所述目标列的标识和所述数据类型,确定所述数据库的数据模型。
可选地,在本申请一个实施例中,所述获取单元610具体用于:对基于所述目标业务生成的多个信息进行过滤,以得到所述数据访问语句,其中,所述多个信息包括所述数据访问语句。
可选地,在本申请一个实施例中,所述获取单元610具体用于:当所述数据访问语句为查询语句时,提取所述查询语句中的从关键字与条件关键字之间的第一字符串,所述第一字符串为所述目标表的标识;提取所述查询语句中的选择关键字与所述从关键字之间的第二字符串,所述第二字符串为所述目标列的标识。
可选地,在本申请一个实施例中,当所述查询语句为连接查询语句且所述至少一个目标表包括多个目标表时,所述获取单元610具体用于:根据所述连接查询语句,获取所述多个目标表在所述目标业务上的关系;根据所述目标表的标识、所述目标列的标识、所述多个目标表在所述目标业务上的关系以及所述数据类型,确定所述数据模型。
可选地,在本申请一个实施例中,所述获取单元610具体用于:从所述连接查询语句中的所述条件关键字之后的第三字符串中,获取所述多个目标表在所述目标业务上的 关系。
可选地,在本申请一个实施例中,所述获取单元610具体用于:当所述数据访问语句为插入语句时,提取所述插入语句中的插入关键字与值关键字之间的第四字符串,所述第四字符串为所述目标表的标识;提取所述插入语句中所述值关键字查询括号内的第五字符串,所述第五字符串为所述目标列的标识。
可选地,在本申请一个实施例中,所述数据模型生成的装置600还包括聚合单元,用于对所述目标表的标识和所述目标列的标识进行聚合;所述确定单元620具体用于:基于聚合后的所述目标表的标识和所述目标列的标识,以及基于所述数据类型,确定所述数据模型。
图7是本申请实施例的数据模型生成的装置700的硬件结构示意图。图7所示的数据模型生成的装置700可以为第三方设备,数据模型生成的装置700包括存储器701、处理器702、通信接口703以及总线704。其中,存储器701、处理器702、通信接口703通过总线704实现彼此之间的通信连接。
存储器701可以是只读存储器(read-only memory,ROM),静态存储设备和随机存取存储器(random access memory,RAM)。存储器701可以存储程序,当存储器701中存储的程序被处理器702执行时,处理器702和通信接口703用于执行本申请实施例的数据模型生成的方法的各个步骤。
处理器702可以采用通用的CPU,微处理器,应用专用集成电路(application specific integrated circuit,ASIC),图形处理器(graphics processing unit,GPU)或者一个或多个集成电路,用于执行相关程序,以实现本申请实施例的装置中的单元所需执行的功能函数,或者执行本申请实施例的数据模型生成的方法。
处理器702还可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,本申请实施例的数据模型生成的方法的各个步骤可以通过处理器702中的硬件的集成逻辑电路或者软件形式的指令完成。
上述处理器702还可以是通用处理器、数字信号处理器(digital signal processing,DSP)、ASIC、现成可编程门阵列(field programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件处理器执行完成,或者用处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器701,处理器702读取存储器701中的信息,结合其硬件完成本申请实施例的装置中包括的单元所需执行的功能函数,或者执行本申请实施例的数据模型生成的方法。
通信接口703使用例如但不限于收发器一类的收发装置,来实现数据模型生成的装置700与其他设备或通信网络之间的通信。
总线704可包括在数据模型生成的装置700各个部件(例如,存储器701、处理器702、通信接口703)之间传送信息的通路。
应注意,尽管上述数据模型生成的装置700仅仅示出了存储器、处理器、通信接口,但是在具体实现过程中,本领域的技术人员应当理解,数据模型生成的装置700还可以包括实现正常运行所必须的其他器件。同时,根据具体需要,本领域的技术人员应当理解,数据模型生成的装置700还可包括实现其他附加功能函数的硬件器件。此外,本领域的技术人员应当理解,数据模型生成的装置700也可仅仅包括实现本申请实施例所必须的器件,而不必包括图7中所示的全部器件。
本申请实施例还提供了一种计算机可读存储介质,存储用于设备执行的程序代码,所述程序代码包括用于执行上述数据模型生成的方法中的步骤的指令。
本申请实施例还提供了一种计算机程序产品,所述计算机程序产品包括存储在计算机可读存储介质上的计算机程序,所述计算机程序包括程序指令,当所述程序指令被计算机执行时,使所述计算机执行上述数据模型生成的方法。
上述的计算机可读存储介质可以是暂态计算机可读存储介质,也可以是非暂态计算机可读存储介质。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的装置的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能函数划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
本申请中使用的用词仅用于描述实施例并且不用于限制权利要求。如在实施例以及权利要求的描述中使用的,除非上下文清楚地表明,否则单数形式的“一个”和“所述”旨在同样包括复数形式。类似地,如在本申请中所使用的术语“和/或”是指包含一个或一个以上相关联的列出的任何以及所有可能的组合。另外,当用于本申请中时,术语“包括”指陈述的特征、整体、步骤、操作、元素,和/或组件的存在,但不排除一个或一个以上其它特征、整体、步骤、操作、元素、组件和/或这些的分组的存在或添加。
所描述的实施例中的各方面、实施方式、实现或特征能够单独使用或以任意组合的方式使用。所描述的实施例中的各方面可由软件、硬件或软硬件的结合实现。所描述的实施例也可以由存储有计算机可读代码的计算机可读介质体现,该计算机可读代码包括可由至少一个计算装置执行的指令。所述计算机可读介质可与任何能够存储数据的数据存储装置相关联,该数据可由计算机系统读取。用于举例的计算机可读介质可以包括只读存储器、随机存取存储器、紧凑型光盘只读储存器(compact disc read-only memory,CD-ROM)、硬盘驱动器(hard disk drive,HDD)、数字视频光盘(digital video disc,DVD)、磁带以及光数据存储装置等。所述计算机可读介质还可以分布于通过网络联接的计算机系统中,这样计算机可读代码就可以分布式存储并执行。
上述技术描述可参照附图,这些附图形成了本申请的一部分,并且通过描述在附图中示出了依照所描述的实施例的实施方式。虽然这些实施例描述的足够详细以使本领域 技术人员能够实现这些实施例,但这些实施例是非限制性的;这样就可以使用其它的实施例,并且在不脱离所描述的实施例的范围的情况下还可以做出变化。比如,流程图中所描述的操作顺序是非限制性的,因此在流程图中阐释并且根据流程图描述的两个或两个以上操作的顺序可以根据若干实施例进行改变。作为另一个例子,在若干实施例中,在流程图中阐释并且根据流程图描述的一个或一个以上操作是可选的,或是可删除的。另外,某些步骤或功能函数可以添加到所公开的实施例中,或两个以上的步骤顺序被置换。所有这些变化被认为包含在所公开的实施例以及权利要求中。
以上所述,仅为本申请实施例的具体实施方式,但本申请实施例的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请实施例揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请实施例的保护范围之内。因此,本申请实施例的保护范围应以所述权利要求的保护范围为准。
Claims (11)
- 一种数据模型生成的方法,其特征在于,所述方法包括:获取(210)用户端向数据库发送的数据访问语句,所述数据访问语句用于访问所述数据库中目标业务的数据;根据所述数据访问语句,获取(220)所述目标业务调用的所述数据库中的至少一个目标表中每一个目标表的标识和所述目标表中目标列的标识;根据所述目标表的标识和所述数据库中表的组织和结构,确定(230)所述目标列的数据类型;根据所述目标表的标识、所述目标列的标识和所述数据类型,确定(240)所述数据库的数据模型。
- 根据权利要求1所述的方法,其特征在于,所述获取(210)用户端向数据库发送的数据访问语句,包括:对基于所述目标业务生成的多个信息进行过滤,以得到所述数据访问语句,其中,所述多个信息包括所述数据访问语句。
- 根据权利要求1或2所述的方法,其特征在于,所述根据所述数据访问语句,获取(220)所述目标业务调用的所述数据库中的至少一个目标表中每一个目标表的标识和所述目标表中目标列的标识,包括:当所述数据访问语句为查询语句时,提取所述查询语句中的从关键字与条件关键字之间的第一字符串,所述第一字符串为所述目标表的标识;提取所述查询语句中的选择关键字与所述从关键字之间的第二字符串,所述第二字符串为所述目标列的标识。
- 根据权利要求3所述的方法,其特征在于,当所述查询语句为连接查询语句且所述至少一个目标表包括多个目标表时,所述方法还包括:根据所述连接查询语句,获取所述多个目标表在所述目标业务上的关系;所述根据所述目标表的标识、所述目标列的标识和所述数据类型,确定(240)所述数据库的数据模型,包括:根据所述目标表的标识、所述目标列的标识、所述多个目标表在所述目标业务上的关系以及所述数据类型,确定所述数据模型。
- 根据权利要求4所述的方法,其特征在于,所述根据所述连接查询语句,获取所述多个目标表在所述目标业务上的关系,包括:从所述连接查询语句中的所述条件关键字之后的第三字符串中,获取所述多个目标表在所述目标业务上的关系。
- 根据权利要求1或2所述的方法,其特征在于,所述根据所述数据访问语句,获取(220)所述目标业务调用的所述数据库中的至少一个目标表中每一个目标表的标识和所述目标表中目标列的标识,包括:当所述数据访问语句为插入语句时,提取所述插入语句中的插入关键字与值关键字之间的第四字符串,所述第四字符串为所述目标表的标识;提取所述插入语句中所述值关键字之后的括号内的第五字符串,所述第五字符串为 所述目标列的标识。
- 根据权利要求1至6中任一项所述的方法,其特征在于,所述方法还包括:对所述目标表的标识和所述目标列的标识进行聚合;所述根据所述目标表的标识、所述目标列的标识和所述数据类型,确定(240)所述数据库的数据模型,包括:基于聚合后的所述目标表的标识和所述目标列的标识,以及基于所述数据类型,确定所述数据模型。
- 一种数据模型生成的装置(700),其特征在于,包括:存储器(701),用于存储程序;处理器(702),用于执行所述存储器存储的程序,当所述存储器存储的程序被执行时,所述处理器用于获取用户端向数据库发送的数据访问语句,所述数据访问语句用于访问所述数据库中目标业务的数据;所述处理器(702)还用于根据所述数据访问语句,获取所述目标业务调用的所述数据库中的至少一个目标表中每一个目标表的标识和所述目标表中目标列的标识;所述处理器(702)还用于根据所述目标表的标识和所述数据库中表的组织和结构,确定所述目标列的数据类型;所述处理器(702)还用于根据所述目标表的标识、所述目标列的标识和所述数据类型,确定所述数据库的数据模型。
- 一种计算机可读存储介质,其特征在于,所述计算机可读介质存储用于设备执行的程序代码,所述程序代码包括用于执行根据权利要求1至7中任一项所述的数据模型生成的方法中的步骤的指令。
- 一种数据模型生成的装置(500),其特征在于,包括:预处理器(510),用于获取用户端向数据库发送的数据访问语句,所述数据访问语句用于访问所述数据库中目标业务的数据;分析器(520),用于根据所述数据访问语句,获取所述目标业务调用的所述数据库中的至少一个目标表中每一个目标表的标识和所述目标表中目标列的标识;模型生成器(530),用于根据所述目标表的标识和所述数据库中表的组织和结构,确定所述目标列的数据类型;所述模型生成器(530)还用于,根据所述目标表的标识、所述目标列的标识和所述数据类型,确定所述数据库的数据模型。
- 一种数据模型生成的装置(600),其特征在于,包括:获取单元(610),用于获取用户端向数据库发送的数据访问语句,所述数据访问语句用于访问所述数据库中目标业务的数据;所述获取单元(610)还用于,根据所述数据访问语句,获取所述目标业务调用的所述数据库中的至少一个目标表中每一个目标表的标识和所述目标表中目标列的标识;确定单元(620),用于根据所述目标表的标识和所述数据库中表的组织和结构,确定所述目标列的数据类型;所述确定单元(620)还用于,根据所述目标表的标识、所述目标列的标识和所述数据类型,确定所述数据库的数据模型。
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP21965958.8A EP4432117A1 (en) | 2021-11-30 | 2021-11-30 | Data model generation method and apparatus |
CN202180104732.3A CN118556232A (zh) | 2021-11-30 | 2021-11-30 | 数据模型生成的方法和装置 |
PCT/CN2021/134650 WO2023097521A1 (zh) | 2021-11-30 | 2021-11-30 | 数据模型生成的方法和装置 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2021/134650 WO2023097521A1 (zh) | 2021-11-30 | 2021-11-30 | 数据模型生成的方法和装置 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023097521A1 true WO2023097521A1 (zh) | 2023-06-08 |
Family
ID=86611424
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/134650 WO2023097521A1 (zh) | 2021-11-30 | 2021-11-30 | 数据模型生成的方法和装置 |
Country Status (3)
Country | Link |
---|---|
EP (1) | EP4432117A1 (zh) |
CN (1) | CN118556232A (zh) |
WO (1) | WO2023097521A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117113960A (zh) * | 2023-09-05 | 2023-11-24 | 北京数聚智连科技股份有限公司 | 业务数据表单的生成方法、装置、电子设备及存储介质 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110019377A (zh) * | 2017-12-14 | 2019-07-16 | 中国移动通信集团山西有限公司 | 动态脱敏方法、装置、设备及介质 |
CN110795756A (zh) * | 2019-09-25 | 2020-02-14 | 江苏满运软件科技有限公司 | 一种数据脱敏方法、装置、计算机设备及计算机可读存储介质 |
CN111324647A (zh) * | 2020-01-21 | 2020-06-23 | 北京东方金信科技有限公司 | 一种生成etl代码的方法及装置 |
CN111712809A (zh) * | 2018-04-16 | 2020-09-25 | 甲骨文国际公司 | 通过示例来学习etl规则 |
WO2021168331A1 (en) * | 2020-02-20 | 2021-08-26 | Oracle International Corporation | System and method for automatic generation of bi models using data introspection and curation |
-
2021
- 2021-11-30 CN CN202180104732.3A patent/CN118556232A/zh active Pending
- 2021-11-30 EP EP21965958.8A patent/EP4432117A1/en active Pending
- 2021-11-30 WO PCT/CN2021/134650 patent/WO2023097521A1/zh active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110019377A (zh) * | 2017-12-14 | 2019-07-16 | 中国移动通信集团山西有限公司 | 动态脱敏方法、装置、设备及介质 |
CN111712809A (zh) * | 2018-04-16 | 2020-09-25 | 甲骨文国际公司 | 通过示例来学习etl规则 |
CN110795756A (zh) * | 2019-09-25 | 2020-02-14 | 江苏满运软件科技有限公司 | 一种数据脱敏方法、装置、计算机设备及计算机可读存储介质 |
CN111324647A (zh) * | 2020-01-21 | 2020-06-23 | 北京东方金信科技有限公司 | 一种生成etl代码的方法及装置 |
WO2021168331A1 (en) * | 2020-02-20 | 2021-08-26 | Oracle International Corporation | System and method for automatic generation of bi models using data introspection and curation |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117113960A (zh) * | 2023-09-05 | 2023-11-24 | 北京数聚智连科技股份有限公司 | 业务数据表单的生成方法、装置、电子设备及存储介质 |
Also Published As
Publication number | Publication date |
---|---|
CN118556232A (zh) | 2024-08-27 |
EP4432117A1 (en) | 2024-09-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11379475B2 (en) | Analyzing tags associated with high-latency and error spans for instrumented software | |
US11775501B2 (en) | Trace and span sampling and analysis for instrumented software | |
US9002825B2 (en) | Estimating rows returned by recursive queries using fanout | |
CN107798038B (zh) | 数据响应方法及数据响应设备 | |
US8924373B2 (en) | Query plans with parameter markers in place of object identifiers | |
US9495437B1 (en) | System and method for accessing dimensional databases | |
US8543535B2 (en) | Generation of star schemas from snowflake schemas containing a large number of dimensions | |
CN106687955B (zh) | 简化将数据从数据源转移到数据目标的导入过程的调用 | |
US10496645B1 (en) | System and method for analysis of a database proxy | |
US11132363B2 (en) | Distributed computing framework and distributed computing method | |
WO2020088262A1 (zh) | 数据分析方法、设备及存储介质 | |
US9141654B2 (en) | Executing user-defined function on a plurality of database tuples | |
CN113918605A (zh) | 数据查询方法、装置、设备以及计算机存储介质 | |
WO2023097521A1 (zh) | 数据模型生成的方法和装置 | |
CN112905600A (zh) | 数据查询方法、装置和存储介质及电子设备 | |
CN114116764A (zh) | 一种基于语法树的指标查询方法、装置、介质及电子设备 | |
WO2024082881A2 (zh) | 数据库查询方法和装置 | |
US20060230020A1 (en) | Improving Efficiency in processing queries directed to static data sets | |
WO2021217119A1 (en) | Analyzing tags associated with high-latency and error spans for instrumented software | |
US9129001B2 (en) | Character data compression for reducing storage requirements in a database system | |
US20180046669A1 (en) | Eliminating many-to-many joins between database tables | |
CN114490724B (zh) | 处理数据库查询语句的方法和装置 | |
US10789249B2 (en) | Optimal offset pushdown for multipart sorting | |
JP2024518051A (ja) | スキーマレスデータの効率的なストレージおよびクエリ | |
CN114969125A (zh) | 一种通用数据查询和统计方法及系统 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21965958 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2021965958 Country of ref document: EP |
|
ENP | Entry into the national phase |
Ref document number: 2021965958 Country of ref document: EP Effective date: 20240612 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |