Summary of the invention
Problem to be solved by this invention is to have overcome the technical matters that prior art exists, a kind of data integrating method based on the isomery relevant database of XML that is suitable for medium and small sized enterprises, Cheap highly effective, extendability and portable strong, cross operating system and database platform is provided, also we can say, the invention provides a kind of data integrated system of the isomery relevant database based on XML.
For solving the problems of the technologies described above, the present invention adopts following technical scheme to realize: the data integrating method of described isomery relevant database based on XML comprises the steps:
1. obtain data integration demand and integrated early-stage preparations.
2. if integrated application has related to the new database product, then carry out and add the new database product of supporting.
3. on the vision area of the data source of global query request configuration interface, add the destination data source of intending integrated data source and intending importing data.
4. ask on the configuration interface information of typing scheme, the preservation position of selection query requests document, generated query request configuration documentation in global query.
5. be provided with on the interface in the persistence parameter, typing persistence parameter, select the storing path of integrated planning documentation, generate integrated planning documentation, if the destination data table does not exist, in the persistence parameter information of the newly-built table of typing on the interface is set, creates a new tables of data, and then typing persistence parameter.
6. carrying out on the integration scheme interface, choose an integration scheme document, carry out this integration scheme, the wait software systems are handled, and carry out the state and the abnormal conditions that can show integrated planning execution on the integration scheme interface, after being finished, the result reports that the interface can show integrated result data statistical report form, data in the special processing exception table are carried out the 4th to 6 step so repeatedly, up to finishing all integration servers.
The step of obtaining data integration demand and integrated early-stage preparations described in the technical scheme is as follows:
1. obtain the data integration demand, determine integrated scope, formulate the data integration plan, write requirement specification.
2. determine the database environment at place, integrated data source, obtain the information of data source, comprising: the product type of database, version, database-name, database IP address, database service port and have login user name and password by JDBC manipulation data storehouse authority.
3. determine the integrated tables of data that relates to, obtain tables of data information, comprising: table schema, table name claim, the tabulation of list of fields and corresponding data type.
4. analyze demands, integration servers is resolved into some separate subtasks, determine the integrated logic of each subtask, integrated logical form is turned to query scheme, query scheme can be represented with Structured Query Language (SQL), should be noted that: each table and attribute column in the query statement all will have the sign of its affiliated database as prefix.
5. finish design to integrated destination data storehouse.
6. obtain persistence parameter as a result:
1) destination data source information.
2) destination data table information.
3) mapping relations of Query Result attribute and purpose Table Properties.
4) specify data to import exception table, note the information of this tables of data.
7. acquired parameters is verified and confirmed, finish the query scheme that obtained and the evaluation and the affirmation of destination data storehouse design proposal.
Described in the technical scheme if integrated application has related to the new database product, the step of then carry out adding the new database product of supporting is as follows:
1. obtain the information of all data types of the title that comprises data type in the newly-increased database product and characteristic.
2. according to the characteristic of data type, specify it to be mapped as a kind of conventional data type.
3. with adding in the data type dictionary of getting access to conventional data type map information and characteristic information.
A kind of data integrated system of data integrating method of the isomery relevant database based on XML, it comprises:
A global query's request and the persistence parameter that the collection user submits to is carried out validity checking to global query's request and persistence parameter, and the distribution integration servers is monitored integrating process, reports integrated result's integration servers manager.
Realized the data source dynamic management for one, the mapping of finish relation table is handled, and the data source manager of data source global view is provided for the user.
One is decomposed into plurality of sub query requests and global query's metadata with query requests according to the mapping relations of data, and the antithetical phrase query requests is optimized, and generates the query decomposition optimizer of subquery plan.
Give the data extract engine with the subquery planned assignment that receives for one, bottom data source issue SQL query is extracted data, and query results is converted to the data extractor of intermediate result XML data.
One receives intermediate result XML data and global query's metadata, utilizes the relational algebra engine, and middle result is integrated, and generates the final result's of global query Query Result integrator.
Result data after will integrating imports to the persistor as a result in the data with existing storehouse.
Compared with prior art the invention has the beneficial effects as follows:
1. the data integrating method of the isomery relevant database based on XML of the present invention can guarantee data source not to be made any change in the process of data integration, and therefore integrated risk and cost are little.
2. the data integrating method of the isomery relevant database based on XML of the present invention has used dynamic data source management, and inquiry gained data are latest data, have guaranteed the consistance of data.
3. the data integrating method of the isomery relevant database based on XML of the present invention need not to build the central data warehouse, has saved the expense of database software product and mass memory unit, has reduced the risk of integrated project.
4. the data integrating method of the isomery relevant database based on XML of the present invention is by the java applet language compilation with professional platform independence, and system has platform independence, transplants easily.
5. when the data integrating method of the isomery relevant database based on XML of the present invention need be supported new database product or data source, only need register corresponding project in data source dictionary or data type dictionary gets final product, this system that makes has good extendability, safeguard that easily cost is lower.
6. the data integrating method of the isomery relevant database based on XML of the present invention needs the data extracted by integrated demand decision, has carried out query optimization before data extract, and data user rate is very high.
Embodiment
Below in conjunction with accompanying drawing the present invention is explained in detail:
The invention provides a kind of data integrating method that is suitable for medium and small sized enterprises, Cheap highly effective, extendability and portable strong, cross operating system and database platform based on the isomery relevant database of XML.
This method is based on a self-editing computer program based on the data integration of the isomery relevant database of XML, this computer program is to operate in the network environment, based on the Java of autonomous definition and XML data type system, have dynamic data source control, integration servers configuration and optimize, data extract and integration, the software systems of the cross-platform data integration of persistence function as a result.This method has provided the solution of the newly-increased database products of software systems simultaneously, and the treatment step of concrete data integration application.
I. based on the data integrating method of the isomery relevant database of XML
According to the step of computer program means flow process, as follows based on the step of the data integrating method of the isomery relevant database of XML:
1. obtain data integration demand and integrated early-stage preparations
1) obtains the data integration demand, determine integrated scope, formulate the data integration plan, write requirement specification.
2) determine the database environment at place, integrated data source.Obtain the information of data source, comprising: the product type of database, version, database-name, database IP address, the database service port, and have login user name and password by JDBC manipulation data storehouse authority.
3) determine the integrated tables of data that relates to, obtain tables of data information, comprising: table schema, table name claim, the tabulation of list of fields and corresponding data type.
4) analyze demands resolves into some separate subtasks with integration servers, determines the integrated logic of each subtask, and integrated logical form is turned to query scheme.Query scheme can be represented with Structured Query Language (SQL).Should be noted that: each table and attribute column in the query statement all will have the sign of its affiliated database as prefix.
5) finish design to integrated destination data storehouse.
6) obtain persistence parameter as a result:
(1) destination data source information.
(2) destination data table information.
(3) mapping relations of the attribute of Query Result attribute and purpose table.
(4) specify data to import exception table, note the information of this tables of data.
7) acquired parameters is verified and confirmed.Finish the query scheme that obtained and the evaluation and the affirmation of destination data storehouse design proposal.
2. if integrated application has related to the new database product, then carry out and in software systems, add the new database product of supporting.
With the vision area of the data source of the global query of system of integrated software request configuration interface on, add and integrated relevant data source, comprise the destination data source of intending integrated data source and intending the importing data.
4. choose a query scheme, on the interface of global query's request configuration, the information of this scheme of typing, the preservation position of selection query requests document generates a query requests configuration documentation.
5. the persistence parameter in system is provided with on the interface, and typing persistence parameter is selected the storing path of integrated planning documentation, generates an integration scheme document.If the destination data table do not exist, can be on the interface information of the newly-built table of typing, create a new tables of data, and then typing persistence parameter.
6. on the execution integration scheme interface of system, choose an integration scheme document, carry out this integration scheme, wait for the software systems processing, can show the state and the abnormal conditions of integrated planning execution on the interface.After being finished, the result reports that the interface can show integrated result data statistical report form, the data in the special processing exception table.Execution in step 4 is to step 6, up to finishing all integration servers so repeatedly.
II. it is as follows to add the step of the new database product of supporting based on the data integrating method of the isomery relevant database of XML in software systems:
1. obtain the information of all data types in the newly-increased database product, comprise the title and the characteristic of data type.
2. according to the characteristic of data type, specify it to be mapped as a kind of conventional data type.
3. with adding in the data type dictionary of system of integrated software of getting access to conventional data type map information and characteristic information.
III. based on the data integrated system of the isomery relevant database of XML
Consult Fig. 2, setting forth according to device for same computer program, is by integration servers manager, data source manager, query decomposition optimizer, data extractor, integrator and persistor constituted as a result functional module construction as a result based on the data integrated system of the isomery relevant database of XML.
1. integration servers manager
Collect global query's request and persistence parameter that the user submits to, validity checking is carried out in global query's request and persistence parameter, the distribution integration servers, the monitoring integrating process is reported integrated result.The implementation procedure of integration servers manager is:
1) determine the representation of query requests, this expression must have following characteristics:
(1) accurately giving expression to each relates to integrated tables of data unambiguity;
(2) can know the semanteme that gives expression to Structured Query Language (SQL).Fig. 4~Fig. 6 has provided a kind of embodiment of method for expressing.
2) create user's visualization interface, comprise four parts: global query's request configuration, persistence parameter are provided with, carry out integration scheme and result's report.
Must there be following characteristics at global query requesting users interface:
(1) shows the unified view of data source table schema;
(2) can give expression to the semanteme of Structured Query Language (SQL);
(3) provide necessary information for reference and that select, user-friendly.
Consult Fig. 7, provided the solution of a kind of global query request configuration interface among the figure, the left side is the data source vision area, shows available data sources, can add on this vision area, removes and the refresh data source.The right is the query configuration vision area, is divided into six subregions, the clause of the corresponding structuralized query of each subregion.Can add some projects in each subregion, comprising in the project can be for the tables of data of user's selection and the territory of column information, sign of operation and confession user input.The bottom is a vision area as a result, the preservation position that can create query requests and configuration querying request.
The persistence parameter is provided with the parameter that interfacial energy is collected the Query Result persistence, comprising: the mapping relations of destination data source information, destination data table information, Query Result attribute and purpose Table Properties, data importing exception table information.Need to select a query requests before collecting the persistence parameter, the Table Properties of persistence parameter should be corresponding consistent on number and type with the attribute of Query Result.On this interface, can dispose the preservation position of integration scheme.In addition, can pass through this interface newdata table in data source.
Carry out the integration scheme interface, embody following function: Integrated Solution is selected, integration scheme executing state feedback, mistake and abnormal prompt.
The situation information of integration scheme execution that the result has reported interface display shows integrated result data statistical report form.
3) make up and data source manager, query decomposition optimizer, data extractor, the integrator and the communication interface of persistor as a result as a result.Determine semantic and wrong and the abnormity processing mode of message and data transfer.
4) realize business processing flow:
(1) global query's request configuration interface and user interactions are communicated by letter with the data source manager, can finish the interpolation of data source, remove refresh function.
(2) global query's request configuration interface and user interactions are communicated by letter with the data source manager, obtain global query's information, generate global query's request configuration documentation, and it is carried out morphology, the checking inspection of grammer, if find mistake, and the feedback user corrigendum.
(3) the persistence parameter is provided with interface and user interactions, communicates by letter with the data source manager, obtains query requests configuration documentation and persistence parameter, generates integrated planning documentation.If the destination data table does not still exist, can use to build and show to specify the persistence parameter again after the device newdata table.Build the treatment scheme of table device: read the user and build the table parameter; Be converted into the DDL code; Mutual with the data source manager data, obtain data source information; Set up database and connect, carry out and build table code; Catch abnormal information, the table result is built in report.
(4) user and execution integration scheme interface alternation obtain the integration scheme document.After integrated order is sent, obtain global query's request, and pass to the query decomposition optimizer.Set up and the query decomposition optimizer, data extractor, communicating to connect of integrator monitored executing state as a result, collects unexpected message and statistical information, writes daily record, to the user report implementation status.Read the persistence parameter, the result data of persistence parameter and integration is passed to the lasting data device, and monitoring persistence state, collect unusual and statistical information, write daily record, to user report persistence situation.
(5) after persistence is finished, collect the integrating process statistical information, generate and show integrated result data statistical report form.
2. data source manager
Consult Fig. 3, dynamic data source management is provided, the mapping of finish relation table is handled, for the user provides the data source global view.Represented the internal structure of data source manager among the figure:
1) set up the data source dictionary, it has preserved the data source that data integration relates to and the information of tables of data thereof.Wherein the information of data source comprises: the product type of database, and version, database-name, database IP address, the database service port, and have login username and password by JDBC manipulation data storehouse authority.Tables of data information comprises: table schema, table name claim, the tabulation of field and corresponding data type.The data source manager is safeguarded this data source dictionary in system's operational process.The interpolation of data source, remove the information updating of the data source dictionary with the refresh operation correspondence.
2) create four kinds of conventional data types, CHAR, NUMBER, DATE and BOOLEAN, every kind of data type has the metamessage of this kind data type, and it comprises: the characteristic of conventional data type, with the compatibility of other conventional data types and the condition of compatible conversion.
3) create the data type dictionary, it has preserved the map information of database data type and conventional data type.When system adds new data source product, need in the data type dictionary, add corresponding map entries.
4) make up the global view of data source, it has showed the data source of registering in the data source dictionary, tables of data, and the information of attribute column also comprises the conventional data type of the data type correspondence of each attribute column.Every data source of registering in the data source dictionary, tables of data, attribute column all are visible in global view.
5) make up metamessage and extract engine, this engine comprises the common interface of data access and the general extracting method of metamessage.Metamessage extracts the operational order that engine is accepted the user, and data source information and table name information are set up JDBC and connected, the availability in verification msg source, and from data source, read the information of attribute column and corresponding data type, information data is synchronized in the data source dictionary.
3. query decomposition optimizer
The effect of query decomposition optimizer is that query requests is decomposed into plurality of sub query requests and global query's metadata according to the mapping relations of data, and the antithetical phrase query requests is optimized, and generates the subquery plan.The course of work of query decomposition optimizer is as follows:
1) receives query requests, generate the relational algebra expression tree.Table among the From clause is as the leafy node of relational algebra expression tree.Utilize Where clause's condition of contact that above-mentioned leafy node is carried out merger, the wherein preferential merger of the condition of contact of the table of same database among the Where clause, generate the binary tree forest, with the above-mentioned binary tree forest of the condition of contact merger of disparate databases table, generate a binary tree then.When binary tree of the not enough generation of condition of contact, with the remaining binary tree forest of cartesian product merger.Projection, gathering and common alternative condition all are placed on tree root, wait to be optimized.
2) under selecting, push away, push away under the projection, assemble the principle abbreviation relational algebra expression tree that pushes away down.
3) generated query plan and query metadata.Push away after the optimization down through various, travel through the node in the relational algebra expression tree, find the node of this condition: be that relational algebra in the subtree of root node only relates to same database with this node, and its father node does not possess this character.To be the relational algebra of the subtree expression of root with these nodes, be converted to the SQL query plan.Among the Where clause, relate to the condition that the multiple database table connects, can generate global query's metadata.
4) with resulting plurality of sub inquiry plan, send to data extractor.
5) with resulting global query metadata, send to integrator as a result.
6) with job schedule, unusual or mistake sends to the integration servers manager.
4. data extractor
Mainly acting as of data extractor: receive the subquery plan, and they are assigned to the data extract engine, each data extract engine will be carried out a sub-inquiry plan, bottom data source issue SQL query be extracted data, and query results is converted to intermediate result XML data.The data extractor implementation procedure is as follows:
1) determine intermediate result XML data representation format, it has following characteristics:
(1) gives expression to data list structure information;
(2) give expression to data recording information;
(3) give expression to the scale information of tables of data;
(4) clear in structure is simple, unambiguity, and redundant data is few, resolves easily and generates.Intermediate result XML data definition can be become list structure and two XML document of table data, an XML list structure of all unique reference of each attribute column document of XML table data file.
Consult Fig. 8, provided a kind of implementation among the figure: the table data file has write down the information of each attribute column of the data scale of this data file, every record.Wherein each attribute column information comprises list structure document title, column number and the data of reference.The list structure document has write down conventional data type and the type attribute thereof that numbering, title, data type and its mapping of information, each attribute column of title, the data source of this list structure document become.
2) virtual file storage is set up super large file management mechanism.Can rewrite the File class of Java language, making files classes logically is a file, and physically is a file group.File in the file group prevents that less than the size of the upper limit of file system file from overflowing.
3) set up memory buffer mechanism, adopt output intent to generate XML document based on stream.
4) make up the data extract engine, this engine comprises the common interface of data access and the general extracting method of data.
5) realize the data extract flow process:
(1) receives the subquery plan.
(2) communicate by letter with the data source manager data and obtain the metamessage of subquery.
(3) set up JDBC and connect, obtain the result set of subquery, the result set data are converted to intermediate result XML data.
(4) give integrator as a result with intermediate result XML data transfer.
5. integrator as a result
Integrator receives intermediate result XML data and global query's metadata as a result, utilizes the relational algebra engine, and middle result is integrated, and generates the final result of global query.The integrator implementation procedure is as follows as a result:
1) makes up the ordering engine
Be input as an XML table data file, the numbering of the XML list structure of quasi-ordering and key attribute row, the conventional data type of attribute column, lifting order parameter.Be output as orderly intermediate result XML data, implementation procedure is as follows:
(1) in internal memory, sets up the data structure of data recording.
(2) according to the configuration of system, the upper limit is held in the definition internal sort.
(3) set up memory buffer mechanism, select STAX instrument analyzing XML file for use.Employing generates XML document based on the output intent of stream.
(4) read in data in bulk, carry out the merger internal sort, output to the external memory file.
(5) do the merger external sort, obtain ranking results.
2) make up nature and connect engine
Be input as two XML table data files, do the connection attribute column number, tag align sort.Be output as the intermediate result XML data after table connects.Implementation procedure is as follows:
(1) if XML table data do not sort according to the connection attribute row, calls the ordering engine and list ordering in connection;
(2) in internal memory, set up the data structure of data recording;
(3) resolve two XML table data files according to the order of sequence with STAX, run into the identical attribute column of value, just generate a record, use method output based on stream.Finish up to the document parsing, obtain connecting the result.
3) make up the cartesian product engine
Be input as two XML table data files doing cartesian product.Be output as the intermediate result XML data after the cartesian product computing.Implementation procedure is as follows:
(1) in internal memory, sets up the data structure of data recording.
(2) resolve one of them XML table data file with STAX,, travel through each the bar record Rj in another XML table data file for each bar record Ri, (Ri Rj), uses the method output based on stream to the new record that generates, finish up to the document parsing, obtain connecting the result.
(3) the table data file is resolved and is finished, and obtains the cartesian product result.
4) make up the gathering engine
Be input as the table data file of doing gathering, groupby clause, having clause.Aggregation operator type and attribute column.Implementation procedure is as follows:
(1) in internal memory, sets up the data structure of data recording.
(2) according to system configuration, the definition internal memory holds the record upper limit.
(3) call the ordering engine according to the attribute column among the groupby clause.
(4) with STAX resolution table data file, sweep record according to the order of sequence.The packet attributes row arrive new value whenever and generate a grouping, do screening with having clause's condition.The record of grouping sum surpasses internal memory and holds the record upper limit after screening, and whole records of grouping export external memory in the mode of stream, continues to handle next bar record, until the end of scan.A scanning is carried out in group record, done aggregate operation, obtain a record, the output result.The group record been scanned obtains assembling the result.
5) realize the data integration flow process
(1) obtains query metadata, call in to the data structure of internal memory.
(2) determine to connect the select progressively strategy of showing, can utilize heuristic rule: select two less tables of data scale sum to make table at every turn and connect.
(3) according to the metadata of Query Result, the definite relational algebra operation that will do, call relation algebraic operation engine is handled and is integrated middle result.
(4) with job schedule, unusual or mistake sends to the integration servers manager.The integral data result is passed to the integration servers manager.
6. persistor as a result
Result data after integrating is imported in the data with existing storehouse.The implementation procedure of persistor is as follows as a result:
1) makes up data transformation engine.Every record of its traversal queries result according to the Query Result attribute column with import corresponding relation between the Table Properties row, generates SQL and inserts the statement script.
2) data importing engine, this engine comprise the common interface and the universal method of data importing.It reads persistence information and data source information, sets up JDBC and connects, and carries out SQL and inserts the statement script, catches unusual in the importing process, inserts unusual data trial and is inserted in the exception table.
3) realize the data persistence flow process.Obtain the result data of persistence parameter and integration, call data transformation engine result data is converted to the SQL script, call the data importing engine, carry out script in batches.With the state of data importing with send to the integration servers manager unusually.