CN105335412A - Method and device for data conversion and data migration - Google Patents

Method and device for data conversion and data migration Download PDF

Info

Publication number
CN105335412A
CN105335412A CN201410375147.9A CN201410375147A CN105335412A CN 105335412 A CN105335412 A CN 105335412A CN 201410375147 A CN201410375147 A CN 201410375147A CN 105335412 A CN105335412 A CN 105335412A
Authority
CN
China
Prior art keywords
data
conversion
data conversion
rule
converted
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410375147.9A
Other languages
Chinese (zh)
Other versions
CN105335412B (en
Inventor
史策
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201410375147.9A priority Critical patent/CN105335412B/en
Publication of CN105335412A publication Critical patent/CN105335412A/en
Application granted granted Critical
Publication of CN105335412B publication Critical patent/CN105335412B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention discloses a method and device for data conversion as well as a method and device for data migration. The method for data conversion comprises the following steps: reading a data conversion rule which is described by adopting a domain specific language from a pre-generated configuration file, wherein the data conversion rule is used for appointing a conversion operation which indicates generating target data according to data to be converted; analyzing the data conversion rule and generating a data conversion algorithm corresponding to the data conversion rule; and reading the data to be converted and converting the data to be converted according to the data conversion algorithm. By adopting the method provided by the invention, the reuse of data conversion codes is realized, so that the workload of the developers is effectively decreased; and the domain specific language which is simple and easy to understand and low in learning cost is adopted to describe the data conversion rule, so that the common users which execute the data conversion operation can be qualified for configuring or modifying the data conversion rule in the configuration file.

Description

For the method and apparatus of data conversion, Data Migration
Technical field
The application relates to data processing technique, specifically provides a kind of method and apparatus for data conversion.The application provides a kind of method and apparatus for Data Migration simultaneously.
Background technology
Along with the development of internet, the data such as social data, online transaction record, social media are increasing, a large amount of enterprise starts the customer information having magnanimity, not only need the data bulk exponentially level of Computer Storage and process to increase, and data type also becomes increasingly complex.These data are stored in various data source usually, and what have is stored in traditional relevant database, such as: the Database Systems such as mySQL, oracle; What have is stored in NoSQL database flourish in recent years, such as: the Database Systems such as HBase, Redis; What have is then stored in file system, such as: in the distributed file system HDFS (HadoopDistributedFileSystem, HDFS) of distributed system cluster Hadoop.
Due to demands such as system backup, upgrading, exchanges data or data sharings, sometimes need to carry out Data Migration between above-mentioned various data source, that is: from as derived data the data source of source, and using the data importing of deriving to as in the data source of destination.Because the data model of various data source is different, such as, relevant database is different from the Method of Data Organization of NoSQL database, although or two data sources all relation belonging to type databases, but the structure of tables of data is different, capital causes needing to carry out data transformation operations in the process of above-mentioned derivation, importing, the data read is converted to the data meeting destination data model from source.
For the data conversion requirements in data exporting process in prior art, usually there are two kinds of solutions.Be a casebycase, namely for Data Migration demand each time, according to the data model of concrete data source and data destination, write the derivation of codes implement data, conversion and import operation; Another kind adopts DataX instrument to move, and what namely adopt DataX to provide reads data from data source and carry out the reader plug-in unit of data conversion and the writer plug-in unit to data destination write data, realizes the Data Migration between different pieces of information source.
In actual applications, there is following defect in above-mentioned two kinds of solutions:
Scheme one, if adopt the mode of casebycase to carry out Data Migration between multiple data source, often increase a data source, just need the Data Migration code write successively between this data source and other data source, such as, increase newly and carry out Data Migration between a data source and existing 3 data sources, then need developer to write three cover codes for 3 data sources respectively.Obviously adopting program code in this way cannot be multiplexing, and when data conversion rule changes, must remodify and compiled code, and developer needs to pay a large amount of duplication of labour, adds cost of development, and code is difficult to safeguard.
Scheme two, adopt the plug-in unit that DataX framework provides, the workload of developer can be reduced to a certain extent, but DataX some data existing source reader, do not provide data converting function, therefore developer still needs according to the relevant code of the framework self-developing of DataX, such as, aerial ladder reader does not provide data transformation ability, and developer needs the data to be imported calculated by map/reduce task output; In addition, use the plug-in unit that DataX provides, user is also needed to learn the collocation method of this plug-in unit, particularly reader plug-in unit adopts the SQL statement of specialty to carry out the configuration of data conversion rule usually, learning cost is high, therefore, when data conversion rule changes, the domestic consumer being responsible for carrying out business datum migration is difficult to the competent modification to configuration file usually, still needs developer to participate in.
Summary of the invention
The application provides a kind of method and apparatus for data conversion, cannot the complicated problem of multiplexing and data conversion rule configuration to solve existing data conversion code.The application provides a kind of method and apparatus for Data Migration in addition.
The application provides a kind of method for data conversion, comprising:
From the configuration file generated in advance, read the data conversion rule adopting Domain Specific Language to describe, described data conversion rule, is used to specify the conversion operations performed needed for data genaration target data to be converted;
Resolve described data conversion rule, generate the data conversion algorithm corresponding with described data conversion rule;
Read data to be converted, and according to described data conversion algorithm, described data to be converted are changed.
Optionally, the data conversion algorithm corresponding with data conversion rule of described generation adopts abstract syntax tree to store.
Optionally, the syntax rule for the described Domain Specific Language of data of description transformation rule comprises:
Data conversion rule is made up of more than one or one data branch statement, adopts specific list separator to separate between adjacent statement; With,
Each data branch statement is made up of the destination field name connected with equal sign and transformed representation; With,
Destination field name can not comprise equal sign and described specific list separator; With,
Transformed representation comprises following form: constant, built-in variable or the constant connected by operational symbol and/or built-in variable; With,
The value that the title of built-in variable comprises this variable of expression preset is from the subscript of the specific fields in the letter of data to be converted and the data to be converted corresponding with this variable.
Optionally, the operational symbol in described syntax rule comprises: one or more in addition, subtraction, multiplication, division or modulo operator.
Optionally, described syntax rule also comprises:
The data type supported comprises: long shaping long, double-precision floating point type double, date type Date, character string type String, and the constant of the above-mentioned type adopts corresponding literal morphology to represent;
The data type supported also comprises: large numeric type BigInteger and BigDecimal, and the constant of the above-mentioned type adopts the type conversion mode of display to represent.
Optionally, described syntax rule also comprises:
Support that built-in variable adopts the type conversion mode of display to represent.
Accordingly, the application also provides a kind of device for data conversion, comprising:
Rule reading unit, for reading the data conversion rule adopting Domain Specific Language to describe from the configuration file generated in advance;
Algorithm generation unit, for resolving described data conversion rule, generates the data conversion algorithm corresponding with described data conversion rule;
Date Conversion Unit, for reading data to be converted, and changes described data to be converted according to described data conversion algorithm.
Optionally, the data conversion algorithm that described algorithm generation unit generates adopts abstract syntax tree to store.
In addition, the application also provides a kind of method for Data Migration, comprising:
Data to be migrated are read from data source;
According to the data conversion algorithm generated, described data to be migrated are converted to the data meeting data destination data model, described data conversion algorithm be according in the configuration file that generates in advance, the data conversion rule that adopts Domain Specific Language to describe generates;
By the data write data destination performed after conversion operations.
Optionally, before described data to be migrated are converted to the data meeting data destination data model, following operation is performed:
The data conversion rule adopting Domain Specific Language to describe is read from the described configuration file generated in advance;
Resolve described data conversion rule, generate the data conversion algorithm corresponding with described data conversion rule.
Optionally, the data conversion algorithm corresponding with data conversion rule of described generation adopts abstract syntax tree to store.
Optionally, the syntax rule for the described Domain Specific Language of data of description transformation rule comprises:
Data conversion rule is made up of more than one or one data branch statement, adopts specific list separator to separate between adjacent statement; With,
Each data branch statement is made up of the destination field name connected with equal sign and transformed representation;
Destination field name can not comprise equal sign and described specific list separator; With,
Transformed representation comprises following form: constant, built-in variable or the constant connected by operational symbol and/or built-in variable; With,
The value that the title of built-in variable comprises this variable of expression preset is from the subscript of the specific fields in the letter of data to be migrated and the data to be migrated corresponding with this variable.
Optionally, described syntax rule also comprises:
The operational symbol supported comprises: one or more in addition, subtraction, multiplication, division or modulo operator; And/or,
The data type supported comprises: long shaping long, double-precision floating point type double, date type Date, character string type String, and the constant of the above-mentioned type adopts corresponding literal morphology to represent; And/or,
The data type supported also comprises: large numeric type BigInteger and BigDecimal, and the constant of the above-mentioned type adopts the type conversion mode of display to represent; And/or,
Support that built-in variable adopts the type conversion mode of display to represent.
Optionally, described method also comprises:
The data to be migrated read from data source are stored in data buffer;
Accordingly, describedly described data to be migrated are converted to the data meeting data destination data model refer to, from described data buffer, read data to be migrated, and be converted into the data meeting data destination data model.
Optionally, from described data buffer, read data to be migrated, and before being converted into the data meeting data destination data model, perform following step:
According to the concurrent number of specifying in described configuration file, create the worker thread of respective number;
Accordingly, describedly from described data buffer, read data to be migrated, and be converted into the data meeting data destination data model and refer to, by the above-mentioned reading of worker thread concurrence performance created and conversion operations.
Optionally, described method comprises:
Judge whether the scale of the data after described execution conversion operations is greater than or equal to the batch of specifying in described configuration file and imports threshold value;
If so, the described step data performed after conversion operations being write data destination is then performed;
If not, then the data after conversion operations are performed described in buffer memory.
Optionally, described method comprises:
Monitor and whether make a mistake in the step of the data write data destination after execution conversion operations described; If make a mistake, then perform following step:
Record the relevant information of described mistake, and cumulative error number of times;
Judge whether described errors number is less than or equal to the error logging threshold value of specifying in described configuration file; If so, then by the relevant information of described mistake write journal file.
Accordingly, the application also provides a kind of device for Data Migration, comprising:
Data-reading unit, for reading data to be migrated from data source;
Date Conversion Unit, for described data to be migrated being converted to the data meeting data destination data model according to the data conversion algorithm that generated, described data conversion algorithm be according in the configuration file generated in advance, the data conversion rule that adopts Domain Specific Language to describe generates;
Data write unit, for the data write data destination that will perform after conversion operations.
Optionally, described device also comprises:
Rule reading unit, for reading the data conversion rule adopting Domain Specific Language to describe from the described configuration file generated in advance;
Algorithm generation unit, for resolving described data conversion rule, generates the data conversion algorithm corresponding with described data conversion rule.
Optionally, the data conversion algorithm that described algorithm generation unit generates adopts abstract syntax tree to store.
Optionally, described device also comprises:
Data buffer unit, for being stored in data buffer by the data to be migrated read from data source;
Accordingly, described Date Conversion Unit specifically for, from described data buffer, read data to be migrated, and be converted into the data meeting data destination data model.
Optionally, described device also comprises:
Thread creation unit, for according to the concurrent number of specifying in described configuration file, creates the worker thread of respective number;
Accordingly, the reading data of described Date Conversion Unit execution and the operation of translation data are by above-mentioned worker thread concurrence performance.
Optionally, described device also comprises:
Judging unit, whether the scale for the data after judging described execution conversion operations is greater than or equal to the batch of specifying in described configuration file imports threshold value; If so, described data write unit action is then triggered; If not, then the data after conversion operations are performed described in buffer memory.
Optionally, described device also comprises:
Whether fault monitoring record cell, making a mistake in the step of the data write data destination after execution conversion operations for monitoring; If make a mistake, record the relevant information of described mistake, and cumulative error number of times, when described errors number is less than or equal to the error logging threshold value of specifying in described configuration file, by the relevant information of described mistake write journal file.
Compared with prior art, the application has the following advantages:
The method for data conversion that the application provides, the data conversion rule adopting Domain Specific Language to describe is read from the configuration file generated in advance, resolve described data conversion rule, generate the data conversion algorithm corresponding with described data conversion rule, then read data to be converted, and according to described data conversion algorithm, described data to be converted are changed.Adopt the method that the application provides, for different data conversion rule, only need in configuration file, adopt field language-specific data of description transformation rule, do not need again to write code, just data transformation operations can be completed, thus achieve the multiplexing of data conversion code, effectively reduce the workload of developer; And the description of data conversion rule is carried out owing to have employed field language-specific, be easily understood, learning cost is low, even perform the domestic consumer of data transformation operations also can be competent at configuration to the data conversion rule in configuration file or modification.
The method for Data Migration that the application provides, by reading data to be migrated from data source, according to the data conversion algorithm of the data conversion rule generation that the employing Domain Specific Language read from configuration file describes, described data to be migrated are converted to the data meeting data destination data model, and the data write data destination that will perform after conversion operations, thus Data Migration is carried out between the data source of different pieces of information model, only need in configuration file with field language-specific data of description transformation rule, do not need the code of Update Table conversion portion, achieve data conversion unit and divide the multiplexing of code, and owing to adopting self-defining field language-specific data of description transformation rule, easy-to-understand, learning cost is low, even the business personnel being responsible for Data Migration also can modify to the data conversion rule in configuration file as required.
Accompanying drawing explanation
Fig. 1 is the process flow diagram of the embodiment of a kind of method for data conversion of the application;
Fig. 2 is the schematic diagram of abstract syntax tree corresponding to data conversion rule that the embodiment of the present application provides;
Fig. 3 is the schematic diagram of the embodiment of a kind of device for data conversion of the application;
Fig. 4 is the process flow diagram of the embodiment of a kind of method for Data Migration of the application;
Fig. 5 is the schematic diagram of the embodiment of a kind of device for Data Migration of the application.
Embodiment
Set forth a lot of detail in the following description so that fully understand the application.But the application can be much different from alternate manner described here to implement, those skilled in the art can when doing similar popularization without prejudice to when the application's intension, and therefore the application is by the restriction of following public concrete enforcement.
In this application, a kind of method and apparatus for data conversion and a kind of method and apparatus for Data Migration is each provided.Below the embodiment of the application is described in detail one by one.
Please refer to Fig. 1, it is the process flow diagram of the embodiment of a kind of method for data conversion of the application.Described method comprises the steps:
Step 101: read the data conversion rule adopting Domain Specific Language to describe from the configuration file generated in advance.
Configuration information is stored in configuration file, program code performs corresponding process according to the configuration information read from configuration file, adopt in this way, the workload of amendment code, compiled code can be reduced, the dirigibility of code and the multiplexing of code can be improved.Therefore the technical scheme of the present embodiment also uses this thinking, data conversion rule is write in advance in configuration file, needs redefine or Update Table transformation rule time, only need to revise configuration file, thus achieve the multiplexing of data conversion code.
Data conversion rule described in the application, be used to specify the conversion operations performed needed for data genaration target data to be converted, such as: data A to be converted is multiplied with the constant C that certain presets, thus obtain target data D, be exactly a simple data conversion rule, which specify and generate according to data A to be converted the conversion operations performed needed for target data D: A is multiplied with constant C.Described conversion operations can comprise add, subtract, the multiple operation such as multiplication and division, delivery and data type conversion, the multiply operation provided in this example is only schematic, in concrete enforcement, required more complicated data conversion rule can be set as required.
In the specific implementation, described data conversion rule can be stored in the configuration file of different-format, such as, in ini file, Reg file or xml file.Due to the standard that xml form is W3C (World Wide Web Consortium), there is portable good, tree-like hierarchical structure and facilitate the feature such as locating information, favorable expandability, in an object lesson of the present embodiment, adopt xml file as configuration file, wherein about shown in being described below of data conversion rule:
Wherein, " name=F1; Chinese=F2; Math=F3; Sum=(Long) F2+ (Long) F3 " part, be the data conversion rule adopting Domain Specific Language to describe.
So-called Domain Specific Language, namely usually said DSL (DomainSpecificLanguage) is the language developed for certain specific area.Developer normally used C/C++, Java, Python etc. belong to general purpose language, and general purpose language treats code from the angle of dealing with problems, and can programme for every field, has stronger versatility; DSL then unlike general purpose language target zone crown all software issue, but specially for the language of a certain specific area problem design, in order to describe problem more rapidly, deal with problems, DSL normally treats code from the angle of specific area client, its style more curstomer-oriented, grammer is also more friendly.
As a kind of language, each DSL has its oneself syntax rule, self-defined a kind of DSL language for data of description transformation rule---data conversion rule configuration language in an object lesson of the present embodiment, and its grammar rule definitions is as follows:
1) data conversion rule is made up of more than one or one data branch statement, adopts specific list separator to separate between adjacent statement;
2) each data branch statement is made up of the destination field name connected with equal sign and transformed representation;
3) destination field name can not comprise equal sign and described specific list separator;
4) transformed representation comprises following form: constant, built-in variable or the constant connected by operational symbol and/or built-in variable;
5) title of the built-in variable value that comprises this variable of expression preset is from the subscript of the specific fields in the letter of data to be converted and the data to be converted corresponding with this variable;
6) operational symbol comprises: one or more in addition, subtraction, multiplication, division or modulo operator;
7) data type supported comprises: long shaping long, double-precision floating point type double, date type Date, character string type String, and the constant of the above-mentioned type adopts corresponding literal morphology to represent;
The data type supported also comprises: large numeric type BigInteger and BigDecimal, and the constant of the above-mentioned type adopts the type conversion mode of display to represent;
8) support that built-in variable adopts the type conversion mode of display to represent.
It should be noted that, literal morphology can be adopted to represent for constant, such as, 99 represent a long shaping constant, 99.0 represent a double-precision floating point type constant, for some data type not having literal literary style, and such as BigInteger and BigDecimal, then can adopt the type conversion of display to represent, such as: (BigInteger) 99.The type conversion of display can be adopted represent equally for built-in variable, to adopt correct data type to carry out data conversion in subsequent steps.
Data conversion rule in above-mentioned example xml file is followed above-mentioned syntax rule and is described.In this example, represent that the value of this variable is F from the letter of data to be converted, described data to be converted comprise four fields, adopt subscript mode to represent and are respectively F0, F1, F2, F3 (arrange according to the routine of computing machine circle, subscript is counted from 0)." name=F1 in data conversion rule; " value that specifies target data name equals the value of second field of data to be converted, " sum=(Long) F2+ (Long) F3; " value that specifies target data sum equal the value of the 3rd field of data to be converted and the value of the 4th field and; and before carrying out summation operation; first these two fields will be converted to Long type; wherein, " (Long) F2+ (Long) F3 " part be positioned on the right side of equal sign reaches formula also referred to as data conversion table.
In order to improve the readability of data conversion rule, self-defined DSL language---the data conversion rule configuration language adopted in the above-mentioned object lesson of the present embodiment, also supports the annotation of Java style.Such as: data branch statement " sum=(Long) F2+ (Long) F3 " also can be written as " sum=(Long) F2/* Chinese language * /+(Long) F3/* mathematics */", thus show the Chinese Achievement Test field in the corresponding data to be converted of F2, and the mathematics achievement field in the corresponding data to be converted of F3, adopt this pattern of annotated information, the readability of data conversion rule can be improved.
Can be found out by object lesson above, adopt self-defining DSL language data of description transformation rule, specify how according to data genaration target data to be converted concisely, this expression way is easy-to-understand, even if common operating personnel do not understand specialty development technique, also can complete paired data transformation rule configuration or amendment task.
Above-mentioned example is only schematic, in specific implementation process, can according to the actual needs, and the grammer of the DSL language of definition data of description transformation rule, and data conversion rule is described in configuration file according to grammer.
Step 102: resolve described data conversion rule, generates the data conversion algorithm corresponding with described data conversion rule adopting abstract syntax tree to store.
In this step, by resolving described data conversion rule, generate data conversion algorithm corresponding with it.The data conversion algorithm generated can adopt different forms to store in internal memory, in the present embodiment, adopts abstract syntax tree to store described data conversion algorithm.
Abstract syntax tree (AbstractSyntaxTree-AST) is the tree-shaped form of expression of the abstract syntax structure of language or code.Abstract syntax tree is different from parse tree, what the latter was reflected is concrete syntax structure, and abstract syntax tree often ignores some details (such as bracket) in language, using minimum information to carry out the abstract syntactic structure of representation language itself, is abstract form or the compressed format of parsing tree.
The resolving of this step, converts the process of the data conversion algorithm stored in internal storage structure exactly to by the data conversion rule adopting textual form to describe, that is: in internal memory, adopt abstract syntax tree to store data conversion algorithm corresponding to data conversion rule.
Above-mentioned resolving generally includes lexical analysis and these two processes of grammatical analysis:
1) lexical analysis: the plain text stream read in (that is: data conversion rule character string) is decomposed into element (also referred to as mark or Token) one by one, that is: text flow is converted to mark stream or Token string, and is the expression formula object (expressionobject) that each Element generation is corresponding;
2) grammatical analysis: the sequencing described according to data conversion rule, adopts tree structure to combine above-mentioned expression formula object, just obtains corresponding abstract syntax tree, and what store in this abstract syntax tree is exactly corresponding data conversion algorithm.
In the expression of abstract syntax tree, each leaf node represents the operand that such as constant or variable are such, and other internal node then represents operational symbol.Adopt abstract syntax tree as the storage organization of algorithm, not only in storage, save space, and for the realization of algorithm, also easy to be many.Please refer to Fig. 2, it is the schematic diagram of the abstract syntax tree that data branch statement data1=(F1+2) × F3 is corresponding.
In specific implementation, the analytical capabilities of the above-mentioned decomposition expression formula of codes implement and structure AST can be write voluntarily, also some instruments can be utilized, such as: the syntax parsing device needed for SAG or the resolver Core Generator ANTLR etc. increased income generate.
If comprise many data branch statements in data conversion rule, can according to statement separator symbol (adopting branch as statement separator symbol in such as above-mentioned xml file) by many data branch statements separately, then for every bar data branch statement, perform the parse operation described in this step, generate the abstract syntax tree corresponding with each data branch statement respectively, for the data transformation operations of subsequent step 103 is ready.
Step 103: read data to be converted, and according to described data conversion algorithm, described data to be converted are changed.
In concrete enforcement, different according to the storage mode of described data to be converted, perform corresponding read operation, such as: can read from data file, also can read from data form, or read from database.Concrete reading manner can be determined according to real needs, and the present embodiment does not limit.After reading data to be converted, just can perform data transformation operations according to the data conversion algorithm be stored in abstract syntax tree.
For the abstract syntax tree generated in a step 102, usually there are two selections: explain or compiling.Explaining, briefly, is exactly ergodic abstract syntax tree, the operation needed for simultaneously performing; Compiling, is converted to concrete target machine code exactly by abstract syntax tree, for machine performs, more simply, usually speed is also faster.Specific to the technical scheme of the present embodiment, the data conversion algorithm stored according to abstract syntax tree is needed to calculate the value that data conversion table reaches formula, therefore the first selection above-mentioned is adopted, that is: ergodic abstract syntax tree, corresponding operation is performed at each node, calculate the value that data conversion table reaches formula, and give corresponding target data by this value assignment, thus complete the data transformation operations of this target data.
Below still for the abstract syntax tree in accompanying drawing 2, illustrate how this step carries out data conversion according to the data conversion algorithm stored in described abstract syntax tree.In diagram abstract syntax tree, have 7 nodes, each node can a corresponding expression formula object (expressionobject).Adopt the mode of depth-first traversal abstract syntax tree, calculate the value of each node successively, for node 1, node 2 and node 4, because these three nodes are all leaf node (also claiming atomic node), can the such method of the similar getValue () of executable expressions object, and directly return the value obtained to upper level, for non-leaf nodes 3 and node 5, owing to containing operational symbol separately, then need perform corresponding computing and return operation result to upper level.
Specifically, for node 1, because this expression formula object comprises is built-in variable F1, the value of second field (referring to the explanation in step 101) of data to be converted that what therefore getValue () function returned is; For node 2, because this expression formula object comprises is constant, therefore return this constant value itself directly to upper level; For node 3, what this expression formula object comprised is adding operator, then need to receive the value (that is: No. 2 nodes return value) of the value (that is: No. 1 node return value) of expression formula object corresponding to left child node and expression formula object corresponding to right child node, then perform sum operation, and operation result is returned even higher level of node; Node 4 is similar with the processing procedure of node 1, returns the value of the 4th field of data to be converted; Node 5 is similar with the processing procedure of node 3, calculate the product of the value (that is: node 4 return value) of the value (that is: node 3 return value) of expression formula object corresponding to left child node and expression formula object corresponding to right child node, and result of calculation is returned even higher level of node; Node 7 is root nodes of this abstract syntax tree, that comprise is assignment operation (assignment), therefore the direct value returned according to right child node is variable assignments corresponding to left child node, so far just completes the data transformation operations of target data data1.
If described data conversion rule comprises many data branch statements, that is a more than abstract syntax tree is generated in a step 102, so in this step, using data to be converted as input, for each abstract syntax tree, perform above-mentioned computation process, thus complete whole data transformation operations.
Can be found out by description above, adopt the method that the present embodiment provides, for different data conversion rule, only need in configuration file, adopt field language-specific data of description transformation rule, do not need again to write code, just can complete data transformation operations, thus achieve the multiplexing of data conversion code, effectively reduce the workload of developer; And the description of data conversion rule is carried out owing to have employed field language-specific, be easily understood, learning cost is low, even the modification to the data conversion rule in configuration file also can be competent at by the domestic consumer performing data transformation operations.
In the above-described embodiment, provide a kind of method for data conversion, correspond, the application also provides a kind of device for data conversion.Please refer to Fig. 3, it is a kind of device embodiment schematic diagram for data conversion of the application.Because device embodiment is substantially similar to embodiment of the method, so describe fairly simple, relevant part illustrates see the part of embodiment of the method.The device embodiment of following description is only schematic.
A kind of device for data conversion of the present embodiment, comprising: regular reading unit 301, for reading the data conversion rule adopting Domain Specific Language to describe from the configuration file generated in advance; Algorithm generation unit 302, for resolving described data conversion rule, generates the data conversion algorithm corresponding with described data conversion rule; Date Conversion Unit 303, for reading data to be converted, and changes described data to be converted according to described data conversion algorithm.
Optionally, comprising: the data conversion algorithm that described algorithm generation unit generates, is adopt abstract syntax tree to store.
Corresponding with a kind of above-mentioned method changed for data, the application also provides a kind of method for Data Migration.Please refer to Fig. 4, the process flow diagram of its a kind of embodiment of the method for Data Migration provided for the application, the part that the present embodiment is identical with the first embodiment step repeats no more, below emphasis difference is described.
For the ease of understanding, first the technical scheme of the present embodiment is briefly described.Data Migration described in the present embodiment refers to, reads data from data source, and the data read are write the process of data destination.Because data source may be different with the data model of data destination, therefore need the data to reading from data source to change, and then perform write operation.
Between the data source of different pieces of information model, the demand of carrying out data conversion is needed based on above-mentioned, the technical scheme of the present embodiment provides a kind of data migration method, that is: corresponding data conversion algorithm is generated according to the data conversion rule adopting field language-specific to describe in configuration file, and according to this algorithm, the data to be migrated read from data source are converted to the data meeting data destination data model, and write data destination.Adopt the mode with field language-specific data of description transformation rule in configuration file, not only can realize data conversion unit and divide the multiplexing of code, and owing to adopting self-defining field language-specific data of description transformation rule, easy-to-understand, learning cost is low, even the business personnel being responsible for Data Migration also can modify to the data conversion rule in configuration file as required, thus reduce the workload of developer.
It has been generally acknowledged that in computer realm, data model (DataModel) is associated with Database Systems, that is: in Database Systems in order to provide information to represent the form framework with operational means, also can think like this, data model is the storage mode of data in database, is the basis of Database Systems.Relevant database is different from the data model of NoSQL database naturally, even if two data sources all relation belonging to type databases, if the list structure comprised is different, usually also thinks the data source that two data models are different.
And the data model described in the present embodiment technical scheme is relatively wide in range concept, the tissue of general reference data and storage mode, not only be confined in above-described Database Systems, such as, the concept of data model can also be applied to the file system etc. adopting certain Organization of Data and storage mode.Therefore Data Migration is carried out between the data source different at data model involved by the present embodiment technical scheme, be not limited only to the Data Migration between database table, be also included between the Organization of Data data source different with storage mode and carry out Data Migration, such as: between HDFS (distributed file system of Hadoop) and OceanBase (a kind of relevant database), carry out Data Migration.Between the data source that above-mentioned data model is different, carry out Data Migration, the method that the present embodiment provides can be applied.
Below, the technical scheme of the present embodiment is described in further detail.A kind of method for Data Migration that the present embodiment provides comprises:
Step 401: read data to be migrated from data source.
The main task of this step reads data to be migrated from data source.In concrete enforcement, need to judge whether to establish data cube computation with data source, if not yet set up, then need first to set up data cube computation before this step of execution.Such as, if data source is certain tables of data in relevant database, then can be connected by ODBC or JDBC interface and this relevant database, the information such as database-name, username and password required in the process that connects, can write direct in program code, also can arrange in configuration file in advance and read from this configuration file when connecting.In an object lesson of the present embodiment, the getConnection () interface provided by JDBC and MySQL database are connected.
After connecting with data source client database, just can perform this step and read data to be migrated from data source, such as, for the situation that data source is relevant database, the function that can call JDBC provides realizes, specifically, on the basis of the data cube computation of current foundation, create a Statement object, and call executeQuery (query) method of this object, wherein query is the SQLselect statement reading data from data source; If data source is NoSQL database, then usually can calling Get () function or similar function that NoSQL Database Systems provide, obtaining corresponding value value by specifying key assignments.
If data source is not database, but file system, then need not set up data cube computation, only need to open data file corresponding to data source, then just the fread () that provides of calling system or similar function can read data to be migrated in this step.
The process of above-mentioned reading data to be migrated, can read data line (such as, a record in relevant database) at every turn, usually comprise multiple field in the row data.In an object lesson of the present embodiment, this step reads a record from the tables of data of statistic achievement, wherein comprise sequence number, name, Chinese Achievement Test, such four fields of mathematics achievement, subsequent step 402 carries out data conversion according to above-mentioned data to be migrated, generate the data comprising such four fields of name, Chinese Achievement Test, mathematics achievement and overall scores, subsequent step 403 is responsible in the tables of data of the data write destination after by conversion (referring to the explanation of hereafter appropriate section about step 402 and 403).Next data to be converted can be read subsequently, repeat said process, until data to be converted be all disposed, thus complete data migration operation.
Above-described embodiment is simple, can realize the technical scheme of the present embodiment, but when data bulk to be migrated is many, the execution efficiency of above-mentioned embodiment may not meet the demands.In order to improve the execution efficiency of whole processing procedure, present embodiments provide a kind of preferred implementation that have employed producers and consumers's pattern, that is: between read operation and conversion write operation, with the addition of a data buffer, the mode that can adopt as required one by one in this step or read in batches reads data to be migrated continuously, and the every data line read all is stored in (process of production) in data buffer, when subsequent step needs to process these data, data processing (process of consumption) can be taken out from buffer zone.Adopt this pattern, on the one hand can by producer's code and consumer identification code decoupling zero, can allow producers and consumers's concurrent efforts on the other hand, need not wait for mutually, the unbalanced problem of speed that can also smoothly produce or may exist in process of consumption.
Step 402: described data to be migrated are converted to the data meeting data destination data model according to the data conversion algorithm generated, described data conversion algorithm be according in the configuration file that generates in advance, the data conversion rule that adopts Domain Specific Language to describe generates.
The data to be migrated obtained in step 401, according to the data conversion algorithm generated, are converted to the data meeting data destination data model by this step.Described data conversion algorithm generates in advance, and its generative process comprises step 402-1 and 402-2, is described below to these two steps.
Step 402-1: read the data conversion rule adopting Domain Specific Language to describe from the described configuration file generated in advance.
Description about data conversion rule, Domain Specific Language refers to the relevant portion in the first embodiment in step 101, repeats no more herein.
In an object lesson of the present embodiment, adopting the self-defining DSL language for data of description transformation rule in advance---data conversion rule configuration language is (about its syntax rule, refer to the explanation in the first embodiment step 101), data conversion rule is written with in xml configuration file, in this step, described data conversion rule is read, for step 402-2 is ready from this xml configuration file.
Step 402-2: resolve described data conversion rule, generates the data conversion algorithm corresponding with described data conversion rule adopting abstract syntax tree to store.
Resolution data transformation rule, generally includes lexical analysis and these two processes of grammatical analysis, after completing the process of these two processes, can generate an abstract syntax tree, store data conversion algorithm corresponding to described data conversion rule in this abstract syntax tree.Also can understand like this, to the process that data conversion rule is resolved, exactly the data conversion rule adopting textual form to describe be converted to the process of the data conversion algorithm stored in internal storage structure.Described resolving is substantially identical with the first embodiment, refers to the related description in the first embodiment step 102.
After performing above-mentioned steps 402-1 and 402-2 generation data conversion algorithm in advance, just the data transformation operations of this step can be performed.Similar with the computation process described in the first embodiment, by the abstract syntax tree of traversal for storing data conversion algorithm in this step, and perform corresponding operation at each node, calculate the value that data conversion table reaches formula, and give corresponding target data by this value assignment, thus complete the data transformation operations of this target data.Concrete computation process refers to the related description of the first embodiment step 103, repeats no more herein.
If described data conversion rule comprises many data branch statements, that is in step 402-2, generate a more than abstract syntax tree, so in this step, using a line of current reading data to be migrated as input, for each abstract syntax tree, perform above-mentioned computation process successively, thus according to the field in these data to be migrated, calculate the value of each aiming field meeting destination data model, complete the data transformation operations of these data to be migrated.
In an object lesson of the present embodiment, the source tables of data performing Data Migration comprises sequence number, name, Chinese Achievement Test, such four fields of mathematics achievement, destination tables of data comprises such four fields of name (name), Chinese Achievement Test (chinese), mathematics achievement (math) and overall scores (sum), the data conversion rule configured in xml file for " name=F1; Chinese=F2; Math=F3; Sum=(Long) F2+ (Long) F3 ".Adopt the method that the present embodiment provides, four abstract syntax tree corresponding with above-mentioned data conversion rule are generated in the above-mentioned step 402-2 performed in advance, in this step using a line of current reading data to be migrated as input, calculate for each abstract syntax tree, obtain the value of such four aiming fields of name, chinese, math and sum, thus obtain the target data meeting destination data model according to data to be migrated.
In concrete enforcement, if have employed producers and consumers's pattern, that is: in step 401 data to be migrated are written with in data buffer, so this step needs to read data to be migrated line by line from described data buffer, if described data to be migrated are adopt the form of a line character string to store in data buffer, then also need to extract each field (such as from the character string read, similar get (index) interface utilizing data buffer to provide, the field corresponding with index is extracted) from string data, then the data to be migrated formed with each field are input, the data conversion algorithm utilizing abstract syntax tree to store calculates the data meeting data destination data model.
In concrete enforcement, if data volume to be migrated is larger, even if having employed above-mentioned producers and consumers's pattern, because the data transformation operations of this step relates to abstract syntax traversal of tree and corresponding computing, relatively consuming time, likely become the bottleneck of system.Based on above-mentioned consideration, in order to improve the handling property of whole system, the embodiment that multi-thread concurrent performs can be adopted, specifically, before this step of execution, can according to the concurrent number of specifying in the configuration file generated in advance, create the worker thread of respective number, by the above-mentioned operation of reading data and translation data from data buffer of these worker thread concurrence performance, while raising cpu busy percentage, improve the efficiency of whole system.
Step 403: by the data write data destination performed after conversion operations.
In concrete enforcement, in order to the data write data destination that will perform after conversion operations, the Database data cube computation of usual needs elder generation and data destination, or open the associated data files of data destination, the relevant portion that its processing mode and step 401 describe is similar, repeats no more herein.
If described data destination is the tables of data in database, after setting up data cube computation, just can write the data after conversion in tables of data.Such as, for the situation that data destination is relevant database, the function that can call JDBC provides realizes, specifically, on the basis of the data cube computation of current foundation, create a Statement object, and call execute (sql) method of this object, wherein parameter s ql is the SQLinsert statement writing data in tables of data; If data destination is NoSQL database, then usually can call Put () function or similar function that NoSQL Database Systems provide, by specifying rowkey, row bunch, the information such as row name, by tables of data corresponding for the data write after conversion.Execute (sql) method, query represents inquiry
If data destination is not above-mentioned database, but file system, then after opening data file corresponding to data destination, according to the data layout that file system is pre-set, the data after execution conversion operations write in described data file by the fwrite () that calling system provides or similar function.
In specific implementation process, if data destination be not deployed in implement the present embodiment technical scheme local device on, operation so to data destination write data generally includes transmission data, transmission data, receives data, writes the several like this link of data, if the data volume writing data destination is larger, each data all just can complete write operation through above-mentioned link, so time overhead can be larger, affects the overall performance of data migration process.For above-mentioned consideration, the present embodiment technical scheme provides the preferred implementation of a kind of batch write.
Specifically, this step, before the data after execution conversion operations are write data destination, first judges whether the scale of described data is greater than or equal to the batch of specifying in configuration file and imports threshold value; If so, then the mode that described data acquisition batch imports is write data destination; If not, then described data are temporarily stored in local internal memory or other storage mediums.
In an object lesson of the present embodiment, the xml configuration file fragment pre-set is as follows:
Wherein, the batch that pre-configured batch-size is described in this step imports threshold value, and this value is 512, represents the conversion operations often completing 512 row data, performs once to the write operation of data destination.Adopt this batch processing mode, the time overhead that Internet Transmission performs write operation can be reduced by.
Except batch-size, the concurrent number of the multithreading of the correspondence of the concurrentcy in above-mentioned xml file in above-mentioned steps 402, error-limit wherein is then for controlling error log write operation, and related description refers to and hereafter describes.It should be noted that, above-mentioned xml file fragment is the part intercepted from integrated configuration file, and concrete value is wherein all only schematic, in specific implementation process, according to actual needs, can configure other parameters and value.
In specific implementation process, because data to be migrated are often from different business ends, data volume is huge, data type is various, the integrality of data, consistance are difficult to be guaranteed, therefore when writing the data after changing to data destination, the phenomenon writing data failure may inevitably be there is, such as: during by JDBC interface to data destination write data, if wherein certain field does not meet the type definition of destination to this field, JDBC can report data write error.
In this case, can the mistake occurred in ablation process be write in daily record, for follow-up, reference when successful data are modified and again import not moved to part, contribute to location and the maintenance of problem data.Consider from the storage of file and the angle being convenient to manage, the size of journal file has certain restriction usually, in order to avoid causing journal file excessive because record above-mentioned error message, error logging threshold value (error-limit see in example xml file fragment above) can be specified in configuration file, and the data write error occurred in this data migration process is accumulated by mistake, when the errors number occurred is less than or equal to this threshold value, by in described mistake and relevant information write journal file, otherwise do not record described mistake and relevant information.
Can be found out by description above, the method adopting the present embodiment to provide carries out Data Migration between the data source of different pieces of information model, only need in configuration file with field language-specific data of description transformation rule, do not need the code of Update Table conversion portion, thus achieve data conversion unit and divide the multiplexing of code, reduce the workload of developer, and owing to adopting self-defining field language-specific data of description transformation rule, easy-to-understand, learning cost is low, even the business personnel being responsible for Data Migration also can be configured the data conversion rule in configuration file or revise as required, thus reduce the workload of developer.
In the above-described embodiment, provide a kind of method for Data Migration, correspond, the application also provides a kind of device for Data Migration.Please refer to Fig. 5, it is the embodiment schematic diagram of a kind of device for Data Migration of the application.Because device embodiment is substantially similar to embodiment of the method, so describe fairly simple, relevant part illustrates see the part of embodiment of the method.The device embodiment of following description is only schematic.
A kind of device for Data Migration of the present embodiment, comprising: data-reading unit 501, for reading data to be migrated from data source; Date Conversion Unit 502, for described data to be migrated being converted to the data meeting data destination data model according to the data conversion algorithm that generated, described data conversion algorithm be according in the configuration file generated in advance, the data conversion rule that adopts Domain Specific Language to describe generates; Data write unit 503, for the data write data destination that will perform after conversion operations.
Optionally, described device also comprises:
Rule reading unit, for reading the data conversion rule adopting Domain Specific Language to describe from the described configuration file generated in advance;
Algorithm generation unit, for resolving described data conversion rule, generates the data conversion algorithm corresponding with described data conversion rule.
Optionally, the data conversion algorithm that described algorithm generation unit generates adopts abstract syntax tree to store.
Optionally, described device also comprises:
Data buffer unit, for being stored in data buffer by the data to be migrated read from data source;
Accordingly, described Date Conversion Unit specifically for, from described data buffer, read data to be migrated, and be converted into the data meeting data destination data model.
Optionally, described device also comprises:
Thread creation unit, for according to the concurrent number of specifying in described configuration file, creates the worker thread of respective number;
Accordingly, the reading data of described Date Conversion Unit execution and the operation of translation data are by above-mentioned worker thread concurrence performance.
Optionally, described device also comprises:
Judging unit, whether the scale for the data after judging described execution conversion operations is greater than or equal to the batch of specifying in described configuration file imports threshold value; If so, described data write unit action is then triggered; If not, then the data after conversion operations are performed described in buffer memory.
Optionally, described device also comprises:
Whether fault monitoring record cell, making a mistake in the step of the data write data destination after execution conversion operations for monitoring; If make a mistake, record the relevant information of described mistake, and cumulative error number of times, when described errors number is less than or equal to the error logging threshold value of specifying in described configuration file, by the relevant information of described mistake write journal file.
Although the application with preferred embodiment openly as above; but it is not for limiting the application; any those skilled in the art are not departing from the spirit and scope of the application; can make possible variation and amendment, the scope that therefore protection domain of the application should define with the application's claim is as the criterion.
In one typically configuration, computing equipment comprises one or more processor (CPU), input/output interface, network interface and internal memory.
Internal memory may comprise the volatile memory in computer-readable medium, and the forms such as random access memory (RAM) and/or Nonvolatile memory, as ROM (read-only memory) (ROM) or flash memory (flashRAM).Internal memory is the example of computer-readable medium.
1, computer-readable medium comprises permanent and impermanency, removable and non-removable media can be stored to realize information by any method or technology.Information can be computer-readable instruction, data structure, the module of program or other data.The example of the storage medium of computing machine comprises, but be not limited to phase transition internal memory (PRAM), static RAM (SRAM), dynamic RAM (DRAM), the random access memory (RAM) of other types, ROM (read-only memory) (ROM), Electrically Erasable Read Only Memory (EEPROM), fast flash memory bank or other memory techniques, read-only optical disc ROM (read-only memory) (CD-ROM), digital versatile disc (DVD) or other optical memory, magnetic magnetic tape cassette, tape magnetic rigid disk stores or other magnetic storage apparatus or any other non-transmitting medium, can be used for storing the information can accessed by computing equipment.According to defining herein, computer-readable medium does not comprise non-temporary computer readable media (transitorymedia), as data-signal and the carrier wave of modulation.
2, it will be understood by those skilled in the art that the embodiment of the application can be provided as method, system or computer program.Therefore, the application can adopt the form of complete hardware embodiment, completely software implementation or the embodiment in conjunction with software and hardware aspect.And the application can adopt in one or more form wherein including the upper computer program implemented of computer-usable storage medium (including but not limited to magnetic disk memory, CD-ROM, optical memory etc.) of computer usable program code.

Claims (24)

1., for a method for data conversion, it is characterized in that, comprising:
From the configuration file generated in advance, read the data conversion rule adopting Domain Specific Language to describe, described data conversion rule, is used to specify the conversion operations performed needed for data genaration target data to be converted;
Resolve described data conversion rule, generate the data conversion algorithm corresponding with described data conversion rule;
Read data to be converted, and according to described data conversion algorithm, described data to be converted are changed.
2. the method for data conversion according to claim 1, is characterized in that, the data conversion algorithm corresponding with described data conversion rule of described generation, is to adopt abstract syntax tree to store.
3. the method for data conversion according to claim 1 and 2, it is characterized in that, the syntax rule for the described Domain Specific Language of data of description transformation rule comprises:
Data conversion rule is made up of more than one or one data branch statement, adopts specific list separator to separate between adjacent statement; With,
Each data branch statement is made up of the destination field name connected with equal sign and transformed representation; With,
Destination field name can not comprise equal sign and described specific list separator; With,
Transformed representation comprises following form: constant, built-in variable or the constant connected by operational symbol and/or built-in variable; With,
The value that the title of built-in variable comprises this variable of expression preset is from the subscript of the specific fields in the letter of data to be converted and the data to be converted corresponding with this variable.
4. the method for data conversion according to claim 3, it is characterized in that, the operational symbol in described syntax rule comprises: one or more in addition, subtraction, multiplication, division or modulo operator.
5. the method for data conversion according to claim 3, it is characterized in that, described syntax rule also comprises:
The data type supported comprises: long shaping long, double-precision floating point type double, date type Date, character string type String, and the constant of the above-mentioned type adopts corresponding literal morphology to represent;
The data type supported also comprises: large numeric type BigInteger and BigDecimal, and the constant of the above-mentioned type adopts the type conversion mode of display to represent.
6. the method for data conversion according to claim 5, it is characterized in that, described syntax rule also comprises:
Support that built-in variable adopts the type conversion mode of display to represent.
7., for a device for data conversion, it is characterized in that, comprising:
Rule reading unit, for reading the data conversion rule adopting Domain Specific Language to describe from the configuration file generated in advance;
Algorithm generation unit, for resolving described data conversion rule, generates the data conversion algorithm corresponding with described data conversion rule;
Date Conversion Unit, for reading data to be converted, and changes described data to be converted according to described data conversion algorithm.
8. the device for data conversion according to claim 7, is characterized in that, the data conversion algorithm that described algorithm generation unit generates, and is to adopt abstract syntax tree to store.
9. for a method for Data Migration, it is characterized in that, comprising:
Data to be migrated are read from data source;
According to the data conversion algorithm generated, described data to be migrated are converted to the data meeting data destination data model, described data conversion algorithm be according in the configuration file that generates in advance, the data conversion rule that adopts Domain Specific Language to describe generates;
By the data write data destination performed after conversion operations.
10. the method for Data Migration according to claim 9, is characterized in that, before described data to be migrated are converted to the data meeting data destination data model, performs following operation:
The data conversion rule adopting Domain Specific Language to describe is read from the described configuration file generated in advance;
Resolve described data conversion rule, generate the data conversion algorithm corresponding with described data conversion rule.
11. methods for Data Migration according to claim 10, is characterized in that, the data conversion algorithm corresponding with data conversion rule of described generation, are to adopt abstract syntax tree to store.
12. methods for Data Migration according to claim 10, is characterized in that, the syntax rule for the described Domain Specific Language of data of description transformation rule comprises:
Data conversion rule is made up of more than one or one data branch statement, adopts specific list separator to separate between adjacent statement; With,
Each data branch statement is made up of the destination field name connected with equal sign and transformed representation;
Destination field name can not comprise equal sign and described specific list separator; With,
Transformed representation comprises following form: constant, built-in variable or the constant connected by operational symbol and/or built-in variable; With,
The value that the title of built-in variable comprises this variable of expression preset is from the subscript of the specific fields in the letter of data to be migrated and the data to be migrated corresponding with this variable.
13. methods for Data Migration according to claim 12, it is characterized in that, described syntax rule also comprises:
The operational symbol supported comprises: one or more in addition, subtraction, multiplication, division or modulo operator; And/or,
The data type supported comprises: long shaping long, double-precision floating point type double, date type Date, character string type String, and the constant of the above-mentioned type adopts corresponding literal morphology to represent; And/or,
The data type supported also comprises: large numeric type BigInteger and BigDecimal, and the constant of the above-mentioned type adopts the type conversion mode of display to represent; And/or,
Support that built-in variable adopts the type conversion mode of display to represent.
14., according to the arbitrary described method for Data Migration of claim 9-13, is characterized in that, comprising:
The data to be migrated read from data source are stored in data buffer;
Accordingly, describedly described data to be migrated are converted to the data meeting data destination data model refer to, from described data buffer, read data to be migrated, and be converted into the data meeting data destination data model.
15. methods for Data Migration according to claim 14, is characterized in that, from described data buffer, reading data to be migrated, and before being converted into the data meeting data destination data model, perform following step:
According to the concurrent number of specifying in described configuration file, create the worker thread of respective number;
Accordingly, describedly from described data buffer, read data to be migrated, and be converted into the data meeting data destination data model and refer to, by the above-mentioned reading of worker thread concurrence performance created and conversion operations.
16. methods for Data Migration according to claim 15, is characterized in that, comprising:
Judge whether the scale of the data after described execution conversion operations is greater than or equal to the batch of specifying in described configuration file and imports threshold value;
If so, the described step data performed after conversion operations being write data destination is then performed;
If not, then the data after conversion operations are performed described in buffer memory.
17. methods for Data Migration according to claim 16, is characterized in that, comprising:
Monitor and whether make a mistake in the step of the data write data destination after execution conversion operations described; If make a mistake, then perform following step:
Record the relevant information of described mistake, and cumulative error number of times;
Judge whether described errors number is less than or equal to the error logging threshold value of specifying in described configuration file; If so, then by the relevant information of described mistake write journal file.
18. 1 kinds for the device of Data Migration, is characterized in that, comprising:
Data-reading unit, for reading data to be migrated from data source;
Date Conversion Unit, for described data to be migrated being converted to the data meeting data destination data model according to the data conversion algorithm that generated, described data conversion algorithm be according in the configuration file generated in advance, the data conversion rule that adopts Domain Specific Language to describe generates;
Data write unit, for the data write data destination that will perform after conversion operations.
19. devices for Data Migration according to claim 18, it is characterized in that, described device also comprises:
Rule reading unit, for reading the data conversion rule adopting Domain Specific Language to describe from the described configuration file generated in advance;
Algorithm generation unit, for resolving described data conversion rule, generates the data conversion algorithm corresponding with described data conversion rule.
20. devices for Data Migration according to claim 19, is characterized in that, the data conversion algorithm that described algorithm generation unit generates, and are to adopt abstract syntax tree to store.
21. according to the arbitrary described device for Data Migration of claim 18-20, and it is characterized in that, described device also comprises:
Data buffer unit, for being stored in data buffer by the data to be migrated read from data source;
Accordingly, described Date Conversion Unit specifically for, from described data buffer, read data to be migrated, and be converted into the data meeting data destination data model.
22. devices for Data Migration according to claim 21, it is characterized in that, described device also comprises:
Thread creation unit, for according to the concurrent number of specifying in described configuration file, creates the worker thread of respective number;
Accordingly, the reading data of described Date Conversion Unit execution and the operation of translation data are by above-mentioned worker thread concurrence performance.
23. devices for Data Migration according to claim 22, it is characterized in that, described device also comprises:
Judging unit, whether the scale for the data after judging described execution conversion operations is greater than or equal to the batch of specifying in described configuration file imports threshold value; If so, described data write unit action is then triggered; If not, then the data after conversion operations are performed described in buffer memory.
24. devices for Data Migration according to claim 23, it is characterized in that, described device also comprises:
Whether fault monitoring record cell, making a mistake in the step of the data write data destination after execution conversion operations for monitoring; If make a mistake, record the relevant information of described mistake, and cumulative error number of times, when described errors number is less than or equal to the error logging threshold value of specifying in described configuration file, by the relevant information of described mistake write journal file.
CN201410375147.9A 2014-07-31 2014-07-31 For data conversion, the method and apparatus of Data Migration Active CN105335412B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410375147.9A CN105335412B (en) 2014-07-31 2014-07-31 For data conversion, the method and apparatus of Data Migration

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410375147.9A CN105335412B (en) 2014-07-31 2014-07-31 For data conversion, the method and apparatus of Data Migration

Publications (2)

Publication Number Publication Date
CN105335412A true CN105335412A (en) 2016-02-17
CN105335412B CN105335412B (en) 2019-06-11

Family

ID=55285945

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410375147.9A Active CN105335412B (en) 2014-07-31 2014-07-31 For data conversion, the method and apparatus of Data Migration

Country Status (1)

Country Link
CN (1) CN105335412B (en)

Cited By (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105824914A (en) * 2016-03-15 2016-08-03 浙江大学 Configuration-based snowflake model information extraction method
CN105930354A (en) * 2016-04-08 2016-09-07 四川师范大学 Storage model conversion method and device
CN106528898A (en) * 2017-01-04 2017-03-22 泰康保险集团股份有限公司 Method and device for converting data of non-relational database into relational database
CN106789058A (en) * 2016-12-09 2017-05-31 南京理工大学 One kind acts on behalf of re-encryption arthmetic statement and analytic method
CN106843858A (en) * 2016-12-30 2017-06-13 南京大学 A kind of method that Spark codes are generated by conversion primitive
CN107357792A (en) * 2016-05-10 2017-11-17 北京京东尚科信息技术有限公司 A kind of method and apparatus of data parsing
CN107436882A (en) * 2016-05-25 2017-12-05 中国移动通信集团广东有限公司 A kind of modeling method and device based on big data platform
CN108153896A (en) * 2018-01-08 2018-06-12 第四范式(北京)技术有限公司 For input data, the processing method and processing device of output data
CN108512902A (en) * 2018-02-24 2018-09-07 国家计算机网络与信息安全管理中心 A kind of data sharing method and device
CN108984757A (en) * 2018-07-18 2018-12-11 上海汉得信息技术股份有限公司 A kind of data lead-in method and equipment
CN109446255A (en) * 2018-09-30 2019-03-08 珠海沙盒网络科技有限公司 A kind of storage method and system that can be adaptive
CN109657103A (en) * 2018-12-19 2019-04-19 广州天鹏计算机科技有限公司 Conversion method, device, computer equipment and the storage medium of data structure
CN110019139A (en) * 2017-12-29 2019-07-16 中国移动通信集团内蒙古有限公司 Method, apparatus, electronic equipment and the storage medium of Data Migration
CN110222117A (en) * 2019-05-23 2019-09-10 武汉达梦数据库有限公司 A kind of data conversion synchronous method, equipment and the storage medium of heterogeneous database
WO2019237333A1 (en) * 2018-06-15 2019-12-19 Entit Software Llc Converting database language statements between dialects
CN110674109A (en) * 2019-09-06 2020-01-10 中国平安财产保险股份有限公司 Data import method, system, computer device and computer readable storage medium
CN110704290A (en) * 2019-09-27 2020-01-17 百度在线网络技术(北京)有限公司 Log analysis method and device
CN110728121A (en) * 2019-09-17 2020-01-24 杭州安恒信息技术股份有限公司 Data import system and method for realizing zero development based on existing interface
CN110879801A (en) * 2019-11-21 2020-03-13 中国银行股份有限公司 Data reading method and device
WO2020215576A1 (en) * 2019-04-25 2020-10-29 平安科技(深圳)有限公司 Method and apparatus for converting sql language into dsl language, computer device, and storage medium
CN112532517A (en) * 2020-11-05 2021-03-19 东北大学 OSPF protocol configuration comprehensive scheme based on domain specific language
CN112765209A (en) * 2021-02-07 2021-05-07 浪潮云信息技术股份公司 SQL statement syntax migration method and system between databases
CN113111108A (en) * 2021-04-06 2021-07-13 创意信息技术股份有限公司 File data source warehousing analysis access method
CN113179332A (en) * 2021-06-28 2021-07-27 脉策(上海)智能科技有限公司 Method, electronic device and storage medium for acquiring configuration information
WO2021179461A1 (en) * 2020-03-12 2021-09-16 深圳中兴网信科技有限公司 Information processing method and device, and computer readable storage medium
CN113656475A (en) * 2021-08-17 2021-11-16 在线途游(北京)科技有限公司 Data conversion method and device based on DSL intermediate description language
CN113761283A (en) * 2020-06-01 2021-12-07 中移(苏州)软件技术有限公司 Method, device, equipment and storage medium for reading XML file
CN115390852A (en) * 2022-08-26 2022-11-25 支付宝(杭州)信息技术有限公司 Method and device for generating uniform abstract syntax tree and program analysis
CN116483475A (en) * 2023-06-26 2023-07-25 北京营安科技有限公司 Configuration method and device of data conversion information, electronic equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090254891A1 (en) * 2008-04-07 2009-10-08 Microsoft Corporation Automatic reverse engineering of input formats
CN101697166A (en) * 2009-10-28 2010-04-21 浪潮电子信息产业股份有限公司 Method for accelerating data integration of heterogeneous platform
CN101763261A (en) * 2009-12-28 2010-06-30 山东中创软件商用中间件股份有限公司 Method, system and device for extracting, converting and loading data
CN102135963A (en) * 2010-01-21 2011-07-27 深圳市智骏数据科技有限公司 Data transfer method and system
CN102508929A (en) * 2011-12-01 2012-06-20 浪潮(北京)电子信息产业有限公司 System, device and method for realizing data migration of heterogeneous database
CN102999537A (en) * 2011-09-19 2013-03-27 阿里巴巴集团控股有限公司 System and method for data migration

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090254891A1 (en) * 2008-04-07 2009-10-08 Microsoft Corporation Automatic reverse engineering of input formats
CN101697166A (en) * 2009-10-28 2010-04-21 浪潮电子信息产业股份有限公司 Method for accelerating data integration of heterogeneous platform
CN101763261A (en) * 2009-12-28 2010-06-30 山东中创软件商用中间件股份有限公司 Method, system and device for extracting, converting and loading data
CN102135963A (en) * 2010-01-21 2011-07-27 深圳市智骏数据科技有限公司 Data transfer method and system
CN102999537A (en) * 2011-09-19 2013-03-27 阿里巴巴集团控股有限公司 System and method for data migration
CN102508929A (en) * 2011-12-01 2012-06-20 浪潮(北京)电子信息产业有限公司 System, device and method for realizing data migration of heterogeneous database

Cited By (40)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105824914A (en) * 2016-03-15 2016-08-03 浙江大学 Configuration-based snowflake model information extraction method
CN105930354A (en) * 2016-04-08 2016-09-07 四川师范大学 Storage model conversion method and device
CN105930354B (en) * 2016-04-08 2020-02-14 四川师范大学 Storage model conversion method and device
CN107357792B (en) * 2016-05-10 2019-12-31 北京京东尚科信息技术有限公司 Data analysis method and device
CN107357792A (en) * 2016-05-10 2017-11-17 北京京东尚科信息技术有限公司 A kind of method and apparatus of data parsing
CN107436882A (en) * 2016-05-25 2017-12-05 中国移动通信集团广东有限公司 A kind of modeling method and device based on big data platform
CN106789058A (en) * 2016-12-09 2017-05-31 南京理工大学 One kind acts on behalf of re-encryption arthmetic statement and analytic method
CN106789058B (en) * 2016-12-09 2020-11-13 南京理工大学 Proxy re-encryption algorithm description and analysis method
CN106843858A (en) * 2016-12-30 2017-06-13 南京大学 A kind of method that Spark codes are generated by conversion primitive
CN106843858B (en) * 2016-12-30 2020-06-09 南京大学 Method for generating Spark code by converting primitive
CN106528898A (en) * 2017-01-04 2017-03-22 泰康保险集团股份有限公司 Method and device for converting data of non-relational database into relational database
CN110019139A (en) * 2017-12-29 2019-07-16 中国移动通信集团内蒙古有限公司 Method, apparatus, electronic equipment and the storage medium of Data Migration
CN108153896A (en) * 2018-01-08 2018-06-12 第四范式(北京)技术有限公司 For input data, the processing method and processing device of output data
CN108153896B (en) * 2018-01-08 2020-07-10 第四范式(北京)技术有限公司 Processing method and device for input data and output data
CN108512902A (en) * 2018-02-24 2018-09-07 国家计算机网络与信息安全管理中心 A kind of data sharing method and device
WO2019237333A1 (en) * 2018-06-15 2019-12-19 Entit Software Llc Converting database language statements between dialects
CN108984757A (en) * 2018-07-18 2018-12-11 上海汉得信息技术股份有限公司 A kind of data lead-in method and equipment
CN109446255A (en) * 2018-09-30 2019-03-08 珠海沙盒网络科技有限公司 A kind of storage method and system that can be adaptive
CN109657103A (en) * 2018-12-19 2019-04-19 广州天鹏计算机科技有限公司 Conversion method, device, computer equipment and the storage medium of data structure
WO2020215576A1 (en) * 2019-04-25 2020-10-29 平安科技(深圳)有限公司 Method and apparatus for converting sql language into dsl language, computer device, and storage medium
CN110222117A (en) * 2019-05-23 2019-09-10 武汉达梦数据库有限公司 A kind of data conversion synchronous method, equipment and the storage medium of heterogeneous database
CN110674109A (en) * 2019-09-06 2020-01-10 中国平安财产保险股份有限公司 Data import method, system, computer device and computer readable storage medium
CN110674109B (en) * 2019-09-06 2024-02-06 中国平安财产保险股份有限公司 Data importing method, system, computer equipment and computer readable storage medium
CN110728121A (en) * 2019-09-17 2020-01-24 杭州安恒信息技术股份有限公司 Data import system and method for realizing zero development based on existing interface
CN110728121B (en) * 2019-09-17 2023-08-01 杭州安恒信息技术股份有限公司 Data importing system and method for realizing zero development based on existing interface
CN110704290A (en) * 2019-09-27 2020-01-17 百度在线网络技术(北京)有限公司 Log analysis method and device
CN110704290B (en) * 2019-09-27 2024-02-13 百度在线网络技术(北京)有限公司 Log analysis method and device
CN110879801A (en) * 2019-11-21 2020-03-13 中国银行股份有限公司 Data reading method and device
WO2021179461A1 (en) * 2020-03-12 2021-09-16 深圳中兴网信科技有限公司 Information processing method and device, and computer readable storage medium
CN113761283A (en) * 2020-06-01 2021-12-07 中移(苏州)软件技术有限公司 Method, device, equipment and storage medium for reading XML file
CN113761283B (en) * 2020-06-01 2023-09-05 中移(苏州)软件技术有限公司 Method and device for reading XML file, equipment and storage medium
CN112532517B (en) * 2020-11-05 2022-04-05 东北大学 OSPF protocol configuration comprehensive scheme based on domain specific language
CN112532517A (en) * 2020-11-05 2021-03-19 东北大学 OSPF protocol configuration comprehensive scheme based on domain specific language
CN112765209A (en) * 2021-02-07 2021-05-07 浪潮云信息技术股份公司 SQL statement syntax migration method and system between databases
CN113111108A (en) * 2021-04-06 2021-07-13 创意信息技术股份有限公司 File data source warehousing analysis access method
CN113179332A (en) * 2021-06-28 2021-07-27 脉策(上海)智能科技有限公司 Method, electronic device and storage medium for acquiring configuration information
CN113179332B (en) * 2021-06-28 2021-09-17 脉策(上海)智能科技有限公司 Method, electronic device and storage medium for acquiring configuration information
CN113656475A (en) * 2021-08-17 2021-11-16 在线途游(北京)科技有限公司 Data conversion method and device based on DSL intermediate description language
CN115390852A (en) * 2022-08-26 2022-11-25 支付宝(杭州)信息技术有限公司 Method and device for generating uniform abstract syntax tree and program analysis
CN116483475A (en) * 2023-06-26 2023-07-25 北京营安科技有限公司 Configuration method and device of data conversion information, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN105335412B (en) 2019-06-11

Similar Documents

Publication Publication Date Title
CN105335412A (en) Method and device for data conversion and data migration
CN106897322B (en) A kind of access method and device of database and file system
Horrocks et al. Using semantic technology to tame the data variety challenge
US11023500B2 (en) Systems and methods for code parsing and lineage detection
CN110023923A (en) It generates data and converts workflow
CN102541521B (en) Automatic operating instruction generating device based on structured query language and method
US8108437B2 (en) Sortable hash table
CN103838672A (en) Automated testing method and device for all-purpose financial statements
CN103049251B (en) A kind of data base persistence layer device and database operation method
CN107644286A (en) Workflow processing method and device
CN100550020C (en) A kind of method and apparatus that is used to solve the Chinese software issue of supporting multilanguage
EP2984585B1 (en) Binding of data source to compound control
CN109491658A (en) The generation method and device of computer-executable code data
CN104102701B (en) A kind of historical data based on hive is achieved and querying method
CN101499063A (en) Tracing-based database schema evolution method and system
CN110264361A (en) A kind of data analysis method and device of block chain
Cai et al. A pattern-based code transformation approach for cloud application migration
US10223086B2 (en) Systems and methods for code parsing and lineage detection
CN108701153B (en) Method, system and computer readable storage medium for responding to natural language query
CN105867886A (en) Method and device for writing table
CN108932225B (en) Method and system for converting natural language requirements into semantic modeling language statements
CN110308907A (en) Data transfer device, device, storage medium and electronic equipment
CN106843822A (en) One kind performs code generating method and equipment
CN110019177A (en) The method and apparatus of rule storage
Yan et al. Tffv: Translator from EOS smart contracts to formal verification language

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant