CN113656377A - Automatic matching method and device for data migration, computer equipment and storage medium - Google Patents

Automatic matching method and device for data migration, computer equipment and storage medium Download PDF

Info

Publication number
CN113656377A
CN113656377A CN202110967081.2A CN202110967081A CN113656377A CN 113656377 A CN113656377 A CN 113656377A CN 202110967081 A CN202110967081 A CN 202110967081A CN 113656377 A CN113656377 A CN 113656377A
Authority
CN
China
Prior art keywords
data
source
database system
knowledge base
mode
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110967081.2A
Other languages
Chinese (zh)
Inventor
高勇
陈煦文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Vanrui Intelligent Technology Co ltd
Original Assignee
Shenzhen Vanrui Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Vanrui Intelligent Technology Co ltd filed Critical Shenzhen Vanrui Intelligent Technology Co ltd
Priority to CN202110967081.2A priority Critical patent/CN113656377A/en
Publication of CN113656377A publication Critical patent/CN113656377A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/214Database migration support
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses an automatic matching method, an automatic matching device, computer equipment and a storage medium for data migration, wherein the method comprises the steps of constructing a universal data processing professional knowledge base; constructing an auxiliary knowledge base for processing personalized data; constructing a source data mode table to be migrated; and reading the environmental information from a target database system, processing the professional knowledge base and the personalized data processing auxiliary knowledge base based on the universal data, and calculating a target data mode by adopting a probability deduction algorithm. The method realizes the automatic mapping of the field types and names related to the heterogeneous source and target database systems; the method realizes the automatic generation of a new target database system based on a source database system; the automatic mapping to the existing target database system is realized based on the source database system; the automatic mapping to a plurality of target database systems is realized based on a plurality of source database systems; and by means of the common industry experience and the actual work experience of a unit entity, the matching efficiency of the database system is greatly improved.

Description

Automatic matching method and device for data migration, computer equipment and storage medium
Technical Field
The present invention relates to the field of database migration, and in particular, to an automatic matching method and apparatus for data migration, a computer device, and a storage medium.
Background
In the information age, data is not intended to be the most important, and with the development of technology, the extension of business and the work of data migration, the work is not happening all the time. Typically the data is stored in a file or a dedicated database. The format of the file comprises binary structure, specific separator, XML, Json and other forms; the databases are classified into relational, key-pair, column, schema, document, etc., each type of database is provided by a different database development team, for example, the mechanisms for storing data are different from Oracle, DB2, Sybase, Informix, Redis, Casandra, NEO4J, MongoDB, etc., and each type of database has a different version, for example, Oracle 9i, Oracle 10g, etc. Mapping of data structures (pattern matching) is indispensable when migrating data between different data warehouses.
Most of the pattern matching methods used by the existing data migration tools are manually realized. Some databases of the same vendor, such as golden gate from Oracle, provide automatic mapping capabilities between different versions, but primarily maintain field type mapping of database schemas, do not enable mapping between different data schemas, and are not able to span databases produced by different companies.
Disclosure of Invention
The invention aims to provide an automatic matching method and device for data migration, computer equipment and a storage medium, and aims to solve the problem that the migration of an existing database system needs manual operation and is troublesome.
In order to solve the technical problems, the invention aims to realize the following technical scheme: an automatic matching method for data migration is provided, which comprises the following steps:
summarizing an original database system, uniformly defining each data type in each original database system to obtain a plurality of standard data types, establishing mapping between the data types in the original database system and the standard data types, and performing migration verification according to the mapping relation to generate a universal data processing professional knowledge base;
counting a log table in the historical migration activity to obtain data pattern associated information, and constructing an auxiliary knowledge base for personalized data processing based on the data pattern associated information;
acquiring a source data mode and actual source data of a source database system, reading and recording a data type in the source data mode, and forming a source data mode table to be migrated, wherein the data type comprises a field type and a field name;
and reading environmental information from a target database system, and calculating a target data mode corresponding to the source data mode table to be migrated by adopting a probability deduction algorithm based on the universal data processing professional knowledge base and the personalized data processing auxiliary knowledge base.
In addition, another object of the present invention is to provide an automatic matching apparatus for data migration, which includes:
the universal data processing professional knowledge base unit is used for summarizing the original database systems, uniformly defining each data type in each original database system to obtain a plurality of standard data types, establishing mapping between the data types in the original database systems and the standard data types, and performing migration verification according to the mapping relation to generate a universal data processing professional knowledge base;
the system comprises an individualized data processing auxiliary knowledge base unit, a data pattern association information acquisition unit and an individualized data processing auxiliary knowledge base unit, wherein the individualized data processing auxiliary knowledge base unit is used for counting a log table in historical migration activities to obtain data pattern association information and establishing an individualized data processing auxiliary knowledge base based on the data pattern association information;
the system comprises a source data mode table unit to be migrated, a source data mode table unit and a source data model table unit, wherein the source data mode table unit to be migrated is used for acquiring a source data mode and actual source data of a source database system, reading and recording a data type in the source data mode and forming a source data mode table to be migrated, and the data type comprises a field type and a field name;
and the calculation target data mode unit is used for reading the environmental information from a target database system, and calculating a target data mode corresponding to the source data mode table to be migrated by adopting a probability deduction algorithm based on the universal data processing professional knowledge base and the personalized data processing auxiliary knowledge base.
In addition, an embodiment of the present invention further provides a computer device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the automatic matching method for data migration according to the first aspect when executing the computer program.
In addition, an embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and the computer program, when executed by a processor, causes the processor to execute the automatic matching method for data migration according to the first aspect.
The embodiment of the invention discloses an automatic matching method and device for data migration, computer equipment and a storage medium, wherein the method comprises the following steps: summarizing an original database system, uniformly defining each data type in each original database system to obtain a plurality of standard data types, establishing mapping between the data types in the original database system and the standard data types, and performing migration verification according to the mapping relation to generate a universal data processing professional knowledge base; counting a log table in the historical migration activity to obtain data pattern associated information, and constructing an auxiliary knowledge base for personalized data processing based on the data pattern associated information; acquiring a source data mode and actual source data of a source database system, reading and recording a data type in the source data mode, and forming a source data mode table to be migrated, wherein the data type comprises a field type and a field name; and reading environmental information from a target database system, and calculating a target data mode corresponding to the source data mode table to be migrated by adopting a probability deduction algorithm based on the universal data processing professional knowledge base and the personalized data processing auxiliary knowledge base. The method realizes the automatic mapping of the field types and names related to the heterogeneous source and target database systems; the method realizes the automatic generation of a new target database system based on a source database system; the automatic mapping to the existing target database system is realized based on the source database system; the automatic mapping to a plurality of target database systems is realized based on a plurality of source database systems, and manual operation is reduced; meanwhile, by means of industry common experience and personalized experience of actual work of a unit entity, the matching efficiency of the database system is greatly improved, meanwhile, daily data migration work results of users are accumulated, a personalized data processing auxiliary knowledge base is established, data migration work is efficiently completed, and service experience capable of continuously growing is provided.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic flow chart of an automatic matching method for data migration according to an embodiment of the present invention;
FIG. 2 is a schematic block diagram of an automatic matching apparatus for data migration according to an embodiment of the present invention;
FIG. 3 is a schematic block diagram of a computer device provided by an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
Referring to fig. 1, fig. 1 is a schematic flow chart illustrating an automatic matching method for data migration according to an embodiment of the present invention;
as shown in fig. 1, the method includes steps S101 to S104.
S101, summarizing original database systems, uniformly defining each data type in each original database system to obtain a plurality of standard data types, establishing mapping between the data types in the original database systems and the standard data types, and performing migration verification according to the mapping relation to generate a universal data processing professional knowledge base;
s102, carrying out statistics on a log table in historical migration activities to obtain data pattern associated information, and constructing an auxiliary knowledge base for personalized data processing based on the data pattern associated information;
s103, acquiring a source data mode and actual source data of a source database system, reading and recording a data type in the source data mode, and forming a source data mode table to be migrated, wherein the data type comprises a field type and a field name;
and S104, reading the environmental information from a target database system, and calculating a target data mode corresponding to the source data mode table to be migrated by adopting a probability deduction algorithm based on the universal data processing professional knowledge base and the personalized data processing auxiliary knowledge base.
In the embodiment of the application, on one hand, comprehensive database model knowledge in the industry, namely the universal data processing professional knowledge base, abstract or full source data and personalized pattern matching experience data, namely a personalized data processing auxiliary knowledge base are combined, and a reasonable probability selection algorithm is applied to automatically generate a target data pattern, so that the manual operation is reduced, the reliability of data migration is ensured, and the migration can be effectively ensured to be implemented as expected; on the other hand, by means of industry common experience and personalized experience of actual work of unit entities, the efficiency of pattern matching is greatly improved.
In a specific embodiment, after the step S104, the method further includes:
s201, revising and applying the target data mode and generating a corresponding application log;
s202, updating the personalized data processing auxiliary knowledge base according to the revised application log.
In this embodiment, the output target data pattern is revised by an expert in the business field, so that the revised target data pattern better meets the actual requirements, that is, meets the habitual expression of an individual unit, and the personalized data processing auxiliary knowledge base is updated according to the revised application log, that is, the application log is a log table.
For example, after a database of a supplier is updated, a database of a manufacturer also needs to update data of the database of the supplier correspondingly, that is, the updated database of the supplier is migrated into the database of the manufacturer, so that internal personnel of the manufacturer can conveniently view latest data information.
In a specific embodiment, the step S101 includes:
and automatically generating a corresponding data pattern of a verification target database system by referring to the data pattern in the verification source database system, verifying the conversion of each data pattern through the migration activities of a series of mutual source database systems and target database systems, correcting existing problems, and generating a universal data processing professional knowledge base.
Specifically, a uniform data type is established first, and the consistent expression of all data types is realized. The meaning of uniformly describing each field type of each database is to realize uniform interpretation of the data types, to obtain a common standard design of the data types, and to support definition of a class by a specific programming language. For example, the maximum minimum length, the maximum minimum value, the normal value and the abnormal value of the field types of the databases of different types and versions are instantiated and basically described. Examples are: a field type Varchar2, such as oracle12.2, that constitutes a character having a minimum value of 0x01, a maximum value of 0x127, a minimum length of 0, a maximum length of 255, an example of a normal value being "abcd 1234", and an example of an abnormal value being "abcd 1234"
Figure BDA0003224503980000061
Then, the data type mapping relation table to which the industry class is applied is arranged and is mapped with the unified data types one by one, for example, Java classes are used for mapping different data types of various databases such as NodeJs, Mysql, Oracle, MsSQl, Sybase, Avro and JSON databases.
And finally, according to the mapping relation from the data types of different databases to the unified data type, automatically generating a corresponding data pattern of a target database by referring to the data pattern in the source database system, migrating the data pattern to the Mysql database from the Oracle database and migrating the data pattern to the Oracle database from the Mysql database through a series of migration activities of the source database system and the target database system which are mutually related, verifying the conversion between each pattern and the data type, confirming that the basic capability of the automatic generation of the patterns for realizing the one-to-one mapping of the data patterns is established, correcting the existing problems one by one, and finally obtaining the universal data processing professional knowledge base.
In a specific embodiment, the step S102 includes:
s301, obtaining a source field type in a historical source database system and a target data type in a historical target database system, matching and recording the source field type and the target data type to form a field type mapping log table, and calculating to obtain a field type mapping probability table based on the field type mapping log table;
s302, acquiring a source field name in a historical source database system and a destination field name in a historical destination database system, matching and recording the source field name and the destination field name to form a field name mapping log table, and calculating to obtain a field name mapping probability table based on the field name mapping log table;
s303, acquiring a source data pattern name in a historical source database system and a target data pattern name in a historical target database system, associating and recording the source data pattern name, the target data pattern name, a field type and a field name to form a pattern field mapping log table, and calculating to obtain a pattern field mapping probability table based on the pattern field mapping log table;
s304, obtaining an associated field list with the relation from the source field name to the destination field name of the historical migration action, matching and recording the source data mode name, the destination data mode name and the associated field list to form a mode associated log list, and calculating to obtain the mode associated probability list based on the mode associated log list.
In step S301, the field type mapping log table is illustrated by the following table.
Table one:
Figure BDA0003224503980000071
as can be seen from the table I, since the field contents of the source field type Int and the destination data type Int are the same, the source field type Int of the historical source data system MySQL 5.6 can be successfully matched with the destination data type Int in the historical destination database system Oracle 12.2.
In the present embodiment, the field type mapping probability table is illustrated in the following table two.
Table two:
Figure BDA0003224503980000072
the probability table is based on statistics of the log table, and is known from table two: the source field type and the destination field type occur 100 times, and if there are 97 times of matching, the probability rate is 97%. Specifically, the source field type Int of the historical source data system MySQL 5.6 is successfully matched with the destination data type Int of the historical destination database system Oracle12.2 97 times, that is, the source field type Int of the subsequent to-be-migrated database is preferentially matched with the destination field type Int of the Oracle12.2 database.
In step S302, the field name mapping log table is exemplified by the following table three.
Table three:
source field name Source field type Name of destination field Type of destination field
Uin Int Uid Long
Uin Int UUID Varchar
The field names from the same-name field of the source database system to the destination database are recorded to form a log and a probability table, the probability table can increase the times of statistical mapping, and the probability of the destination field names appearing in the source field names is calculated.
In the present embodiment, the field name mapping probability table is illustrated by the following table four.
Table four:
Figure BDA0003224503980000081
counting the field name mapping log table, and counting according to the same source field name, source field type, destination field name and destination field type to obtain the number of times of combination; then, counting according to the same source field name and source field type to obtain the total number; the ratio of the number of combinations to the total number is the combination probability.
In step S303, the mode field mapping log table is illustrated by table five below.
Table five:
name of mode Direction Name of field Type of field
Customer Source Uin Int
User Purpose(s) to Uin Int
In the present embodiment, the mode field mapping probability table is illustrated in the following table six.
Table six:
Figure BDA0003224503980000082
in a source data pattern in a source database system and a destination data pattern in a destination database system in the history migration activity, a field name, a field type and a pattern name are associated to form a pattern field mapping log table and a pattern field mapping probability table.
In step S304, the pattern association log table is illustrated by the following table seven.
TABLE VII:
source schema name Name of destination schema Associated field list
Customer User Uin->Uin…
Customer Order Name->CustomerName…
In the present embodiment, the mode association probability table is illustrated by the following table eight.
Table eight:
Figure BDA0003224503980000083
Figure BDA0003224503980000091
wherein, the source mode name: in data migration, the mode name is used as a data source; name of the destination mode: a schema name as a destination in data migration; list of associated fields: the relationship of the source schema field name to the destination schema field name with a particular migration action.
In a specific embodiment, the step S103 includes:
s401, acquiring a source data mode and actual source data of a source database system;
s402, reading and recording the data type in the source data mode to form a source data mode table to be migrated;
s403, based on the source data mode table to be migrated, sampling or traversing actual source data of the source database system in a full amount, recording and identifying a critical value, and describing the data type in the source data mode table to be migrated according to the critical value.
Extracting a data pattern of a source database system, reading and recording field data types and field names in the extracted data pattern, and forming a source data pattern table to be migrated; then, sampling or traversing the actual data of the corresponding mode of the source database system in a full quantity manner, recording critical values, namely the maximum value and the minimum value for identification, and further enriching the data type description in the source data mode table. Actual data type requirements may be given, for example, when the definition of the source data schema does not match the actual application, such as when the definition is too broad. For example, if a variable length field varchar is defined as 256 and the actual data length is only 128 at maximum, the destination field length output may be considered as 128.
In a specific embodiment, the step S104 includes:
s501, obtaining environment information of the target database system, wherein the environment information comprises a database type, a database version and a preset data mode;
s502, outputting a full specification list of data types allowed by the target database system, and recording a full capability model table of the target database system for the data mode, wherein the full capability model table comprises a data mode naming rule, a field naming rule and a data mode size constraint.
When some databases are in operation, the databases are instantiated in combination with specific configuration parameters, for example, character codes and the like are different, so that the interpretation of the data types by the databases can be influenced; the full capability data type specification list can determine whether the data type based on the source can be directly migrated successfully or not, because different databases have different ranges and different capabilities for the data type definition.
In a specific embodiment, the step S104 further includes:
and acquiring a source data mode based on the source database system, extracting a related probability table from the personalized data processing auxiliary knowledge base, calculating the extracted probability table based on a Bayesian algorithm to obtain a corresponding posterior probability table, and determining a corresponding target data mode.
In this embodiment, the personalized data processing auxiliary knowledge base establishes prior probabilities of field types, field names, pattern names, and pattern field name relationships, and in a specific migration suggestion, a bayesian algorithm is used to calculate a posterior probability based on the prior probabilities to generate a mapping relationship from a source data pattern to a destination pattern. For example, in an actual situation, the type, the name, the mode name, and the mode-to-type relationship are obtained from a source database system, the prior probabilities with the purpose are extracted from the personalized data processing auxiliary knowledge base, and the posterior probabilities are calculated by using a bayesian algorithm, so that the optimal solution is given by the ranking.
Specifically, in the actual use process, a target mode optimization algorithm is selected, the minimum matching and the maximum matching are preset, the universal data processing professional knowledge base and the personalized data processing auxiliary knowledge base are matched with rules (including and/or optimization), a probability deduction algorithm including a Bayesian algorithm is adopted, and a one-to-one, many-to-one and one-to-many target data mode with the highest probability is given to the source data mode based on a type probability optimization algorithm, a naming probability optimization algorithm and a mode probability optimization algorithm.
And outputting the target data mode, namely firstly providing a recommended target data mode for the user, ensuring that data can be migrated according to the mode, guiding the user to adjust the data mode, namely, the user actively adjusts the mode, and providing deduction capability by the system to ensure that the adjusted data can fall to the ground. The method can help the client to generate the target data mode with less adjustment or even no adjustment, and can deduce in real time, ensure that the adjusted mode is correct, prompt possible disputes and ensure that the migration result is expected. And recording modification and application logs for the confirmed adjustment, so as to facilitate the establishment of an auxiliary knowledge base for processing personalized data.
The method in the embodiment of the application realizes the automatic mapping of the field types and names related to the heterogeneous source and target database systems; the method realizes the automatic generation of a new target database system based on a source database system; the automatic mapping to the existing target database system is realized based on the source database system; the automatic mapping to a plurality of target database systems is realized based on a plurality of source database systems; meanwhile, the matching efficiency of the database system is greatly improved by means of industry common experience and individualized experience of actual work of unit entities; meanwhile, the reliability of data migration is ensured, the efficiency of database system migration is improved, and the migration can be implemented as expected.
The embodiment of the invention also provides an automatic matching device for data migration, which is used for executing any embodiment of the automatic matching method for data migration. Specifically, referring to fig. 2, fig. 2 is a schematic block diagram of an automatic matching apparatus for data migration according to an embodiment of the present invention.
As shown in fig. 2, the automatic matching apparatus 600 for data migration includes:
the universal data processing professional knowledge base unit 601 is used for summarizing the original database systems, uniformly defining each data type in each original database system to obtain a plurality of standard data types, establishing mapping between the data types in the original database systems and the standard data types, and performing migration verification according to the mapping relation to generate a universal data processing professional knowledge base;
the personalized data processing auxiliary knowledge base unit 602 is configured to count a log table in the historical migration activity to obtain data pattern association information, and construct a personalized data processing auxiliary knowledge base based on the data pattern association information;
a source data schema table unit 603 to be migrated, configured to obtain a source data schema and actual source data of a source database system, read and record a data type in the source data schema, and form a source data schema table to be migrated, where the data type includes a field type and a field name;
and a destination data pattern calculating unit 604, configured to read environment information from a destination database system, and calculate a destination data pattern corresponding to the source data pattern table to be migrated by using a probability deduction algorithm based on the universal data processing professional knowledge base and the personalized data processing auxiliary knowledge base.
The device and the method realize the automatic mapping of the field types and the names related to the heterogeneous source and the target database system; the method realizes the automatic generation of a new target database system based on a source database system; the automatic mapping to the existing target database system is realized based on the source database system; the automatic mapping to a plurality of target database systems is realized based on a plurality of source database systems; meanwhile, the matching efficiency of the database system is greatly improved by means of industry common experience and individualized experience of actual work of unit entities; meanwhile, the reliability of data migration is ensured, the efficiency of database system migration is improved, and the migration can be implemented as expected.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
The above-described automatic matching means for data migration may be implemented in the form of a computer program that can be run on a computer device as shown in fig. 3.
Referring to fig. 3, fig. 3 is a schematic block diagram of a computer device according to an embodiment of the present invention. The computer device 1100 is a server, and the server may be an independent server or a server cluster including a plurality of servers.
Referring to fig. 3, the computer device 1100 includes a processor 1102, memory and network interface 1105 connected by a system bus 1101, where the memory may include non-volatile storage media 1103 and internal memory 1104.
The non-volatile storage medium 1103 may store an operating system 11031 and computer programs 11032. The computer program 11032, when executed, may cause the processor 1102 to perform an auto-matching method of data migration.
The processor 1102 is configured to provide computing and control capabilities that support the operation of the overall computing device 1100.
The internal memory 1104 provides an environment for running the computer program 11032 in the non-volatile storage medium 1103, and the computer program 11032, when executed by the processor 1102, may cause the processor 1102 to perform an automatic matching method of data migration.
The network interface 1105 is used for network communications, such as to provide for the transmission of data information. Those skilled in the art will appreciate that the configuration shown in fig. 3 is a block diagram of only a portion of the configuration associated with aspects of the present invention and is not intended to limit the computing device 1100 to which aspects of the present invention may be applied, and that a particular computing device 1100 may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
Those skilled in the art will appreciate that the embodiment of a computer device illustrated in fig. 3 does not constitute a limitation on the specific construction of the computer device, and in other embodiments a computer device may include more or fewer components than those illustrated, or some components may be combined, or a different arrangement of components. For example, in some embodiments, the computer device may only include a memory and a processor, and in such embodiments, the structures and functions of the memory and the processor are consistent with those of the embodiment shown in fig. 3, and are not described herein again.
It should be appreciated that in embodiments of the present invention, the Processor 1102 may be a Central Processing Unit (CPU), and the Processor 1102 may also be other general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, etc. Wherein a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
In another embodiment of the invention, a computer-readable storage medium is provided. The computer readable storage medium may be a non-volatile computer readable storage medium. The computer readable storage medium stores a computer program, wherein the computer program, when executed by a processor, implements the automatic matching method for data migration of an embodiment of the present invention.
The storage medium is an entity and non-transitory storage medium, and may be various entity storage media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a magnetic disk, or an optical disk.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described apparatuses, devices and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
While the invention has been described with reference to specific embodiments, the invention is not limited thereto, and various equivalent modifications and substitutions can be easily made by those skilled in the art within the technical scope of the invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. An automatic matching method for data migration is characterized by comprising the following steps:
summarizing an original database system, uniformly defining each data type in each original database system to obtain a plurality of standard data types, establishing mapping between the data types in the original database system and the standard data types, and performing migration verification according to the mapping relation to generate a universal data processing professional knowledge base;
counting a log table in the historical migration activity to obtain data pattern associated information, and constructing an auxiliary knowledge base for personalized data processing based on the data pattern associated information;
acquiring a source data mode and actual source data of a source database system, reading and recording a data type in the source data mode, and forming a source data mode table to be migrated, wherein the data type comprises a field type and a field name;
and reading environmental information from a target database system, and calculating a target data mode corresponding to the source data mode table to be migrated by adopting a probability deduction algorithm based on the universal data processing professional knowledge base and the personalized data processing auxiliary knowledge base.
2. The method for automatically matching data migration according to claim 1, wherein after reading the environmental information from the destination database system and calculating the destination data pattern by using a probability deduction algorithm based on the universal data processing professional knowledge base and the personalized data processing auxiliary knowledge base, the method comprises:
revising and applying the target data mode and generating a corresponding application log;
and updating the personalized data processing auxiliary knowledge base according to the revised application log.
3. The automatic matching method for data migration according to claim 1, wherein the performing migration verification according to the mapping relationship to generate a universal data processing professional knowledge base includes:
and automatically generating a corresponding data pattern of a verification target database system by referring to the data pattern in the verification source database system, verifying the conversion of each data pattern through the migration activities of a series of mutual source database systems and target database systems, correcting existing problems, and generating a universal data processing professional knowledge base.
4. The automatic matching method for data migration according to claim 1, wherein the step of performing statistics on a log table in the historical migration activity to obtain data pattern association information, and constructing a personalized data processing auxiliary knowledge base based on the data pattern association information includes:
acquiring a source field type in a historical source database system and a target data type in a historical target database system, matching and recording the source field type and the target data type to form a field type mapping log table, and calculating to obtain a field type mapping probability table based on the field type mapping log table;
acquiring a source field name in a historical source database system and a destination field name in a historical destination database system, matching and recording the source field name and the destination field name to form a field name mapping log table, and calculating to obtain a field name mapping probability table based on the field name mapping log table;
acquiring a source data pattern name in a historical source database system and a target data pattern name in a historical target database system, associating and recording the source data pattern name, the target data pattern name, a field type and a field name to form a pattern field mapping log table, and calculating to obtain a pattern field mapping probability table based on the pattern field mapping log table;
acquiring an associated field list of the relation from a source field name to a destination field name with historical migration action, matching and recording the source data mode name, the destination data mode name and the associated field list to form a mode associated log list, and calculating to obtain the mode associated probability list based on the mode associated log list.
5. The method according to claim 1, wherein the obtaining a source data schema and actual source data of a source database system, reading and recording a data type in the source data schema, and forming a source data schema table to be migrated, wherein the data type includes a field type and a field name, and includes:
acquiring a source data mode and actual source data of a source database system;
reading and recording the data type in the source data mode to form a source data mode table to be migrated;
and sampling or traversing actual source data of the source database system in a full quantity based on the source data mode table to be migrated, recording and identifying a critical value, and describing the data type in the source data mode table to be migrated according to the critical value.
6. The method for automatic matching of data migration according to claim 1, wherein the reading of the environment information from the destination database system comprises:
acquiring environment information of the target database system, wherein the environment information comprises a database type, a database version and a preset data mode;
and outputting a full specification list of data types allowed by the target database system, and recording a full capability model table of the target database system for the data mode, wherein the full capability model table comprises a data mode naming rule, a field naming rule and a data mode size constraint.
7. The method according to claim 1, wherein the reading of the environmental information from the destination database system and the calculation of the destination data pattern corresponding to the source data pattern table to be migrated using a probability deduction algorithm based on the universal data processing professional knowledge base and the personalized data processing auxiliary knowledge base comprises:
and acquiring a source data mode based on the source database system, extracting a related probability table from the personalized data processing auxiliary knowledge base, calculating the extracted probability table based on a Bayesian algorithm to obtain a corresponding posterior probability table, and determining a corresponding target data mode.
8. An automatic matching device for data migration, comprising:
the universal data processing professional knowledge base unit is used for summarizing the original database systems, uniformly defining each data type in each original database system to obtain a plurality of standard data types, establishing mapping between the data types in the original database systems and the standard data types, and performing migration verification according to the mapping relation to generate a universal data processing professional knowledge base;
the system comprises an individualized data processing auxiliary knowledge base unit, a data pattern association information acquisition unit and an individualized data processing auxiliary knowledge base unit, wherein the individualized data processing auxiliary knowledge base unit is used for counting a log table in historical migration activities to obtain data pattern association information and establishing an individualized data processing auxiliary knowledge base based on the data pattern association information;
the system comprises a source data mode table unit to be migrated, a source data mode table unit and a source data model table unit, wherein the source data mode table unit to be migrated is used for acquiring a source data mode and actual source data of a source database system, reading and recording a data type in the source data mode and forming a source data mode table to be migrated, and the data type comprises a field type and a field name;
and the calculation target data mode unit is used for reading the environmental information from a target database system, and calculating a target data mode corresponding to the source data mode table to be migrated by adopting a probability deduction algorithm based on the universal data processing professional knowledge base and the personalized data processing auxiliary knowledge base.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the auto-matching method of data migration according to any one of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed by a processor, causes the processor to execute the automatic matching method of data migration according to any one of claims 1 to 7.
CN202110967081.2A 2021-08-23 2021-08-23 Automatic matching method and device for data migration, computer equipment and storage medium Pending CN113656377A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110967081.2A CN113656377A (en) 2021-08-23 2021-08-23 Automatic matching method and device for data migration, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110967081.2A CN113656377A (en) 2021-08-23 2021-08-23 Automatic matching method and device for data migration, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN113656377A true CN113656377A (en) 2021-11-16

Family

ID=78480699

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110967081.2A Pending CN113656377A (en) 2021-08-23 2021-08-23 Automatic matching method and device for data migration, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113656377A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114238258A (en) * 2021-11-30 2022-03-25 企查查科技有限公司 Database data processing method and device, computer equipment and storage medium
CN115543485A (en) * 2022-10-24 2022-12-30 清华大学 Data conversion configuration generation method and device, computer equipment and medium
CN115952185A (en) * 2023-03-10 2023-04-11 布比(北京)网络技术有限公司 Data processing method and device, equipment and storage medium

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114238258A (en) * 2021-11-30 2022-03-25 企查查科技有限公司 Database data processing method and device, computer equipment and storage medium
CN114238258B (en) * 2021-11-30 2024-02-20 企查查科技股份有限公司 Database data processing method, device, computer equipment and storage medium
CN115543485A (en) * 2022-10-24 2022-12-30 清华大学 Data conversion configuration generation method and device, computer equipment and medium
CN115543485B (en) * 2022-10-24 2023-06-30 清华大学 Data conversion configuration generation method, device, computer equipment and medium
CN115952185A (en) * 2023-03-10 2023-04-11 布比(北京)网络技术有限公司 Data processing method and device, equipment and storage medium
CN115952185B (en) * 2023-03-10 2023-06-30 布比(北京)网络技术有限公司 Data processing method and device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN113656377A (en) Automatic matching method and device for data migration, computer equipment and storage medium
CN103177068B (en) According to the system and method for existence compatible rule merging source record
US11494688B2 (en) Learning ETL rules by example
US8782081B2 (en) Query template definition and transformation
US10192187B2 (en) Comparison of client and benchmark data
US20170154057A1 (en) Efficient consolidation of high-volume metrics
US20190155801A1 (en) Systems and methods for distributed data validation
US20100262625A1 (en) Method and system for fine-granularity access control for database entities
US8615526B2 (en) Markup language based query and file generation
US20080208918A1 (en) Efficient data handling representations
US7720831B2 (en) Handling multi-dimensional data including writeback data
US10394805B2 (en) Database management for mobile devices
US20060294159A1 (en) Method and process for co-existing versions of standards in an abstract and physical data environment
US11379466B2 (en) Data accuracy using natural language processing
US11960482B1 (en) Systems and methods for extracting data views from heterogeneous sources
US11775506B2 (en) Quality control test transactions for shared databases of a collaboration tool
US9501505B2 (en) System of and method for entity representation splitting without the need for human interaction
CN110569313B (en) Model table level judging method and device of data warehouse
CN114328759A (en) Data construction and management method and terminal of data warehouse
US9489423B1 (en) Query data acquisition and analysis
US20240169027A1 (en) Systems and methods for machine learning models for performance measurement
CN111581212B (en) Data storage method, system, server and storage medium of relational database
CN110737727A (en) data processing method and system
US8538792B1 (en) Method and system for determining total cost of ownership
US20210397745A1 (en) Data providing server device and data providing method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination