CN110674220B - Data heterogeneous method, device and equipment - Google Patents

Data heterogeneous method, device and equipment Download PDF

Info

Publication number
CN110674220B
CN110674220B CN201910911575.1A CN201910911575A CN110674220B CN 110674220 B CN110674220 B CN 110674220B CN 201910911575 A CN201910911575 A CN 201910911575A CN 110674220 B CN110674220 B CN 110674220B
Authority
CN
China
Prior art keywords
data
heterogeneous
rule
isomerism
input source
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910911575.1A
Other languages
Chinese (zh)
Other versions
CN110674220A (en
Inventor
杨森
王彭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Enyike Beijing Data Technology Co ltd
Original Assignee
Enyike Beijing Data Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Enyike Beijing Data Technology Co ltd filed Critical Enyike Beijing Data Technology Co ltd
Priority to CN201910911575.1A priority Critical patent/CN110674220B/en
Publication of CN110674220A publication Critical patent/CN110674220A/en
Application granted granted Critical
Publication of CN110674220B publication Critical patent/CN110674220B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2358Change logging, detection, and notification
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A method, apparatus, device and computer-readable storage medium for data heterogeneity, the method comprising: setting data heterogeneous rules according to data input source information and data output source information; according to the data isomerism rule, isomerism is carried out on historical data, a monitoring database is used for updating a log binLog, and isomerism is carried out on real-time data according to the data isomerism rule; and outputting the data obtained by the isomerism to a search engine. The data heterogeneous stability and real-time performance are improved, the data heterogeneous configurability is supported, and the data availability is guaranteed.

Description

Data heterogeneous method, device and equipment
Technical Field
The present disclosure relates to the field of data processing, and more particularly, to a method, an apparatus, a device, and a computer-readable storage medium for data heterogeneous.
Background
In the current big data era, fragmentation and decentralization exist in the data distribution situation. Data isomerism is a premise of data searching and data analysis, most data are stored in a structured database by structured numbers at present, such as MySql and the like, the structured data have natural advantages in realizing most business logics, but short boards of the structured data are amplified under the business scenes of searching, recommending, data reporting and the like of a large amount of data, and the query speed shows the reduction of a proportional function along with the increase of the data.
In the related data heterogeneous technology, timing periodic data synchronization heterogeneous is generally adopted. The timing regular data synchronization isomerism is adopted, and full and incremental data isomerism is carried out in a time synchronization mode, namely data from the last synchronization time to the current synchronization time are synchronized, and data consistency is guaranteed. However, this method does not have real-time performance, and the structured database is dragged down under a large data volume, which affects the stability of the normal service system.
Disclosure of Invention
The application provides a data heterogeneous method, a data heterogeneous device, data heterogeneous equipment and a computer readable storage medium, so that stability and instantaneity of data heterogeneous are improved.
The embodiment of the application provides a data heterogeneous method, which comprises the following steps:
setting data heterogeneous rules according to data input source information and data output source information;
according to the data isomerism rule, isomerism is conducted on historical data, a log binLog is updated through a monitoring database, and isomerism is conducted on real-time data according to the data isomerism rule;
and outputting the data obtained by isomerism to a search engine.
In an embodiment, the data input source information includes a data input source table structure, the data output source information includes a data output source index, and the setting of the data heterogeneous rule according to the data input source information and the data output source information includes:
the method comprises the steps of obtaining a data input source table structure, obtaining a data output source index, carrying out data input source field mapping on each field in the data output source index according to the data input source table structure, generating a mapping relation rule model, and storing the mapping relation rule model in an unstructured database.
In one embodiment, the mapping relationship rule model includes a plurality of rule sets, and the data input source table structure is in a many-to-many relationship with the rule sets.
In an embodiment, the isomerizing the historical data according to the data isomerization rule includes:
and creating a historical data initialization task Job according to the data heterogeneous rule, setting the final modification time of the initialized data, and executing the Job.
In an embodiment, the method further comprises:
and after the Job finishes, setting an offset of real-time consumption according to the last modification time of the data, starting real-time data isomerism, and finishing historical data isomerism when the data consumption time is equal to the latest data time.
In an embodiment, the isomerizing, by monitoring the binLog, the real-time data according to the data isomerization rule includes:
the method comprises the steps of acquiring bin logs by monitoring the bin logs, setting a unique data ID for each bin log, sending the data ID and corresponding real-time data to a kafka data pipeline through kafka information, and carrying out data isomerism on the real-time data through a plurality of parallel heterogeneous services according to a data isomerism rule.
In an embodiment, the method further comprises:
and managing the plurality of parallel heterogeneous services through the Zookeeper.
In an embodiment, the managing the plurality of parallel heterogeneous services through Zookeeper includes:
when the heterogeneous service is increased or decreased, closing a data heterogeneous switch through a Zookeeper, re-hashing the kafka message, and reporting the service state of the heterogeneous service;
and after all the heterogeneous services are in a non-data processing state, opening a data heterogeneous switch through the Zookeeper to perform data heterogeneous.
In an embodiment, the method further comprises:
the real-time data is reprocessed for playback by Zookeeper in accordance with the offset of kafka.
The embodiment of the present application further provides a data heterogeneous device, including:
the data model rule module is used for setting data heterogeneous rules according to the information of the data input source and the information of the data output source;
the historical data isomerism module is used for isomerising the historical data according to the data isomerism rule;
and the data distribution module is used for updating the log binLog by monitoring the database, carrying out isomerism on the real-time data according to the data isomerism rule and outputting data obtained by isomerism to the search engine.
An embodiment of the present application further provides a device for data heterogeneous, including: the data heterogeneous method comprises the following steps of storing, processing and computer programs which are stored on the storing and can run on the processor, and the processor realizes the data heterogeneous method when executing the programs.
The embodiment of the application also provides a computer-readable storage medium, which stores computer-executable instructions, wherein the computer-executable instructions are used for executing the data heterogeneous method.
Compared with the related art, the application comprises the following steps: setting data heterogeneous rules according to data input source information and data output source information; according to the data isomerism rule, isomerism is carried out on historical data, a monitoring database is used for updating a log binLog, and isomerism is carried out on real-time data according to the data isomerism rule; and outputting the data obtained by isomerism to a search engine. The data heterogeneous stability and real-time performance are improved, the data heterogeneous configurability is supported, and the data availability is guaranteed.
Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the application. Other advantages of the application may be realized and attained by the instrumentalities and combinations particularly pointed out in the specification, claims, and drawings.
Drawings
The accompanying drawings are included to provide an understanding of the present disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the examples serve to explain the principles of the disclosure and not to limit the disclosure.
Fig. 1 is a flowchart of a data isomerization method according to an embodiment of the present application;
FIG. 2 is a flow chart of a method for data isomerism according to an application example of the present application;
FIG. 3 is a data flow diagram of an example application of the present application;
fig. 4 is a schematic diagram of a data heterogeneous device according to an embodiment of the present application.
Detailed Description
The description herein describes embodiments, but is intended to be exemplary, rather than limiting and it will be apparent to those of ordinary skill in the art that many more embodiments and implementations are possible that are within the scope of the embodiments described herein. Although many possible combinations of features are shown in the drawings and discussed in the detailed description, many other combinations of the disclosed features are possible. Any feature or element of any embodiment may be used in combination with or instead of any other feature or element in any other embodiment, unless expressly limited otherwise.
The present application includes and contemplates combinations of features and elements known to those of ordinary skill in the art. The embodiments, features and elements disclosed in this application may also be combined with any conventional features or elements to form a unique inventive concept as defined by the claims. Any feature or element of any embodiment may also be combined with features or elements from other inventive aspects to form yet another unique inventive aspect, as defined by the claims. Thus, it should be understood that any of the features shown and/or discussed in this application may be implemented alone or in any suitable combination. Accordingly, the embodiments are not to be restricted except in light of the attached claims and their equivalents. Furthermore, various modifications and changes may be made within the scope of the appended claims.
Further, in describing representative embodiments, the specification may have presented the method and/or process as a particular sequence of steps. However, to the extent that the method or process does not rely on the particular order of steps set forth herein, the method or process should not be limited to the particular sequence of steps described. Other orders of steps are possible as will be understood by those of ordinary skill in the art. Therefore, the particular order of the steps set forth in the specification should not be construed as limitations on the claims. Furthermore, the claims directed to the method and/or process should not be limited to the performance of their steps in the order written, and one skilled in the art can readily appreciate that the sequences may be varied and still remain within the spirit and scope of the embodiments of the present application.
At present, most heterogeneous strategies cannot simultaneously solve the problems of data heterogeneous instantaneity, data consistency, configurable data heterogeneous rules, heterogeneous data switching, system disaster tolerance, data recovery and the like, and have defects in data heterogeneous availability and stability.
The embodiment of the application provides a new data heterogeneous method from the aspects of stability and real-time performance, and on the basis, a data heterogeneous strategy is provided, so that the stability and the real-time performance of data heterogeneous are improved, the data heterogeneous configurability is supported, the disaster recovery strategy is optimized, the data recovery is ensured, and the data availability is ensured.
As shown in fig. 1, a method for data heterogeneous according to an embodiment of the present application includes:
step 101, setting data heterogeneous rules according to data input source information and data output source information.
In one embodiment, the data input source information includes a data input source table structure, the data output source information includes a data output source index, and the step 101 includes:
the method comprises the steps of obtaining a data input source table structure and a data output source Index (Index), carrying out data input source field mapping on each field in the data output source Index according to the data input source table structure, generating a mapping relation rule model, and storing the mapping relation rule model in an unstructured database.
The unstructured database may be Redis.
In this embodiment, based on the data input source information and the data output source information, the heterogeneous rules may be set on a page, an Index of an output source is selected, data input source field mapping is performed on each field of the Index, and finally, a Map-structured mapping relationship rule model is generated and stored in the unstructured database Redis, where the Map-structured mapping relationship rule model is used for facilitating modification of rules while data heterogeneous is performed, and a data rule creation record is stored in the structured database Mysql, so that a user can view and configure subsequent processes conveniently.
In one embodiment, the mapping relationship rule model includes a plurality of rule sets, and the data input source table structure is in a many-to-many relationship with the rule sets.
That is, one data table structure belongs to a plurality of rule groups, one rule group comprises a plurality of table structures, and one rule group is heterogeneous to form a data structure body comprising parent-child relationships and the like.
The rule set simultaneously supports the rules of multiple data input sources, multiple tables and multiple fields which are unified into one data output source, and is flexible and changeable. That is, data from multiple data input sources may be mapped to output sources by a set of rules.
And 102, isomerizing historical data according to the data isomerization rule, and isomerizing real-time data according to the data isomerization rule by monitoring a bin log (database update log).
In an embodiment, the isomerizing the historical data according to the data isomerization rule includes:
and creating historical data according to the data heterogeneous rule to initialize Job (task), setting the final modification time of the initialized data, and executing the Job.
Wherein Job is a database timing task.
Job of historical data is used for heterogeneous historical data, and data before heterogeneous is guaranteed not to be lost.
The binLog is a database update log, is a file in a binary format, and is used for recording SQL statement information of a user for updating the database.
In an embodiment, the isomerizing the real-time data according to the data isomerization rule by monitoring the binLog includes:
the method comprises the steps of acquiring bin logs by monitoring the bin logs, setting a unique data ID for each bin log, sending the data ID and corresponding real-time data to a kafka data pipeline through a kafka message (message), and carrying out data isomerism on the real-time data through a plurality of parallel heterogeneous services according to a data isomerism rule.
The rule for the unique ID is database link + database name + table name + primary key ID, set into the ID of the kafka message, sent into the kafka data pipe.
In the embodiment, the binLog log can be monitored based on Mysql, simple data filtering can be performed, operations which do not affect data are mainly filtered, and the data processing amount is reduced. And generating the Id of the data, aiming at classifying the data, only needing to process the sequence of one row of records of one table in one library in order to ensure the sequence of the data, and setting an ID number for each binLog in order to meet the requirement, wherein the ID number rule database is linked with the database name, the table name and the primary key ID.
In one embodiment, after the Job is finished, the offset of real-time consumption is set according to the last modification time of the data, real-time data isomerism is started, and historical data isomerism is finished when the data consumption time is equal to the latest data time.
The offset consumed in real time is the last modification time of the initialized data, and when the data consumption time is equal to the latest data time, the data consumption time is referred to the latest data time, i.e. the heterogeneous of the historical data is completed, and the new data (i.e. the real-time heterogeneous data) is normally operated.
And setting an alias of the new data (namely the real-time heterogeneous data) which is the same as the historical heterogeneous data, deleting the alias of the historical heterogeneous data, starting to use the new data, and deleting the waste data.
And pulling the kafka message, and carrying out data isomerism on the data according to the data rule.
And step 103, outputting the data obtained by the isomerism to a search engine.
Wherein the search engine may be an Elasticsearch.
In an embodiment, the method further comprises: and managing the plurality of parallel heterogeneous services through the Zookeeper.
When the heterogeneous service is increased or decreased, closing a data heterogeneous switch through a Zookeeper, re-hashing the kafka message, and reporting the service state of the heterogeneous service; and after all the heterogeneous services are in a non-data processing state, opening a data heterogeneous switch through the Zookeeper to perform data heterogeneous.
And realizing the heterogeneous registry and heterogeneous service management by using the Zookeeper. Each time the heterogeneous service is started, the heterogeneous service is registered with a heterogeneous registration center, a switch for judging whether data heterogeneous is available or not is stored in the management of the heterogeneous service, and the current state of each service is as follows: data isomerism is in progress, and no data processing is performed. And when the heterogeneous services are increased or decreased, triggering data heterogeneous closing, hashing the kafka message again, triggering service state reporting, and when all the services are in a data processing-free state, opening a data heterogeneous switch to start data heterogeneous. The problem that the data ID is distributed on different machines and then sent first due to the fact that the kafka message is re-hashed in the service increasing and decreasing process is solved.
In an embodiment, the method further comprises:
the real-time data is reprocessed for playback by Zookeeper in accordance with the offset of kafka.
In this embodiment, Zookeeper manages the offset of kafka, and data can be restored to a certain time point, and data isomerism can be performed again.
Wherein the data can be subjected to playback reprocessing based on the offset of kafka, and by setting the offset to kafka in a format that is a point in time, playback to the set time can be consumed with a message, thereby re-consuming refreshing at a time.
The embodiment of the application can improve the stability and effectiveness of heterogeneous services, effectively reduce the pressure of data isomerism on a structured database, effectively adapt to data change by using a data rule configuration mode, further effectively improve smooth switching of new and old data, and ensure system compatibility of data change.
The following is a description of an application example.
In the application example, kafka is used as a carrier of data circulation, Zookeeper is used as a data heterogeneous processing service and the management of offset of kafka, and Elasticsearch is used as a carrier of a heterogeneous result.
The Java language is used, data are distributed to Kafka, and the data heterogeneous processing service monitors Kafka messages to perform washing and heterogeneous processing on the data to Elasticissearch.
As shown in fig. 2 and 3, the method comprises the following steps:
step 201, setting data input source information.
Wherein, the data input source can be a plurality of.
Step 202, obtain the data input source table structure, execute step 205.
Step 203, setting data output source information.
In step 204, the data input source Index structure is obtained, and step 205 is executed.
Wherein steps 201 to 202 and steps 203 to 204 are executed in parallel.
Step 205, setting mapping rules.
And setting a mapping rule according to the data input source table structure and the data input source Index structure, wherein the mapping rule is stored in an unstructured database Redis.
Step 206, create historical data Job.
Wherein the historical data Job is created according to the mapping rule
And step 207, creating a real-time data heterogeneous task.
And the heterogeneous tasks of the data are real-time according to the mapping rule.
And step 208, starting the real-time heterogeneous rule and verifying the correctness of the real-time heterogeneous.
Step 209 starts historical data Job.
Wherein, the initialized data last modification time (namely Job opening time) is set, and Job is executed.
Step 210, Job, historical data is complete.
And step 211, returning the real-time data to the starting time of the historical data Job, and performing real-time isomerism again.
Step 212, the new rule mapping data is validated.
The new rule mapping data refers to current real-time heterogeneous data.
Step 213, stop the old rule heterogeneous data.
The old regular heterogeneous data refers to the real-time heterogeneous data which is performed before.
In step 214, the old rule data Index is deleted.
In step 215, the heterogeneous handover is completed.
As shown in fig. 4, the data heterogeneous apparatus according to the embodiment of the present invention includes:
the data model rule module 41 is used for setting data heterogeneous rules according to the information of the data input source and the information of the data output source;
the historical data isomerism module 42 is used for isomerising the historical data according to the data isomerism rule;
and the data distribution module 43 is configured to update the log binLog by monitoring the database, perform isomerism on the real-time data according to the data isomerism rule, and output data obtained through isomerism to the search engine.
In an embodiment, the data input source information includes a data input source table structure, the data output source information includes a data output source index, and the data model rule module 41 is configured to:
the method comprises the steps of obtaining a data input source table structure, obtaining a data output source index, carrying out data input source field mapping on each field in the data output source index according to the data input source table structure, generating a mapping relation rule model, and storing the mapping relation rule model in an unstructured database.
In one embodiment, the mapping relationship rule model includes a plurality of rule sets, and the data input source table structure is in a many-to-many relationship with the rule sets.
In an embodiment, the historical data heterogeneous module 42 is configured to:
and creating a historical data initialization task Job according to the data heterogeneous rule, setting the final modification time of the initialized data, and executing the Job.
In an embodiment, the data distribution module 43 is configured to:
the method comprises the steps of acquiring bin logs by monitoring the bin logs, setting a unique data ID for each bin log, sending the data IDs and corresponding real-time data to a kafka data pipeline through kafka messages, and carrying out data isomerism on the real-time data through a plurality of parallel isomerism services according to a data isomerism rule.
In one embodiment, the apparatus further comprises:
a data heterogeneous processing service management module 44, configured to:
and managing the plurality of parallel heterogeneous services through the Zookeeper.
In an embodiment, the data heterogeneous processing service management module is configured to:
when the heterogeneous service is increased or decreased, closing a data heterogeneous switch through a Zookeeper, re-hashing the kafka message, and reporting the service state of the heterogeneous service;
and after all the heterogeneous services are in a non-data processing state, opening a data heterogeneous switch through the Zookeeper to perform data heterogeneous.
In one embodiment, the apparatus further comprises:
and the data playback recovery module 45 is configured to perform playback reprocessing on the real-time data according to the offset of kafka by the Zookeeper.
In an embodiment, the data playback recovery module 45 is further configured to set an offset for real-time consumption according to the last modification time of the data after the Job is completed, start heterogeneous real-time data, and complete heterogeneous historical data when the data consumption time is equal to the latest data time.
The data heterogeneous stability and the real-time performance are improved, the data heterogeneous configurability is supported, and the data availability is guaranteed.
The embodiment of the present application further provides a device for data heterogeneous, including: the data heterogeneous method comprises the following steps of storing, processing and computer programs which are stored on the storing and can run on the processor, and the processor realizes the data heterogeneous method when executing the programs.
The embodiment of the application also provides a computer-readable storage medium, which stores computer-executable instructions, wherein the computer-executable instructions are used for the data heterogeneous method.
In this embodiment, the storage medium may include, but is not limited to: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
It will be understood by those of ordinary skill in the art that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. In a hardware implementation, the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be performed by several physical components in cooperation. Some or all of the components may be implemented as software executed by a processor, such as a digital signal processor or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as is well known to those skilled in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by a computer. In addition, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as known to those skilled in the art.

Claims (8)

1. A method for data heterogeneity, comprising:
setting data heterogeneous rules according to data input source information and data output source information;
according to the data isomerism rule, isomerism is conducted on historical data, a log binLog is updated through a monitoring database, and isomerism is conducted on real-time data according to the data isomerism rule;
outputting the data obtained by isomerism to a search engine;
the data input source information includes a data input source table structure, the data output source information includes a data output source index, and the setting of the data heterogeneous rule according to the data input source information and the data output source information includes:
acquiring a data input source table structure and a data output source index, performing data input source field mapping on each field in the data output source index according to the data input source table structure, generating a mapping relation rule model, and storing the mapping relation rule model in an unstructured database;
the mapping relation rule model comprises a plurality of rule groups, and the data input source table structure and the rule groups are in a many-to-many relation;
the method further comprises the following steps: managing the plurality of parallel heterogeneous services through a Zookeeper, comprising:
when the heterogeneous service is increased or decreased, closing a data heterogeneous switch through a Zookeeper, re-hashing the kafka message, and reporting the service state of the heterogeneous service;
and after all the heterogeneous services are in a non-data processing state, opening a data heterogeneous switch through a Zookeeper to perform data heterogeneous.
2. The method of claim 1, wherein the heterogeneous data according to the data heterogeneous rule comprises:
and creating a historical data initialization task Job according to the data heterogeneous rule, setting the final modification time of the initialized data, and executing the Job.
3. The method of claim 2, further comprising:
and after the Job finishes, setting an offset of real-time consumption according to the last modification time of the data, starting real-time data isomerism, and finishing historical data isomerism when the data consumption time is equal to the latest data time.
4. The method of claim 1, wherein the isomerizing the real-time data according to the data isomerization rule by listening to a binLog comprises:
the method comprises the steps of acquiring bin logs by monitoring the bin logs, setting a unique data ID for each bin log, sending the data ID and corresponding real-time data to a kafka data pipeline through kafka information, and carrying out data isomerism on the real-time data through a plurality of parallel heterogeneous services according to a data isomerism rule.
5. The method of claim 1, further comprising:
the real-time data is reprocessed for playback by Zookeeper in accordance with the offset of kafka.
6. An apparatus for data heterogeneity, comprising:
the data model rule module is used for setting data heterogeneous rules according to the information of the data input source and the information of the data output source;
the historical data isomerism module is used for isomerising the historical data according to the data isomerism rule;
the data distribution module is used for updating the log binLog by monitoring the database, carrying out isomerism on the real-time data according to the data isomerism rule and outputting data obtained by isomerism to a search engine;
the data input source information includes a data input source table structure, the data output source information includes a data output source index, and the setting of the data heterogeneous rule according to the data input source information and the data output source information includes:
acquiring a data input source table structure and a data output source index, performing data input source field mapping on each field in the data output source index according to the data input source table structure, generating a mapping relation rule model, and storing the mapping relation rule model in an unstructured database;
the mapping relation rule model comprises a plurality of rule groups, and the data input source table structure and the rule groups are in a many-to-many relation;
the data distribution module is further configured to: managing the plurality of parallel heterogeneous services through a Zookeeper, comprising:
when the heterogeneous service is increased or decreased, closing a data heterogeneous switch through a Zookeeper, re-hashing the kafka message, and reporting the service state of the heterogeneous service;
and after all the heterogeneous services are in a non-data processing state, opening a data heterogeneous switch through the Zookeeper to perform data heterogeneous.
7. A device for data heterogeneity, comprising: memory, processor and computer program stored on the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1 to 5 when executing the program.
8. A computer-readable storage medium storing computer-executable instructions for performing the method of any one of claims 1-5.
CN201910911575.1A 2019-09-25 2019-09-25 Data heterogeneous method, device and equipment Active CN110674220B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910911575.1A CN110674220B (en) 2019-09-25 2019-09-25 Data heterogeneous method, device and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910911575.1A CN110674220B (en) 2019-09-25 2019-09-25 Data heterogeneous method, device and equipment

Publications (2)

Publication Number Publication Date
CN110674220A CN110674220A (en) 2020-01-10
CN110674220B true CN110674220B (en) 2022-09-09

Family

ID=69078949

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910911575.1A Active CN110674220B (en) 2019-09-25 2019-09-25 Data heterogeneous method, device and equipment

Country Status (1)

Country Link
CN (1) CN110674220B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111666344B (en) * 2020-06-19 2023-05-16 中信银行股份有限公司 Heterogeneous data synchronization method and device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105005618A (en) * 2015-07-21 2015-10-28 杭州合众数据技术有限公司 Data synchronization method and system among heterogeneous databases
CN107783975A (en) * 2016-08-24 2018-03-09 北京京东尚科信息技术有限公司 The method and apparatus of distributed data base synchronization process

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101693658B1 (en) * 2016-04-05 2017-01-06 주식회사 티맥스 소프트 Method, business processing server and data processing server for storing and searching transaction history data

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105005618A (en) * 2015-07-21 2015-10-28 杭州合众数据技术有限公司 Data synchronization method and system among heterogeneous databases
CN107783975A (en) * 2016-08-24 2018-03-09 北京京东尚科信息技术有限公司 The method and apparatus of distributed data base synchronization process

Also Published As

Publication number Publication date
CN110674220A (en) 2020-01-10

Similar Documents

Publication Publication Date Title
CN108536761B (en) Report data query method and server
CN109493076B (en) Kafka message unique consumption method, system, server and storage medium
CN112019369A (en) Dynamic configuration management method and system under micro-service framework
US7360208B2 (en) Rolling upgrade of distributed software with automatic completion
US7734585B2 (en) Updateable fan-out replication with reconfigurable master association
US9104728B2 (en) Query language to traverse a path in a graph
CN102202087B (en) Method for identifying storage equipment and system thereof
US11481440B2 (en) System and method for processing metadata to determine an object sequence
CN107977396B (en) Method and device for updating data table of KeyValue database
JP2021518021A (en) Data processing methods, equipment and computer readable storage media
CN109522043B (en) Method and device for managing configuration data and storage medium
WO2021012868A1 (en) Transaction rollback method and apparatus, database, system, and computer storage medium
CN107040576A (en) Information-pushing method and device, communication system
CN112115012A (en) Transaction monitoring method, device and system for distributed database and storage medium
US10083070B2 (en) Log file reduction according to problem-space network topology
CN110008104A (en) A kind of management method of log information, system, equipment and storage medium
WO2016014333A1 (en) Distributing and processing streams over one or more networks for on-the-fly schema evolution
CN110674220B (en) Data heterogeneous method, device and equipment
CN113254271B (en) Data sequence recovery method, device, equipment and storage medium
CN110377298B (en) Distributed cluster upgrading method and distributed cluster
CN111078258B (en) Version upgrading method and device
CN106681914B (en) Television picture quality debugging method and device
CN112835887A (en) Database management method, database management device, computing equipment and storage medium
CN110287220B (en) Method and device for generating configuration reverse textualization
CN115426356A (en) Distributed timed task lock update control execution method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant