CN116089535A - Data synchronization method, device, equipment and storage medium - Google Patents

Data synchronization method, device, equipment and storage medium Download PDF

Info

Publication number
CN116089535A
CN116089535A CN202310156089.XA CN202310156089A CN116089535A CN 116089535 A CN116089535 A CN 116089535A CN 202310156089 A CN202310156089 A CN 202310156089A CN 116089535 A CN116089535 A CN 116089535A
Authority
CN
China
Prior art keywords
data
field
hive
synchronized
scene
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310156089.XA
Other languages
Chinese (zh)
Inventor
冯洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202310156089.XA priority Critical patent/CN116089535A/en
Publication of CN116089535A publication Critical patent/CN116089535A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to data processing in the field of financial science and technology, and provides a data synchronization method, a device, equipment and a storage medium. The method identifies a field to be synchronized based on a request scene; acquiring field information of the field to be synchronized from the Hive data warehouse; creating a local data table in the clickHouse database based on the field information; creating a Hive engine table in the ClickHouse database based on the position information of the fields to be synchronized in the Hive data warehouse; generating a query statement based on the traversal of the Hive engine table; based on the query statement, query is performed on the Hive data warehouse to obtain query data, and the query data is written into the local data table, so that data synchronization can be efficiently and accurately realized. Furthermore, the present invention relates to blockchain techniques in which the query data may be stored.

Description

Data synchronization method, device, equipment and storage medium
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a data synchronization method, apparatus, device, and storage medium.
Background
In the field of finance and technology, with the enhancement of business instantaneity, the requirement on data query performance is continuously increased, so that prest in the current Hive data warehouse cannot meet the analysis requirement of data.
To meet the user's demand for data analysis, a clickHouse database has been developed. However, how to connect Hive data warehouse and clickHouse database more efficiently and accurately to realize step-by-step synchronization of data in Hive data warehouse to clickHouse database is a technical problem to be solved.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a data synchronization method, apparatus, device, and storage medium that can solve the technical problem of efficiently and accurately synchronizing data in a Hive data warehouse to a clickHouse database.
In one aspect, the present invention provides a data synchronization method applied to an electronic device, where the electronic device stores a Hive data warehouse and a clickHouse database, the data synchronization method includes:
identifying a field to be synchronized based on the request scene;
acquiring field information of the field to be synchronized from the Hive data warehouse;
creating a local data table in the clickHouse database based on the field information;
Creating a Hive engine table in the ClickHouse database based on the position information of the fields to be synchronized in the Hive data warehouse;
generating a query statement based on the traversal of the Hive engine table;
and inquiring the Hive data warehouse based on the inquiry statement to obtain inquiry data, and writing the inquiry data into the local data table.
According to a preferred embodiment of the present invention, the identifying the field to be synchronized based on the request scenario includes:
acquiring scene data quantity of a configuration scene and scene access frequency from the Hive data warehouse;
determining the configuration scene with the scene data volume larger than the preset data volume and the scene access frequency larger than the preset frequency as the request scene;
and determining a scene field corresponding to the request scene as the field to be synchronized.
According to a preferred embodiment of the present invention, the field information includes a field type, and the acquiring the field information of the field to be synchronized from the Hive data warehouse includes:
acquiring data formats of a plurality of preset data types;
any data information corresponding to the field to be synchronized is obtained from the Hive data warehouse;
Identifying a target format matched with any one data information from a plurality of data formats;
if a plurality of target formats exist, field data corresponding to the field to be synchronized are obtained from the Hive data warehouse based on a preset quantity threshold value and are matched with the plurality of target formats, so that matching degree is obtained;
and determining the preset data type corresponding to the target format with the largest matching degree as the field type.
According to a preferred embodiment of the present invention, said creating a local data table in said clickHouse database based on said field information comprises:
creating an initial data table in the ClickHouse database based on the field number of the fields to be synchronized and the field type;
acquiring an identification code of the initial data table;
generating a target script according to the identification code and a preset local cache starting script;
and writing the target script into a configuration file of the ClickHouse database, and running the written configuration file to convert the initial data table into the local data table.
According to a preferred embodiment of the present invention, said creating a Hive engine table in said clickHouse database based on the location information of said fields to be synchronized in said Hive data warehouse comprises:
Acquiring scene configuration information of the request scene;
acquiring a data table name and a database name corresponding to the field to be synchronized in the Hive data warehouse from the scene configuration information;
identifying the position information according to the data table name and the database name;
and filling the field to be synchronized, the position information, the data table name and the database name into a preset data table in the ClickHouse database to obtain the Hive engine table.
According to a preferred embodiment of the present invention, the querying the Hive data repository based on the query statement, to obtain query data includes:
acquiring equipment residual resources of the electronic equipment and warehouse residual resources of the Hive data warehouse;
if the equipment residual resources and the warehouse residual resources are both larger than a preset resource threshold, counting the statement number of the query statement;
and calling the execution threads corresponding to the statement number, and running the query statement in the Hive data warehouse to obtain the query data.
According to a preferred embodiment of the present invention, said writing said query data into said local data table comprises:
locating a table path of the local data table in the clickHouse database;
Acquiring the pointer position of the field to be synchronized from the local data table;
and writing the query data on a space corresponding to the table path and the pointer position.
On the other hand, the invention also provides a data synchronization device which operates in an electronic device, wherein the electronic device stores a Hive data warehouse and a ClickHouse database, and the data synchronization device comprises:
the identifying unit is used for identifying a field to be synchronized based on the request scene;
an obtaining unit, configured to obtain field information of the field to be synchronized from the Hive data warehouse;
a creating unit configured to create a local data table in the clickHouse database based on the field information;
the creating unit is further configured to create a Hive engine table in the clickHouse database based on the location information of the field to be synchronized in the Hive data warehouse;
a generating unit, configured to generate a query statement based on the traversal of the Hive engine table;
and the writing unit is used for inquiring the Hive data warehouse based on the inquiry statement to obtain inquiry data and writing the inquiry data into the local data table.
In another aspect, the present invention also proposes an electronic device, including:
A memory storing computer readable instructions; a kind of electronic device with high-pressure air-conditioning system
And a processor executing computer readable instructions stored in the memory to implement the data synchronization method.
In another aspect, the present invention also proposes a computer readable storage medium having stored therein computer readable instructions that are executed by a processor in an electronic device to implement the data synchronization method.
According to the technical scheme, the field to be synchronized can be accurately identified through the request scene, the local data table can be accurately created in the ClickHouse database based on the field information matched from the Hive data warehouse, further, the query statement is generated through the Hive engine table constructed by the field to be synchronized, the corresponding query data can be accurately obtained, and the obtained query data can be accurately stored in the local data table. Meanwhile, the local data table is created in the ClickHouse database, so that the writing efficiency of the query data can be accelerated, and the synchronization efficiency is improved. According to the method and the device, the data in the Hive data warehouse are synchronized to the ClickHouse database, so that the instantaneity and the use efficiency of the data can be improved, the value of the data is improved, and meanwhile, the consumption of resources can be reduced. The utility model is applicable to the finance science and technology field, through this application, can improve the relevant staff in finance science and technology field and to the calling analysis efficiency of data in the ClickHouse database to promote the development in wisdom city.
Drawings
FIG. 1 is a flow chart of a preferred embodiment of the data synchronization method of the present invention.
FIG. 2 is a schematic diagram of a local data table in the present invention.
FIG. 3 is a schematic diagram of the Hive engine table of the present invention.
FIG. 4 is a functional block diagram of a preferred embodiment of the data synchronization device of the present invention.
Fig. 5 is a schematic structural diagram of an electronic device according to a preferred embodiment of the present invention for implementing a data synchronization method.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in detail with reference to the accompanying drawings and specific embodiments.
FIG. 1 is a flow chart of a preferred embodiment of the data synchronization method of the present invention. The order of the steps in the flowchart may be changed and some steps may be omitted according to various needs.
The data synchronization method can acquire and process related data based on artificial intelligence technology. Among these, artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use knowledge to obtain optimal results.
Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.
The data synchronization method is applied to one or more electronic devices, wherein the electronic devices are devices capable of automatically performing numerical calculation and/or information processing according to preset or stored computer readable instructions, and the hardware comprises, but is not limited to, microprocessors, application specific integrated circuits (Application Specific Integrated Circuit, ASICs), programmable gate arrays (Field-Programmable Gate Array, FPGAs), digital signal processors (Digital Signal Processor, DSPs), embedded devices and the like.
The electronic device may be any electronic product that can interact with a user in a human-computer manner, such as a personal computer, tablet computer, smart phone, personal digital assistant (Personal Digital Assistant, PDA), game console, interactive internet protocol television (Internet Protocol Television, IPTV), smart wearable device, etc.
The electronic device may comprise a network device and/or a user device. Wherein the network device includes, but is not limited to, a single network electronic device, a group of electronic devices made up of multiple network electronic devices, or a Cloud based Cloud Computing (Cloud Computing) made up of a large number of hosts or network electronic devices.
The network on which the electronic device is located includes, but is not limited to: the internet, wide area networks, metropolitan area networks, local area networks, virtual private networks (Virtual Private Network, VPN), etc.
The electronic device stores a Hive data warehouse and a clickHouse database. The Hive data warehouse can be used for extracting, converting and loading data. The Hive data warehouse provides SQL query functions, and can convert SQL sentences into MapReduce tasks to be executed. The ClickHouse database is an open-source column database and is mainly used for on-line analysis and processing of query data.
The data synchronization method is applied to the field of financial science and technology.
101, identifying a field to be synchronized based on a request scenario.
In at least one embodiment of the present invention, the request scenario refers to a configuration scenario in which the scenario data amount is greater than the preset data amount and the scenario access frequency is greater than the preset frequency, for example, the request scenario may be a man-hour statistics scenario.
Wherein the scene data amount refers to the amount of all data corresponding to the requested scene. The scene access frequency refers to the access condition of related staff in the financial science and technology field to the data of the request scene. The preset data amount and the preset frequency can be set according to actual requirements.
The field to be synchronized refers to a field that needs to be synchronized from the Hive data warehouse to the clickHouse database, and includes all scene fields of the request scene, for example, the field to be synchronized includes, but is not limited to: user name, user hours to work, etc.
In at least one embodiment of the present invention, the electronic device identifying the field to be synchronized based on the request scenario includes:
acquiring scene data quantity of a configuration scene and scene access frequency from the Hive data warehouse;
determining the configuration scene with the scene data volume larger than the preset data volume and the scene access frequency larger than the preset frequency as the request scene;
and determining a scene field corresponding to the request scene as the field to be synchronized.
Wherein, the configuration scene refers to all scenes preconfigured in the Hive data warehouse, for example, the configuration scene may further include: sales statistics for A commodity, etc.
By combining the scene data amount and the scene access frequency, a configuration scene with more consumed resources can be determined as the request scene, so that data corresponding to the request scene is synchronized to the ClickHouse database, and the analysis efficiency of the data corresponding to the request scene can be improved.
102, acquiring field information of the field to be synchronized from the Hive data warehouse.
In at least one embodiment of the present invention, the field information includes a field type.
The field type refers to a type of data corresponding to the field to be synchronized, for example, the field type may include, but is not limited to: numerical value type, character string type, date and time type, and the like.
In at least one embodiment of the present invention, the electronic device obtaining, from the Hive data store, field information of the field to be synchronized includes:
acquiring data formats of a plurality of preset data types;
any data information corresponding to the field to be synchronized is obtained from the Hive data warehouse;
identifying a target format matched with any one data information from a plurality of data formats;
if a plurality of target formats exist, field data corresponding to the field to be synchronized are obtained from the Hive data warehouse based on a preset quantity threshold value and are matched with the plurality of target formats, so that matching degree is obtained;
And determining the preset data type corresponding to the target format with the largest matching degree as the field type.
Wherein the plurality of preset data types may include, but are not limited to: numerical value type, character string type, date and time type, and the like.
The preset number threshold may be set according to actual requirements, for example, the preset number threshold may be 10.
The field data refers to data corresponding to the field to be synchronized in the Hive data warehouse, and the number of the field data is equal to the preset number threshold.
The matching degree refers to the ratio of the matching number of the field data matched with each target format on the preset number threshold.
By matching any one data information with a plurality of data formats, the target format can be rapidly identified because matching of a plurality of data formats with all data corresponding to the field to be synchronized is not needed, and when a plurality of target formats exist, the field data with the preset number threshold are matched with the plurality of target formats, so that the field type can be accurately determined while the determination efficiency of the field type is ensured.
In other embodiments, if the target format is only a single one, the electronic device directly determines the preset data type corresponding to the target format as the field type.
In at least one embodiment of the present invention, the field information may further include, but is not limited to: field precision, field length, primary key, etc. The field precision, the field length and the primary key can be obtained from field configuration information corresponding to the field to be synchronized in the Hive data warehouse.
The field precision refers to the precision of the data corresponding to the field to be synchronized, for example, the field precision may be 0.001 or the like.
The field length refers to a byte length occupied by data corresponding to the field to be synchronized, and for example, the field length may be 4KB or the like.
The primary key can uniquely identify the field to be synchronized in the Hive data store.
103 creating a local data table in said clickHouse database based on said field information.
In at least one embodiment of the present invention, the local data table has a local caching function, so as to further improve synchronization efficiency. As shown in fig. 2, fig. 2 is a schematic diagram of a local data table in the present invention. Fig. 2 includes N fields to be synchronized, where the N fields to be synchronized are respectively: user name, user identification code, user working time, field to be synchronized N, etc. The field types of the N fields to be synchronized are further included in fig. 2, for example, the field types corresponding to the user name are: the character string type, the field type corresponding to the user identification code is: the character string type, the field type of the user working time is: the number type and the field type of the field N to be synchronized are date and time type and the like.
In at least one embodiment of the invention, the creating, by the electronic device, a local data table in the clickHouse database based on the field information includes:
creating an initial data table in the ClickHouse database based on the field number of the fields to be synchronized and the field type;
acquiring an identification code of the initial data table;
generating a target script according to the identification code and a preset local cache starting script;
and writing the target script into a configuration file of the ClickHouse database, and running the written configuration file to convert the initial data table into the local data table.
Wherein the number of rows or columns in the initial data table is equal to the number of fields. The initial data table does not have a local caching function.
The identification code can uniquely identify the initial data table.
The preset local cache starting script is used for starting the local cache function of the data table.
Relevant configuration information of the ClickHouse database is stored in the configuration file, for example, a local cache starting function is stored in the configuration file.
Through the number of the fields, an initial data table can be reasonably created in the ClickHouse database, and then the identification code and the preset local cache starting script are combined, so that a target script can be quickly generated, and further, the initial data table can be quickly converted into the local data table based on the target script.
104, creating a Hive engine table in the ClickHouse database based on the position information of the fields to be synchronized in the Hive data warehouse.
In at least one embodiment of the present invention, the location information includes address information of a data table in which the field to be synchronized is located.
And the Hive engine table stores the corresponding relation among the field to be synchronized, the position information, the data table name and the database name. As shown in FIG. 3, FIG. 3 is a schematic diagram of the Hive engine table of the present invention. Fig. 3 includes N fields to be synchronized, where the N fields to be synchronized are respectively: user name, user identification code, user working time, field to be synchronized N, etc. The position information corresponding to the user name is a path A, the corresponding data is shown as a table 1, and the corresponding database name is a database X; the position information corresponding to the user identification code is a path B, the corresponding data is shown as a table 3, and the corresponding database name is a database X; the position information corresponding to the user working time is a path C, the corresponding data is shown as a table 4, and the corresponding database name is a database Y; the location information corresponding to the field to be synchronized N is path a, the corresponding data is indicated as table 2, the corresponding database name is library X, and so on.
In at least one embodiment of the present invention, the creating a Hive engine table in the clickHouse database by the electronic device based on the location information of the fields to be synchronized in the Hive data store comprises:
acquiring scene configuration information of the request scene;
acquiring a data table name and a database name corresponding to the field to be synchronized in the Hive data warehouse from the scene configuration information;
identifying the position information according to the data table name and the database name;
and filling the field to be synchronized, the position information, the data table name and the database name into a preset data table in the ClickHouse database to obtain the Hive engine table.
The scene configuration information includes all information related to the requested scene, for example, the scene configuration information includes a data table name and a database name of the field to be synchronized.
The preset data table refers to a data table which is pre-configured in the ClickHouse database.
The data table names and the database names can be obtained rapidly and comprehensively through the scene configuration information, and further the data table names and the database names can be used for identifying the position information, so that the query of the data in the Hive data warehouse can be assisted, and the field to be synchronized, the position information, the data table names and the database names are filled into the preset data table.
105, generating a query statement based on the traversal of the Hive engine table.
In at least one embodiment of the present invention, the query statement includes the field to be synchronized, the location information, the data table name, and the database name stored in the Hive engine table. For example, the query statement may be: SELECT id, name FROM (table name and database name) WHERE.
In at least one embodiment of the present invention, the electronic device generating a query statement based on a traversal of the Hive engine table comprises:
and traversing the Hive engine table in sequence, and generating a plurality of query sentences according to traversing information obtained by each traversing.
Through traversing the Hive engine table, corresponding query sentences can be sequentially generated.
And 106, inquiring the Hive data warehouse based on the inquiry statement to obtain inquiry data, and writing the inquiry data into the local data table.
It is emphasized that to further ensure the privacy and security of the query data, the query data may also be stored in a blockchain node.
In at least one embodiment of the present invention, the query statement refers to data corresponding to the field to be synchronized in the Hive data warehouse.
In at least one embodiment of the present invention, the electronic device querying the Hive data warehouse based on the query statement, where obtaining query data includes:
acquiring equipment residual resources of the electronic equipment and warehouse residual resources of the Hive data warehouse;
if the equipment residual resources and the warehouse residual resources are both larger than a preset resource threshold, counting the statement number of the query statement;
and calling the execution threads corresponding to the statement number, and running the query statement in the Hive data warehouse to obtain the query data.
The device residual resources may refer to a device idle processing thread, a residual capacity, and the like of the electronic device.
The warehouse remaining resources may refer to the remaining capacity of the Hive data warehouse, etc.
The preset resource threshold can be set according to actual requirements.
The number of statements is typically equal to the number of fields.
By the implementation mode, when the equipment residual resources and the warehouse residual resources are larger than the preset resource threshold, the parallel query of the query data by the execution threads can be invoked, and the acquisition efficiency of the query data is improved.
In other embodiments, if the remaining resources of the device or the remaining resources of the repository are less than or equal to the preset resource threshold, the electronic device invokes an idle processing thread, and runs the query statement in the Hive data repository to obtain the query data.
By the implementation mode, the query data can be obtained on the premise that the resource occupation of the electronic equipment and the Hive data warehouse is avoided on the premise that the equipment residual resources or the warehouse residual resources are smaller than or equal to the preset resource threshold.
In at least one embodiment of the invention, the electronic device writing the query data to the local data table includes:
locating a table path of the local data table in the clickHouse database;
acquiring the pointer position of the field to be synchronized from the local data table;
and writing the query data on a space corresponding to the table path and the pointer position.
And synchronizing the query data to the corresponding spaces in sequence through the table paths and the pointer positions, and simultaneously avoiding discontinuous storage in the local data table.
According to the technical scheme, the field to be synchronized can be accurately identified through the request scene, the local data table can be accurately created in the ClickHouse database based on the field information matched from the Hive data warehouse, further, the query statement is generated through the Hive engine table constructed by the field to be synchronized, the corresponding query data can be accurately obtained, and the obtained query data can be accurately stored in the local data table. Meanwhile, the local data table is created in the ClickHouse database, so that the writing efficiency of the query data can be accelerated, and the synchronization efficiency is improved. According to the method and the device, the data in the Hive data warehouse are synchronized to the ClickHouse database, so that the instantaneity and the use efficiency of the data can be improved, the value of the data is improved, and meanwhile, the consumption of resources can be reduced. The utility model is applicable to the finance science and technology field, through this application, can improve the relevant staff in finance science and technology field and to the calling analysis efficiency of data in the ClickHouse database to promote the development in wisdom city.
FIG. 4 is a functional block diagram of a preferred embodiment of the data synchronization device of the present invention. The data synchronizing device 11 includes an identification unit 110, an acquisition unit 111, a creation unit 112, a generation unit 113, and a writing unit 114. The module/unit referred to herein is a series of computer readable instructions capable of being retrieved by the processor 13 and performing a fixed function and stored in the memory 12. In the present embodiment, the functions of the respective modules/units will be described in detail in the following embodiments.
The data synchronization device operates on an electronic device, and the electronic device stores a Hive data warehouse and a ClickHouse database.
An identifying unit 110, configured to identify a field to be synchronized based on a request scenario;
an obtaining unit 111, configured to obtain field information of the field to be synchronized from the Hive data warehouse;
a creating unit 112 for creating a local data table in the clickHouse database based on the field information;
the creating unit 112 is further configured to create a Hive engine table in the clickHouse database based on the location information of the field to be synchronized in the Hive data repository;
a generating unit 113, configured to generate a query statement based on the traversal of the Hive engine table;
and the writing unit 114 is configured to query the Hive data warehouse based on the query statement, obtain query data, and write the query data into the local data table.
In at least one embodiment of the present invention, the identifying unit 110 is further configured to:
acquiring scene data quantity of a configuration scene and scene access frequency from the Hive data warehouse;
determining the configuration scene with the scene data volume larger than the preset data volume and the scene access frequency larger than the preset frequency as the request scene;
And determining a scene field corresponding to the request scene as the field to be synchronized.
In at least one embodiment of the present invention, the field information includes a field type, and the obtaining unit 111 is further configured to:
acquiring data formats of a plurality of preset data types;
any data information corresponding to the field to be synchronized is obtained from the Hive data warehouse;
identifying a target format matched with any one data information from a plurality of data formats;
if a plurality of target formats exist, field data corresponding to the field to be synchronized are obtained from the Hive data warehouse based on a preset quantity threshold value and are matched with the plurality of target formats, so that matching degree is obtained;
and determining the preset data type corresponding to the target format with the largest matching degree as the field type.
In at least one embodiment of the present invention, the creating unit 112 is further configured to:
creating a local data table in the clickHouse database based on the field information includes:
creating an initial data table in the ClickHouse database based on the field number of the fields to be synchronized and the field type;
acquiring an identification code of the initial data table;
Generating a target script according to the identification code and a preset local cache starting script;
and writing the target script into a configuration file of the ClickHouse database, and running the written configuration file to convert the initial data table into the local data table.
In at least one embodiment of the present invention, the creating unit 112 is further configured to:
acquiring scene configuration information of the request scene;
acquiring a data table name and a database name corresponding to the field to be synchronized in the Hive data warehouse from the scene configuration information;
identifying the position information according to the data table name and the database name;
and filling the field to be synchronized, the position information, the data table name and the database name into a preset data table in the ClickHouse database to obtain the Hive engine table.
In at least one embodiment of the present invention, the writing unit 114 is further configured to:
acquiring equipment residual resources of the electronic equipment and warehouse residual resources of the Hive data warehouse;
if the equipment residual resources and the warehouse residual resources are both larger than a preset resource threshold, counting the statement number of the query statement;
And calling the execution threads corresponding to the statement number, and running the query statement in the Hive data warehouse to obtain the query data.
In at least one embodiment of the present invention, the writing unit 114 is further configured to:
locating a table path of the local data table in the clickHouse database;
acquiring the pointer position of the field to be synchronized from the local data table;
and writing the query data on a space corresponding to the table path and the pointer position.
According to the technical scheme, the field to be synchronized can be accurately identified through the request scene, the local data table can be accurately created in the ClickHouse database based on the field information matched from the Hive data warehouse, further, the query statement is generated through the Hive engine table constructed by the field to be synchronized, the corresponding query data can be accurately obtained, and the obtained query data can be accurately stored in the local data table. Meanwhile, the local data table is created in the ClickHouse database, so that the writing efficiency of the query data can be accelerated, and the synchronization efficiency is improved. According to the method and the device, the data in the Hive data warehouse are synchronized to the ClickHouse database, so that the instantaneity and the use efficiency of the data can be improved, the value of the data is improved, and meanwhile, the consumption of resources can be reduced. The utility model is applicable to the finance science and technology field, through this application, can improve the relevant staff in finance science and technology field and to the calling analysis efficiency of data in the ClickHouse database to promote the development in wisdom city.
Fig. 5 is a schematic structural diagram of an electronic device according to a preferred embodiment of the present invention for implementing the data synchronization method.
In one embodiment of the invention, the electronic device 1 includes, but is not limited to, a memory 12, a processor 13, and computer readable instructions, such as a data synchronization program, stored in the memory 12 and executable on the processor 13.
It will be appreciated by those skilled in the art that the schematic diagram is merely an example of the electronic device 1 and does not constitute a limitation of the electronic device 1, and may include more or less components than illustrated, or may combine certain components, or different components, e.g. the electronic device 1 may further include input-output devices, network access devices, buses, etc.
The processor 13 may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. The general purpose processor may be a microprocessor or the processor may be any conventional processor, etc., and the processor 13 is an operation core and a control center of the electronic device 1, connects various parts of the entire electronic device 1 using various interfaces and lines, and executes an operating system of the electronic device 1 and various installed applications, program codes, etc.
Illustratively, the computer readable instructions may be partitioned into one or more modules/units that are stored in the memory 12 and executed by the processor 13 to complete the present invention. The one or more modules/units may be a series of computer readable instructions capable of performing a specific function, the computer readable instructions describing a process of executing the computer readable instructions in the electronic device 1. For example, the computer-readable instructions may be divided into an identification unit 110, an acquisition unit 111, a creation unit 112, a generation unit 113, and a writing unit 114.
The memory 12 may be used to store the computer readable instructions and/or modules, and the processor 13 may implement various functions of the electronic device 1 by executing or executing the computer readable instructions and/or modules stored in the memory 12 and invoking data stored in the memory 12. The memory 12 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like; the storage data area may store data created according to the use of the electronic device, etc. Memory 12 may include non-volatile and volatile memory, such as: a hard disk, memory, plug-in hard disk, smart Media Card (SMC), secure Digital (SD) Card, flash Card (Flash Card), at least one disk storage device, flash memory device, or other storage device.
The memory 12 may be an external memory and/or an internal memory of the electronic device 1. Further, the memory 12 may be a physical memory, such as a memory bank, a TF Card (Trans-flash Card), or the like.
The integrated modules/units of the electronic device 1 may be stored in a computer readable storage medium if implemented in the form of software functional units and sold or used as separate products. Based on such understanding, the present invention may also be implemented by implementing all or part of the processes in the methods of the embodiments described above, by instructing the associated hardware by means of computer readable instructions, which may be stored in a computer readable storage medium, the computer readable instructions, when executed by a processor, implementing the steps of the respective method embodiments described above.
Wherein the computer readable instructions comprise computer readable instruction code which may be in the form of source code, object code, executable files, or in some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying the computer readable instruction code, a recording medium, a USB flash disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory).
The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm and the like. The Blockchain (Blockchain), which is essentially a decentralised database, is a string of data blocks that are generated in association using cryptographic methods, each of which contains information of a batch of network transactions for verifying the validity (anti-counterfeiting) of the information and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, an application services layer, and the like.
In connection with fig. 1, the memory 12 in the electronic device 1 stores computer readable instructions implementing a data synchronization method, the processor 13 being executable to implement:
identifying a field to be synchronized based on the request scene;
acquiring field information of the field to be synchronized from a Hive data warehouse;
creating a local data table in a clickHouse database based on the field information;
creating a Hive engine table in the ClickHouse database based on the position information of the fields to be synchronized in the Hive data warehouse;
generating a query statement based on the traversal of the Hive engine table;
And inquiring the Hive data warehouse based on the inquiry statement to obtain inquiry data, and writing the inquiry data into the local data table.
In particular, the specific implementation method of the processor 13 on the computer readable instructions may refer to the description of the relevant steps in the corresponding embodiment of fig. 1, which is not repeated herein.
In the several embodiments provided in the present invention, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical function division, and there may be other manners of division when actually implemented.
The computer readable storage medium has stored thereon computer readable instructions, wherein the computer readable instructions when executed by the processor 13 are configured to implement the steps of:
identifying a field to be synchronized based on the request scene;
acquiring field information of the field to be synchronized from a Hive data warehouse;
creating a local data table in a clickHouse database based on the field information;
creating a Hive engine table in the ClickHouse database based on the position information of the fields to be synchronized in the Hive data warehouse;
Generating a query statement based on the traversal of the Hive engine table;
and inquiring the Hive data warehouse based on the inquiry statement to obtain inquiry data, and writing the inquiry data into the local data table.
The modules described as separate components may or may not be physically separate, and components shown as modules may or may not be physical units, may be located in one place, or may be distributed over multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional module in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units can be realized in a form of hardware or a form of hardware and a form of software functional modules.
The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.
Furthermore, it is evident that the word "comprising" does not exclude other elements or steps, and that the singular does not exclude a plurality. The units or means may also be implemented by one unit or means in software or hardware. The terms first, second, etc. are used to denote a name, but not any particular order.
Finally, it should be noted that the above-mentioned embodiments are merely for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made to the technical solution of the present invention without departing from the spirit and scope of the technical solution of the present invention.

Claims (10)

1. The data synchronization method is applied to electronic equipment, and the electronic equipment stores a Hive data warehouse and a ClickHouse database, and is characterized by comprising the following steps:
identifying a field to be synchronized based on the request scene;
acquiring field information of the field to be synchronized from the Hive data warehouse;
creating a local data table in the clickHouse database based on the field information;
creating a Hive engine table in the ClickHouse database based on the position information of the fields to be synchronized in the Hive data warehouse;
Generating a query statement based on the traversal of the Hive engine table;
and inquiring the Hive data warehouse based on the inquiry statement to obtain inquiry data, and writing the inquiry data into the local data table.
2. The data synchronization method of claim 1, wherein the identifying a field to be synchronized based on a request scenario comprises:
acquiring scene data quantity of a configuration scene and scene access frequency from the Hive data warehouse;
determining the configuration scene with the scene data volume larger than the preset data volume and the scene access frequency larger than the preset frequency as the request scene;
and determining a scene field corresponding to the request scene as the field to be synchronized.
3. The data synchronization method of claim 1, wherein the field information comprises a field type, and wherein the obtaining the field information of the field to be synchronized from the Hive data warehouse comprises:
acquiring data formats of a plurality of preset data types;
any data information corresponding to the field to be synchronized is obtained from the Hive data warehouse;
identifying a target format matched with any one data information from a plurality of data formats;
If a plurality of target formats exist, field data corresponding to the field to be synchronized are obtained from the Hive data warehouse based on a preset quantity threshold value and are matched with the plurality of target formats, so that matching degree is obtained;
and determining the preset data type corresponding to the target format with the largest matching degree as the field type.
4. The data synchronization method of claim 3, wherein the creating a local data table in the clickHouse database based on the field information comprises:
creating an initial data table in the ClickHouse database based on the field number of the fields to be synchronized and the field type;
acquiring an identification code of the initial data table;
generating a target script according to the identification code and a preset local cache starting script;
and writing the target script into a configuration file of the ClickHouse database, and running the written configuration file to convert the initial data table into the local data table.
5. The data synchronization method of claim 1, wherein the creating a Hive engine table in the ClickHouse database based on the location information of the fields to be synchronized in the Hive data store comprises:
Acquiring scene configuration information of the request scene;
acquiring a data table name and a database name corresponding to the field to be synchronized in the Hive data warehouse from the scene configuration information;
identifying the position information according to the data table name and the database name;
and filling the field to be synchronized, the position information, the data table name and the database name into a preset data table in the ClickHouse database to obtain the Hive engine table.
6. The data synchronization method of claim 1, wherein querying the Hive data warehouse based on the query statement, the obtaining query data comprises:
acquiring equipment residual resources of the electronic equipment and warehouse residual resources of the Hive data warehouse;
if the equipment residual resources and the warehouse residual resources are both larger than a preset resource threshold, counting the statement number of the query statement;
and calling the execution threads corresponding to the statement number, and running the query statement in the Hive data warehouse to obtain the query data.
7. The data synchronization method of claim 1, wherein the writing the query data to the local data table comprises:
Locating a table path of the local data table in the clickHouse database;
acquiring the pointer position of the field to be synchronized from the local data table;
and writing the query data on a space corresponding to the table path and the pointer position.
8. A data synchronization device operable on an electronic device, the electronic device storing a Hive data warehouse and a ClickHouse database, the data synchronization device comprising:
the identifying unit is used for identifying a field to be synchronized based on the request scene;
an obtaining unit, configured to obtain field information of the field to be synchronized from the Hive data warehouse;
a creating unit configured to create a local data table in the clickHouse database based on the field information;
the creating unit is further configured to create a Hive engine table in the clickHouse database based on the location information of the field to be synchronized in the Hive data warehouse;
a generating unit, configured to generate a query statement based on the traversal of the Hive engine table;
and the writing unit is used for inquiring the Hive data warehouse based on the inquiry statement to obtain inquiry data and writing the inquiry data into the local data table.
9. An electronic device, the electronic device comprising:
a memory storing computer readable instructions; a kind of electronic device with high-pressure air-conditioning system
A processor executing computer readable instructions stored in the memory to implement the data synchronization method of any one of claims 1 to 7.
10. A computer-readable storage medium, characterized by: stored in the computer readable storage medium are computer readable instructions that are executed by a processor in an electronic device to implement the data synchronization method of any one of claims 1 to 7.
CN202310156089.XA 2023-02-15 2023-02-15 Data synchronization method, device, equipment and storage medium Pending CN116089535A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310156089.XA CN116089535A (en) 2023-02-15 2023-02-15 Data synchronization method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310156089.XA CN116089535A (en) 2023-02-15 2023-02-15 Data synchronization method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116089535A true CN116089535A (en) 2023-05-09

Family

ID=86202561

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310156089.XA Pending CN116089535A (en) 2023-02-15 2023-02-15 Data synchronization method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116089535A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117331513A (en) * 2023-12-01 2024-01-02 蒲惠智造科技股份有限公司 Data reduction method and system based on Hadoop architecture

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117331513A (en) * 2023-12-01 2024-01-02 蒲惠智造科技股份有限公司 Data reduction method and system based on Hadoop architecture
CN117331513B (en) * 2023-12-01 2024-03-19 蒲惠智造科技股份有限公司 Data reduction method and system based on Hadoop architecture

Similar Documents

Publication Publication Date Title
CN110750654A (en) Knowledge graph acquisition method, device, equipment and medium
JP2021518021A (en) Data processing methods, equipment and computer readable storage media
CN111897818A (en) Data storage method and device, electronic equipment and storage medium
CN112632163B (en) Big data report export method and related equipment
CN112506486A (en) Search system establishing method and device, electronic equipment and readable storage medium
CN112231070A (en) Data writing and reading method and device and server
CN106909554A (en) A kind of loading method and device of database text table data
CN116089535A (en) Data synchronization method, device, equipment and storage medium
CN114372060A (en) Data storage method, device, equipment and storage medium
WO2022105546A1 (en) Big data transaction method and apparatus based on blockchain, and medium and device
CN111986771A (en) Medical prescription query method and device, electronic equipment and storage medium
CN112434062A (en) Quasi-real-time data processing method, device, server and storage medium
CN115794621A (en) Code coverage test method, device, equipment and storage medium
CN115114297A (en) Data lightweight storage and search method and device, electronic equipment and storage medium
CN116360769A (en) Code generation method, device, equipment and storage medium
CN115952201A (en) Data query method, device, system and storage medium
CN114692204A (en) Data query method, device, equipment and storage medium
CN112667721A (en) Data analysis method, device, equipment and storage medium
CN114238296A (en) Product index data display method, device, equipment and storage medium
CN113282218A (en) Multi-dimensional report generation method, device, equipment and storage medium
CN110297842B (en) Data comparison method, device, terminal and storage medium
CN113254455A (en) Dynamic configuration method and device of database, computer equipment and storage medium
CN111859985A (en) AI customer service model testing method, device, electronic equipment and storage medium
CN113127574A (en) Service data display method, system, equipment and medium based on knowledge graph
CN113722550A (en) Method and device for realizing relation map, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination