WO2021253688A1 - Data synchronization method and apparatus, and data query method and apparatus - Google Patents

Data synchronization method and apparatus, and data query method and apparatus Download PDF

Info

Publication number
WO2021253688A1
WO2021253688A1 PCT/CN2020/119711 CN2020119711W WO2021253688A1 WO 2021253688 A1 WO2021253688 A1 WO 2021253688A1 CN 2020119711 W CN2020119711 W CN 2020119711W WO 2021253688 A1 WO2021253688 A1 WO 2021253688A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
database
index
synchronized
synchronization
Prior art date
Application number
PCT/CN2020/119711
Other languages
French (fr)
Chinese (zh)
Inventor
杨飞
曹素杰
Original Assignee
北京旷视科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京旷视科技有限公司 filed Critical 北京旷视科技有限公司
Publication of WO2021253688A1 publication Critical patent/WO2021253688A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution

Definitions

  • the present disclosure generally relates to the field of database storage, and specifically relates to a data synchronization method, a data synchronization device, a data query method, a data query device, an electronic device, and a computer-readable storage medium.
  • the base database data needs to be synchronized to the retrieval engine to perform attribute retrieval.
  • the retrieval engine will be set with a certain number of shards. Each shard corresponds to an index, and an index corresponds to a database table written in a database, that is, to The database table of each database establishes an independent index. In this case, if the number of databases to be synchronized is large, a large amount of sharding resources will be required. If the number of shards is not enough, it will not be able to build The new index makes the data of some databases unable to be synchronized and searched.
  • the first aspect of the present disclosure provides a data synchronization method, wherein the method includes: acquiring data to be synchronized, and the data to be synchronized includes data in one or more database tables of the first database. Data; create a first index in the target database, and synchronize the data of all the database tables of the first database corresponding to the data to be synchronized to the first index in turn; if the current first index meets the index rolling strategy, create a new first index Index, and continue to synchronize the data that has not yet been synchronized to the new first index.
  • the current first index satisfies the index rolling strategy, including at least one of the following: the storage space occupied by the current first index reaches the storage threshold; and the amount of data synchronized to the current first index reaches the capacity threshold.
  • the current first index satisfies the index rolling strategy, which further includes: the time for data synchronization of the current first index reaches the time threshold.
  • the data to be synchronized further includes data in one or more database tables of the second database, wherein the amount of data in the second database is greater than the amount of data in the first database; the method further includes: establishing in the target database One or more second indexes corresponding to the database table of each second database one-to-one; the content of the database table of each second database is synchronized to its corresponding second index respectively.
  • the method further includes: determining the type of index creation based on the table name of the database table, wherein the table name of the database table of the first database includes a first identifier, and the database table of the second database includes a second identifier; if If the table name of the database table contains the first identifier, the first index is created, and the data of all the database tables of the first database corresponding to the data to be synchronized is synchronized to the first index in sequence; if the table name of the database table contains the second To identify, the step of establishing one or more second indexes corresponding to the database tables of each second database is executed.
  • the method further includes: in response to a data deletion request for deleting any of the first databases, determining one or more first indexes corresponding to the first database to be deleted; deleting one or more first indexes Data corresponding to the deleted first database.
  • the method further includes: in response to a data deletion request for deleting any second database, deleting a second index corresponding to the second database to be deleted.
  • the method further includes: when an operation on the first data in the database table of any first database is detected, based on the first data, determining the first index corresponding to the first data among all the first indexes And according to the operation, synchronously correct the data in the first index corresponding to the first data; wherein, the operation includes adding data, modifying data, or deleting data.
  • the method further includes: when an operation on the second data in the database table of any second database is detected, in the second index corresponding to the database table of the second database, synchronously revising the first data according to the operation.
  • a second aspect of the present disclosure provides a data synchronization device, including: a data acquisition module for acquiring data to be synchronized, the data to be synchronized including data in one or more database tables of a first database; and a data synchronization module for Create a first index in the target database, and synchronize the data of all the database tables of the first database corresponding to the data to be synchronized to the first index in turn; if the current first index satisfies the index rolling strategy, a new first index is created, Continue to synchronize the unsynchronized data to the new first index.
  • a third aspect of the present disclosure provides a data query method.
  • the method includes: obtaining query information of the data to be queried; querying synchronization data corresponding to the data to be queried in the index of the target database based on the query information; and determining the query to be queried based on the synchronization data The location of the data in its corresponding database; wherein the data to be synchronized in the database is synchronized to the index of the target database by the data synchronization method according to any one of claims 1-8.
  • a fourth aspect of the present disclosure provides a data query device including: a receiving module for obtaining query information of the data to be queried; a search module for querying synchronization data corresponding to the data to be queried in the index of the target database based on the query information
  • the query module is used to determine the location of the data to be queried in its corresponding database based on the synchronization data; wherein, the data to be synchronized in the database is synchronized to the index of the target database through the data synchronization method as in the first aspect.
  • a fifth aspect of the present disclosure provides an electronic device including: a memory for storing instructions; and a processor for calling the instructions stored in the memory to execute the data synchronization method according to the first aspect or the data query method according to the third aspect .
  • a sixth aspect of the present disclosure provides a computer-readable storage medium in which instructions are stored. When the instructions are executed by a processor, the data synchronization method according to the first aspect or the data query method according to the third aspect is executed.
  • the data synchronization method, data synchronization device, data query method, data query device, electronic equipment, and computer readable storage medium can write one or more data in the database tables of multiple databases in sequence by scrolling In the index, thereby saving the occupied fragmentation resources.
  • Fig. 1 shows a schematic flowchart of a data synchronization method according to an embodiment of the present disclosure
  • FIG. 2 shows a schematic flowchart of a data synchronization method according to another embodiment of the present disclosure
  • FIG. 3 shows a schematic flowchart of a data synchronization method according to another embodiment of the present disclosure
  • FIG. 4 shows a schematic flowchart of a data synchronization method according to another embodiment of the present disclosure
  • FIG. 5 shows a schematic flowchart of a data synchronization method according to another embodiment of the present disclosure
  • Fig. 6 shows a schematic flowchart of a data query method according to an embodiment of the present disclosure
  • Fig. 7 shows a schematic diagram of a data synchronization device according to an embodiment of the present disclosure
  • FIG. 8 shows a schematic diagram of a data synchronization device according to another embodiment of the present disclosure.
  • Fig. 9 shows a schematic diagram of a data query device according to an embodiment of the present disclosure.
  • FIG. 10 is a schematic diagram of an electronic device provided by an embodiment of the present disclosure.
  • the data to be synchronized can be the bottom library.
  • the bottom library includes static library (large data volume, usually a static library contains more than 8000w data), dynamic library (small data volume, usually a dynamic library contains less than 100w data), one piece of data in the bottom library is usually It is a picture, name, ID number and other attribute information.
  • a distributed file storage database such as MongoDB
  • a distributed search engine such as Elasticsearch
  • a synchronization tool such as Monstache.
  • each database in MongoDB corresponds to an index in Elasticsearch.
  • the creation of an index will occupy fragmented resources, and one index will occupy one fragment.
  • the amount of data contained in the dynamic library is small.
  • a dynamic library corresponds to an index and occupies a fragment, the data in the dynamic library actually uses only a small part of the fragment, but because the dynamic library occupies the fragment Fragment, data in other databases cannot be written to this fragment.
  • the number of dynamic libraries to be synchronized is large, after the synchronization is completed, a large number of shards are in a state of being occupied but not full, which will cause a great waste of sharding resources in Elasticsearch.
  • the embodiment of the present disclosure provides a data synchronization method 10, which can be applied to a database of distributed file storage.
  • the data synchronization method 10 may include: step S11 to step S13. The following is a detailed description of the above steps:
  • Step S11 Obtain data to be synchronized.
  • the data to be synchronized includes data in one or more database tables of the first database.
  • the data to be synchronized in the present disclosure can be any form of data, for example, it can be used in the field of image recognition, and the data to be synchronized can be data stored in a base database, such as pictures, names, ID numbers, and personal attributes, etc. information.
  • the first database may be a dynamic database, which is characterized by a relatively small amount of data.
  • a first database contains 0 to 1 million pieces of data, but the number of first databases to be synchronized is relatively large. If the data of the database table of each first database is put into an index respectively, that is, one fragment is occupied, a large number of fragments will be in an occupied but not full state, and a large amount of fragmentation resources will be wasted.
  • For each database data when writing to the MongoDB database, it will be written into a database table of MongoDB, and each database corresponds to a database table.
  • the database table of the first database The database table of the first database In step S12, a first index is established in the target database, and the data of all the database tables of the first database corresponding to the data to be synchronized are sequentially synchronized to the first index.
  • step S13 if the current first index satisfies the index rolling strategy, a new first index is created, and the unsynchronized data is continuously synchronized to the new first index.
  • a first index may store data in multiple database tables of the first database, and the data of a database table of a first database may be stored in different indexes (due to the amount of data in the first database) If it is smaller, the probability that part of the data in the database table of a first database exists in the previous first index and part of the data exists in the latter first index is relatively small).
  • the current first index satisfies the index rolling strategy, including at least one of the following: the storage space occupied by the current first index reaches the storage threshold; and the amount of data synchronized to the current first index reaches the capacity threshold.
  • the conditions for satisfying the rolling strategy can be set according to the storage space occupied by the index. For example, if a first index A stores the contents of 10 database tables of the first database, it only occupies one sharding resource, if When the contents of the database table of the eleventh first database are written in the first half, the current first index A has already occupied the fragment resources occupied by it, and a new first index B is created on a new fragment.
  • the rolling strategy can also be set according to the amount of data synchronized to the index, that is, after a certain amount of data is written in a first index, a new first index can be recreated to continue writing data, ensuring The amount of data in each first index.
  • the data volume is set to 100 million
  • the database table of the 10th first database synchronizes data in the first index C
  • the data volume in the first index C reaches 100 million
  • a new first index is created D.
  • One of the above two methods can be selected, and either one can be satisfied to satisfy the index rolling strategy.
  • the time for data synchronization of the current first index reaches the time threshold, that is, the index rolling strategy is satisfied.
  • a time threshold is set. In the process of data synchronization for a first index, if the time exceeds the time threshold, and there are new data to be synchronized that need to be synchronized to the first index, a new one can be created The first index.
  • the data to be synchronized further includes data in one or more database tables of the second database, wherein the amount of data in the second database is greater than the amount of data in the first database.
  • Each second database also corresponds to a database table in MongoDB.
  • a data volume threshold can be set. When the data volume of the database in the data to be synchronized is greater than the data volume threshold, the database is considered to be the second database, otherwise the database is the first database.
  • the data synchronization method 20 may include: step S14, establishing in the target database one or more second indexes corresponding to the database tables of each second database.
  • Step S15 synchronize the contents of the database table of each second database to its corresponding second index respectively.
  • the data to be synchronized may also include a second database, and the amount of data in the second database is greater than that of the first database.
  • the second database such as a static database, is characterized by a relatively large amount of data, usually more than 80 million. Therefore, even if a shard corresponds to the database table of a static library, the shard corresponding to the static library will be in a relatively full state; in addition, the number of static libraries is relatively small, even if the shards corresponding to the static library are not full , It will not cause a lot of fragmentation waste.
  • a second index corresponding to the database table of each second database is established in the target database, that is, the database table of each second database has a corresponding second index, and then each second index
  • the contents of the database tables of the database are respectively synchronized to their corresponding second indexes. That is, for the first database with a large number and a small amount of data, the rolling storage strategy of "multiple database tables are written into one index, and another index is automatically created when the index capacity is exceeded" is used for the first database.
  • the database adopts a storage strategy of one-to-one correspondence between database tables and indexes. By adopting different strategies for different databases, storage resources are saved, and retrieval efficiency is also taken into account.
  • the data synchronization method 30 may include: step S16, determining the type of index creation based on the table name of the database table, where the database of the first database The table name of the table contains the first identification, and the database table of the second database contains the second identification; if the table name of the database table contains the first identification, step S12 is executed; if the table name of the database table contains the second identification, step S12 is executed S14.
  • the naming rules of database tables corresponding to different database types can be different.
  • the identifier contained in the table name can be used to determine the database type corresponding to the database table, so that the corresponding database table can be conveniently used according to different types of databases. Strategies to create indexes.
  • the data of the same dynamic library is written to a corresponding database table of MongoDB.
  • the database table name has monitor or other identifiers, and the operation occurs when writing.
  • the log oplog is monitored in real time by the synchronization tool Monstache, so that the written data will be synchronized to the search engine Elasticsearch in real time.
  • the step S12 is adopted. The method is to use the synchronization template for the dynamic library set on Elasticsearch to write all the contents of this database table into the current index, and then roll the storage to the new index when the current index is full.
  • the data synchronization method 40 may include: step S171, in response to a data deletion request for deleting any first database, determining the first database to be deleted One or more first indexes corresponding to a database; step S172, deleting data corresponding to the deleted first database in the one or more first indexes.
  • step S171 in response to a data deletion request for deleting any first database, determining the first database to be deleted One or more first indexes corresponding to a database
  • step S172 deleting data corresponding to the deleted first database in the one or more first indexes.
  • the first index corresponding to the first database to be deleted is determined in the first index, which may correspond to one first index, or may correspond to multiple first indexes based on synchronization based on a rolling strategy. After determining the corresponding one or more first indexes, delete the corresponding data in the first index, thereby completing the synchronization of the data in the index and the database.
  • the data synchronization method 50 may include: step S173, in response to a data deletion request for deleting any second database, deleting the first to be deleted The second index corresponding to the second database.
  • step S173 in response to a data deletion request for deleting any second database, deleting the first to be deleted
  • the second index corresponding to the second database when data in a certain second database needs to be deleted according to actual needs, since the second database has a one-to-one correspondence with the second index, the second index corresponding to the second database to be deleted can be Determine in advance or in real time, and directly delete the second index corresponding to the second database to complete the data deletion. In this way, the efficiency of data deletion can be improved.
  • the index name of the second index can be set to include the name of the second database or the database table of the second database, or the index name of the second index can be set to be the same as that of the second database or the database of the second database.
  • the names of the tables are the same. In this way, the corresponding second index can be quickly determined according to the name of the second database to be deleted.
  • the data synchronization method may further include: when an operation on the first data in the database table of any first database is detected, based on the first data, determining the first data corresponding to the first data in all the first indexes The position of the data in the first index; and according to the operation, synchronously correct the data in the first index corresponding to the first data; wherein, the operation includes adding data, modifying data, or deleting data.
  • the operation includes adding data, modifying data, or deleting data.
  • synchronization can then be carried out. The position of the first data can be determined by searching in all the first indexes, and then according to the actual operation type, Such as adding data, modifying data, or deleting data, synchronize the content in the first index.
  • the data synchronization method may further include: when an operation on the second data in the database table of any second database is detected, in the second index corresponding to the database table of the second database, according to the operation , Synchronously revise the data in the second index corresponding to the second data, where the operations include adding data, modifying data or deleting data.
  • the operations include adding data, modifying data or deleting data.
  • the present disclosure also provides a data query method 60.
  • the data query method 20 may include: step S21 obtains query information of the data to be queried; Query the synchronization data corresponding to the data to be queried in; step S23, based on the synchronization data, determine the location of the data to be queried in its corresponding database; wherein, the data synchronization method 10 to 50 in any of the foregoing embodiments is used to synchronize the data in the database.
  • the data is synchronized to the index of the target database.
  • the index established through data synchronization methods 10 to 50 can reduce the occupation of fragmentation resources, improve efficiency, and can conveniently perform queries based on the data synchronized to the index.
  • the present disclosure also provides a data synchronization device 100.
  • the data synchronization device 100 includes: a data acquisition module 110 for acquiring data to be synchronized.
  • the data to be synchronized includes one or more first The data in the database table of the database; the data synchronization module 120 is used to establish a first index in the target database, and sequentially synchronize the data of all the database tables of the first database corresponding to the data to be synchronized to the first index; if the current first index If an index satisfies the index rolling strategy, a new first index is created, and the unsynchronized data is continuously synchronized to the new first index.
  • the current first index satisfies the index rolling strategy, including at least one of the following: the storage space occupied by the current first index reaches the storage threshold; and the amount of data synchronized to the current first index reaches the capacity threshold.
  • the current first index satisfies the index rolling strategy, which further includes: the time for data synchronization of the current first index reaches the time threshold.
  • the data to be synchronized further includes one or more second databases, wherein the data volume of the second database is greater than the data in the database table of the first database; the data synchronization module 120 is also used to: One or more second indexes corresponding to the database tables of each second database are established in one-to-one; and the contents of the database tables of each second database are respectively synchronized to its corresponding second indexes.
  • the data synchronization module 120 is further configured to determine the type of index creation based on the table name of the database table, where the table name of the database table of the first database contains the first identifier, and the table name of the database table of the second database contains the first identifier. Second identification; if the table name of the database table contains the first identification, the first index is established in the target database, and the data of all the database tables of the first database corresponding to the data to be synchronized are sequentially synchronized to the first index; if the database table If the table name of contains the second identifier, one or more second indexes corresponding to the database tables of each second database are established in the target database.
  • the data synchronization device 200 may further include: a synchronization correction module 130, configured to determine whether to delete data in response to a data deletion request to delete any of the first databases One or more first indexes corresponding to the first database; delete data corresponding to the deleted first database in one or more first indexes.
  • a synchronization correction module 130 configured to determine whether to delete data in response to a data deletion request to delete any of the first databases One or more first indexes corresponding to the first database; delete data corresponding to the deleted first database in one or more first indexes.
  • the synchronization correction module 130 is further configured to delete the second index corresponding to the second database to be deleted in response to a data deletion request for deleting any second database.
  • the synchronization correction module 130 is further configured to: when an operation on the first data in the database table of any first database is detected, based on the first data, determine the first data corresponding to the first data in all the first indexes. The position of the data in the index; and according to the operation, the data in the first index corresponding to the first data is synchronously revised; wherein, the operation includes adding data, modifying data, or deleting data.
  • the synchronization correction module 130 is further configured to: when an operation on the second data in the database table of any second database is detected, in the second index corresponding to the database table of the second database, synchronize according to the operation Modify the data in the second index corresponding to the second data; where the operations include adding data, modifying data, or deleting data.
  • the present disclosure also provides a data query device 300.
  • the data query device 300 includes: a receiving module 210 for obtaining query information of the data to be queried; a search module 220 for query-based Information, query the synchronization data corresponding to the data to be queried in the index of the target database; the query module 230 is used to determine the position of the data to be queried in its corresponding database based on the synchronization data; wherein, by The data synchronization method 10 synchronizes the data to be synchronized with the database to the index of the target database.
  • an embodiment of the present disclosure provides an electronic device 400.
  • the electronic device 400 includes a memory 401, a processor 402, and an input/output (Input/Output, I/O) interface 403.
  • the memory 401 is used to store instructions.
  • the processor 402 is configured to call the instructions stored in the memory 401 to execute the data synchronization method or the data query method of the embodiment of the present disclosure.
  • the processor 402 is respectively connected to the memory 401 and the I/O interface 403, for example, through a bus system and/or other forms of connection mechanisms (not shown).
  • the memory 401 can be used to store programs and data, including programs of the data synchronization method or data query method involved in the embodiments of the present disclosure.
  • the processor 402 executes various functional applications and data of the electronic device 400 by running the programs stored in the memory 401 deal with.
  • the processor 402 may use any of digital signal processors (Digital Signal Processing, DSP), Field-Programmable Gate Array (Field-Programmable Gate Array, FPGA), and Programmable Logic Array (Programmable Logic Array, PLA).
  • the processor 402 may be implemented in at least one form of hardware, and the processor 402 may be one or more of a central processing unit (Central Processing Unit, CPU) or other forms of processing units with data processing capabilities and/or instruction execution capabilities combination.
  • CPU Central Processing Unit
  • the memory 401 in the embodiment of the present disclosure may include one or more computer program products, and the computer program products may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory.
  • the volatile memory may include, for example, random access memory (Random Access Memory, RAM) and/or cache memory (cache).
  • the non-volatile memory may include, for example, Read-Only Memory (ROM), Flash Memory (Flash Memory), Hard Disk Drive (HDD), Solid-State Drive (SSD), etc. .
  • the I/O interface 403 can be used to receive input commands (for example, numeric or character information, and generate key signal inputs related to the user settings and function control of the electronic device 400, etc.), and can also output various external commands.
  • Kind of information for example, image or sound, etc.
  • the I/O interface 403 in the embodiment of the present disclosure may include one or more of a physical keyboard, function buttons (such as volume control buttons, switch buttons, etc.), a mouse, a joystick, a trackball, a microphone, a speaker, and a touch panel, etc. Piece.
  • any steps, operations, or programs described herein can be executed or implemented using one or more hardware or software modules alone or in combination with other devices.
  • the software module is implemented using a computer program product including a computer readable medium containing computer program code, which can be executed by a computer processor for executing any or all of the described steps, operations, or programs.

Abstract

Provided are a data synchronization method, a data synchronization apparatus, a data query method, a data query apparatus, an electronic device and a computer-readable storage medium. The data synchronization method comprises: acquiring data to be synchronized, wherein said data comprises data in database tables of one or more first databases; establishing a first index in a target database, and successively synchronizing, to the first index, data in database tables of all the first databases corresponding to the data to be synchronized; and if the current first index satisfies an index rollover policy, establishing a new first index, and continuing to synchronize data, which has not yet been synchronized, to the new first index. By means of successively writing data in database tables of a plurality of databases into one or more indexes in a rollover manner, occupied shard resources are saved.

Description

数据同步方法及装置、数据查询方法及装置Data synchronization method and device, data query method and device
本申请基于申请号为202010561213.7、申请日为2020年6月18日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。This application is filed based on a Chinese patent application with an application number of 202010561213.7 and an application date of June 18, 2020, and claims the priority of the Chinese patent application. The entire content of the Chinese patent application is hereby incorporated by reference into this application.
技术领域Technical field
本公开一般地涉及数据库存储领域,具体涉及一种数据同步方法、数据同步装置、数据查询方法、数据查询装置、电子设备以及计算机可读存储介质。The present disclosure generally relates to the field of database storage, and specifically relates to a data synchronization method, a data synchronization device, a data query method, a data query device, an electronic device, and a computer-readable storage medium.
背景技术Background technique
在数据存储和检索过程中,尤其在分布式存储的数据中,需要将底库数据同步到检索引擎才能进行属性检索,为了提高检索效率,需要对检索引擎中存储的数据建立索引。在一些分布式存储和分布式检索的相关技术中,检索引擎会设置有一定数量的分片(shards),每个分片对应一个索引,一个索引对应写入一个数据库的数据库表,也即对每个数据库的数据库表建立独立的索引,在这种情况下,如果待同步中包含的数据库的数量较多时,则会需要大量的分片资源,如果分片数量不够,则会导致不能够建立新的索引,造成部分数据库的数据无法同步、无法检索。In the process of data storage and retrieval, especially in distributed storage data, the base database data needs to be synchronized to the retrieval engine to perform attribute retrieval. In order to improve retrieval efficiency, it is necessary to index the data stored in the retrieval engine. In some related technologies of distributed storage and distributed retrieval, the retrieval engine will be set with a certain number of shards. Each shard corresponds to an index, and an index corresponds to a database table written in a database, that is, to The database table of each database establishes an independent index. In this case, if the number of databases to be synchronized is large, a large amount of sharding resources will be required. If the number of shards is not enough, it will not be able to build The new index makes the data of some databases unable to be synchronized and searched.
发明内容Summary of the invention
为了解决现有技术中存在的上述问题,本公开的第一方面提供一种数据同步方法,其中,方法包括:获取待同步数据,待同步数据包括一个或多个第一数据库的数据库表中的数据;在目标数据库中建立第一索引,将待同步数据对应的全部第一数据库的数据库表的数据依次同步至第一索引;若当前的第一索引满足索引滚动策略,则建立新的第一索引,将尚未同步的数据继续同步至新的第一索引。In order to solve the above-mentioned problems in the prior art, the first aspect of the present disclosure provides a data synchronization method, wherein the method includes: acquiring data to be synchronized, and the data to be synchronized includes data in one or more database tables of the first database. Data; create a first index in the target database, and synchronize the data of all the database tables of the first database corresponding to the data to be synchronized to the first index in turn; if the current first index meets the index rolling strategy, create a new first index Index, and continue to synchronize the data that has not yet been synchronized to the new first index.
在一实施例中,当前的第一索引满足索引滚动策略,包括以下至少一项:当前的第一索引占用的存储空间达到存储阈值;同步至当前的第一索引的数据量达到容量阈值。In an embodiment, the current first index satisfies the index rolling strategy, including at least one of the following: the storage space occupied by the current first index reaches the storage threshold; and the amount of data synchronized to the current first index reaches the capacity threshold.
在一实施例中,当前的第一索引满足索引滚动策略,还包括:对当前的第一索引进行数据同步的时间达到时间阈值。In an embodiment, the current first index satisfies the index rolling strategy, which further includes: the time for data synchronization of the current first index reaches the time threshold.
在一实施例中,待同步数据还包括一个或多个第二数据库的数据库表中的数据,其中,第二数据库的数据量大于第一数据库的数据量;方法还包括:在目标数据库中建立与每个第二数据库的数据库表一一对应的一个或多个第二索引;将每个第二数据库的数据库表的内容分别同步至其对应的第二索引。In an embodiment, the data to be synchronized further includes data in one or more database tables of the second database, wherein the amount of data in the second database is greater than the amount of data in the first database; the method further includes: establishing in the target database One or more second indexes corresponding to the database table of each second database one-to-one; the content of the database table of each second database is synchronized to its corresponding second index respectively.
在一实施例中,方法还包括:基于数据库表的表名确定建立索引的类型,其中,第一 数据库的数据库表的表名包含第一标识,第二数据库的数据库表包含第二标识;若数据库表的表名包含第一标识,则执行建立第一索引,将待同步数据对应的全部第一数据库的数据库表的数据依次同步至第一索引的步骤;若数据库表的表名包含第二标识,则执行建立与每个第二数据库的数据库表一一对应的一个或多个第二索引的步骤。In an embodiment, the method further includes: determining the type of index creation based on the table name of the database table, wherein the table name of the database table of the first database includes a first identifier, and the database table of the second database includes a second identifier; if If the table name of the database table contains the first identifier, the first index is created, and the data of all the database tables of the first database corresponding to the data to be synchronized is synchronized to the first index in sequence; if the table name of the database table contains the second To identify, the step of establishing one or more second indexes corresponding to the database tables of each second database is executed.
在一实施例中,方法还包括:响应于删除任一第一数据库的数据删除请求,确定与待删除的第一数据库对应的一个或多个第一索引;删除一个或多个第一索引中与删除的第一数据库对应的数据。In an embodiment, the method further includes: in response to a data deletion request for deleting any of the first databases, determining one or more first indexes corresponding to the first database to be deleted; deleting one or more first indexes Data corresponding to the deleted first database.
在一实施例中,方法还包括:响应于删除任一第二数据库的数据删除请求,删除待删除的第二数据库对应的第二索引。In an embodiment, the method further includes: in response to a data deletion request for deleting any second database, deleting a second index corresponding to the second database to be deleted.
在一实施例中,方法还包括:当检测到对任一第一数据库的数据库表中第一数据的操作时,基于第一数据,在全部第一索引中确定第一数据对应的第一索引中数据的位置;并根据操作,同步修正第一数据对应的第一索引中的数据;其中,操作包括增加数据、修改数据或删除数据。In an embodiment, the method further includes: when an operation on the first data in the database table of any first database is detected, based on the first data, determining the first index corresponding to the first data among all the first indexes And according to the operation, synchronously correct the data in the first index corresponding to the first data; wherein, the operation includes adding data, modifying data, or deleting data.
在一实施例中,方法还包括:当检测到对任一第二数据库的数据库表中第二数据的操作时,在第二数据库的数据库表对应的第二索引中,根据操作,同步修正第二数据对应的第二索引中的数据;其中,操作包括增加数据、修改数据或删除数据。In an embodiment, the method further includes: when an operation on the second data in the database table of any second database is detected, in the second index corresponding to the database table of the second database, synchronously revising the first data according to the operation. The data in the second index corresponding to the second data; where the operations include adding data, modifying data, or deleting data.
本公开的第二方面提供一种数据同步装置,包括:数据获取模块,用于获取待同步数据,待同步数据包括一个或多个第一数据库的数据库表中的数据;数据同步模块,用于在目标数据库中建立第一索引,将待同步数据对应的全部第一数据库的数据库表的数据依次同步至第一索引;若当前的第一索引满足索引滚动策略,则建立新的第一索引,将尚未同步的数据继续同步至新的第一索引。A second aspect of the present disclosure provides a data synchronization device, including: a data acquisition module for acquiring data to be synchronized, the data to be synchronized including data in one or more database tables of a first database; and a data synchronization module for Create a first index in the target database, and synchronize the data of all the database tables of the first database corresponding to the data to be synchronized to the first index in turn; if the current first index satisfies the index rolling strategy, a new first index is created, Continue to synchronize the unsynchronized data to the new first index.
本公开的第三方面提供一种数据查询方法,方法包括:获取待查询数据的查询信息;基于查询信息,在目标数据库的索引中查询待查询数据对应的同步数据;基于同步数据,确定待查询数据在其对应的数据库中的位置;其中,通过如权利要求1-8任一项的数据同步方法将数据库中的待同步数据同步至目标数据库的索引中。A third aspect of the present disclosure provides a data query method. The method includes: obtaining query information of the data to be queried; querying synchronization data corresponding to the data to be queried in the index of the target database based on the query information; and determining the query to be queried based on the synchronization data The location of the data in its corresponding database; wherein the data to be synchronized in the database is synchronized to the index of the target database by the data synchronization method according to any one of claims 1-8.
本公开的第四方面提供一种数据查询装置包括:接收模块,用于获取待查询数据的查询信息;搜索模块,用于基于查询信息,在目标数据库的索引中查询待查询数据对应的同步数据;查询模块,用于基于同步数据,确定待查询数据在其对应的数据库中的位置;其中,通过如第一方面的数据同步方法将与数据库中的待同步数据同步至目标数据库的索引中。A fourth aspect of the present disclosure provides a data query device including: a receiving module for obtaining query information of the data to be queried; a search module for querying synchronization data corresponding to the data to be queried in the index of the target database based on the query information The query module is used to determine the location of the data to be queried in its corresponding database based on the synchronization data; wherein, the data to be synchronized in the database is synchronized to the index of the target database through the data synchronization method as in the first aspect.
本公开的第五方面提供一种电子设备,包括:存储器,用于存储指令;以及处理器,用于调用存储器存储的指令执行如第一方面的数据同步方法或如第三方面的数据查询方法。A fifth aspect of the present disclosure provides an electronic device including: a memory for storing instructions; and a processor for calling the instructions stored in the memory to execute the data synchronization method according to the first aspect or the data query method according to the third aspect .
本公开的第六方面提供一种计算机可读存储介质,其中存储有指令,指令被处理器执行时,执行如第一方面的数据同步方法或如第三方面的数据查询方法。A sixth aspect of the present disclosure provides a computer-readable storage medium in which instructions are stored. When the instructions are executed by a processor, the data synchronization method according to the first aspect or the data query method according to the third aspect is executed.
本公开提供的数据同步方法、数据同步装置、数据查询方法、数据查询装置、电子设备以及计算机可读存储介质,通过对将多个数据库的数据库表中的数据依次滚动的写入一个或多个索引中,从而节省了占用的分片资源。The data synchronization method, data synchronization device, data query method, data query device, electronic equipment, and computer readable storage medium provided by the present disclosure can write one or more data in the database tables of multiple databases in sequence by scrolling In the index, thereby saving the occupied fragmentation resources.
附图说明Description of the drawings
通过参考附图阅读下文的详细描述,本公开实施方式的上述以及其他目的、特征和优点将变得易于理解。在附图中,以示例性而非限制性的方式示出了本公开的若干实施方式,其中:By reading the following detailed description with reference to the accompanying drawings, the above and other objectives, features, and advantages of the embodiments of the present disclosure will become easier to understand. In the drawings, several embodiments of the present disclosure are shown in an exemplary and non-limiting manner, in which:
图1示出了根据本公开一实施例数据同步方法的流程示意图;Fig. 1 shows a schematic flowchart of a data synchronization method according to an embodiment of the present disclosure;
图2示出了根据本公开另一实施例数据同步方法的流程示意图;FIG. 2 shows a schematic flowchart of a data synchronization method according to another embodiment of the present disclosure;
图3示出了根据本公开另一实施例数据同步方法的流程示意图;FIG. 3 shows a schematic flowchart of a data synchronization method according to another embodiment of the present disclosure;
图4示出了根据本公开另一实施例数据同步方法的流程示意图;FIG. 4 shows a schematic flowchart of a data synchronization method according to another embodiment of the present disclosure;
图5示出了根据本公开另一实施例数据同步方法的流程示意图;FIG. 5 shows a schematic flowchart of a data synchronization method according to another embodiment of the present disclosure;
图6示出了根据本公开一实施例数据查询方法的流程示意图;Fig. 6 shows a schematic flowchart of a data query method according to an embodiment of the present disclosure;
图7示出了根据本公开一实施例的数据同步装置示意图;Fig. 7 shows a schematic diagram of a data synchronization device according to an embodiment of the present disclosure;
图8示出了根据本公开另一实施例的数据同步装置示意图;FIG. 8 shows a schematic diagram of a data synchronization device according to another embodiment of the present disclosure;
图9示出了根据本公开一实施例的数据查询装置示意图;Fig. 9 shows a schematic diagram of a data query device according to an embodiment of the present disclosure;
图10是本公开实施例提供的一种电子设备示意图;FIG. 10 is a schematic diagram of an electronic device provided by an embodiment of the present disclosure;
在附图中,相同或对应的标号表示相同或对应的部分。In the drawings, the same or corresponding reference numerals indicate the same or corresponding parts.
具体实施方式detailed description
下面将参考若干示例性实施方式来描述本公开的原理和精神。应当理解,给出这些实施方式仅仅是为了使本领域技术人员能够更好地理解进而实现本公开,而并非以任何方式限制本公开的范围。Hereinafter, the principle and spirit of the present disclosure will be described with reference to several exemplary embodiments. It should be understood that these embodiments are only provided to enable those skilled in the art to better understand and then implement the present disclosure, and are not intended to limit the scope of the present disclosure in any way.
需要注意,虽然本文中使用“第一”、“第二”等表述来描述本公开的实施方式的不同模块、步骤和数据等,但是“第一”、“第二”等表述仅是为了在不同的模块、步骤和数据等之间进行区分,而并不表示特定的顺序或者重要程度。实际上,“第一”、“第二”等表述完全可以互换使用。It should be noted that although expressions such as "first" and "second" are used herein to describe different modules, steps, data, etc. of the embodiments of the present disclosure, the expressions such as "first" and "second" are only used to describe Different modules, steps, data, etc. are distinguished, but do not indicate a specific order or degree of importance. In fact, expressions such as "first" and "second" can be used interchangeably.
待同步数据可以为底库。底库包含静态库(数据量大,通常一个静态库中包含8000w条以上的数据)、动态库(数据量小,通常一个动态库中包含100w条以下的数据),底库中的一条数据通常是一张图片、姓名、身份证号等属性信息。一些相关技术中,要想对底库中的数据进行检索,需要通过例如Monstache的同步工具,将存储于分布式文件存储的数据库(如MongoDB)的底库数据同步至分布式检索引擎(如Elasticsearch),再在分布式检索引擎中实现对属性的模糊检索等检索查询功能。同步时,需要用MongoDB数据库的同步工具Monstache,通过读取操作日志(oplog),将MongoDB数据库中底库数据, 即静态库、动态库数据同步到搜索引擎Elasticsearch中。在相关技术中,一个静态库放入Elasticsearch的一个索引,一个动态库也放入Elasticsearch的一个索引。The data to be synchronized can be the bottom library. The bottom library includes static library (large data volume, usually a static library contains more than 8000w data), dynamic library (small data volume, usually a dynamic library contains less than 100w data), one piece of data in the bottom library is usually It is a picture, name, ID number and other attribute information. In some related technologies, if you want to retrieve the data in the bottom database, you need to synchronize the bottom database data stored in a distributed file storage database (such as MongoDB) to a distributed search engine (such as Elasticsearch) through a synchronization tool such as Monstache. ), and then implement retrieval query functions such as fuzzy retrieval of attributes in the distributed retrieval engine. When synchronizing, you need to use the synchronization tool Monstache of the MongoDB database to synchronize the data in the bottom database of the MongoDB database, that is, the static database and dynamic database data, to the search engine Elasticsearch by reading the operation log (oplog). In related technologies, a static library is put into an index of Elasticsearch, and a dynamic library is also put into an index of Elasticsearch.
将底库数据由MongoDB同步至Elasticsearch时,MongoDB中每个数据库对应Elasticsearch中一个索引,索引的创建会占用分片资源,一个索引占用一个分片。动态库中包含的数据量小,当一个动态库对应一个索引、占用一个分片时,该动态库中的数据实际只使用了该分片的一小部分,但是由于该动态库占用了这个分片,其他数据库中的数据无法写入该分片。当待同步的动态库数目较多时,同步完成后,大量的分片都处于被占用但未存满的状态,会对Elasticsearch中分片资源产生极大浪费。另外,目前Elasticsearch单机仅有1000个分片可供使用,如果用一个数据库对应一个分片,则Elasticsearch单机最多只能存储1000个动态库/静态库,一旦待同步的动态库/静态库超过此数量,则需扩展Elasticsearch单机为集群。When synchronizing base database data from MongoDB to Elasticsearch, each database in MongoDB corresponds to an index in Elasticsearch. The creation of an index will occupy fragmented resources, and one index will occupy one fragment. The amount of data contained in the dynamic library is small. When a dynamic library corresponds to an index and occupies a fragment, the data in the dynamic library actually uses only a small part of the fragment, but because the dynamic library occupies the fragment Fragment, data in other databases cannot be written to this fragment. When the number of dynamic libraries to be synchronized is large, after the synchronization is completed, a large number of shards are in a state of being occupied but not full, which will cause a great waste of sharding resources in Elasticsearch. In addition, currently only 1,000 shards are available for use on a single Elasticsearch machine. If a database corresponds to one shard, a single Elasticsearch machine can only store up to 1,000 dynamic libraries/static libraries. Once the dynamic library/static library to be synchronized exceeds this Number, you need to expand a single Elasticsearch machine into a cluster.
为了解决上述问题,本公开实施例提供了一种数据同步方法10,可以应用于分布式文件存储的数据库中。如图1所示,数据同步方法10可以包括:步骤S11至步骤S13。下文分别对上述步骤进行详细说明:In order to solve the above-mentioned problem, the embodiment of the present disclosure provides a data synchronization method 10, which can be applied to a database of distributed file storage. As shown in FIG. 1, the data synchronization method 10 may include: step S11 to step S13. The following is a detailed description of the above steps:
步骤S11,获取待同步数据,待同步数据包括一个或多个第一数据库的数据库表中的数据。Step S11: Obtain data to be synchronized. The data to be synchronized includes data in one or more database tables of the first database.
本公开中的待同步数据可以是任何形式的数据,例如可以是用于图像识别领域中,待同步数据可以是存在底库中的数据,如可以包括图片、姓名、身份证号以及人员属性等信息。其中,第一数据库可以是动态库,该数据库的特点是数据量相对较小,一般一个第一数据库中包含为0-100万条数据,但待同步的第一数据库的个数相对较多,如果每个第一数据库的数据库表的数据分别放入一个索引,即占用一个分片,则会使大量的分片都处于被占用但未存满的状态,浪费大量分片资源。对于每个数据库的数据,在写入MongoDB数据库时均会写入MongoDB的一张数据库表中,每个数据库分别对应于一个数据库表。The data to be synchronized in the present disclosure can be any form of data, for example, it can be used in the field of image recognition, and the data to be synchronized can be data stored in a base database, such as pictures, names, ID numbers, and personal attributes, etc. information. Among them, the first database may be a dynamic database, which is characterized by a relatively small amount of data. Generally, a first database contains 0 to 1 million pieces of data, but the number of first databases to be synchronized is relatively large. If the data of the database table of each first database is put into an index respectively, that is, one fragment is occupied, a large number of fragments will be in an occupied but not full state, and a large amount of fragmentation resources will be wasted. For each database data, when writing to the MongoDB database, it will be written into a database table of MongoDB, and each database corresponds to a database table.
第一数据库的数据库表第一数据库的数据库表步骤S12,在目标数据库中建立第一索引,将待同步数据对应的全部第一数据库的数据库表的数据依次同步至第一索引。The database table of the first database The database table of the first database In step S12, a first index is established in the target database, and the data of all the database tables of the first database corresponding to the data to be synchronized are sequentially synchronized to the first index.
步骤S13,若当前的第一索引满足索引滚动策略,则建立新的第一索引,将尚未同步的数据继续同步至新的第一索引。In step S13, if the current first index satisfies the index rolling strategy, a new first index is created, and the unsynchronized data is continuously synchronized to the new first index.
本公开实施例中,将各第一数据库的数据库表的内容,依次写入第一索引中。实现了把多个第一数据库的数据库表写入在一个索引内;而在一个索引满足索引滚动策略之后,可以建立一个新的索引,继续存入当前未存入的内容,从而减少了分片资源的浪费。在本实施例中,一个第一索引中可能存入了多个第一数据库的数据库表中的数据,一个第一数据库的数据库表的数据可能存在不同索引中(由于第一数据库中的数据量较小,一个第一数据库的数据库表中的部分数据存在于前一个第一索引、部分数据存在于后一个第一索引中的概率较小)。In the embodiment of the present disclosure, the contents of the database tables of each first database are sequentially written into the first index. It realizes that the database tables of multiple first databases are written into one index; and after an index meets the index rolling strategy, a new index can be created, and the content that is not currently stored is continued to be stored, thereby reducing fragmentation Waste of resources. In this embodiment, a first index may store data in multiple database tables of the first database, and the data of a database table of a first database may be stored in different indexes (due to the amount of data in the first database) If it is smaller, the probability that part of the data in the database table of a first database exists in the previous first index and part of the data exists in the latter first index is relatively small).
在一实施例中,当前的第一索引满足索引滚动策略,包括以下至少一项:当前的第一 索引占用的存储空间达到存储阈值;同步至当前的第一索引的数据量达到容量阈值。本实例中,可以根据索引占用的存储空间设置滚动策略的满足条件,例如,一个第一索引A中存入了10个第一数据库的数据库表的内容,其仅占用了一个分片资源,如果第11个第一数据库的数据库表的内容写入了前一半时,当前的第一索引A已把其占用的分片资源占满,则在一个新的分片上建立一个新的第一索引B,将第11个第一数据库的数据库表尚未写入的后半部分,继续写在新的第一索引B内,从而实现了“多个数据库表写入一个索引,超出索引容量自动新建另一索引”的滚动存储,大量节省了存储资源。另一方面,也可以根据同步至索引中的数据量设置滚动策略,即在一个第一索引内,写入了一定数量的数据后,可以重新创建新的第一索引继续写入数据,保证了每个第一索引内的数据量。例如,设置数据量为1亿条,第10个第一数据库的数据库表在第一索引C内同步数据时,第一索引C内的数据量达到了1亿条,则创建新的第一索引D,将第10个第一数据库的数据库表中未写入索引C的内容继续同步至第一索引D。上述两种方式可以择其一,也可以满足任一一项即满足索引滚动策略。In an embodiment, the current first index satisfies the index rolling strategy, including at least one of the following: the storage space occupied by the current first index reaches the storage threshold; and the amount of data synchronized to the current first index reaches the capacity threshold. In this example, the conditions for satisfying the rolling strategy can be set according to the storage space occupied by the index. For example, if a first index A stores the contents of 10 database tables of the first database, it only occupies one sharding resource, if When the contents of the database table of the eleventh first database are written in the first half, the current first index A has already occupied the fragment resources occupied by it, and a new first index B is created on a new fragment. , Continue to write the second half of the database table of the 11th first database that has not yet been written in the new first index B, thus realizing "multiple database tables are written into one index, and another one is automatically created when the index capacity is exceeded. The rolling storage of "index" saves a lot of storage resources. On the other hand, the rolling strategy can also be set according to the amount of data synchronized to the index, that is, after a certain amount of data is written in a first index, a new first index can be recreated to continue writing data, ensuring The amount of data in each first index. For example, if the data volume is set to 100 million, when the database table of the 10th first database synchronizes data in the first index C, the data volume in the first index C reaches 100 million, then a new first index is created D. Continue to synchronize the contents of the database table of the tenth first database that are not written in index C to the first index D. One of the above two methods can be selected, and either one can be satisfied to satisfy the index rolling strategy.
在一实施例中,在前述满足索引滚动策略的条件基础上,还可以包括:对当前的第一索引进行数据同步的时间达到时间阈值,即满足索引滚动策略。本实施例中,设置有时间阈值,在对一个第一索引进行数据同步的过程中,如果时间超过该时间阈值,还有新的待同步数据需要同步至第一索引,则可以创建一个新的第一索引。In an embodiment, on the basis of satisfying the aforementioned conditions of the index rolling strategy, it may further include: the time for data synchronization of the current first index reaches the time threshold, that is, the index rolling strategy is satisfied. In this embodiment, a time threshold is set. In the process of data synchronization for a first index, if the time exceeds the time threshold, and there are new data to be synchronized that need to be synchronized to the first index, a new one can be created The first index.
在一实施例中,待同步数据还包括一个或多个第二数据库的数据库表中的数据,其中,第二数据库的数据量大于第一数据库的数据量。每个第二数据库同样分别对应于一个MongoDB中的数据库表。可以理解的是,可以设置一数据量阈值,当待同步数据中数据库的数据量大于数据量阈值时,则认为该数据库为第二数据库,否则该数据库为第一数据库。如图2所示,在数据同步方法10的基础上,数据同步方法20可以包括:步骤S14,在目标数据库中建立与每个第二数据库的数据库表一一对应的一个或多个第二索引;步骤S15,将每个第二数据库的数据库表的内容分别同步至其对应的第二索引。本实施例中,待同步数据还可以包括第二数据库,且第二数据库的数据量大于第一数据库,第二数据库如静态库,其特点是数据量相对较大,通常为8000万条以上,因此,即便一个分片对应一个静态库的数据库表,静态库对应的分片也会处于较满的状态;此外,静态库的个数相对较少,即便静态库所对应的分片未存满,也不会造成大量分片浪费。同时,一个索引仅存入一个静态库的数据库表时,由于静态库的数据量较大,在对该静态库中的数据进行检索时,直接在对应的索引中检索即可,因此可以提高效率。在对静态库进行删除操作时,可直接删除该静态库对应的索引,因此可提高数据删除效率。相反,如果对静态库也进行滚动存储,大概率会出现一个静态库中的数据存在不同索引中的情况。在对静态库中的数据进行检索时,需要先确定该静态库中的数据存储在哪几个索引中,再在这几个索引中进行检索,因此会降低检索效率。基于上述原因,在目标数据库中建立与每个第二数据库的数据库表一一对应的第二索引,即每个第二数据库的数据库表都有对应的一个第二索引, 再将每个第二数据库的数据库表的内容分别同步至其对应的第二索引内。即对数量多、包含的数据量小的第一数据库采用“多个数据库表写入一个索引,超出索引容量自动新建另一索引”的滚动存储策略,对数量少、包含的数据量大第二数据库采用数据库表和索引一一对应的存储策略,通过对不同数据库采用不同的策略,节约了存储资源,也兼顾了检索效率。In an embodiment, the data to be synchronized further includes data in one or more database tables of the second database, wherein the amount of data in the second database is greater than the amount of data in the first database. Each second database also corresponds to a database table in MongoDB. It is understandable that a data volume threshold can be set. When the data volume of the database in the data to be synchronized is greater than the data volume threshold, the database is considered to be the second database, otherwise the database is the first database. As shown in FIG. 2, on the basis of the data synchronization method 10, the data synchronization method 20 may include: step S14, establishing in the target database one or more second indexes corresponding to the database tables of each second database. Step S15, synchronize the contents of the database table of each second database to its corresponding second index respectively. In this embodiment, the data to be synchronized may also include a second database, and the amount of data in the second database is greater than that of the first database. The second database, such as a static database, is characterized by a relatively large amount of data, usually more than 80 million. Therefore, even if a shard corresponds to the database table of a static library, the shard corresponding to the static library will be in a relatively full state; in addition, the number of static libraries is relatively small, even if the shards corresponding to the static library are not full , It will not cause a lot of fragmentation waste. At the same time, when an index is only stored in the database table of a static library, because the amount of data in the static library is large, when the data in the static library is retrieved, it can be retrieved directly in the corresponding index, so the efficiency can be improved. . When the static library is deleted, the index corresponding to the static library can be directly deleted, so the efficiency of data deletion can be improved. On the contrary, if the static library is also stored on a rolling basis, there will be a high probability that the data in a static library is stored in different indexes. When retrieving data in a static library, it is necessary to first determine which indexes the data in the static library is stored in, and then perform retrieval in these indexes, which will reduce the retrieval efficiency. Based on the above reasons, a second index corresponding to the database table of each second database is established in the target database, that is, the database table of each second database has a corresponding second index, and then each second index The contents of the database tables of the database are respectively synchronized to their corresponding second indexes. That is, for the first database with a large number and a small amount of data, the rolling storage strategy of "multiple database tables are written into one index, and another index is automatically created when the index capacity is exceeded" is used for the first database. The database adopts a storage strategy of one-to-one correspondence between database tables and indexes. By adopting different strategies for different databases, storage resources are saved, and retrieval efficiency is also taken into account.
在一实施例中,如图3所示,在数据同步方法20的基础上,数据同步方法30可以包括:步骤S16,基于数据库表的表名确定建立索引的类型,其中,第一数据库的数据库表的表名包含第一标识,第二数据库的数据库表包含第二标识;若数据库表的表名包含第一标识,则执行步骤S12;若数据库表的表名包含第二标识,则执行步骤S14。本实施例中,不同数据库类型对应的数据库表的命名规则可以不同,可以通过在表名中包含的标识,来确定该数据库表对应的数据库类型,从而能够方便的根据不同类型的数据库采用对应的策略进行创建索引。In an embodiment, as shown in FIG. 3, based on the data synchronization method 20, the data synchronization method 30 may include: step S16, determining the type of index creation based on the table name of the database table, where the database of the first database The table name of the table contains the first identification, and the database table of the second database contains the second identification; if the table name of the database table contains the first identification, step S12 is executed; if the table name of the database table contains the second identification, step S12 is executed S14. In this embodiment, the naming rules of database tables corresponding to different database types can be different. The identifier contained in the table name can be used to determine the database type corresponding to the database table, so that the corresponding database table can be conveniently used according to different types of databases. Strategies to create indexes.
例如,在外部将一个动态库数据写入MongoDB的情况下,同一个动态库的数据写入MongoDB的对应的一张数据库表,数据库表表名中带有monitor或其他标识,写入时产生操作日志oplog被同步工具Monstache实时监听到,从而会将写入的数据实时同步到搜索引擎Elasticsearch中,在同步到Elasticsearch的过程中,根据识别这数据库表表名带有monitor标识,则采用步骤S12的方式,即使用Elasticsearch上设置的针对动态库的同步模板,将这张数据库表的全部内容写入当前索引,当前索引满了再滚动存储到新索引。For example, in the case of externally writing a dynamic library data to MongoDB, the data of the same dynamic library is written to a corresponding database table of MongoDB. The database table name has monitor or other identifiers, and the operation occurs when writing. The log oplog is monitored in real time by the synchronization tool Monstache, so that the written data will be synchronized to the search engine Elasticsearch in real time. In the process of synchronizing to Elasticsearch, according to the identification of the database table name with the monitor identifier, the step S12 is adopted. The method is to use the synchronization template for the dynamic library set on Elasticsearch to write all the contents of this database table into the current index, and then roll the storage to the new index when the current index is full.
在外部将静态库数据写入MongoDB的情况下,同一个静态库的数据写入MongoDB的对应的一张数据库表,数据库表表名中带有staitc或其他标识,同样,在写入时产生oplog被Monstache监听到,从而会将写入的数据实时同步到Elasticsearch中,在同步到Elasticsearch的过程中,根据识别这数据库表表名带有static标识,则采用步骤S14的方式,即使用Elasticsearch上设置的针对静态库的同步模板,将这张数据库表的全部内容写入一个单独的索引。在一实施例中,如图4所示,在数据同步方法30的基础上,数据同步方法40可以包括:步骤S171,响应于删除任一第一数据库的数据删除请求,确定与待删除的第一数据库对应的一个或多个第一索引;步骤S172,删除一个或多个第一索引中与删除的第一数据库对应的数据。本实施例中,根据实际需求需要对某一第一数据库的数据进行删除时,由于第一数据库与索引并非一一对应,因此响应于删除任一第一数据库的数据删除请求,需要先在全部第一索引中确定该待删除的第一数据库对应的第一索引,可能对应的是一个第一索引,也可能基于滚动策略同步从而对应的是多个第一索引。在确定对应的一个或多个第一索引后,删除在第一索引中的对应的数据,从而完成索引中的数据与数据库的同步。In the case of externally writing static database data to MongoDB, the data of the same static database is written to a corresponding database table of MongoDB. The table name of the database table contains statit or other identifiers. Similarly, oplog is generated when writing. If monitored by Monstache, the written data will be synchronized to Elasticsearch in real time. In the process of synchronizing to Elasticsearch, according to the identification of the database table name with static identification, the method of step S14 is adopted, that is, the setting on Elasticsearch is used. The synchronization template for static libraries writes all the contents of this database table into a separate index. In one embodiment, as shown in FIG. 4, on the basis of the data synchronization method 30, the data synchronization method 40 may include: step S171, in response to a data deletion request for deleting any first database, determining the first database to be deleted One or more first indexes corresponding to a database; step S172, deleting data corresponding to the deleted first database in the one or more first indexes. In this embodiment, when data in a certain first database needs to be deleted according to actual needs, because the first database and the index are not in a one-to-one correspondence, in response to a data deletion request to delete any first database, it is necessary to delete all data first. The first index corresponding to the first database to be deleted is determined in the first index, which may correspond to one first index, or may correspond to multiple first indexes based on synchronization based on a rolling strategy. After determining the corresponding one or more first indexes, delete the corresponding data in the first index, thereby completing the synchronization of the data in the index and the database.
在另一实施例中,如图5所示,在数据同步方法40的基础上,数据同步方法50可以包括:步骤S173,响应于删除任一第二数据库的数据删除请求,删除待删除的第二数据库对应的第二索引。本实施例中,根据实际需求需要对某一第二数据库的数据进行删除时, 由于第二数据库与第二索引有一一对应的关系,与该待删除的第二数据库对应的第二索引可以预先或实时的进行确定,直接删除第二数据库对应的第二索引即完成数据删除。如此,可提高数据删除效率。在一例中,可将第二索引的索引名设置为包含第二数据库或第二数据库的数据库表的名称,或者,可将第二索引的索引名设置为与第二数据库或第二数据库的数据库表的名称相同,如此,可根据待删除的第二数据库的名称快速确定其对应的第二索引。In another embodiment, as shown in FIG. 5, based on the data synchronization method 40, the data synchronization method 50 may include: step S173, in response to a data deletion request for deleting any second database, deleting the first to be deleted The second index corresponding to the second database. In this embodiment, when data in a certain second database needs to be deleted according to actual needs, since the second database has a one-to-one correspondence with the second index, the second index corresponding to the second database to be deleted can be Determine in advance or in real time, and directly delete the second index corresponding to the second database to complete the data deletion. In this way, the efficiency of data deletion can be improved. In one example, the index name of the second index can be set to include the name of the second database or the database table of the second database, or the index name of the second index can be set to be the same as that of the second database or the database of the second database. The names of the tables are the same. In this way, the corresponding second index can be quickly determined according to the name of the second database to be deleted.
在一实施例中,数据同步方法还可以包括:当检测到对任一第一数据库的数据库表中第一数据的操作时,基于第一数据,在全部第一索引中确定第一数据对应的第一索引中数据的位置;并根据操作,同步修正第一数据对应的第一索引中的数据;其中,操作包括增加数据、修改数据或删除数据。本实施例中,针对任一第一数据库的数据库表的第一数据进行操作时,由于第一数据库的数据库表是通过滚动策略同步至第一索引中的,因此,需要先确定该被操作的第一数据在哪个第一索引中以及在该第一索引中的位置,继而才能进行同步,可以通过在全部第一索引中进行检索,从而确定该第一数据的位置,之后根据实际操作类型,如增加数据、修改数据或删除数据,对第一索引中的内容进行同步。In an embodiment, the data synchronization method may further include: when an operation on the first data in the database table of any first database is detected, based on the first data, determining the first data corresponding to the first data in all the first indexes The position of the data in the first index; and according to the operation, synchronously correct the data in the first index corresponding to the first data; wherein, the operation includes adding data, modifying data, or deleting data. In this embodiment, when an operation is performed on the first data of a database table of any first database, since the database table of the first database is synchronized to the first index through a rolling strategy, it is necessary to determine the operated item first. In which first index and the position of the first data in the first index, synchronization can then be carried out. The position of the first data can be determined by searching in all the first indexes, and then according to the actual operation type, Such as adding data, modifying data, or deleting data, synchronize the content in the first index.
在另一实施例中,数据同步方法还可以包括:当检测到对任一第二数据库的数据库表中第二数据的操作时,在第二数据库的数据库表对应的第二索引中,根据操作,同步修正第二数据应的第二索引中的数据,其中,操作包括增加数据、修改数据或删除数据。不同于对第一数据库的数据库表中的第一数据进行操作之后的同步,对一个第二数据库的数据库表的第二数据进行操作后,由于每个第二数据库的数据库表均有一一对应的第二索引,因此,可以直接在对应的第二索引对该第二数据对应的数据进行与具体操作相应的同步修正。In another embodiment, the data synchronization method may further include: when an operation on the second data in the database table of any second database is detected, in the second index corresponding to the database table of the second database, according to the operation , Synchronously revise the data in the second index corresponding to the second data, where the operations include adding data, modifying data or deleting data. Different from the synchronization after the operation on the first data in the database table of the first database, after the operation on the second data in the database table of a second database, since each database table of the second database has a one-to-one correspondence Therefore, the data corresponding to the second data can be directly synchronized and corrected in accordance with the specific operation in the corresponding second index.
基于同一发明构思,本公开还提供一种数据查询方法60,如图6所示,数据查询方法20可以包括:步骤S21获取待查询数据的查询信息;步骤S22基于查询信息,在目标数据库的索引中查询待查询数据对应的同步数据;步骤S23基于同步数据,确定待查询数据在其对应的数据库中的位置;其中,通过前述任一实施例的数据同步方法10至50将数据库中的待同步数据同步至目标数据库的索引中。通过数据同步方法10至50建立的索引能够降低对分片资源的占用,提高效率,并且能够根据同步至索引中的数据方便的进行查询。Based on the same inventive concept, the present disclosure also provides a data query method 60. As shown in FIG. 6, the data query method 20 may include: step S21 obtains query information of the data to be queried; Query the synchronization data corresponding to the data to be queried in; step S23, based on the synchronization data, determine the location of the data to be queried in its corresponding database; wherein, the data synchronization method 10 to 50 in any of the foregoing embodiments is used to synchronize the data in the database. The data is synchronized to the index of the target database. The index established through data synchronization methods 10 to 50 can reduce the occupation of fragmentation resources, improve efficiency, and can conveniently perform queries based on the data synchronized to the index.
基于同一发明构思,本公开还提供一种数据同步装置100,如图7所示,数据同步装置100包括:数据获取模块110,用于获取待同步数据,待同步数据包括一个或多个第一数据库的数据库表中的数据;数据同步模块120,用于在目标数据库中建立第一索引,将待同步数据对应的全部第一数据库的数据库表的数据依次同步至第一索引;若当前的第一索引满足索引滚动策略,则建立新的第一索引,将尚未同步的数据继续同步至新的第一索引。Based on the same inventive concept, the present disclosure also provides a data synchronization device 100. As shown in FIG. 7, the data synchronization device 100 includes: a data acquisition module 110 for acquiring data to be synchronized. The data to be synchronized includes one or more first The data in the database table of the database; the data synchronization module 120 is used to establish a first index in the target database, and sequentially synchronize the data of all the database tables of the first database corresponding to the data to be synchronized to the first index; if the current first index If an index satisfies the index rolling strategy, a new first index is created, and the unsynchronized data is continuously synchronized to the new first index.
在一例中,当前的第一索引满足索引滚动策略,包括以下至少一项:当前的第一索引占用的存储空间达到存储阈值;同步至当前的第一索引的数据量达到容量阈值。In one example, the current first index satisfies the index rolling strategy, including at least one of the following: the storage space occupied by the current first index reaches the storage threshold; and the amount of data synchronized to the current first index reaches the capacity threshold.
在一例中,当前的第一索引满足索引滚动策略,还包括:对当前的第一索引进行数据同步的时间达到时间阈值。In one example, the current first index satisfies the index rolling strategy, which further includes: the time for data synchronization of the current first index reaches the time threshold.
在一例中,待同步数据还包括一个或多个第二数据库,其中,第二数据库的数据量大于第一数据库的数据量的数据库表中的数据;数据同步模块120还用于:在目标数据库中建立与每个第二数据库的数据库表一一对应的一个或多个第二索引;将每个第二数据库的数据库表的内容分别同步至其对应的第二索引。In one example, the data to be synchronized further includes one or more second databases, wherein the data volume of the second database is greater than the data in the database table of the first database; the data synchronization module 120 is also used to: One or more second indexes corresponding to the database tables of each second database are established in one-to-one; and the contents of the database tables of each second database are respectively synchronized to its corresponding second indexes.
在一例中,数据同步模块120还用于基于数据库表的表名确定建立索引的类型,其中,第一数据库的数据库表的表名包含第一标识,第二数据库的数据库表的表名包含第二标识;若数据库表的表名包含第一标识,则执行在目标数据库中建立第一索引,将待同步数据对应的全部第一数据库的数据库表的数据依次同步至第一索引;若数据库表的表名包含第二标识,则执行在目标数据库中建立与每个第二数据库的数据库表一一对应的一个或多个第二索引。In one example, the data synchronization module 120 is further configured to determine the type of index creation based on the table name of the database table, where the table name of the database table of the first database contains the first identifier, and the table name of the database table of the second database contains the first identifier. Second identification; if the table name of the database table contains the first identification, the first index is established in the target database, and the data of all the database tables of the first database corresponding to the data to be synchronized are sequentially synchronized to the first index; if the database table If the table name of contains the second identifier, one or more second indexes corresponding to the database tables of each second database are established in the target database.
在一例中,如图8所示,数据同步装置200在数据同步装置100的基础上还可包括:同步修正模块130,用于响应于删除任一第一数据库的数据删除请求,确定与待删除的第一数据库对应的一个或多个第一索引;删除一个或多个第一索引中与删除的第一数据库对应的数据。In one example, as shown in FIG. 8, on the basis of the data synchronization device 100, the data synchronization device 200 may further include: a synchronization correction module 130, configured to determine whether to delete data in response to a data deletion request to delete any of the first databases One or more first indexes corresponding to the first database; delete data corresponding to the deleted first database in one or more first indexes.
在一例中,同步修正模块130还用于:响应于删除任一第二数据库的数据删除请求,删除待删除的第二数据库对应的第二索引。In one example, the synchronization correction module 130 is further configured to delete the second index corresponding to the second database to be deleted in response to a data deletion request for deleting any second database.
在一例中,同步修正模块130还用于:当检测到对任一第一数据库的数据库表中第一数据的操作时,基于第一数据,在全部第一索引中确定第一数据对应的第一索引中数据的位置;并根据操作,同步修正第一数据对应的第一索引中的数据;其中,操作包括增加数据、修改数据或删除数据。In one example, the synchronization correction module 130 is further configured to: when an operation on the first data in the database table of any first database is detected, based on the first data, determine the first data corresponding to the first data in all the first indexes. The position of the data in the index; and according to the operation, the data in the first index corresponding to the first data is synchronously revised; wherein, the operation includes adding data, modifying data, or deleting data.
在一例中,同步修正模块130还用于:当检测到对任一第二数据库的数据库表中第二数据的操作时,在第二数据库的数据库表对应的第二索引中,根据操作,同步修正第二数据对应的第二索引中的数据;其中,操作包括增加数据、修改数据或删除数据。In one example, the synchronization correction module 130 is further configured to: when an operation on the second data in the database table of any second database is detected, in the second index corresponding to the database table of the second database, synchronize according to the operation Modify the data in the second index corresponding to the second data; where the operations include adding data, modifying data, or deleting data.
关于上述实施例中的数据同步装置100中的模块,其中各个模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述,此处将不做详细阐述说明。Regarding the modules in the data synchronization device 100 in the foregoing embodiment, the specific manner in which each module executes operations has been described in detail in the embodiment of the method, and will not be elaborated here.
基于同一发明构思,本公开还提供一种数据查询装置300,如图9所示,数据查询装置300包括:接收模块210,用于获取待查询数据的查询信息;搜索模块220,用于基于查询信息,在目标数据库的索引中查询待查询数据对应的同步数据;查询模块230,用于基于同步数据,确定待查询数据在其对应的数据库中的位置;其中,通过如前述任一实施例中的数据同步方法10将与数据库中的待同步数据同步至目标数据库的索引中。Based on the same inventive concept, the present disclosure also provides a data query device 300. As shown in FIG. 9, the data query device 300 includes: a receiving module 210 for obtaining query information of the data to be queried; a search module 220 for query-based Information, query the synchronization data corresponding to the data to be queried in the index of the target database; the query module 230 is used to determine the position of the data to be queried in its corresponding database based on the synchronization data; wherein, by The data synchronization method 10 synchronizes the data to be synchronized with the database to the index of the target database.
关于上述实施例中的数据查询装置300,其中各个模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述,此处将不做详细阐述说明。Regarding the data query device 300 in the foregoing embodiment, the specific manner in which each module performs operations has been described in detail in the embodiment related to the method, and will not be elaborated here.
如图10所示,本公开的一个实施方式提供了一种电子设备400。其中,该电子设备 400包括存储器401、处理器402、输入/输出(Input/Output,I/O)接口403。其中,存储器401,用于存储指令。处理器402,用于调用存储器401存储的指令执行本公开实施例的数据同步方法或数据查询方法。其中,处理器402分别与存储器401、I/O接口403连接,例如可通过总线系统和/或其他形式的连接机构(未示出)进行连接。存储器401可用于存储程序和数据,包括本公开实施例中涉及的数据同步方法或数据查询方法的程序,处理器402通过运行存储在存储器401的程序从而执行电子设备400的各种功能应用以及数据处理。As shown in FIG. 10, an embodiment of the present disclosure provides an electronic device 400. The electronic device 400 includes a memory 401, a processor 402, and an input/output (Input/Output, I/O) interface 403. Among them, the memory 401 is used to store instructions. The processor 402 is configured to call the instructions stored in the memory 401 to execute the data synchronization method or the data query method of the embodiment of the present disclosure. The processor 402 is respectively connected to the memory 401 and the I/O interface 403, for example, through a bus system and/or other forms of connection mechanisms (not shown). The memory 401 can be used to store programs and data, including programs of the data synchronization method or data query method involved in the embodiments of the present disclosure. The processor 402 executes various functional applications and data of the electronic device 400 by running the programs stored in the memory 401 deal with.
本公开实施例中处理器402可以采用数字信号处理器(Digital Signal Processing,DSP)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)、可编程逻辑阵列(Programmable Logic Array,PLA)中的至少一种硬件形式来实现,所述处理器402可以是中央处理单元(Central Processing Unit,CPU)或者具有数据处理能力和/或指令执行能力的其他形式的处理单元中的一种或几种的组合。In the embodiment of the present disclosure, the processor 402 may use any of digital signal processors (Digital Signal Processing, DSP), Field-Programmable Gate Array (Field-Programmable Gate Array, FPGA), and Programmable Logic Array (Programmable Logic Array, PLA). The processor 402 may be implemented in at least one form of hardware, and the processor 402 may be one or more of a central processing unit (Central Processing Unit, CPU) or other forms of processing units with data processing capabilities and/or instruction execution capabilities combination.
本公开实施例中的存储器401可以包括一个或多个计算机程序产品,所述计算机程序产品可以包括各种形式的计算机可读存储介质,例如易失性存储器和/或非易失性存储器。所述易失性存储器例如可以包括随机存取存储器(Random Access Memory,RAM)和/或高速缓冲存储器(cache)等。所述非易失性存储器例如可以包括只读存储器(Read-Only Memory,ROM)、快闪存储器(Flash Memory)、硬盘(Hard Disk Drive,HDD)或固态硬盘(Solid-State Drive,SSD)等。The memory 401 in the embodiment of the present disclosure may include one or more computer program products, and the computer program products may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, random access memory (Random Access Memory, RAM) and/or cache memory (cache). The non-volatile memory may include, for example, Read-Only Memory (ROM), Flash Memory (Flash Memory), Hard Disk Drive (HDD), Solid-State Drive (SSD), etc. .
本公开实施例中,I/O接口403可用于接收输入的指令(例如数字或字符信息,以及产生与电子设备400的用户设置以及功能控制有关的键信号输入等),也可向外部输出各种信息(例如,图像或声音等)。本公开实施例中I/O接口403可包括物理键盘、功能按键(比如音量控制按键、开关按键等)、鼠标、操作杆、轨迹球、麦克风、扬声器、和触控面板等中的一个或多个。In the embodiment of the present disclosure, the I/O interface 403 can be used to receive input commands (for example, numeric or character information, and generate key signal inputs related to the user settings and function control of the electronic device 400, etc.), and can also output various external commands. Kind of information (for example, image or sound, etc.). The I/O interface 403 in the embodiment of the present disclosure may include one or more of a physical keyboard, function buttons (such as volume control buttons, switch buttons, etc.), a mouse, a joystick, a trackball, a microphone, a speaker, and a touch panel, etc. Piece.
可以理解的是,本公开实施例中尽管在附图中以特定的顺序描述操作,但是不应将其理解为要求按照所示的特定顺序或是串行顺序来执行这些操作,或是要求执行全部所示的操作以得到期望的结果。在特定环境中,多任务和并行处理可能是有利的。It can be understood that, although the operations are described in a specific order in the drawings in the embodiments of the present disclosure, they should not be understood as requiring these operations to be performed in the specific order shown or in a serial order, or requiring execution. All the operations shown in order to get the desired result. In certain circumstances, multitasking and parallel processing may be advantageous.
本公开实施例涉及的方法和装置能够利用标准编程技术来完成,利用基于规则的逻辑或者其他逻辑来实现各种方法步骤。还应当注意的是,此处以及权利要求书中使用的词语“装置”和“模块”意在包括使用一行或者多行软件代码的实现和/或硬件实现和/或用于接收输入的设备。The methods and devices involved in the embodiments of the present disclosure can be implemented using standard programming techniques, and various method steps can be implemented using rule-based logic or other logic. It should also be noted that the words "device" and "module" used herein and in the claims are intended to include implementations using one or more lines of software code and/or hardware implementations and/or devices for receiving input.
此处描述的任何步骤、操作或程序可以使用单独的或与其他设备组合的一个或多个硬件或软件模块来执行或实现。在一个实施方式中,软件模块使用包括包含计算机程序代码的计算机可读介质的计算机程序产品实现,其能够由计算机处理器执行用于执行任何或全部的所描述的步骤、操作或程序。Any steps, operations, or programs described herein can be executed or implemented using one or more hardware or software modules alone or in combination with other devices. In one embodiment, the software module is implemented using a computer program product including a computer readable medium containing computer program code, which can be executed by a computer processor for executing any or all of the described steps, operations, or programs.
出于示例和描述的目的,已经给出了本公开实施的前述说明。前述说明并非是穷举性 的也并非要将本公开限制到所公开的确切形式,根据上述教导还可能存在各种变形和修改,或者是可能从本公开的实践中得到各种变形和修改。选择和描述这些实施例是为了说明本公开的原理及其实际应用,以使得本领域的技术人员能够以适合于构思的特定用途来以各种实施方式和各种修改而利用本公开。For the purposes of example and description, the foregoing description of the implementation of the present disclosure has been given. The foregoing description is not exhaustive and does not intend to limit the present disclosure to the exact form disclosed. Various variations and modifications may exist based on the above teachings, or may be derived from the practice of the present disclosure. These embodiments are selected and described in order to explain the principles of the present disclosure and its practical applications, so that those skilled in the art can utilize the present disclosure with various implementations and various modifications suitable for the specific purpose conceived.

Claims (14)

  1. 一种数据同步方法,其中,所述方法包括:A data synchronization method, wherein the method includes:
    获取待同步数据,所述待同步数据包括一个或多个第一数据库的数据库表中的数据;Acquiring data to be synchronized, where the data to be synchronized includes data in one or more database tables of the first database;
    在目标数据库中建立第一索引,将所述待同步数据对应的全部所述第一数据库的数据库表的数据依次同步至所述第一索引;Establishing a first index in the target database, and sequentially synchronizing data of all database tables of the first database corresponding to the data to be synchronized to the first index;
    若当前的第一索引满足索引滚动策略,则建立新的第一索引,将尚未同步的数据继续同步至所述新的第一索引。If the current first index satisfies the index rolling strategy, a new first index is created, and the unsynchronized data is continuously synchronized to the new first index.
  2. 根据权利要求1所述的方法,其中,所述当前的第一索引满足所述索引滚动策略,包括以下至少一项:The method according to claim 1, wherein the current first index satisfies the index rolling strategy and includes at least one of the following:
    所述当前的第一索引占用的存储空间达到存储阈值;The storage space occupied by the current first index reaches a storage threshold;
    同步至所述当前的第一索引的数据量达到容量阈值。The amount of data synchronized to the current first index reaches the capacity threshold.
  3. 根据权利要求2所述的方法,其中,所述当前的第一索引满足所述索引滚动策略,还包括:The method according to claim 2, wherein the current first index satisfies the index rolling strategy, further comprising:
    对所述当前的第一索引进行数据同步的时间达到时间阈值。The time for data synchronization of the current first index reaches the time threshold.
  4. 根据权利要求1所述的方法,其中,所述待同步数据还包括一个或多个第二数据库的数据库表中的数据,其中,所述第二数据库的数据量大于所述第一数据库的数据量;The method according to claim 1, wherein the data to be synchronized further includes data in one or more database tables of a second database, wherein the amount of data in the second database is greater than the data in the first database quantity;
    所述方法还包括:The method also includes:
    在目标数据库中建立与每个所述第二数据库的数据库表一一对应的一个或多个第二索引;Establishing one or more second indexes in the target database corresponding to each database table of the second database;
    将每个所述第二数据库的数据库表的内容分别同步至其对应的第二索引。Synchronize the contents of the database table of each second database to its corresponding second index respectively.
  5. 根据权利要求4所述的方法,其中,所述方法还包括:The method according to claim 4, wherein the method further comprises:
    基于数据库表的表名确定建立索引的类型,其中,所述第一数据库的数据库表的表名包含第一标识,所述第二数据库的数据库表包含第二标识;Determining the type of index creation based on the table name of the database table, wherein the table name of the database table of the first database includes a first identifier, and the database table of the second database includes a second identifier;
    若数据库表的表名包含所述第一标识,则执行所述建立第一索引,将所述待同步数据对应的全部所述第一数据库的数据库表的数据依次同步至所述第一索引的步骤;If the table name of the database table contains the first identifier, the first index creation is performed, and the data of all the database tables of the first database corresponding to the data to be synchronized are sequentially synchronized to the first index. step;
    若所述数据库表的表名包含所述第二标识,则执行所述建立与每个所述第二数据库的数据库表一一对应的一个或多个第二索引的步骤。If the table name of the database table includes the second identifier, the step of establishing one or more second indexes corresponding to each database table of the second database is performed.
  6. 根据权利要求4所述的方法,其中,所述方法还包括:The method according to claim 4, wherein the method further comprises:
    响应于删除任一第一数据库的数据删除请求,确定与待删除的第一数据库对应的一个或多个第一索引;In response to a data deletion request to delete any first database, determine one or more first indexes corresponding to the first database to be deleted;
    删除所述一个或多个第一索引中与所述待删除的第一数据库对应的数据。Deleting data corresponding to the first database to be deleted in the one or more first indexes.
  7. 根据权利要求4-6任一项所述的方法,其中,所述方法还包括:The method according to any one of claims 4-6, wherein the method further comprises:
    响应于删除任一第二数据库的数据删除请求,删除待删除的第二数据库对应的第二索 引。In response to a data deletion request for deleting any second database, the second index corresponding to the second database to be deleted is deleted.
  8. 根据权利要求4所述的方法,其中,所述方法还包括:The method according to claim 4, wherein the method further comprises:
    当检测到对任一第一数据库的数据库表中第一数据的操作时,基于所述第一数据,在全部所述第一索引中确定所述第一数据对应的第一索引中数据的位置;When an operation on the first data in the database table of any first database is detected, based on the first data, determine the position of the data in the first index corresponding to the first data among all the first indexes ;
    并根据所述操作,同步修正所述第一数据对应的第一索引中的数据;And according to the operation, synchronously correct the data in the first index corresponding to the first data;
    其中,所述操作包括增加数据、修改数据或删除数据。Wherein, the operation includes adding data, modifying data or deleting data.
  9. 根据权利要求4-6任一项所述的方法,其中,所述方法还包括:The method according to any one of claims 4-6, wherein the method further comprises:
    当检测到对任一第二数据库的数据库表中第二数据的操作时,在所述第二数据库的数据库表对应的第二索引中,根据所述操作,同步修正所述第二数据对应的第二索引中的数据;When an operation on the second data in the database table of any second database is detected, in the second index corresponding to the database table of the second database, according to the operation, synchronously correct the second data corresponding to the second data Data in the second index;
    其中,所述操作包括增加数据、修改数据或删除数据。Wherein, the operation includes adding data, modifying data or deleting data.
  10. 一种数据同步装置,其中,所述装置包括:A data synchronization device, wherein the device includes:
    数据获取模块,用于获取待同步数据,所述待同步数据包括一个或多个第一数据库的数据库表中的数据;A data acquisition module for acquiring data to be synchronized, where the data to be synchronized includes data in one or more database tables of the first database;
    数据同步模块,用于在目标数据库中建立第一索引,将所述待同步数据对应的全部所述第一数据库的数据库表的数据依次同步至所述第一索引;若当前的第一索引满足索引滚动策略,则建立新的第一索引,将尚未同步的数据继续同步至所述新的第一索引。The data synchronization module is configured to establish a first index in the target database, and sequentially synchronize the data of all the database tables of the first database corresponding to the data to be synchronized to the first index; if the current first index satisfies The index rolling strategy is to establish a new first index, and continue to synchronize the unsynchronized data to the new first index.
  11. 一种数据查询方法,其中,所述方法包括:A data query method, wherein the method includes:
    获取待查询数据的查询信息;Obtain query information of the data to be queried;
    基于所述查询信息,在目标数据库的索引中查询所述待查询数据对应的同步数据;Based on the query information, query the synchronization data corresponding to the data to be queried in the index of the target database;
    基于所述同步数据,确定所述待查询数据在其对应的数据库中的位置;Based on the synchronization data, determine the position of the data to be queried in its corresponding database;
    其中,通过如权利要求1-9任一项所述的数据同步方法将数据库中的待同步数据同步至所述目标数据库的索引中。Wherein, the data to be synchronized in the database is synchronized to the index of the target database by the data synchronization method according to any one of claims 1-9.
  12. 一种数据查询装置,其中,所述装置包括:A data query device, wherein the device includes:
    接收模块,用于获取待查询数据的查询信息;The receiving module is used to obtain the query information of the data to be queried;
    搜索模块,用于基于所述查询信息,在目标数据库的索引中查询所述待查询数据对应的同步数据;The search module is configured to query the synchronization data corresponding to the data to be queried in the index of the target database based on the query information;
    查询模块,用于基于所述同步数据,确定所述待查询数据在其对应的数据库中的位置;A query module, configured to determine the location of the data to be queried in its corresponding database based on the synchronization data;
    其中,通过如权利要求1-9任一项所述的数据同步方法将与数据库中的待同步数据同步至所述目标数据库的索引中。Wherein, the data to be synchronized in the database is synchronized to the index of the target database by the data synchronization method according to any one of claims 1-9.
  13. 一种电子设备,其中,所述电子设备包括:An electronic device, wherein the electronic device includes:
    存储器,用于存储指令;以及Memory for storing instructions; and
    处理器,用于调用所述存储器存储的指令执行如权利要求1-9任一项所述的数据同步方法或如权利要求11所述的数据查询方法。The processor is configured to call the instructions stored in the memory to execute the data synchronization method according to any one of claims 1-9 or the data query method according to claim 11.
  14. 一种计算机可读存储介质,其中存储有指令,所述指令被处理器执行时,执行如 权利要求1-9任一项所述的数据同步方法或如权利要求11所述的数据查询方法。A computer-readable storage medium stores instructions therein, and when the instructions are executed by a processor, the data synchronization method according to any one of claims 1-9 or the data query method according to claim 11 is executed.
PCT/CN2020/119711 2020-06-18 2020-09-30 Data synchronization method and apparatus, and data query method and apparatus WO2021253688A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010561213.7A CN111914020A (en) 2020-06-18 2020-06-18 Data synchronization method and device and data query method and device
CN202010561213.7 2020-06-18

Publications (1)

Publication Number Publication Date
WO2021253688A1 true WO2021253688A1 (en) 2021-12-23

Family

ID=73237947

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/119711 WO2021253688A1 (en) 2020-06-18 2020-09-30 Data synchronization method and apparatus, and data query method and apparatus

Country Status (2)

Country Link
CN (1) CN111914020A (en)
WO (1) WO2021253688A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115840788A (en) * 2023-02-21 2023-03-24 创意信息技术股份有限公司 Method, device, terminal and storage medium for synchronizing MySql data to ES

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112866763B (en) * 2020-12-28 2023-05-26 网宿科技股份有限公司 Sequence number generation method, server and storage medium of HLS multi-code rate stream slice
CN113407785B (en) * 2021-06-11 2023-02-28 西北工业大学 Data processing method and system based on distributed storage system
CN113342832B (en) * 2021-08-04 2021-11-02 北京快立方科技有限公司 Database indexing method

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102339315A (en) * 2011-09-30 2012-02-01 亿赞普(北京)科技有限公司 Index updating method and system of advertisement data
CN104199881A (en) * 2014-08-21 2014-12-10 广州华多网络科技有限公司 Database cluster, data query method and data synchronism method and device
CN106294860A (en) * 2016-08-23 2017-01-04 浪潮电子信息产业股份有限公司 The system of a kind of real time indexing data syn-chronization and its implementation
CN106469158A (en) * 2015-08-17 2017-03-01 杭州海康威视系统技术有限公司 Method of data synchronization and device
US20190179910A1 (en) * 2017-12-13 2019-06-13 International Business Machines Corporation Fast filtering for similarity searches on indexed data
CN110110007A (en) * 2019-04-15 2019-08-09 平安普惠企业管理有限公司 Data managing method and Related product
CN110427364A (en) * 2019-06-21 2019-11-08 北京奇艺世纪科技有限公司 A kind of data processing method, device, electronic equipment and storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10614070B2 (en) * 2015-10-27 2020-04-07 International Business Machines Corporation Preventing staleness in query results when using asynchronously updated indexes
CN109885589B (en) * 2017-12-06 2022-09-16 腾讯科技(深圳)有限公司 Data query method and device, computer equipment and storage medium
CN110119427A (en) * 2019-04-15 2019-08-13 平安普惠企业管理有限公司 Data managing method and Related product
CN110532272A (en) * 2019-08-30 2019-12-03 北京东软望海科技有限公司 Data query method, apparatus, electronic equipment and computer readable storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102339315A (en) * 2011-09-30 2012-02-01 亿赞普(北京)科技有限公司 Index updating method and system of advertisement data
CN104199881A (en) * 2014-08-21 2014-12-10 广州华多网络科技有限公司 Database cluster, data query method and data synchronism method and device
CN106469158A (en) * 2015-08-17 2017-03-01 杭州海康威视系统技术有限公司 Method of data synchronization and device
CN106294860A (en) * 2016-08-23 2017-01-04 浪潮电子信息产业股份有限公司 The system of a kind of real time indexing data syn-chronization and its implementation
US20190179910A1 (en) * 2017-12-13 2019-06-13 International Business Machines Corporation Fast filtering for similarity searches on indexed data
CN110110007A (en) * 2019-04-15 2019-08-09 平安普惠企业管理有限公司 Data managing method and Related product
CN110427364A (en) * 2019-06-21 2019-11-08 北京奇艺世纪科技有限公司 A kind of data processing method, device, electronic equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115840788A (en) * 2023-02-21 2023-03-24 创意信息技术股份有限公司 Method, device, terminal and storage medium for synchronizing MySql data to ES
CN115840788B (en) * 2023-02-21 2023-04-28 创意信息技术股份有限公司 Method, device, terminal and storage medium for synchronizing MySql data to ES

Also Published As

Publication number Publication date
CN111914020A (en) 2020-11-10

Similar Documents

Publication Publication Date Title
WO2021253688A1 (en) Data synchronization method and apparatus, and data query method and apparatus
US9830109B2 (en) Materializing data from an in-memory array to an on-disk page structure
US10817258B2 (en) Clustering storage method and apparatus
US9891831B2 (en) Dual data storage using an in-memory array and an on-disk page structure
US8095726B1 (en) Associating an identifier with a content unit
US10606865B2 (en) Database scale-out
US9418094B2 (en) Method and apparatus for performing multi-stage table updates
US20150154254A1 (en) Intelligently utilizing non-matching weighted indexes
WO2021258853A1 (en) Vocabulary error correction method and apparatus, computer device, and storage medium
WO2023202394A1 (en) Partition table creation method and apparatus, data writing method and apparatus for partition table, and data reading method and apparatus for partition table
WO2019165763A1 (en) Method for use in querying data
JP7006013B2 (en) Data provision program, data provision method, and data provision device
JP2020123320A (en) Method, apparatus, device and storage medium for managing index
US20200409915A1 (en) Database key compression
WO2024078122A1 (en) Database table scanning method and apparatus, and device
US20200151020A1 (en) Decentralized data processing architecture
WO2022083211A1 (en) Data management method and system for security protection terminal, device and storage medium
CN116302376A (en) Process creation method, process creation device, electronic equipment and computer readable medium
JP2016194826A (en) Database processing control method, processing control program and database server
US11609909B2 (en) Zero copy optimization for select * queries
CN113505134B (en) Multithreading data processing method, multithreading base database data storage method and device
US11113296B1 (en) Metadata management for a transactional storage system
US20220365905A1 (en) Metadata processing method and apparatus, and a computer-readable storage medium
US20120330983A1 (en) Data processing system
CN115544149A (en) Small file storage method and system based on HBase multi-terminal fusion

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20940545

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 24.03.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 20940545

Country of ref document: EP

Kind code of ref document: A1