WO2020215799A1 - 基于日志分析的MongoDB数据迁移监控方法及装置 - Google Patents

基于日志分析的MongoDB数据迁移监控方法及装置 Download PDF

Info

Publication number
WO2020215799A1
WO2020215799A1 PCT/CN2019/130542 CN2019130542W WO2020215799A1 WO 2020215799 A1 WO2020215799 A1 WO 2020215799A1 CN 2019130542 W CN2019130542 W CN 2019130542W WO 2020215799 A1 WO2020215799 A1 WO 2020215799A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
migration
mongodb
server
information
Prior art date
Application number
PCT/CN2019/130542
Other languages
English (en)
French (fr)
Inventor
石婧文
须成忠
叶可江
王洋
Original Assignee
深圳先进技术研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳先进技术研究院 filed Critical 深圳先进技术研究院
Publication of WO2020215799A1 publication Critical patent/WO2020215799A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/1734Details of monitoring file system events, e.g. by the use of hooks, filter drivers, logs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/214Database migration support

Definitions

  • the present invention relates to the technical field of electronic information, in particular to a MongoDB data migration monitoring method and device based on log analysis.
  • MongoDB supports two storage methods: shard storage and Replica Set storage of data in the cluster.
  • the main purpose of replica set storage is to use the master-slave mode for automatic failure recovery, while shard storage is to divide the key-value interval into different server storage without overlapping, and improve read and write throughput.
  • Mongodb will start the data migration module to perform data block migration to ensure that the amount of data stored on each server is approximately the same.
  • the fragmentation and migration process may bring a lot of redundant overhead.
  • the embodiment of the present invention provides a MongoDB data migration monitoring method and device based on log analysis, so as to at least solve the technical problem of redundancy overhead in the existing MongoDB data fragmentation and migration process.
  • a MongoDB data migration monitoring method based on log analysis includes the following steps:
  • the MongoDB sharded cluster contains three components: Shard, Mongos, and Config server;
  • the data migration route is divided into different stages, and the key value interval of the data blocks in each stage is drawn in proportional order.
  • the MongoDB data migration monitoring method also includes:
  • the cumulative sum of the data volume of the secondary data migration in the MongoDB sharded cluster data is transfer size, and the calculation formula is:
  • Mongos can obtain the changelog collection data on the Config server, and the transfer size can be obtained by traversing the changelog collection data.
  • the changelog collection data is stored in the form of a dictionary; clonedBytes represents the accumulated bytes of the data volume.
  • This log record is obtained from the data block migration server, including data block key information, migration server, migration server, subordinate collection name, and copy data volume information;
  • This log record is obtained from the data block migration receiving server, and contains data block key information, move-out server, move-in server, subordinate collection name, and success information.
  • This log record is obtained from the data block migration receiving server, and contains data block key information, move-out server, move-in server, subordinate set name, and success information;
  • shardCollection.start This log record is executed and created by mongos, and specifies the shard server where the initial data block MinKey and MaxKey are located;
  • the log record is obtained from the shard server that performs the split, and includes the data block information before the fragmentation, the data block information after the fragmentation, the collection name, and the shard server information where the data block is located.
  • the key value range of the initial data block and the shard server information are obtained from shardCollection.start. After that, all data blocks are split from existing data blocks, obtained from multi-split, and data block migration information is obtained from moveChunks. Get from from.
  • a MongoDB data migration monitoring device based on log analysis including:
  • the cluster building unit is used to build a MongoDB sharded cluster.
  • the MongoDB sharded cluster includes three components: Shard, Mongos and Config server;
  • the threshold unit is used to accumulate the data volume of the secondary data migration in the MongoDB sharded cluster data and stay within the preset threshold range;
  • the information acquisition unit is used to acquire the dynamic split and migration information of historical data blocks in the MongoDB sharded cluster
  • the key value interval dividing unit is used to divide the data migration route into different stages based on the successful migration of historical data blocks, and draw the data block key value interval of each stage in a proportional order.
  • a storage medium storing a program file that can implement any of the above-mentioned log analysis-based MongoDB data migration monitoring methods.
  • a processor which is used to run a program, where any one of the above-mentioned MongoDB data migration monitoring methods based on log analysis is executed when the program is running.
  • the MongoDB data migration monitoring method and device based on log analysis in the embodiment of the present invention utilize the log data in the MongoDB configuration server, observe the current distribution and past distribution migration of data blocks between different servers, and define write amplification estimates
  • Figure 1 is a flow chart of the MongoDB data migration monitoring method based on log analysis of the present invention
  • Figure 2 is a preferred flow chart of the MongoDB data migration monitoring method based on log analysis of the present invention
  • FIG. 3 is a schematic diagram of the data block splitting and migration process in the MongoDB data migration monitoring method based on log analysis of the present invention
  • Figure 4 is a block diagram of the MongoDB data migration monitoring device based on log analysis of the present invention
  • Figure 5 is a preferred module diagram of the MongoDB data migration monitoring device based on log analysis of the present invention.
  • the present invention proposes a scheme for accurately extracting data block migration information from log files, which can be used to measure whether the data migration strategy, split mechanism, and key value design are reasonable.
  • the MongoDB sharded cluster consists of three components: Shard, Mongos and Configserver:
  • Mongos is responsible for providing cluster access interfaces, ensuring cluster consistency, and correctly routing user requests to the corresponding Shard. At the same time, Mongos provides the user command line tool mongos shell, through which we can obtain a small amount of statistical information about the database and data collection. Part of the data in the database comes from shell commands.
  • the Shard is responsible for storing data, and the data is stored and migrated in the Shard cluster in the form of chunks.
  • Config server saves all metadata of the Shard cluster, and Mongos connects to Config server to obtain metadata information.
  • the metadata information includes the log set changelog and chunks set.
  • the changelog set stores database changes, and the chunks set stores all current data block information.
  • MongoDB's built-in monitoring tool mongostat can display the time taken to perform operations and cache hits; the web monitoring tool MMS (MongoDB Monitoring Service) provided on the MongoDB official website can detect hardware event.
  • MMS MongoDB Monitoring Service
  • Most of the existing technologies aimed at improving the performance of nosql databases such as MongoDB take insertion, query time cost, and storage cost as indicators, and there is no further data migration analysis.
  • the technical scheme of the present invention can measure whether the current database migration and configuration are reasonable, and can visually observe the historical key value interval distribution, data block splitting, and data block migration between different servers in the fragmented cluster.
  • a MongoDB data migration monitoring method based on log analysis includes the following steps:
  • the MongoDB sharded cluster includes three components: Shard, Mongos, and Config server;
  • S102 The accumulated sum of the data amount of the secondary data migration in the MongoDB sharded cluster data is within a preset threshold range, that is, the smaller the accumulated sum of data, the better;
  • the MongoDB data migration monitoring method based on log analysis of the present invention utilizes log data in the MongoDB configuration server, observes the existing distribution and past distribution migration of data blocks between different servers, and defines write amplification estimation formulas to evaluate split and migration
  • the strategy is good or bad, helping the MongoDB database to better pre-divide and allocate resources. Compared with traditional observation methods, it is not interfered by other factors, and the results are more accurate by using historical diary data.
  • the results are intuitive, showing the performance of the sharded database through formula indicators or visual evaluation, and can intuitively reflect whether the data migration strategy, splitting mechanism, and key value design are reasonable.
  • the MongoDB data migration monitoring method method further includes:
  • S105 Fill the data block with different colors for the key value interval of the data block in each stage to represent different servers, and visualize the splitting and migration process of the data block of the entire data set.
  • Balanced overhead calculation method use transfersize to represent the cumulative sum of the data volume of the secondary data migration under the guidance of the balance component. While the data blocks are distributed as evenly as possible in the sharded cluster, the smaller the network transmission resource overhead of data migration, the better. Define the following formula:
  • the transfer size can be obtained by traversing the changelog collection, Mongos can obtain the changelog data on the Config server, and clonedeBytes represents the accumulated bytes of the data volume.
  • the data is saved in dictionary form:
  • the "what" attribute represents the type of operation. There are mainly two types of operations used in the calculation of the write zoom ratio:
  • “MoveChunks.commit” This log record is obtained from the data block migration server, including data block key information, migration server, migration server, subordinate collection name, copy data volume and other information.
  • “MoveChunks.from” This log record is obtained from the data block migration receiving server, and contains data block key value information, moving out server, moving in server, subordinate collection name, and success or failure information.
  • the transfer size is the cumulative sum of the amount of copied data confirmed by moveChunks.from in the history record.
  • Visualization method of historical data block split migration Use the chunks set on the Config server to depict the distribution of the current data block cluster, and obtain the dynamic split and migration of historical data blocks from the Changelog.
  • “MoveChunks.from” This log record is obtained from the data block migration receiving server, and contains data block key value information, moving out server, moving in server, subordinate collection name, and success or failure information.
  • the transfer size is the cumulative sum of the amount of copied data confirmed by moveChunks.from in the history record.
  • multi-split The log record is obtained from the shard server performing the split, and contains information such as the data block information before the fragmentation, the data block information after the fragmentation, the collection name, and the shard server where the data block is located.
  • the data migration route is divided into different stages, and the key value interval of the data block in each stage is drawn in proportional order, and the data blocks are filled with different colors to represent different shard servers, which is visualized
  • the division and migration process of the data block of the entire data collection The key value range of the initial data block and the shard server information are obtained from "shardCollection.start”. After that, all data blocks are split from existing data blocks, so they are all obtained from "multi-split", and data blocks are migrated. The information is obtained from "moveChunks.from”.
  • FIG. 3 there are intervals between different data blocks.
  • the length of the data block is proportional to the key value interval responsible for storage.
  • Green, purple, and blue respectively represent different servers where the data block is located (shard000 is blue, shard001 is Green, shard002 is purple). Except for stage0 to stage1, which is caused by the first split of the data block, new stages after that are caused by data migration.
  • a MongoDB data migration monitoring device based on log analysis including:
  • the cluster building unit 201 is used to build a MongoDB sharded cluster.
  • the MongoDB sharded cluster includes three components: Shard, Mongos, and Config server;
  • the threshold unit 202 is configured to accumulate the data amount of the secondary data migration in the MongoDB sharded cluster data and fall within a preset threshold range;
  • the information acquiring unit 203 is configured to acquire dynamic split and migration information of historical data blocks in the MongoDB sharded cluster;
  • the key value interval dividing unit 204 is configured to divide the data migration route into different stages based on the successful migration of historical data blocks, and draw the data block key value interval of each stage in a proportional order.
  • the MongoDB data migration monitoring device based on log analysis in the embodiment of the present invention utilizes the log data in the MongoDB configuration server, observes the current distribution and past distribution migration of data blocks between different servers, and defines write amplification estimation formula evaluation
  • the split and migration strategy is good or bad, helping the MongoDB database to better pre-divide and allocate resources. Compared with traditional observation methods, it is not interfered by other factors, and the results are more accurate by using historical diary data.
  • the results are intuitive, showing the performance of the sharded database through formula indicators or visual evaluation, and can intuitively reflect whether the data migration strategy, split mechanism, and key value design are reasonable.
  • the device further includes:
  • the color filling unit 205 is used to fill the data block with different colors for different servers in the key value interval of the data block at each stage, and visualize the splitting and migration process of the data block of the entire data set.
  • a storage medium storing a program file that can implement any of the above-mentioned log analysis-based MongoDB data migration monitoring methods.
  • a processor which is used to run a program, where any one of the above-mentioned MongoDB data migration monitoring methods based on log analysis is executed when the program is running.
  • the disclosed technical content can be implemented in other ways.
  • the system embodiment described above is only illustrative.
  • the division of units may be a logical function division, and there may be other divisions in actual implementation.
  • multiple units or components may be combined or integrated into Another system, or some features can be ignored, or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of units or modules, and may be in electrical or other forms.
  • the units described as separate components may or may not be physically separate, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or may be distributed on multiple units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional units in the various embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.
  • the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
  • the technical solution of the present invention essentially or the part that contributes to the prior art or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , Including several instructions to make a computer device (which can be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods in the various embodiments of the present invention.
  • the aforementioned storage media include: U disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), mobile hard disk, magnetic disk or optical disk and other media that can store program code .

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种基于日志分析的MongoDB数据迁移监控方法及装置,涉及电子信息技术领域,所述方法包括:搭建一个MongoDB分片集群(S101);将MongoDB分片集群数据中二次数据迁移的数据量累加和处于预设阈值范围内(S102);获取MongoDB分片集群中历史数据块动态分裂与迁移信息(S103);以历史数据块发生成功迁移为界,将数据迁移路线分为不同的阶段,并将每个阶段的数据块键值区间按照比例顺序画出(S104)。该方法及装置利用了MongoDB配置服务器中的日志数据,观测数据块在不同服务器之间现有分布与过去分布迁移情况,并定义写放大估算公式评估分裂与迁移策略好坏,帮助MongoDB数据库更好地进行预划分和资源分配。与传统的观测方法相比,不受其他因素干扰,使用历史日记数据,结果更加准确。

Description

基于日志分析的MongoDB数据迁移监控方法及装置 技术领域
本发明涉及电子信息技术领域,具体而言,涉及一种基于日志分析的MongoDB数据迁移监控方法及装置。
背景技术
随着海量非结构化数据(传感器采集的空间数据、路网数据)源源不断地产生,分布式Nosql数据库,如MongoDB、Hbase等地位日益提高。MongoDB支持数据在集群中分片(shard)存储与副本集(Replica Set)存储两种存储方式。副本集存储的主要目的是利用主从模式进行自动故障恢复功能,而分片存储是为了将键值区间无重叠地划分给不同服务器存储,提高读写吐量。另外,当服务器存储的数据块不均匀时,Mongodb会启动数据迁移模块进行数据块迁移,保证各台服务器存储数据量大致相同。但由于数据可能存在严重不可预测的数据倾斜,分片和迁移过程可能带来很多冗余开销。
发明内容
本发明实施例提供了一种基于日志分析的MongoDB数据迁移监控方法及装置,以至少解决现有MongoDB数据分片和迁移过程中存在冗余开销的技术问题。
根据本发明的一实施例,提供了一种基于日志分析的MongoDB数据迁移监控方法,包括以下步骤:
搭建一个MongoDB分片集群,MongoDB分片集群包含Shard、Mongos和Config server3种组件;
将MongoDB分片集群数据中二次数据迁移的数据量累加和处于预设阈值范围内;
获取MongoDB分片集群中历史数据块动态分裂与迁移信息;
以历史数据块发生成功迁移为界,将数据迁移路线分为不同的阶段,并将每个阶段的数据块键值区间按照比例顺序画出。
进一步地,MongoDB数据迁移监控方法还包括:
将每个阶段的数据块键值区间用不同颜色代表不同的服务器填充数据块。
进一步地,MongoDB分片集群数据中二次数据迁移的数据量累加和为transfer size,其计算公式为:
transfer size=∑clonedBytes;
Mongos可获取Config server上的changelog集合数据,transfer size可通过遍历changelog集合数据获取,changelog集合数据以字典形式保存;clonedBytes代表数据量累加字节。
进一步地,MongoDB分片集群数据中二次数据迁移的数据量累加计算中采用两种操作类型:
moveChunks.commit:该日志记录从数据块迁出服务器获取,包含有数据块键值信息、迁出服务器、迁入服务器、从属集合名称、拷贝数据量信息;
moveChunks.from:该日志记录从数据块迁移接收服务器获取,包含有数据块键值信息、迁出服务器、迁入服务器、从属集合名称、是否成功信息。
进一步地,利用Config server上的chunks集合,描绘当前数据块集群中的分布情况,从MongoDB分片集群的Changelog集合数据中获取历史数据块动态分裂与迁移信息。
进一步地,从MongoDB分片集群的Changelog集合数据中获取历史数据块动态分裂与迁移信息过程中采用三种操作类型:
moveChunks.from:该日志记录从数据块迁移接收服务器获取,包含有数 据块键值信息、迁出服务器、迁入服务器、从属集合名称、是否成功信息;
shardCollection.start:该日志记录由mongos执行创建,指定了初始数据块MinKey、MaxKey所在shard服务器;
multi-split:该日志记录从执行分裂的shard服务器获取,包含分片前数据块信息、分片后数据块信息、集合名称、数据块所在shard服务器信息。
进一步地,初始数据块的键值区间和所在shard服务器信息从shardCollection.start中获取,之后所有数据块都由已存在数据块分裂而来,从multi-split中获取,数据块迁移信息从moveChunks.from中获取。
根据本发明的另一实施例,提供了一种基于日志分析的MongoDB数据迁移监控装置,包括:
集群搭建单元,用于搭建一个MongoDB分片集群,MongoDB分片集群包含Shard、Mongos和Config server3种组件;
阈值单元,用于将MongoDB分片集群数据中二次数据迁移的数据量累加和处于预设阈值范围内;
信息获取单元,用于获取MongoDB分片集群中历史数据块动态分裂与迁移信息;
键值区间划分单元,用于以历史数据块发生成功迁移为界,将数据迁移路线分为不同的阶段,并将每个阶段的数据块键值区间按照比例顺序画出。
一种存储介质,存储介质存储有能够实现上述任意一项基于日志分析的MongoDB数据迁移监控方法的程序文件。
一种处理器,处理器用于运行程序,其中,程序运行时执行上述任意一项的基于日志分析的MongoDB数据迁移监控方法。
本发明实施例中的基于日志分析的MongoDB数据迁移监控方法及装置,利用了MongoDB配置服务器中的日志数据,观测数据块在不同服务器之间现有分布与过去分布迁移情况,并定义写放大估算公式评估分裂与迁移策略好坏,帮助MongoDB数据库更好地进行预划分和资源分配。与传统的观测方法相比, 不受其他因素干扰,使用历史日记数据,结果更加准确。结果直观,通过公式指标或可视化评估呈现分片数据库性能,并能直观体现衡量观察数据迁移策略、分裂机制、键值设计是否合理。
附图说明
此处所说明的附图用来提供对本发明的进一步理解,构成本申请的一部分,本发明的示意性实施例及其说明用于解释本发明,并不构成对本发明的不当限定。在附图中:
图1为本发明基于日志分析的MongoDB数据迁移监控方法的流程图;
图2为本发明基于日志分析的MongoDB数据迁移监控方法的优选流程图;
图3为本发明基于日志分析的MongoDB数据迁移监控方法中数据块分裂与迁移过程示意图;
图4为本发明基于日志分析的MongoDB数据迁移监控装置的模块图;
图5为本发明基于日志分析的MongoDB数据迁移监控装置的优选模块图。
具体实施方式
为了使本技术领域的人员更好地理解本发明方案,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分的实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都应当属于本发明保护的范围。
需要说明的是,本发明的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本发明的实施例能够以除了在这里图示或描述的那些以外的顺序实施。此外,术语“包括” 和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。
现有工具或方法虽然实时性好,能够通过测量网络或者磁盘I/O,绘制出长期跟踪曲线,但是难以体现资源消耗与上层机制(如迁移策略)的关系。测量结果易受到各种干扰,如I/O的观察往往混合了数据库其他I/O影响或者其他应用的I/O干扰,并且难以从一个混合指标中分解出实际需要的资源消耗。这不利于找到性能问题存在、评估上层策略、改进数据库机制等。本发明提出了一种从日志文件准确提取数据块迁移信息的方案,可用来衡量数据迁移策略、分裂机制、键值设计是否合理。
分布式数据库比起单机数据库引入了很多的新问题,如数据在服务器之间的分发与迁移问题,这些新过程带来的开销和影响往往被人们忽视,可视化与量化公式可以帮助数据库管理员更好地判断预划分效果。但数据块的分裂与迁移是一个持续进行的长期过程,中间伴有数据块动态分裂,迁移过程中部分数据可能发生多次冗余网络传输,以上各种因素增加了观测复杂性,现有技术中还没有具体方法将迁移分裂过程中的写放大和冗余网络传输直观观测与量化。为此,我们针对分布式MongoDB数据库集群提出了新的监控分析方法。
其中MongoDB分片集群由Shard、Mongos和Config server3种组件构成:
(1)Mongos负责提供集群访问接口,保证集群一致性,并将用户请求正确路由到对应的Shard。同时,Mongos提供了用户命令行工具mongos shell,通过mongos shell我们可以获取数据库与数据集合少量统计信息。数据库中的部分数据来源于shell命令。
(2)Shard负责存储数据,数据以chunk形式在Shard集群中进行存储和迁移。
(3)Config server保存Shard集群所有元数据,Mongos连接Config server获取元数据信息。其中元数据信息包含日志集changelog和chunks集合, changelog集合存储了数据库变动情况,chunks集合存储了当前所有数据块信息。
以往的数据库监控方案与工具大多直接测量资源利用情况,如:MongoDB自带的监控工具mongostat可以显示执行操作花费时间、cache命中情况;MongoDB官网提供的网页监控工具MMS(MongoDB Monitoring Service)可以检测硬件事件。针对MongoDB等nosql数据库性能改进的现有技术大多以插入、查询时间代价和存储代价为指标,没有进一步的数据迁移分析。
本发明的技术方案可以衡量当前数据库迁移和配置是否合理,并可在分片集群中可视化观测历史键值区间分布、数据块分裂、数据块在不同服务器之间的迁移。
实施例1
根据本发明一实施例,提供了一种基于日志分析的MongoDB数据迁移监控方法,参见图1,包括以下步骤:
S101:搭建一个MongoDB分片集群,MongoDB分片集群包含Shard、Mongos和Config server3种组件;
S102:将MongoDB分片集群数据中二次数据迁移的数据量累加和处于预设阈值范围内,即数据量累加和越小越好;
S103:获取MongoDB分片集群中历史数据块动态分裂与迁移信息;
S104:以历史数据块发生成功迁移为界,将数据迁移路线分为不同的阶段,并将每个阶段的数据块键值区间按照比例顺序画出。
本发明的基于日志分析的MongoDB数据迁移监控方法,利用了MongoDB配置服务器中的日志数据,观测数据块在不同服务器之间现有分布与过去分布迁移情况,并定义写放大估算公式评估分裂与迁移策略好坏,帮助MongoDB数据库更好地进行预划分和资源分配。与传统的观测方法相比,不受其他因素干扰,使用历史日记数据,结果更加准确。结果直观,通过公式指标或可视化评估呈现分片数据库性能,并能直观体现衡量观察数据迁移策略、分裂机制、 键值设计是否合理。
作为优选的技术方案中,参见图2,该MongoDB数据迁移监控方法方法还包括:
S105:将每个阶段的数据块键值区间用不同颜色代表不同的服务器填充数据块,可视化了整个数据集合数据块的分裂、迁移过程。
下面以具体实施例,对本方法进行详细说明,本发明一种基于日志分析的MongoDB数据迁移监控方法中:
首先,搭建一个MongoDB分片集群,包含Shard、Mongos和Config server3种组件,创建分片集合,并向分片集合中进行数据处理。
平衡开销计算方法:用transfer size代表数据在平衡组件指导下进行二次数据迁移的数据量累加和。在使数据块在分片集群中达到尽可能均匀分布同时,数据迁移的网络传输资源开销越小越好。定义如下公式:
transfer size=∑clonedBytes;
transfer size可通过遍历changelog集合获取,Mongos可以获取Config server上的changelog数据,clonedeBytes代表数据量累加字节。该数据以字典形式保存:
{"_id":"silverdew-2018-10-06T20:42:02.820+0800-5bb8ad9a11fa6074beda8f4b","server":"silverdew","clientAddr":"127.0.0.1:33058","time":ISODate("2018-10-06T12:42:02.820Z"),"what":"moveChunk.commit","ns":"two_zero_one_seven.Jan_sh_hil_fourlogic","details":{"min":{"key":{"$minKey":1}},"max":{"key":"03100002001021021033022023100231"},"from":"shard0000","to":"shard0001","counts":{"cloned":NumberLong(1),"clonedBytes":NumberLong(310),"catchup":NumberL ong(0),"steady":NumberLong(0)}}}。
“what”属性代表操作类型,写放大比例计算过程中主要用到两种操作类型:
“moveChunks.commit”:该日志记录从数据块迁出服务器获取,包含有数据块键值信息、迁出服务器、迁入服务器、从属集合名称、拷贝数据量等信息。
“moveChunks.from”:该日志记录从数据块迁移接收服务器获取,包含有数据块键值信息、迁出服务器、迁入服务器、从属集合名称、是否成功等信息。在该种操作类型中,transfer size为历史记录中经过moveChunks.from确认迁移成功的拷贝数据量累加和。
历史数据块分裂迁移可视化方法:利用Config server上的chunks集合,描绘当前数据块集群中的分布情况,从Changelog获取历史数据块动态分裂与迁移情况。
可视化过程中主要的“what”操作类型有:
“moveChunks.from”:该日志记录从数据块迁移接收服务器获取,包含有数据块键值信息、迁出服务器、迁入服务器、从属集合名称、是否成功等信息。在该种操作类型中,transfer size为历史记录中经过moveChunks.from确认迁移成功的拷贝数据量累加和。
"shardCollection.start":该日志记录由mongos执行创建,指定了初始数据块【MinKey,MaxKey】所在shard服务器。
"multi-split":该日志记录从执行分裂的shard服务器获取,包含分片前数据块信息、分片后数据块信息、集合名称、数据块所在shard服务器等信息。
以数据块发生成功迁移为界,将数据迁移路线分为不同的阶段,并将每个阶段的数据块键值区间按照比例顺序画出,用不同颜色代表不同的shard服务器填充数据块,可视化了整个数据集合数据块的分裂、迁移过程。其中初始数据块的键值区间和所在shard服务器信息从"shardCollection.start"中获取,之后所有数据块都由已存在数据块分裂而来,因此都从"multi-split"中获取,数据块迁移信息从“moveChunks.from”中获取。
参见图3,图3中不同数据块之间有间隔,数据块长度与负责存储的键值区间成正比,绿色、紫色、蓝色分别代表数据块所在不同的服务器(shard000为蓝色、shard001为绿色、shard002为紫色)。除了stage0到stage1是由数据块第一次分裂造成,之后新的stage都是由数据迁移导致。
实施例2
根据本发明另一实施例,提供了一种基于日志分析的MongoDB数据迁移监控装置,参见图4,包括:
集群搭建单元201,用于搭建一个MongoDB分片集群,MongoDB分片集群包含Shard、Mongos和Config server3种组件;
阈值单元202,用于将MongoDB分片集群数据中二次数据迁移的数据量累加和处于预设阈值范围内;
信息获取单元203,用于获取MongoDB分片集群中历史数据块动态分裂与迁移信息;
键值区间划分单元204,用于以历史数据块发生成功迁移为界,将数据迁移路线分为不同的阶段,并将每个阶段的数据块键值区间按照比例顺序画出。
本发明实施例中的基于日志分析的MongoDB数据迁移监控装置,利用了MongoDB配置服务器中的日志数据,观测数据块在不同服务器之间现有分布与过去分布迁移情况,并定义写放大估算公式评估分裂与迁移策略好坏,帮助MongoDB数据库更好地进行预划分和资源分配。与传统的观测方法相比,不受其他因素干扰,使用历史日记数据,结果更加准确。结果直观,通过公式指标或可视化评估呈现分片数据库性能,并能直观体现衡量观察数据迁移策略、分裂机制、键值设计是否合理。
作为优选的技术方案中,参见图5,该装置还包括:
颜色填充单元205,用于将每个阶段的数据块键值区间用不同颜色代表不同的服务器填充数据块,可视化了整个数据集合数据块的分裂、迁移过程。
实施例3
一种存储介质,存储介质存储有能够实现上述任意一项基于日志分析的MongoDB数据迁移监控方法的程序文件。
实施例4
一种处理器,处理器用于运行程序,其中,程序运行时执行上述任意一项的基于日志分析的MongoDB数据迁移监控方法。
上述本发明实施例序号仅仅为了描述,不代表实施例的优劣。
在本发明的上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。
在本申请所提供的几个实施例中,应该理解到,所揭露的技术内容,可通过其它的方式实现。其中,以上所描述的系统实施例仅仅是示意性的,例如单元的划分,可以为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,单元或模块的间接耦合或通信连接,可以是电性或其它的形式。
作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可为个人计算机、服务器或者网络设备等)执行本发明各个实施例方法的全部或部分步骤。而前述的存储介质包括:U盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述仅是本发明的优选实施方式,应当指出,对于本技术领域的普通 技术人员来说,在不脱离本发明原理的前提下,还可以做出若干改进和润饰,这些改进和润饰也应视为本发明的保护范围。

Claims (10)

  1. 一种基于日志分析的MongoDB数据迁移监控方法,其特征在于,包括以下步骤:
    搭建一个MongoDB分片集群,所述MongoDB分片集群包含Shard、Mongos和Config server3种组件;
    将所述MongoDB分片集群数据中二次数据迁移的数据量累加和处于预设阈值范围内;
    获取所述MongoDB分片集群中历史数据块动态分裂与迁移信息;
    以历史数据块发生成功迁移为界,将数据迁移路线分为不同的阶段,并将每个阶段的数据块键值区间按照比例顺序画出。
  2. 根据权利要求1所述的MongoDB数据迁移监控方法,其特征在于,所述MongoDB数据迁移监控方法还包括:
    将每个阶段的数据块键值区间用不同颜色代表不同的服务器填充数据块。
  3. 根据权利要求1所述的MongoDB数据迁移监控方法,其特征在于,所述MongoDB分片集群数据中二次数据迁移的数据量累加和为transfer size,其计算公式为:
    transfer size=∑clonedBytes;
    Mongos获取Config server上的changelog集合数据,transfer size通过遍历changelog集合数据获取,changelog集合数据以字典形式保存;clonedBytes代表数据量累加字节。
  4. 根据权利要求1所述的MongoDB数据迁移监控方法,其特征在于,所述MongoDB分片集群数据中二次数据迁移的数据量累加计算中采用两种操作类型:
    moveChunks.commit:该日志记录从数据块迁出服务器获取,包含有数据块键值信息、迁出服务器、迁入服务器、从属集合名称、拷贝数据量信息;
    moveChunks.from:该日志记录从数据块迁移接收服务器获取,包含有数据块键值信息、迁出服务器、迁入服务器、从属集合名称、是否成功信息。
  5. 根据权利要求1所述的MongoDB数据迁移监控方法,其特征在于,利用Config server上的chunks集合,描绘当前数据块集群中的分布情况,从所述MongoDB分片集群的Changelog集合数据中获取历史数据块动态分裂与迁移信息。
  6. 根据权利要求5所述的MongoDB数据迁移监控方法,其特征在于,从所述MongoDB分片集群的Changelog集合数据中获取历史数据块动态分裂与迁移信息过程中采用三种操作类型:
    moveChunks.from:该日志记录从数据块迁移接收服务器获取,包含有数据块键值信息、迁出服务器、迁入服务器、从属集合名称、是否成功信息;
    shardCollection.start:该日志记录由mongos执行创建,指定了初始数据块MinKey、MaxKey所在shard服务器;
    multi-split:该日志记录从执行分裂的shard服务器获取,包含分片前数据块信息、分片后数据块信息、集合名称、数据块所在shard服务器信息。
  7. 根据权利要求6所述的MongoDB数据迁移监控方法,其特征在于,初始数据块的键值区间和所在shard服务器信息从shardCollection.start中获取,之后所有数据块都由已存在数据块分裂而来,从multi-split中获取,数据块迁移信息从moveChunks.from中获取。
  8. 一种基于日志分析的MongoDB数据迁移监控装置,其特征在于,包括:
    集群搭建单元,用于搭建一个MongoDB分片集群,所述MongoDB分片集群包含Shard、Mongos和Config server3种组件;
    阈值单元,用于将所述MongoDB分片集群数据中二次数据迁移的数据量累加和处于预设阈值范围内;
    信息获取单元,用于获取所述MongoDB分片集群中历史数据块动态分裂与迁移信息;
    键值区间划分单元,用于以历史数据块发生成功迁移为界,将数据迁移路线分为不同的阶段,并将每个阶段的数据块键值区间按照比例顺序画出。
  9. 一种存储介质,其特征在于,所述存储介质存储有能够实现权利要求1至7中任意一项所述基于日志分析的MongoDB数据迁移监控方法的程序文件。
  10. 一种处理器,其特征在于,所述处理器用于运行程序,其中,所述程序运行时执行权利要求1至7中任意一项所述的基于日志分析的MongoDB数据迁移监控方法。
PCT/CN2019/130542 2019-04-24 2019-12-31 基于日志分析的MongoDB数据迁移监控方法及装置 WO2020215799A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910331821.6A CN110147353B (zh) 2019-04-24 2019-04-24 基于日志分析的MongoDB数据迁移监控方法及装置
CN201910331821.6 2019-04-24

Publications (1)

Publication Number Publication Date
WO2020215799A1 true WO2020215799A1 (zh) 2020-10-29

Family

ID=67594373

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/130542 WO2020215799A1 (zh) 2019-04-24 2019-12-31 基于日志分析的MongoDB数据迁移监控方法及装置

Country Status (2)

Country Link
CN (1) CN110147353B (zh)
WO (1) WO2020215799A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113806046A (zh) * 2021-09-15 2021-12-17 武汉虹信技术服务有限责任公司 一种基于线程池的任务调度系统
CN114202365A (zh) * 2021-12-15 2022-03-18 广东电力信息科技有限公司 一种基于电力行业营销系统实时数据的监控方法

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110147353B (zh) * 2019-04-24 2022-04-26 深圳先进技术研究院 基于日志分析的MongoDB数据迁移监控方法及装置

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100161565A1 (en) * 2008-12-18 2010-06-24 Electronics And Telecommunications Research Institute Cluster data management system and method for data restoration using shared redo log in cluster data management system
CN102917072A (zh) * 2012-10-31 2013-02-06 北京奇虎科技有限公司 用于数据服务器集群之间进行数据迁移的设备、系统及方法
CN102930062A (zh) * 2012-11-30 2013-02-13 南京富士通南大软件技术有限公司 一种数据库快速水平扩展的方法
CN106126543A (zh) * 2016-06-15 2016-11-16 清华大学 一种关系型数据库到MongoDB的模型转换和数据迁移方法
CN109145121A (zh) * 2018-07-16 2019-01-04 浙江大学 一种时变图数据的快速存储查询方法
CN110147353A (zh) * 2019-04-24 2019-08-20 深圳先进技术研究院 基于日志分析的MongoDB数据迁移监控方法及装置

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2624146A4 (en) * 2012-03-29 2013-12-18 Huawei Tech Co Ltd DATA BLOCK PROCESSING SYSTEM AND SYSTEM, FRONT END DISPLAY DEVICE AND BACK END PROCESSING DEVICE
CN103259843B (zh) * 2013-03-22 2018-06-05 嘉兴安尚云信软件有限公司 一个智能化PaaS云计算平台系统
CN106777225B (zh) * 2016-12-26 2021-04-06 腾讯科技(深圳)有限公司 一种数据的迁移方法和系统
CN108241555B (zh) * 2016-12-26 2022-03-01 阿里巴巴集团控股有限公司 一种分布式数据库的备份、恢复方法、装置和服务器
CN107343021A (zh) * 2017-05-22 2017-11-10 国网安徽省电力公司信息通信分公司 国网云中应用的一种基于大数据的日志管理系统
US11269822B2 (en) * 2017-10-09 2022-03-08 Sap Se Generation of automated data migration model
CN108664580A (zh) * 2018-05-04 2018-10-16 西安邮电大学 一种MongoDB数据库中细粒度的负载均衡方法及系统
CN108959525A (zh) * 2018-06-28 2018-12-07 郑州云海信息技术有限公司 一种冷热数据可视化方法、系统、设备及计算机存储介质
CN109344153B (zh) * 2018-08-22 2023-12-05 中国平安人寿保险股份有限公司 业务数据的处理方法及终端设备

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100161565A1 (en) * 2008-12-18 2010-06-24 Electronics And Telecommunications Research Institute Cluster data management system and method for data restoration using shared redo log in cluster data management system
CN102917072A (zh) * 2012-10-31 2013-02-06 北京奇虎科技有限公司 用于数据服务器集群之间进行数据迁移的设备、系统及方法
CN102930062A (zh) * 2012-11-30 2013-02-13 南京富士通南大软件技术有限公司 一种数据库快速水平扩展的方法
CN106126543A (zh) * 2016-06-15 2016-11-16 清华大学 一种关系型数据库到MongoDB的模型转换和数据迁移方法
CN109145121A (zh) * 2018-07-16 2019-01-04 浙江大学 一种时变图数据的快速存储查询方法
CN110147353A (zh) * 2019-04-24 2019-08-20 深圳先进技术研究院 基于日志分析的MongoDB数据迁移监控方法及装置

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113806046A (zh) * 2021-09-15 2021-12-17 武汉虹信技术服务有限责任公司 一种基于线程池的任务调度系统
CN114202365A (zh) * 2021-12-15 2022-03-18 广东电力信息科技有限公司 一种基于电力行业营销系统实时数据的监控方法

Also Published As

Publication number Publication date
CN110147353A (zh) 2019-08-20
CN110147353B (zh) 2022-04-26

Similar Documents

Publication Publication Date Title
WO2020215799A1 (zh) 基于日志分析的MongoDB数据迁移监控方法及装置
US11645183B1 (en) User interface for correlation of virtual machine information and storage information
US10031671B2 (en) Method, apparatus, and system for calculating identification threshold to distinguish cold data and hot data
US10229129B2 (en) Method and apparatus for managing time series database
JP5123641B2 (ja) 性能履歴の管理方法および性能履歴の管理システム
US9507807B1 (en) Meta file system for big data
CN108536761A (zh) 报表数据查询方法及服务器
CN109656958B (zh) 数据查询方法以及系统
US20100153431A1 (en) Alert triggered statistics collections
DE112013000650B4 (de) Datenzwischenspeicherungsbereich
EP2380090B1 (en) Data integrity in a database environment through background synchronization
CN103714004A (zh) Jvm在线内存泄露分析方法及系统
CN101986655A (zh) 存储网络及该存储网络的数据读写方法
CN107145432A (zh) 一种建立模型数据库的方法以及客户端
US20090313312A1 (en) Method of Enhancing De-Duplication Impact by Preferential Selection of Master Copy to be Retained
CN104022913B (zh) 用于数据集群的测试方法和装置
JP6633642B2 (ja) 分散データベースにおけるデータブロックを処理する方法およびデバイス
CN113448491B (zh) 存储系统的数据迁移
US20150286661A1 (en) Database capacity estimation for database sizing
CN105786877B (zh) 一种数据存储方法、系统及查询方法
CN110457626A (zh) 一种异常访问请求筛选方法及装置
JPWO2019168599A5 (zh)
US20130332465A1 (en) Database management device and database management method
CN113934713A (zh) 一种订单数据索引方法、系统、计算机设备以及存储介质
CN110134698A (zh) 数据管理方法及相关产品

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19925867

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19925867

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 180322)

122 Ep: pct application non-entry in european phase

Ref document number: 19925867

Country of ref document: EP

Kind code of ref document: A1