WO2015135370A1 - 一种数据更新的方法和系统 - Google Patents

一种数据更新的方法和系统 Download PDF

Info

Publication number
WO2015135370A1
WO2015135370A1 PCT/CN2014/095957 CN2014095957W WO2015135370A1 WO 2015135370 A1 WO2015135370 A1 WO 2015135370A1 CN 2014095957 W CN2014095957 W CN 2014095957W WO 2015135370 A1 WO2015135370 A1 WO 2015135370A1
Authority
WO
WIPO (PCT)
Prior art keywords
server
data
fragment
update
replica
Prior art date
Application number
PCT/CN2014/095957
Other languages
English (en)
French (fr)
Inventor
陈华清
Original Assignee
北京奇虎科技有限公司
奇智软件(北京)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京奇虎科技有限公司, 奇智软件(北京)有限公司 filed Critical 北京奇虎科技有限公司
Publication of WO2015135370A1 publication Critical patent/WO2015135370A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1095Replication or mirroring of data, e.g. scheduling or transport for data synchronisation between network nodes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/2053Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant
    • G06F11/2094Redundant storage or storage space
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/2097Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements maintaining the standby controller/processing unit updated
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0646Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
    • G06F3/065Replication mechanisms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2201/00Indexing scheme relating to error detection, to error correction, and to monitoring
    • G06F2201/81Threshold

Definitions

  • the present invention relates to the field of data updating, and more particularly to a method, and system for data update.
  • the index dictionary is a very important data file for search engines. Its main features are: 1) the data size is very large; 2) access and update are very frequent.
  • data updates to search engines refer to updates to the index dictionary.
  • updates to the data such as updates to the index dictionary, often require a module restart to take effect, and require more human involvement. Therefore, this method of data update brings great pressure to the operation and operation and maintenance of data processing on the line.
  • the present invention proposes an innovative way to update the data, thereby greatly reducing online data processing. It is difficult to operate and improve the robustness of online data processing.
  • the present invention provides a data update technique for updating data without stopping the online service.
  • a data updating method comprising: dividing data of a data file into data chunks and storing them in one or more fragment server clusters; in each of the fragment server clusters Copying and storing the data into more than one fragmented replica server; the summary server summarizes the data processing results of the fragmented server cluster, and obtains data processing services by accessing the one or more fragmented server clusters.
  • the fragment replica server the method comprising: step 110: receiving a data update request; step 120: determining whether there is a fragment replica server in the fragment server cluster that can update data; step 130: when determining the fragment server cluster When there is a fragmented replica server that can update data, at least one of the clustered server clusters can update the data. And updating the data of the fragment replica server; and step 140: after the at least one fragment replica server that can update the data updates the data, performing step 110 to the remaining unupdated fragment replica server 130, until all the unsynchronized fragment replica server update data is completed.
  • a computer program comprising computer readable code, which when executed on a computer, performs the aforementioned data update method.
  • a computer readable medium storing the aforementioned computer program is provided.
  • a data update system comprising: one or more slice server clusters, one or more slice replica servers, and a summary server; each slice server cluster includes one or more a fragment replica server; the fragment replica server includes: a receiving module, configured to receive a data update request; and a determining module, configured to determine whether a fragment replica server capable of updating data in the fragment server cluster; and an update And a module, configured to update data of the at least one updateable fragment replica server in the fragment server cluster when determining that the fragment server server has a fragment replica server that can update data.
  • the invention adopts the scheme of distributed multi-machine cooperative hot switching, so that the data update switching is completely completed by the program automation, that is, the automatic switching of the data files such as the configuration file and the data update is performed without stopping the online service. Effective; compared with the prior art, since the update of the data is completely automated by the program, the manual intervention is reduced, thereby reducing the labor cost and the error rate of the manual operation; and, since the present invention adopts the distributed multi-machine cooperative hot switching, Rather than a stand-alone hot-swap solution, a single machine only needs to load one copy of the data in memory, which greatly reduces the amount of memory and saves resources. In addition, the use of distributed multi-machine cooperative hot switching can greatly reduce the difficulty of operation and maintenance of online data processing services, and also improve the robustness of online services.
  • FIG. 1 is a flow chart of a data update method in accordance with one embodiment of the present invention.
  • FIGS. 2a and 2b are diagrams of a Zookeeper file system of a data update method according to an embodiment of the present invention
  • FIG. 3 is a Zookeeper file system diagram of a data update method according to another embodiment of the present invention.
  • FIG. 4 is a schematic diagram of a data update system in accordance with one embodiment of the present invention.
  • FIG. 5 is a block diagram of a slice replica server in accordance with one embodiment of the present invention.
  • Figure 6 shows schematically a block diagram of a server for carrying out the method according to the invention
  • Fig. 7 schematically shows a storage unit for holding or carrying program code implementing the method according to the invention.
  • the main idea of the present invention is that the data update is completely completed by the program automation by adopting the distributed multi-machine cooperative hot switching scheme, that is, the data files such as the configuration file and the retrieval dictionary are performed without stopping the online service. Automatic switching and taking effect.
  • the data update is switched by a professional system operation and maintenance personnel.
  • the specific method is as follows: 1) Provide a standby cluster for the online service, which is equivalent to an image of the online service cluster; 2) Each data When updating, first update the standby cluster online. After the standby cluster is updated, prepare the online access request traffic to the standby cluster, so that the standby cluster becomes an online service cluster; 3) the original online The service cluster becomes a standby cluster.
  • This implementation Disadvantages: 1) More manual participation is required, the update speed is slow, and it is easy to cause frequent online faults, and the stability of the system is difficult to guarantee; 2) Due to the need to provide a standby cluster, more machines are needed to meet the daily line. On the service, but in most cases the standby cluster is idle and the resources are wasted.
  • data update switching is performed by using dual buffers (memory).
  • the disadvantage of this implementation is that since the system needs to load the data into two copies, one for updating and the other for currently providing online services, the update is to switch between the two data, so the system needs Take up more memory.
  • Data hot switching refers to the update of data without stopping the online service.
  • SHARD SERVER Divide the data of the data file into several parts and store each piece of data in a cluster of server servers. The sum of the data of all the fragment server clusters is all data.
  • Replica Replica Server In each shard server cluster, the data is completely copied into several copies (one or more copies), and the data of each shard copy is stored in a shard copy. In the server.
  • Zookeeper is a reliable coordination system for large distributed systems, including: configuration maintenance, name service, distributed synchronization, group services, and so on.
  • the goal of Zookeeper is to package complex and error-prone critical services, providing easy-to-use interfaces and efficient, stable systems to users.
  • the ZooKeeper system in order to coordinate management of a distributed collaborative fragment server cluster, provides services such as configuration maintenance, name service, distributed synchronization, and group service coordination management for the fragment replica server in each fragment server cluster, thereby achieving efficient and reliable collaborative work.
  • the core of the Zookeeper system is a streamlined file system.
  • Each file node is called a node (NODE) in the Zookeeper system.
  • NODE node
  • Each NODE node can set a listener (Watcher) to monitor the NODE node and its child nodes. The state changes, and the Watcher can monitor the state changes of the directory nodes and the state changes of their subdirectories. Once the state of the monitored node changes, the Zookeeper server will notify all the Watchers set on this directory node, so that the corresponding client will quickly know that the state of the directory node it is concerned with changes, thus making corresponding reaction.
  • Watcher listener
  • Figure 1 is a flow diagram of a data update method 100 in accordance with one embodiment of the present invention. As shown in FIG. 1, method 100 begins at step 110.
  • a data update request is received.
  • the data of the data file is first divided into data blocks and stored in one or more slice server clusters, and then in each of the slice server clusters, the data is divided and stored into more than one segment.
  • Tablet replica server When a data processing service is provided, the summary server can summarize the data processing results of the fragmented server cluster to provide complete data processing results. In addition, the summary server can obtain a sliced replica server that is providing data processing services by accessing the one or more cluster server clusters.
  • the invention uses the push method to notify the data update, that is, the system responsible for updating the data, after the data update is completed, sending a data update message to the system using the data, and the system using the data is notified to update the data.
  • the Zookeeper server after receiving the data update message sent by the system responsible for updating the data, notifies all the fragment replica servers in the corresponding slice server cluster to update the data.
  • step 120 it is determined whether there is a fragment replica server in the fragment server cluster that can update data.
  • the update identifier file corresponding to the fragment server cluster may be accessed through a specified path. (LOADING), determining the fragment replica server situation that is currently updating data, so that the current fragmentation can be determined according to a preset number of fragment replica servers that can perform data update in parallel in the fragment server cluster. Is there a fragmented replica server in the server cluster that can update data?
  • the number of fragmented replica servers of the data is equal to the threshold; or if there is no fragment replica server that needs to be updated in the fragment server cluster, there is currently no fragment that can be updated in the fragment server cluster.
  • step 130 when it is determined that there is a fragment replica server in the fragment server cluster that can update data, the data of the fragment replica server that can update the data in the cluster server cluster is updated.
  • At least one fragment replica server that can update data in the fragment server cluster may be performed according to a preset threshold. Update. For example, the number of fragmented replica servers that can be updated in parallel in the cluster server cluster is set to 2 (that is, two patched replica servers are allowed to be updated in parallel), and the number of fragmented replica servers currently updating data is 1. The other unsynchronized fragment replica server in the fragment server cluster can also be updated.
  • the fragment replica server determines that there is no fragment replica server that can update the data
  • the data is not updated for the unupdated fragment replica server, and at this time they continue to provide the data processing service when there is a fragment replica server.
  • the ZooKeeper server monitors the status change of the updated fragment replica server, and sends a data update notification to the fragment replica server in the fragment server cluster to which the updated fragment replica server belongs, and the remaining After the updated fragment replica server receives the notification, steps 120 and 130 are performed until all the unsynchronized fragment replica servers in the fragmentation server cluster update the data.
  • the updated fragment server does not provide data processing services.
  • the status of the updated fragment replica server may be defined as an update state and the update status is notified to the summary server, so that the summary server knows which fragment replica server The data is being updated, ie no data processing service is provided.
  • the updated fragment replica server resumes providing the data processing service and notifies the summary server of its service status.
  • each fragment server cluster can perform the above steps in parallel, and such parallel update can greatly improve data update. effectiveness.
  • At least one fragment replica server in each fragment server cluster always provides data processing services, so as to ensure the integrity of the data processing results.
  • the index dictionary data is generally stored in two parts in two server clusters, that is, a basic information server cluster (BASIC_SERVERS) and a display.
  • Information server cluster (DETAIL_SERVERS).
  • the basic information server cluster is used to perform specific search tasks, and the operations of loading, querying, and updating the inverted index dictionary are all completed in the cluster.
  • the display information server cluster is used to query display information of search results, such as title, summary and description.
  • the summary server (MERGE_SERVER) is used to merge and sort and filter the search result information sent by the basic information server cluster and the presentation information server cluster.
  • the fragment server cluster may be a basic information server cluster or a presentation information server cluster.
  • the description of the search dictionary update process for each cluster is the same as that of the above method 100, and details are not described herein again.
  • the present invention adopts the distributed multi-machine cooperative hot switching, instead of A single-machine hot-swap solution, so a single machine only needs to load one copy of the data in the memory, which greatly reduces the amount of memory, thereby saving resources.
  • the use of distributed multi-machine cooperative hot-swapping can also greatly reduce the difficulty of operation and maintenance of online data processing services, and also improve the robustness of online services.
  • Zookeeper's file system diagram a distributed collaborative application using the distributed collaboration system Zookeeper to implement data update is illustrated.
  • the data of the data file is stored in two parts in a basic information server cluster (BASIC_SERVERS) and a presentation information server cluster (DETAIL_SERVERS).
  • BASIC_SERVERS basic information server cluster
  • DETAIL_SERVERS presentation information server cluster
  • MERGE_SERVER can each create their own nodes and their child nodes.
  • FIGS. 2a and 2b are diagrams of a Zookeeper file system of a data update method according to an embodiment of the present invention, in which only one REPALA node is allowed to perform data update in a SHARD node.
  • the BASIC_SERVERS node has two SHARD nodes, BS_SHARD0 and BS_SHARD1, of which BS_SHARD0 has two REPLICA nodes, BS_SHARD0_NODE0 and BS_SHARD0_NODE1, and BS_SHARD1 has two REPLICA nodes, BS_SHARD1_NODE0 and BS_SHARD1_NODE1.
  • MERGE_SERVER obtains all the fragment server clusters (BS_SHARD0, BS_SHARD1) in the current system by accessing the child nodes under BS_SERVERS, and obtains the REPICCA nodes that are providing services by accessing the child nodes under BS_SHARD0 and BS_SHARD1.
  • the child nodes under it are BS_SHARD0_NODE0 and BS_SHARD0_NODE1, and all the child nodes under the node LOADING (the nodes being updated) are removed, and finally the REPICCA node that is providing the service is obtained.
  • the MERGE_SERVER get the nodes being provided: BS_SHARD0_NODE0, BS_SHARD0_NODE1, BS_SHARD1_NODE0, BS_SHARD1_NODE1; there are no nodes that are updating data.
  • BS_SHARD0 when SHARD senses that there is new data to update, because only one REPICCA node is allowed to update data under the preset LOADING sub-node, BS_SHARD0 is taken as an example.
  • BS_SHARD0_NODE0 accesses its own SHARD (BS_SHARD0) through the specified path. Under the LOADING child node, it is found that the child node is empty, then the child node BS_SHARD0_NODE0 is identified under the LOADING node, that is, the data is updated. At the same time, remove yourself from the SERVER sub-node and create a node identifying itself under the MERGE_SERVER/LOADING node.
  • the nodes that MERGE_SERVER is providing are: BS_SHARD0_NODE1, BS_SHARD1_NODE0, BS_SHARD1_NODE1; the node that is updating data is: BS_SHARD0_NODE0.
  • the child node BS_SHARD0_NODE0 under the LOADING node has been offline, and the data is updated.
  • BS_SHARD0_NODE0 is updated, re-establish the child node that identifies itself under the SERVER node.
  • MERGE_SERBER listens to the SERVER node and gets the notification change to remove the node from the child node of MERGE_SERVER/LOADING.
  • the BS_SHARD0_NODE0 node that listens to MERGE_SERVER/LOADING gets the delete notification, knowing that MERGE_SERVER already knows the notification of the service it provided, and then removes itself from the BS_SHARD0/LOADING node.
  • the remaining unupdated REPLICA nodes under the BS_SHARD0_SERVER node access the LOADING subnode of the BS_SHARD0 to which they belong when the update notification is reached. If the subnode is found to be empty, then LOADING The sub-node BS_SHARD0_NODE0, which identifies itself, is created under the node, that is, the index dictionary is updated, and the remaining steps are the same as described above. This is repeated until the data of all REPLICA nodes under BS_SHARD0 is updated.
  • FIG. 3 is a diagram of a Zookeeper file system for data update in accordance with another embodiment of the present invention.
  • the capacity of the child node under LOADING can be set according to the actual situation (the number of REPLICA nodes, the load condition on the current line), that is, one node (but not all) is allowed to perform data update at the same time. Also, in an actual system, each SHARD node can execute an update program in parallel.
  • Figure 3 depicts a situation in which two REPLICA nodes can be allowed to update data in parallel in a SHARD node:
  • MERGE_SERVER gets the nodes being provided: BS_SHARD0_NODE1, BS_SHARD0_NODE3, BS_SHARD1_NODE0, BS_SHARD1_NODE2;
  • the node that is updating the data is BS_SHARD0_NODE0, BS_SHARD0_NODE2, BS_SHARD1_NODE1, BS_SHARD1_NODE3.
  • FIG. 4 is a schematic diagram of a data update switching system 400 in accordance with one embodiment of the present invention.
  • the system 400 includes: n fragment server clusters and a summary server; each fragment server cluster includes m (m>1) fragment replica servers.
  • the aggregation server can aggregate the data processing results of all fragmented server clusters to provide complete data processing results.
  • the summary server can obtain a sliced replica server that is providing data processing services by accessing the one or more cluster server clusters.
  • FIG. 5 is a block diagram of a slice replica server in accordance with one embodiment of the present invention.
  • the fragment replica server 500 includes (as shown in FIG. 5): a receiving module 510, configured to receive a data update request, and a determining module 520, configured to determine whether there is data in the cluster server cluster that can update data. a slice replica server; and an update module 530, configured to: when the slice server cluster in the slice server cluster has a tile replica server that can update data, the at least one updateable slice replica in the slice server cluster The server's data is updated.
  • the update module 530 includes (not shown): a service status notification sub-module, after the at least one piecemeal replica server that can update the data updates the data,
  • the state of the updated fragment replica server is defined as the service state and the service state is notified to the summary server; and the updated fragment replica server resumes providing the data processing service.
  • the update module 530 further includes (not shown): an update status notification sub-module for defining a status of the updated fragment replica server as an update status and updating the same Status notification summary server.
  • a tiled replica server belonging to a different slice server cluster that can update data is configured to update data in parallel.
  • each tile server cluster is configured to provide data processing services by at least one tile replica server.
  • each slice server cluster is configured to have a slice replica server less than or equal to a preset threshold in parallel. update data.
  • each tile server cluster is configured such that only one slice replica server is updating data.
  • the fragment server cluster is a basic information server set or a presentation information server cluster.
  • the functions of the device in this embodiment are basically corresponding to the method embodiment shown in FIG. 1 , and the detailed description in the description of the embodiment may be referred to the related description in the foregoing embodiment, and details are not described herein. .
  • the various component embodiments of the present invention may be implemented in hardware, or in a software module running on one or more processors, or in a combination thereof.
  • a microprocessor or digital signal processor may be used in practice to implement some or all of the functionality of some or all of the components in a client or server in accordance with embodiments of the present invention.
  • the invention can also be implemented as a device or device program (e.g., a computer program and a computer program product) for performing some or all of the methods described herein.
  • a program implementing the invention may be stored on a computer readable medium or may be in the form of one or more signals. Such signals may be downloaded from an Internet website, provided on a carrier signal, or provided in any other form.
  • Figure 6 illustrates a server, such as a search engine server, a sliced replica server, that can implement the above method in accordance with the present invention.
  • the server conventionally includes a processor 610 and a computer program product or computer readable medium in the form of a memory 630.
  • the memory 630 may be an electronic memory such as a flash memory, an EEPROM (Electrically Erasable Programmable Read Only Memory), an EPROM, a hard disk, or a ROM.
  • Memory 630 has a memory space 650 for program code 651 for performing any of the method steps described above.
  • storage space 650 for program code may include various program code 651 for implementing various steps in the above methods, respectively.
  • the program code can be read from or written to one or more computer program products.
  • These computer program products include program code carriers such as hard disks, compact disks (CDs), memory cards or floppy disks.
  • Such a computer program product is typically a portable or fixed storage unit as described with reference to FIG.
  • the storage unit may have a storage section, a storage space, and the like arranged similarly to the storage 630 in the server of FIG.
  • the program code can be compressed, for example, in an appropriate form.
  • the storage unit includes computer readable code 651', code that can be read by a processor, such as 610, which when executed by the server causes the server to perform various steps in the methods described above.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Human Computer Interaction (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明涉及一种数据更新的方法和系统,该方法包括:步骤110:接收数据更新请求;步骤120:确定分片服务器集群中是否有可以更新数据的分片复本服务器;步骤130:当确定分片服务器集群中有可以更新数据的分片复本服务器时,对分片服务器集群中的至少一个可以更新数据的分片复本服务器的数据进行更新;以及步骤140:当所述至少一个可以更新数据的分片复本服务器更新数据完毕后,针对剩余的未更新的分片复本服务器,执行步骤110至130,直至所有未更新的分片复本服务器更新数据完毕。根据本发明的方法,能够在不停掉线上服务的情况下,进行配置文件和检索词典等数据文件的自动切换并生效。

Description

一种数据更新的方法和系统 技术领域
本发明涉及数据更新领域,更具体地涉及一种数据更新的方法、和系统。
背景技术
对线上服务来说,经常需要更新在线配置和数据。例如,索引词典是搜索引擎非常重要的数据文件,其主要特点是:1)数据规模非常庞大;2)访问和更新都非常频繁。一般来说,对搜索引擎的数据更新就是指对索引词典的更新。随着现代搜索技术的发展,所需要处理的数据量越来越大,而且数据更新的次数也更加频繁。但是,对数据的更新,例如对索引词典的更新往往需要模块的重启才能生效,而且需要较多的人工参与。因此,这种数据更新方式给线上的数据处理的运行和运维都带来了较大的压力。
因此,在本领域中,需要一种能够在不停掉线上服务的情况下更新数据的方式,本发明提出利用一种创新的方式来实现对数据的更新,从而大大降低线上数据处理的运维难度,并且提高了线上数据处理的稳健性。
发明内容
鉴于上述问题,本发明提供一种数据更新技术,以在不停掉线上服务的情况下更新数据。
依据本发明的一个方面,提供了一种数据更新方法,其包括:将数据文件的数据划分成数据分块并分别存储到一个或多个分片服务器集群中;在每一个分片服务器集群中,将数据分块复制并存储至一个以上分片复本服务器;汇总服务器对分片服务器集群的数据处理结果进行汇总,并通过访问所述一个或多个分片服务器集群获取正在提供数据处理服务的分片复本服务器,该方法包括:步骤110:接收数据更新请求;步骤120:确定分片服务器集群中是否有可以更新数据的分片复本服务器;步骤130:当确定分片服务器集群中有可以更新数据的分片复本服务器时,对分片服务器集群中的至少一个可以更新数据 的分片复本服务器的数据进行更新;以及步骤140:当所述至少一个可以更新数据的分片复本服务器更新数据完毕后,针对剩余的未更新的分片复本服务器,执行步骤110至130,直至所有未更新的分片复本服务器更新数据完毕。
依据本发明的另一方面,提供了一种计算机程序,包括计算机可读代码,当所述计算机可读代码在计算机上运行时,将执行前述的数据更新方法。
依据本发明的又一方面,提供了一种计算机可读介质,其中存储了前述的计算机程序。
依据本发明的再一方面,提供了一种数据更新系统,其包括:一个或多个分片服务器集群、一个以上分片复本服务器以及汇总服务器;每一个分片服务器集群包括一个或多个分片复本服务器;所述分片复本服务器包括:接收模块,用于接收数据更新请求;确定模块,用于确定分片服务器集群中是否有可以更新数据的分片复本服务器;以及更新模块,用于当确定分片服务器集群中有可以更新数据的分片复本服务器时,对分片服务器集群中的所述至少一个可以更新的分片复本服务器的数据进行更新。
本发明通过采用分布式多机协作热切换的方案,使数据更新切换完全由程序自动化完成,即,在不停掉线上服务的情况下,进行配置文件和数据更新等数据文件的自动切换并生效;与现有技术相比,由于数据的更新完全由程序自动化完成,减少了人工干预,从而减少了人工成本和人工操作的出错率;并且,由于本发明采用分布式多机协作热切换,而不是单机的热切换解决方案,因此单机只需要在内存中加载一份数据,占用内存量大大减少,从而节省了资源。另外,采用分布式多机协作热切换还能够大大降低线上数据处理服务的运维难度,也提高了线上服务的稳健性。
上述说明仅是本发明技术方案的概述,为了能够更清楚了解本发明的技术手段,而可依照说明书的内容予以实施,并且为了让本发明的上述和其它目的、特征和优点能够更明显易懂,以下特举本发明的具体实施方式。
附图说明
通过阅读下文优选实施方式的详细描述,各种其他的优点和益处对于本领域普通技术人员将变得清楚明了。附图仅用于示出优选实施方式的目的,而并 不认为是对本发明的限制。而且在整个附图中,用相同的参考符号表示相同的部件。在附图中:
图1是根据本发明一个实施例的数据更新方法的流程图;
图2a和图2b是根据本发明一个实施例的数据更新方法的Zookeeper文件系统图;
图3根据本发明另一个实施例的数据更新方法的Zookeeper文件系统图;
图4是根据本发明一个实施例的数据更新系统的示意图;
图5是根据本发明一个实施例的一个分片复本服务器的框图;
图6示意性地示出了用于执行根据本发明的方法的服务器的框图;以及
图7示意性地示出了用于保持或者携带实现根据本发明的方法的程序代码的存储单元。
具体实施方式
下面将参照附图更详细地描述本公开的示例性实施例。虽然附图中显示了本公开的示例性实施例,然而应当理解,可以以各种形式实现本公开而不应被这里阐述的实施例所限制。相反,提供这些实施例是为了能够更透彻地理解本公开,并且能够将本公开的范围完整的传达给本领域的技术人员。
本发明的主要思想在于,通过采用分布式多机协作热切换的方案,使数据更新完全由程序自动化完成,即,在不停掉线上服务的情况下,进行配置文件和检索词典等数据文件的自动切换并生效。
为使本发明的目的、技术方案和优点更加清楚,下面将结合本发明具体实施例及相应的附图对本发明技术方案进行清楚、完整地描述。显然,所描述的实施例仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
在一个实现方式中,通过专业的系统运维人员进行数据更新切换,具体做法:1)为线上服务提供一个备用集群,这个备用集群相当于线上服务集群的一个镜像;2)每次数据更新时,先在线上更新好备用集群,待备用集群更新完毕准备好之后,将线上访问请求流量转移到备用集群上,这样备用集群就成了线上服务集群;3)将原来的线上服务集群变成备用集群。这种实现方式的 缺点:1)需要较多的人工参与,更新速度缓慢,而且容易造成线上故障频发,系统的稳定性难以保障;2)由于需要提供备用集群,导致需要更多的机器来满足日常的线上服务,但是大部分情况下备用集群都处于闲置状态,资源浪费严重。
在另一个实现方式中,通过采用双buffer(内存)进行数据更新切换,具体做法:1)数据处理系统在内存中把数据加载为两份,将两份数据的序号分别设定为0和1;2)其中一份数据表用于提供线上服务,例如,cur_idx对应当前提供线上服务的数据,其当前序号为0,即表示当前0号数据提供线上服务;3)当系统发现有新的数据更新生成推送时,利用内存中的1-cur_idx号数据来加载该更新数据,待数据更新完毕后令cur_idx=1-cur_idx,从而使cur_idx数据提供线上服务。这种实现方式的缺点在于,由于系统需要把数据加载为两份,其中一份用于更新,另一份用于当前提供线上服务,更新时在两份数据之间进行切换,因此系统需要占用较多内存。
为了方便下文描述,首先介绍本申请的各个实施方式中所涉及的术语的解释。
数据热切换:指在不停掉线上服务的情况下,进行数据的更新。
分布式协作:指在分布式系统中,各个模块或节点机器协作共同完成一个任务。
分片服务器集群(SHARD SERVER):把数据文件的数据分为几份,将每一份数据存储在一个分片服务器集群中。所有的分片服务器集群的数据的总和即为全部数据。
分片复本服务器(REPLICA SERVER):在每个分片服务器集群中,把数据完全复制为若干份(一份以上)复本,将每一个分片复本的数据存储在一个分片复本服务器中。
Zookeeper是一个针对大型分布式系统的可靠的协调系统,其提供的功能包括:配置维护、名字服务、分布式同步、组服务等。Zookeeper的目标就是封装好复杂易出错的关键服务,将简单易用的接口和性能高效、功能稳定的系统提供给用户。
在本发明中,为了对分布式协作的分片服务器集群进行协调管理,可以采 用Zookeeper系统对各分片服务器集群中的分片复本服务器提供配置维护、名字服务、分布式同步、组服务等协调管理的服务,从而实现高效和可靠的协同工作。
Zookeeper系统的核心是一个精简的文件系统,每一个文件节点在Zookeeper系统中称之为节点(NODE),每个NODE节点上可以设置监听器(Watcher),用来监控NODE节点及其子节点的状态变化,同时,Watcher可以监控目录节点的状态变化以及其子目录的状态变化。一旦所监听的节点的状态发生变化,Zookeeper服务器就会通知所有设置在这个目录节点上的Watcher,从而相应的客户端都会很快知道它所关注的目录节点的状态发生变化,从而做出相应的反应。
下面将参考附图,详细描述本发明改进的技术方案。
参考图1,图1是根据本发明一个实施例的数据更新方法100的流程图。如图1所示,方法100开始于步骤110。
在步骤110,接收数据更新请求。
具体而言,首先将数据文件的数据划分成数据分块并分别存储到一个或多个分片服务器集群中,然后在每一个分片服务器集群中,将数据分块复制并存储至一个以上分片复本服务器。当提供数据处理服务时,汇总服务器可以对分片服务器集群的数据处理结果进行汇总,从而提供完整的数据处理结果。此外,汇总服务器可以通过访问所述一个或多个分片服务器集群获取正在提供数据处理服务的分片复本服务器。
本发明采用push的方式通知数据更新,即,负责更新数据的系统,在数据更新完毕之后,发送一个数据更新的消息通知给使用该数据的系统,使用该数据的系统得到通知后更新数据。
在本发明中,Zookeeper服务器接收到负责更新数据的系统发送的数据更新的消息后,通知相应的分片服务器集群中的所有分片复本服务器更新数据。
在步骤120,确定分片服务器集群中是否有可以更新数据的分片复本服务器。
具体而言,当分片服务器集群中的各分片复本服务器感知到有新的数据需要更新时,可以通过指定路径访问与该分片服务器集群相对应的更新标识文件 (LOADING),确定当前正在更新数据的分片复本服务器情况,从而可以根据预先设定的在该分片服务器集群中可以并行进行数据更新的分片复本服务器数量,确定当前在该分片服务器集群中是否有可以更新数据的分片复本服务器。如果在该分片服务器集群中存在需要进行数据更新的分片复本服务器,并且当前在该分片服务器集群中正在更新数据的分片复本服务器数量小于该预先设定的阈值时,则当前在该分片服务器集群中有可以进行数据更新的分片复本服务器;如果在该分片服务器集群中存在需要进行数据更新的分片复本服务器,并且当前在该分片服务器集群中正在更新数据的分片复本服务器数量等于阈值;或者如果在该分片服务器集群中不存在需要进行数据更新的分片复本服务器,则在该分片服务器集群中当前没有可以进行数据更新的分片复本服务器。
在步骤130,当确定分片服务器集群中有可以更新数据的分片复本服务器时,对分片服务器集群中的至少一个可以更新数据的分片复本服务器的数据进行更新。
具体而言,当确定分片服务器集群中有可以更新数据的分片复本服务器时,可以根据预先设定的阈值,对分片服务器集群中的至少一个可以更新数据的分片复本服务器进行更新。例如,将分片服务器集群中可以并行更新的分片复本服务器数量阈值设定为2(即容许2台分片复本服务器并行更新),而当前正在更新数据的分片复本服务器数量为1,则可以对该分片服务器集群中另外1台未更新的分片复本服务器也进行更新。
当分片复本服务器确定没有可以更新数据的分片复本服务器时,暂时不对未更新的分片复本服务器进行数据的更新,此时它们继续保持提供数据处理服务,当有分片复本服务器更新完毕时,Zookeeper服务器监听到所更新的分片复本服务器的状态变化后,向所更新的分片复本服务器所属的分片服务器集群中分片复本服务器发送数据更新通知,剩余的未更新的分片复本服务器收到通知后执行步骤120和130,直至分片服务器集群中所有未更新的分片复本服务器更新数据完毕。
为了保证检索结果的一致性,更新中的分片复本服务器不提供数据处理服务。根据本发明的一个实施例,当对分片服务器集群中的至少一个可以更新数 据的分片复本服务器进行更新时,可以将所更新的分片复本服务器的状态定义为更新状态并将其所述更新状态通知汇总服务器,从而使汇总服务器得知哪些分片复本服务器正在更新数据,即,不提供数据处理服务。
当至少一个未更新的分片复本服务器更新数据完毕后,所更新的分片复本服务器恢复提供数据处理服务,并且将其服务状态通知汇总服务器。
上述为了方便描述的目的,只描述了一个分片服务器集群的数据更新过程,在实际的系统中,每个分片服务器集群可以是并行执行上述步骤的,这种并行更新可以大大提高数据的更新效率。
需要指出的是,在分片服务器集群的更新过程中,每个分片服务器集群中始终至少有一个分片复本服务器提供数据处理服务,这样可以保证数据处理结果的完整性。
另外,根据本发明的数据更新方法的一个实施例,在搜索引擎检索集群中,通常将索引词典数据分为两部分分别存储在两个服务器集群中,即,基础信息服务器集群(BASIC_SERVERS)和展示信息服务器集群(DETAIL_SERVERS)。其中,基础信息服务器集群用于执行具体的搜索任务,倒排索引词典的加载,查询和更新等操作都在该集群内完成。展示信息服务器集群用于查询搜索结果的展示信息,比如标题,摘要和描述等展示信息。汇总服务器(MERGE_SERVER)用于合并基础信息服务器集群和展示信息服务器集群发送过来的搜索结果信息,并进行排序和过滤。
根据本发明的一个实施例,所述分片服务器集群可以是基础信息服务器集群或展示信息服务器集群。对各集群的检索词典更新过程的描述与上述方法100的步骤相同,在此不再赘述。
至此,描述了根据本发明一个实施例的数据更新方法100的流程图。其通过采用分布式多机协作热切换的方案,使数据更新切换完全由程序自动化完成,即,在不停掉线上服务的情况下,进行配置文件和数据更新等数据文件的自动切换并生效。与现有技术相比,由于数据的更新完全由程序自动化完成,减少了人工干预,从而减少了人工成本和人工操作的出错率;并且,由于本发明采用分布式多机协作热切换,而不是单机的热切换解决方案,因此单机只需要在内存中加载一份数据,占用内存量大大减少,从而节省了资源。另外,采 用分布式多机协作热切换还能够大大降低线上数据处理服务的运维难度,也提高了线上服务的稳健性。
下面,参考Zookeeper的文件系统图,举例说明采用分布式协作系统Zookeeper来实现数据更新的分布式协作应用。
在以下的示例中,将数据文件的数据分为两部分分别存储在基础信息服务器集群(BASIC_SERVERS)和展示信息服务器集群(DETAIL_SERVERS)中。在Zookeeper的文件系统图中,BASIC_SERVERS、DETAIL_SERVERS、MERGE_SERVER可以分别建立各自的节点和它们下面的子节点。
图2a和图2b是根据本发明一个实施例的数据更新方法的Zookeeper文件系统图,其中,预先设定一个SHARD节点中仅容许一个REPLICA节点正在进行数据更新。
在图2a中,BASIC_SERVERS节点有2个SHARD节点,分别为BS_SHARD0和BS_SHARD1,其中BS_SHARD0中有2个REPLICA节点,分别为BS_SHARD0_NODE0和BS_SHARD0_NODE1,BS_SHARD1中有2个REPLICA节点,分别为BS_SHARD1_NODE0和BS_SHARD1_NODE1。
MERGE_SERVER通过访问BS_SERVERS下的子节点获取当前系统中的所有分片服务器集群(BS_SHARD0,BS_SHARD1),通过访问BS_SHARD0,BS_SHARD1下的子节点来获取正在提供服务的REPLICA节点。例如,对BS_SHARD0来说,其下的子节点为BS_SHARD0_NODE0和BS_SHARD0_NODE1,同时要除去节点LOADING下的所有子节点(正在更新中的节点),最终得到正在提供服务的REPLICA节点。
由图2a可知,MERGE_SERVER得到正在提供的节点为:BS_SHARD0_NODE0,BS_SHARD0_NODE1,BS_SHARD1_NODE0,BS_SHARD1_NODE1;无正在更新数据的节点。
在图2b中,当SHARD感知到有新的数据需要更新时,由于预先设定LOADING子节点下仅容许一个REPLICA节点正在进行数据更新,以BS_SHARD0为例,BS_SHARD0_NODE0通过指定路径访问其所属SHARD(BS_SHARD0)下的LOADING子节点,发现该子节点下为空,则在LOADING节点下建立标识自己的子节点BS_SHARD0_NODE0,即,进行数据的更新, 同时把自己从SERVER子节点中删除,并且在MERGE_SERVER/LOADING节点下建立标识自己的节点。
由图2b可知,MERGE_SERVER得到正在提供的节点为:BS_SHARD0_NODE1,BS_SHARD1_NODE0,BS_SHARD1_NODE1;正在更新数据的节点为:BS_SHARD0_NODE0。
此时,LOADING节点下的子节点BS_SHARD0_NODE0已下线,进行数据的更新。BS_SHARD0_NODE0更新完毕之后,在SERVER节点下重新建立标识自己的子节点。MERGE_SERBER监听SERVER节点得到通知变化后把该节点从MERGE_SERVER/LOADING的子节点中删除。监听MERGE_SERVER/LOADING的BS_SHARD0_NODE0节点得到删除通知,知道MERGE_SERVER已经知道自己提供服务的通知,于是把自己从BS_SHARD0/LOADING节点中删除。当BS_SHARD0/LOADING的状态发生变化时,BS_SHARD0_SERVER节点下的剩余的未更新的REPLICA节点都到更新通知时,再次访问其所属BS_SHARD0下的LOADING子节点,如发现该子节点下为空,则在LOADING节点下建立标识自己的子节点BS_SHARD0_NODE0,即,进行索引词典的更新,其余步骤与上述描述相同。如此反复,直到BS_SHARD0下所有REPLICA节点的数据都更新完毕。
对DETAIL_SERVERS集群来说,其数据的更新情况与BASIC_SERVERS集群类似,这里不再赘述。
图3根据本发明另一个实施例的数据更新的Zookeeper文件系统图。
具体来说,如果一个SHARD中REPLICA数量很多,则会由于REPLICA排队更新时间过长而导致整个SHARD更新周期过长。在这种情况下,可以根据实际情况(REPLICA节点的数量,当前线上的负载情况),设定LOADING下子节点的容量,即一次容许多个节点(但不是全部)同时进行数据更新。并且,在实际的系统中,每个SHARD节点可以是并行执行更新程序的。
图3描述了预先设定在一个SHARD节点中可以容许2个REPLICA个节点并行进行数据更新的情形:
在图3中,MERGE_SERVER得到正在提供的节点为:BS_SHARD0_NODE1,BS_SHARD0_NODE3,BS_SHARD1_NODE0, BS_SHARD1_NODE2;正在更新数据的节点为BS_SHARD0_NODE0,BS_SHARD0_NODE2,BS_SHARD1_NODE1,BS_SHARD1_NODE3。
图4是根据本发明一个实施例的数据更新切换系统400的示意图。如图4所示,系统400包括:n个分片服务器集群和汇总服务器;每一个分片服务器集群包括m(m>1)个分片复本服务器。
当提供数据处理服务时,汇总服务器可以对所有分片服务器集群的数据处理结果进行汇总,从而提供完整的数据处理结果。此外,汇总服务器可以通过访问所述一个或多个分片服务器集群获取正在提供数据处理服务的分片复本服务器。
图5是根据本发明一个实施例的一个分片复本服务器的框图。
如图5所示,分片复本服务器500包括(如图5所示):接收模块510,用于接收数据更新请求;确定模块520,用于确定分片服务器集群中是否有可以更新数据的分片复本服务器;以及更新模块530,用于当确定分片服务器集群中有可以更新数据的分片复本服务器时,对分片服务器集群中的所述至少一个可以更新的分片复本服务器的数据进行更新。
根据本发明的一个实施例,所述更新模块进530一步包括(未示出):服务状态通知子模块,用于当所述至少一个可以更新数据的分片复本服务器更新数据完毕后,将所更新的分片复本服务器的状态定义为服务状态并将所述服务状态通知汇总服务器;并且更新完毕的分片复本服务器恢复提供数据处理服务。
根据本发明的一个实施例,所述更新模块530进一步包括(未示出):更新状态通知子模块,用于将所更新的分片复本服务器的状态定义为更新状态并将其所述更新状态通知汇总服务器。
根据本发明的一个实施例,属于不同分片服务器集群的可以更新数据的分片复本服务器被配置为并行更新数据。
根据本发明的一个实施例,在分片服务器集群的数据更新过程中,每个分片服务器集群被配置为至少有一个分片复本服务器提供数据处理服务。
根据本发明的一个实施例,在分片服务器集群的数据更新过程中,每个分片服务器集群被配置为有小于或等于预先设定的阈值的分片复本服务器并行 更新数据。
根据本发明的一个实施例,在分片服务器集群的数据更新过程中,每个分片服务器集群被配置为仅有一个分片复本服务器正在更新数据。
根据本发明的一个实施例,所述分片服务器集群是基础信息服务器集或展示信息服务器集群。
由于本实施例的装置所实现的功能基本相应于前述图1所示的方法实施例,故本实施例的描述中未详尽之处,可以参见前述实施例中的相关说明,在此不做赘述。
本发明的各个部件实施例可以以硬件实现,或者以在一个或者多个处理器上运行的软件模块实现,或者以它们的组合实现。本领域的技术人员应当理解,可以在实践中使用微处理器或者数字信号处理器(DSP)来实现根据本发明实施例的客户端或服务器中的一些或者全部部件的一些或者全部功能。本发明还可以实现为用于执行这里所描述的方法的一部分或者全部的设备或者设备程序(例如,计算机程序和计算机程序产品)。这样的实现本发明的程序可以存储在计算机可读介质上,或者可以具有一个或者多个信号的形式。这样的信号可以从因特网网站上下载得到,或者在载体信号上提供,或者以任何其他形式提供。
例如,图6示出了可以实现根据本发明上述方法的服务器,例如搜索引擎服务器、分片复本服务器。该服务器传统上包括处理器610和以存储器630形式的计算机程序产品或者计算机可读介质。存储器630可以是诸如闪存、EEPROM(电可擦除可编程只读存储器)、EPROM、硬盘或者ROM之类的电子存储器。存储器630具有用于执行上述方法中的任何方法步骤的程序代码651的存储空间650。例如,用于程序代码的存储空间650可以包括分别用于实现上面的方法中的各种步骤的各个程序代码651。这些程序代码可以从一个或者多个计算机程序产品中读出或者写入到这一个或者多个计算机程序产品中。这些计算机程序产品包括诸如硬盘,紧致盘(CD)、存储卡或者软盘之类的程序代码载体。这样的计算机程序产品通常为如参考图7所述的便携式或者固定存储单元。该存储单元可以具有与图6的服务器中的存储器630类似布置的存储段、存储空间等。程序代码可以例如以适当形式进行压缩。通常,存 储单元包括计算机可读代码651’,即可以由例如诸如610之类的处理器读取的代码,这些代码当由服务器运行时,导致该服务器执行上面所描述的方法中的各个步骤。
本文中所称的“一个实施例”、“实施例”或者“一个或者多个实施例”意味着,结合实施例描述的特定特征、结构或者特性包括在本发明的至少一个实施例中。此外,请注意,这里“在一个实施例中”的词语例子不一定全指同一个实施例。
在此处所提供的说明书中,说明了大量具体细节。然而,能够理解,本发明的实施例可以在没有这些具体细节的情况下被实践。在一些实例中,并未详细示出公知的方法、结构和技术,以便不模糊对本说明书的理解。
应该注意的是上述实施例对本发明进行说明而不是对本发明进行限制,并且本领域技术人员在不脱离所附权利要求的范围的情况下可设计出替换实施例。在权利要求中,不应将位于括号之间的任何参考符号构造成对权利要求的限制。单词“包含”不排除存在未列在权利要求中的元件或步骤。位于元件之前的单词“一”或“一个”不排除存在多个这样的元件。本发明可以借助于包括有若干不同元件的硬件以及借助于适当编程的计算机来实现。在列举了若干装置的单元权利要求中,这些装置中的若干个可以是通过同一个硬件项来具体体现。单词第一、第二、以及第三等的使用不表示任何顺序。可将这些单词解释为名称。
此外,还应当注意,本说明书中使用的语言主要是为了可读性和教导的目的而选择的,而不是为了解释或者限定本发明的主题而选择的。因此,在不偏离所附权利要求书的范围和精神的情况下,对于本技术领域的普通技术人员来说许多修改和变更都是显而易见的。对于本发明的范围,对本发明所做的公开是说明性的,而非限制性的,本发明的范围由所附权利要求书限定。

Claims (19)

  1. 一种数据更新方法,其特征在于,将数据文件的数据划分成数据分块并分别存储到一个或多个分片服务器集群中;在每一个分片服务器集群中,将数据分块复制并存储至一个以上分片复本服务器;汇总服务器对分片服务器集群的数据处理结果进行汇总,并通过访问所述一个或多个分片服务器集群获取正在提供数据处理服务的分片复本服务器,该方法包括:
    步骤110:接收数据更新请求;
    步骤120:确定分片服务器集群中是否有可以更新数据的分片复本服务器;
    步骤130:当确定分片服务器集群中有可以更新数据的分片复本服务器时,对分片服务器集群中的至少一个可以更新数据的分片复本服务器的数据进行更新;以及
    步骤140:当所述至少一个可以更新数据的分片复本服务器更新数据完毕后,针对剩余的未更新的分片复本服务器,执行步骤110至130,直至所有未更新的分片复本服务器更新数据完毕。
  2. 根据权利要求1所述的方法,其特征在于,还包括:
    步骤150:当确定没有可以更新数据的分片复本服务器时,返回步骤101。
  3. 根据权利要求1所述的方法,其特征在于,所述步骤140进一步包括:当所述至少一个可以更新数据的分片复本服务器更新数据完毕后,将所更新的分片复本服务器的状态定义为服务状态并将所述服务状态通知汇总服务器;并且更新完毕的分片复本服务器恢复提供数据处理服务。
  4. 根据权利要求1所述的方法,其特征在于,所述步骤130进一步包括:将所更新的分片复本服务器的状态定义为更新状态并将其所述更新状态通知汇总服务器。
  5. 根据权利要求1至4中任一项权利要求所述的方法,其特征在于,属于不同分片服务器集群的可以更新数据的分片复本服务器并行更新数据。
  6. 根据权利要求1至4中任一项权利要求所述的方法,其特征在于,在分片服务器集群的数据更新过程中,每个分片服务器集群中至少有一个分片复 本服务器提供数据处理服务。
  7. 根据权利要求1至4中任一项权利要求所述的方法,其特征在于,在分片服务器集群的数据更新过程中,每个分片服务器集群中有小于或等于预先设定的阈值的分片复本服务器并行更新数据。
  8. 根据权利要求1至4中任一项权利要求所述的方法,其特征在于,在分片服务器集群的数据更新过程中,每个分片服务器集群中仅有一个分片复本服务器正在更新数据。
  9. 根据权利要求1至4中任一项权利要求所述的方法,其特征在于,所述分片服务器集群是基础信息服务器集群或展示信息服务器集群。
  10. 一种计算机程序,包括计算机可读代码,当所述计算机可读代码在计算机上运行时,将执行根据权利要求1至9中的任一项所述的数据更新方法。
  11. 一种计算机可读介质,其中存储了如权利要求10所述的计算机程序。
  12. 一种数据更新系统,其特征在于,包括:一个或多个分片服务器集群、一个以上分片复本服务器以及汇总服务器;
    每一个分片服务器集群包括一个或多个分片复本服务器;
    所述分片复本服务器包括:接收模块,用于接收数据更新请求;
    确定模块,用于确定分片服务器集群中是否有可以更新数据的分片复本服务器;以及
    更新模块,用于当确定分片服务器集群中有可以更新数据的分片复本服务器时,对分片服务器集群中的所述至少一个可以更新的分片复本服务器的数据进行更新。
  13. 根据权利要求12所述的系统,其特征在于,所述更新模块进一步包括:服务状态通知子模块,用于当所述至少一个可以更新数据的分片复本服务器更新数据完毕后,将所更新的分片复本服务器的状态定义为服务状态并将所述服务状态通知汇总服务器;并且更新完毕的分片复本服务器恢复提供数据处理服务。
  14. 根据权利要求12所述的系统,其特征在于,所述更新模块进一步包括:更新状态通知子模块,用于将所更新的分片复本服务器的状态定义为更新状态并将其所述更新状态通知汇总服务器。
  15. 根据权利要求12至14中任一项权利要求所述的系统,其特征在于,属于不同分片服务器集群的可以更新数据的分片复本服务器被配置为并行更新数据。
  16. 根据权利要求12至14中任一项权利要求所述的系统,其特征在于,在分片服务器集群的数据更新过程中,每个分片服务器集群被配置为至少有一个分片复本服务器提供数据处理服务。
  17. 根据权利要求12至14中任一项权利要求所述的系统,其特征在于,在分片服务器集群的数据更新过程中,每个分片服务器集群被配置为有小于或等于预先设定的阈值的分片复本服务器并行更新数据。
  18. 根据权利要求12至14中任一项权利要求所述的系统,其特征在于,在分片服务器集群的数据更新过程中,每个分片服务器集群被配置为仅有一个分片复本服务器正在更新数据。
  19. 根据权利要求12至14中任一项权利要求所述的系统,其特征在于,所述分片服务器集群是基础信息服务器集或展示信息服务器集群。
PCT/CN2014/095957 2014-03-13 2014-12-31 一种数据更新的方法和系统 WO2015135370A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201410093941.4A CN104917798A (zh) 2014-03-13 2014-03-13 一种数据更新的方法和系统
CN201410093941.4 2014-03-13

Publications (1)

Publication Number Publication Date
WO2015135370A1 true WO2015135370A1 (zh) 2015-09-17

Family

ID=54070901

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/095957 WO2015135370A1 (zh) 2014-03-13 2014-12-31 一种数据更新的方法和系统

Country Status (2)

Country Link
CN (1) CN104917798A (zh)
WO (1) WO2015135370A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105516343A (zh) * 2015-12-31 2016-04-20 中国电子科技集团公司第五十四研究所 一种网络动态自组织的文件共享系统及实现方法
CN112671905A (zh) * 2020-12-23 2021-04-16 广州三七互娱科技有限公司 服务调度方法、装置及系统
CN113760933A (zh) * 2021-08-25 2021-12-07 福建天泉教育科技有限公司 一种数据更新方法及终端

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105871649A (zh) * 2016-06-21 2016-08-17 上海帝联信息科技股份有限公司 节点服务器、服务端及其配置文件更新、更新控制方法
CN106067886B (zh) * 2016-08-03 2019-06-14 广州品唯软件有限公司 安全策略更新方法及系统
CN106790549B (zh) * 2016-12-23 2021-01-15 北京奇虎科技有限公司 一种数据更新方法及装置
CN108614713B (zh) * 2018-03-14 2021-07-27 挖财网络技术有限公司 一种自动化应用发布的方法、系统及装置

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102436355A (zh) * 2011-11-15 2012-05-02 华为技术有限公司 一种数据传输方法、设备及系统
CN102857554A (zh) * 2012-07-26 2013-01-02 福建网龙计算机网络信息技术有限公司 基于分布式存储系统进行数据冗余处理方法
CN102902746A (zh) * 2012-09-18 2013-01-30 杭州勒卡斯广告策划有限公司 一种海量数据处理方法、装置及系统

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8768895B2 (en) * 2007-04-11 2014-07-01 Emc Corporation Subsegmenting for efficient storage, resemblance determination, and transmission
CN101753349B (zh) * 2008-12-09 2012-08-15 中国移动通信集团公司 数据节点的升级方法、升级调度节点及升级系统
US8171202B2 (en) * 2009-04-21 2012-05-01 Google Inc. Asynchronous distributed object uploading for replicated content addressable storage clusters

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102436355A (zh) * 2011-11-15 2012-05-02 华为技术有限公司 一种数据传输方法、设备及系统
CN102857554A (zh) * 2012-07-26 2013-01-02 福建网龙计算机网络信息技术有限公司 基于分布式存储系统进行数据冗余处理方法
CN102902746A (zh) * 2012-09-18 2013-01-30 杭州勒卡斯广告策划有限公司 一种海量数据处理方法、装置及系统

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105516343A (zh) * 2015-12-31 2016-04-20 中国电子科技集团公司第五十四研究所 一种网络动态自组织的文件共享系统及实现方法
CN105516343B (zh) * 2015-12-31 2018-07-17 中国电子科技集团公司第五十四研究所 一种网络动态自组织的文件共享实现方法
CN112671905A (zh) * 2020-12-23 2021-04-16 广州三七互娱科技有限公司 服务调度方法、装置及系统
CN113760933A (zh) * 2021-08-25 2021-12-07 福建天泉教育科技有限公司 一种数据更新方法及终端
CN113760933B (zh) * 2021-08-25 2023-11-03 福建天泉教育科技有限公司 一种数据更新方法及终端

Also Published As

Publication number Publication date
CN104917798A (zh) 2015-09-16

Similar Documents

Publication Publication Date Title
WO2015135370A1 (zh) 一种数据更新的方法和系统
CN109558215B (zh) 虚拟机的备份方法、恢复方法、装置及备份服务器集群
US11245758B2 (en) System and method for automatic cloud-based full-data restore to mobile devices
WO2019154394A1 (zh) 分布式数据库集群系统、数据同步方法及存储介质
EP3127018B1 (en) Geographically-distributed file system using coordinated namespace replication
JP6553822B2 (ja) 分散システムにおける範囲の分割および移動
US9268655B2 (en) Interface for resolving synchronization conflicts of application states
CN108153849B (zh) 一种数据库表切分方法、装置、系统和介质
WO2019001017A1 (zh) 集群间数据迁移方法、系统、服务器及计算机存储介质
US20150213100A1 (en) Data synchronization method and system
CN111475483B (zh) 数据库迁移方法、装置及计算设备
US8930364B1 (en) Intelligent data integration
CN107919977B (zh) 一种基于Paxos协议的在线扩容、在线缩容的方法和装置
US20170168756A1 (en) Storage transactions
CN105493474B (zh) 用于支持用于同步分布式数据网格中的数据的分区级别日志的系统及方法
JP5686034B2 (ja) クラスタシステム、同期制御方法、サーバ装置および同期制御プログラム
US9747291B1 (en) Non-disruptive upgrade configuration translator
CN111726388A (zh) 一种跨集群高可用的实现方法、装置、系统及设备
CN111209265A (zh) 一种数据库切换方法和终端设备
WO2019001021A1 (zh) 数据处理方法、装置、系统、服务器及计算机存储介质
US8090695B2 (en) Dynamic restoration of message object search indexes
US9639502B1 (en) Techniques for managing computing resources
EP4250119A1 (en) Data placement and recovery in the event of partition failures
US9754006B2 (en) System and method for processing long relation chain data of user
CN113157716B (zh) 一种数据处理方法、装置、设备及介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14885372

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14885372

Country of ref document: EP

Kind code of ref document: A1