WO2021103499A1 - 一种基于多活数据中心的流量切换方法及装置 - Google Patents

一种基于多活数据中心的流量切换方法及装置 Download PDF

Info

Publication number
WO2021103499A1
WO2021103499A1 PCT/CN2020/097003 CN2020097003W WO2021103499A1 WO 2021103499 A1 WO2021103499 A1 WO 2021103499A1 CN 2020097003 W CN2020097003 W CN 2020097003W WO 2021103499 A1 WO2021103499 A1 WO 2021103499A1
Authority
WO
WIPO (PCT)
Prior art keywords
task
traffic
data center
configuration information
application server
Prior art date
Application number
PCT/CN2020/097003
Other languages
English (en)
French (fr)
Inventor
葛耀
杨涛
葛伟
王鑫
林仁山
Original Assignee
苏宁易购集团股份有限公司
苏宁云计算有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 苏宁易购集团股份有限公司, 苏宁云计算有限公司 filed Critical 苏宁易购集团股份有限公司
Priority to CA3162740A priority Critical patent/CA3162740A1/en
Publication of WO2021103499A1 publication Critical patent/WO2021103499A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • G06F11/2035Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant without idle spare hardware
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • G06F11/2023Failover techniques
    • G06F11/203Failover techniques using migration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24552Database cache management

Definitions

  • This application relates to the field of data processing, and in particular to a flow method and device based on a multi-active data center.
  • Disaster tolerance system refers to a system that can cope with various data disasters for computer information systems.
  • the computer system suffers from irresistible natural disasters such as fires, floods, earthquakes, wars, and man-made disasters such as computer crimes, computer viruses, power failures, network/communication failures, hardware/software errors, and human operation errors, data transmission is interrupted.
  • the disaster recovery system will ensure the security of user data.
  • the current disaster tolerance mostly adopts the main backup mode, that is, a disaster tolerance backup center is established far away from the computer system.
  • the disaster tolerance backup center does not undertake any online business traffic, but regularly backs up the data in the computer system and stores it in In the disaster recovery backup center, when a disaster occurs and the system is paralyzed, the backup data is used to restore the operation of the system in the disaster recovery backup center.
  • the disaster recovery backup center does not carry real online business traffic, when a disaster occurs, we cannot assert that the backup center is available, and because the backup system needs to be manually started, the requirements for system maintenance personnel are high, and manual startup is not effective. The disaster response is not rapid enough. The delay period will also make it impossible to record various data during the downtime.
  • the so-called multi-active is actually that multiple sites (located in a remote computer room) set up the same database, and carry business traffic at the same time. You can decide how to share traffic between sites based on business attributes such as user ID and region, such as ID1-ID49.
  • the data processing request of the user is allocated to the first site for processing, and the data processing request of the user with ID50-ID99 is allocated to the second site for processing.
  • the first site fails, you can switch to the second site quickly (minutes) and smoothly.
  • the damage to the business is very small.
  • each site in the multiple-active strategy has the ability to carry business traffic in real time, and its stability is reliable.
  • traffic switching may not only occur when the data center fails. Sometimes traffic switching is performed based on other conditions. For example, a certain data center has a large amount of tasks during a special period, and some of them need to be allocated.
  • the fault message is notified to the maintenance personnel, and the maintenance personnel configure the traffic switching information, start the traffic switching process, and switch the site traffic.
  • This application provides a flow method and device based on a multi-active data center to solve the problem of delay in traffic switching of the multi-active data center in the prior art, which causes data loss within the delay time.
  • the first aspect provides a traffic switching method based on a multi-active data center, the method including:
  • the application server executes the operation of obtaining traffic configuration information;
  • the traffic configuration information is that the multi-active switching platform judges that there is a data center that needs traffic switching according to the preset when judging from the data transmission status information of each data center. Rules; the multi-active data center has at least two data centers; the traffic configuration information is used to indicate the traffic distribution corresponding to each data center;
  • the application server parses the traffic configuration information to obtain the traffic distribution corresponding to the data center in which it is located;
  • the application server judges whether the application server has the processing authority of the currently pending task according to the traffic distribution and the type information of the currently pending task;
  • the application server loads the task for task processing.
  • the application server obtains the traffic configuration information through the following steps:
  • the application server reads the cache and determines whether the traffic configuration information exists in the cache
  • the application server reads the traffic configuration information from the multiple-active switching platform.
  • the method further includes:
  • the application server When the application server detects a change in the traffic configuration information of the multiple-active switching platform, it reads the changed traffic configuration information and synchronizes the changed traffic configuration information to the cache.
  • the application server judging whether the application server has the processing authority of the task currently to be processed according to the traffic distribution and the type information of the task currently to be processed includes:
  • the application server determines that the current task to be processed is an exclusive task, then determine whether the traffic distribution corresponding to the data center where the application server is located is empty;
  • the application server has the processing authority of the currently pending task.
  • the traffic distribution includes a set of sub-database numbers with read and write permissions corresponding to each data center;
  • the judging whether the traffic distribution corresponding to the data center where the application server is located is empty includes:
  • the multi-active data center has a main data center, and the traffic configuration information further includes the main data center identifier;
  • the application server judging whether the application server has the processing authority of the current task to be processed according to the traffic distribution and the type information of the task currently to be processed includes:
  • the application server determines that the current task to be processed is a competitive task, then determine whether the data center identifier corresponding to the application server is the same as the main data center identifier;
  • the application server has the processing authority of the task currently to be processed.
  • the traffic distribution includes a set of sub-library numbers with read and write permissions corresponding to each data center;
  • the task processing performed by the application server loading task includes:
  • the application server searches for the current task to be processed from the cached task queue, and if it is found, then according to the sub-library number corresponding to the current task to be processed and the application server has read and write permissions according to the data center where it is located To determine whether the application server has the authority to process the currently pending task;
  • the application server determines the status of the sub-library number corresponding to the current task to be processed as processing and saves it in the task configuration information;
  • the application server changes the status of the bank number corresponding to the current task to be processed to to be processed and saves it in the task configuration information.
  • a second aspect provides a traffic switching method based on a multi-active data center, the method including:
  • the multi-active switching platform obtains data transmission status information of each data center; the multi-active data center has at least two data centers;
  • the multi-active switching platform makes a judgment based on the state information and preset conditions. When it is judged that traffic switching is required, it generates traffic configuration information according to the preset rules so that the application server can obtain the task scheduling instruction after receiving the task scheduling instruction.
  • the traffic configuration information is combined with the obtained task configuration information to load tasks for task processing; the traffic configuration information is used to indicate the traffic distribution corresponding to each data center.
  • the multi-active switching platform makes a judgment based on the state information and preset conditions, and when it is judged that traffic switching is required, generating traffic configuration information according to the preset rules includes:
  • the multiple-active switching platform determines that there is a data transmission failure in a data center according to the status information, according to the current flow of the data center that has not failed, the flow threshold, and the rules for allocating the traffic corresponding to the competing task to the same data center Performing traffic distribution to generate traffic configuration information including the traffic distribution corresponding to each data center and the identifier of the primary data center that carries the competitive task.
  • the method further includes:
  • the multi-active switching platform synchronizes the traffic configuration information to a cache, so that the application server obtains the traffic configuration information from the cache;
  • the multi-active switching platform When the multi-active switching platform receives the flow configuration information acquisition request of the application server, it sends the latest flow configuration information to the application server.
  • a third aspect provides a traffic switching device based on a multi-active data center, the device including:
  • the flow configuration information obtaining unit is used to perform the flow configuration information obtaining operation after receiving the task scheduling instruction;
  • the flow configuration information is that the multiple-active switching platform determines that there is a data center that needs flow according to the data transmission status information of each data center Generated according to preset rules when switching;
  • the multi-active data center has at least two data centers;
  • the traffic configuration information is used to indicate the traffic distribution corresponding to each data center;
  • the analysis unit is used to analyze the traffic configuration information to obtain the traffic distribution corresponding to the data center in which it is located;
  • the authority determining unit is configured to determine whether it has the processing authority of the currently pending task according to the traffic distribution and the type information of the current pending task;
  • the task processing unit is configured to obtain task configuration information when it is determined that there is processing authority, and perform task processing in combination with the traffic distribution loading task.
  • a fourth aspect provides a traffic switching device based on a multi-active data center, the device comprising:
  • a data transmission status information acquisition unit configured to acquire data transmission status information of each data center; the multi-active data center has at least two data centers;
  • the traffic configuration information unit is used to make a judgment based on the state information and preset conditions. When it is judged that traffic switching is required, it generates traffic configuration information according to the preset rules so that the application server, after receiving the task scheduling instruction, The traffic configuration information is obtained and combined with the obtained task configuration information to load the task for task processing; the traffic configuration information is used to indicate the traffic distribution corresponding to each data center.
  • the technical solution of the present application can automatically generate and obtain multiple-active traffic configuration information in real-time in a multiple-active data center scenario, and can actively compensate and obtain multiple-active traffic configuration information in a configuration lacking scenario.
  • the scheduling task in this application can identify and analyze the configuration information of multi-active traffic, and support exclusive tasks and competing tasks to automatically switch computer rooms to perform business operations.
  • the task configuration and anti-concurrency operations in this application are based on distributed caching, which reduces the performance consumption of the database.
  • Figure 1 is a system scene diagram provided by this application.
  • FIG. 2 is a flowchart of exclusive task processing provided by this application.
  • FIG. 3 is a flowchart of competitive task processing provided by this application.
  • FIG. 4 is a flowchart of the method in Embodiment 1 of the present application.
  • FIG. 5 is a flowchart of the method in Embodiment 2 of the present application.
  • Multi-active switching platform It is a management platform developed for configuration, management, and execution of multi-active data center traffic switching. By maintaining various application systems and component information in the platform, configuration switching steps and multi-scene switching tasks such as main data center level switching Or non-primary data center level switching, etc., to realize the execution and management of single data center traffic switching and multi-active data center traffic switching, and to undertake the task of traffic switching after a preset failure of the multi-active data center to ensure timely, comprehensive and reliable switching. Visually controllable.
  • a Cell After segmenting according to the specified data dimension, the minimum segmentation dimension data and the collection of the data center, at the logical level, a cell can complete all the services on the data shards in the cell. When a user request is determined to belong to the cell according to the data segmentation dimension, the user's subsequent services are completely enclosed in a cell.
  • a Cell can be a sub-library.
  • the data center LDC is a collection unit composed of multiple cells whose services can be closed.
  • each data center of a multi-active data center is also called a computer room, and their geographic locations are usually far away from each other.
  • Traffic configuration information information used to indicate the traffic distribution of each data center, including the identification of each data center and its corresponding cell set, such as the set of sub-database numbers corresponding to each data center, indicating the operable sub-database data of each data center set.
  • the cache key is Ldclnfo
  • the value is the value of each data center LDC and the cell set responsible.
  • the full traffic can be divided into 16 cells, if all the traffic is divided in the main data center, Then the cellList configuration value of the main data center is 0-15, and the sub data center is empty.
  • the cellList configuration value of the main data center is an even set of 0-15.
  • the value of the cellList configuration in the sub-data center is an odd set of 0-15, and so on, the value in the cellList configuration represents the sub-library number with write permission.
  • the traffic configuration information also includes the configuration of the data center where the main data center is located: the cache key is MasterLdc (master data center), and the value is the English abbreviation of the main data center. If the main data center is Nanjing Yuhua Computer Room, the value is NJYH.
  • Competitive tasks determine whether the current server belongs to the main data center.
  • variable name ldc is configured in the server environment variable.
  • the value is the English abbreviation of the data center where the current server is deployed. If deployed in the Nanjing Yuhua computer room, configure NJYH.
  • the system of the present application includes a multi-active data center (three data centers are shown in Figure 1), each data center includes a multi-active switching platform, task scheduling platform, application cluster and redis distributed Cache cluster. Corresponding to whether to handle competitive tasks, data centers are divided into main data centers (host rooms) and sub-data centers (sub-computer rooms).
  • the multiple-active switching platform is used to generate traffic configuration information.
  • the multiple-active switching platform of the main data center can generate the traffic configuration information and then synchronize it to the multiple-active switching platform of the sub-data center.
  • the application server of the application cluster detects new traffic configuration information, it will read the traffic configuration information and synchronize the traffic configuration information to the redis distributed cache cluster.
  • the task scheduling platform is used for task scheduling, and the task scheduling instructions are sent to each application server of the application cluster for task processing.
  • the application server reads the traffic configuration information from the redis distributed cache cluster according to the task scheduling instructions. If it cannot be read, the application server directly goes to the multi-active switching platform to read the traffic configuration information and save it to the redis distributed cache cluster for easy download Fast reads.
  • the application server will subsequently read task configuration information from the redis distributed cache cluster, and perform related task processing based on the traffic configuration information and task configuration information. This step will be described in detail later.
  • the multi-active switching platform automatically generates traffic configuration information. This is the first problem to be solved in this application.
  • the multi-active switching platform is used to monitor the data transmission status of each data center, such as data transmission rate, etc. When it is judged according to the monitoring that there is a data transmission failure in the data center or other events that trigger traffic switching, follow The preset rules automatically generate traffic configuration information.
  • the preset rule can be to make the sub-databases of the failed data center as even as possible under the condition that the original traffic distribution of other data centers remains unchanged. Allocating to other data centers can also allocate the sub-databases of the failed data center to the data center with the least current traffic.
  • the preset rules can also redistribute all traffic in the remaining data centers.
  • operations can also be performed in conjunction with the current status of the non-faulty remaining data centers.
  • the business volume will increase sharply in certain events, and it is possible to no longer allocate traffic to such data centers as much as possible.
  • the failed data center is the data center responsible for the competitive task, that is, the primary data center
  • a new primary data center for the competitive task needs to be specified in the traffic configuration information.
  • the rules for traffic configuration can be set in the traffic switching platform in advance, so that the platform automatically generates traffic configuration information according to the rules and the monitored data transmission status of each data center.
  • each application server When each application server is set up, it will configure the information of which data center it belongs to. This information is configured in the environment variable value of the application server. For example, the value of the environment variable of an application server is "Beijing Haidian", then the application The server belongs to the data center named "Beijing Haidian”.
  • the application server analyzes the traffic configuration information to obtain the traffic distribution corresponding to each data center, such as a set of sub-database numbers with read and write permissions corresponding to each data center.
  • the database has 16 sub-libraries, numbered 1-16 respectively. Analyze the traffic configuration information to determine that the main data center corresponds to sub-libraries 1-7, the first sub-data center corresponds to sub-libraries 8-12, and the second sub-data center corresponds to sub-libraries 13-16. If the application server belongs to the first sub-data center, the application server has read and write permissions for sub-databases 8-12, that is, it can carry traffic tasks related to sub-databases 8-12.
  • the traffic configuration information is analyzed and it is determined that the data center traffic distribution described by the application server is empty, that is, it does not correspond to any sub-database, it indicates that the application server does not have the read and write permissions of any sub-database and cannot perform any tasks. At this time, exit the process directly.
  • the aforementioned tasks are divided into independent tasks and competitive tasks.
  • independent tasks as shown in Figure 2, the current data center traffic configuration in the multiple-active traffic configuration information is analyzed to determine whether the current task is operational. If the traffic configuration is If the sub-library number has a value, it can be operated. If the traffic configuration is such as the sub-library number is an empty set, it cannot be operated.
  • the application server can pre-determine whether the current task can be performed based on the above-mentioned task type, the traffic distribution of the data center, and the information of the main data center identifier, and exit if not.
  • the application server further obtains the task configuration, loads and executes specific tasks, specifically:
  • the application server uses JOB_QUEUE: the task name as the KEY to obtain the task from the head of the Redis cache task queue. If the task is not obtained, it uses JOB_TASKPENDING: the task name as the KEY to obtain the task configuration with write permission from the Redis full scheduling task cache.
  • the information is loaded one by one to the end of the Redis cache task queue. If the task configuration information cannot be found in the Redis full cache according to the sub-library number and task name, the database is read, the task is queried from the public library and loaded into the Redis full amount.
  • the scheduling task is cached and further synchronized to the Redis cache task queue.
  • the task configuration information contains the sub-library number corresponding to the task. When the task configuration information of the sub-library with write permission is obtained from the Redis full scheduling task cache, it can be combined with the permission sub-library number of the application server to take the intersection of the sub-library number corresponding to the task. The corresponding task is loaded.
  • the task name is used as the KEY
  • the exclusive task determines whether it is in the operable range according to the sub-library number of the task and the CellList configuration
  • the competitive task determines whether it is in the operable range according to the LDC in the host room and the current server environment variables.
  • this application provides the following methods to prevent concurrent operations based on Redis caching, which specifically include:
  • the application server uses JOB_QUEUE: the task name as the KEY to obtain the task configuration from the head of the Redis task queue, and judge whether the task configuration is obtained.
  • Exclusive tasks determine whether the current task is operable by analyzing the CellList configuration of the current computer room LDC in the multi-active configuration. Competing tasks can analyze the multi-active In the configuration, the LDC configuration in the host room and the LDC configuration in the current server environment variables determine whether the current task is operational.
  • the status is pending, the update status is processing. If the update fails, the shared lock is released to end the current task configuration processing, and the next task configuration is obtained from the task queue to continue execution. If the update is successful, the shared lock is released and the specific business logic corresponding to the task is executed. The execution of the business logic ends, the task status is changed to pending, the current task configuration processing ends, and the next task configuration is obtained from the task queue to continue execution.
  • JOB_TASK_LOAD_LOCK task name as KEY, set current system time + invalid time fixed value (milliseconds) as Value to perform setnx operation on Redis and add a shared lock to prevent concurrent scheduling from causing repeated loading of pending tasks to the redis task queue.
  • the multi-active data center traffic switching is changed from the original manual modification of the configuration file to the system automatically recognizes the switching platform instructions for real-time switching, which improves the availability of the system and reduces the business congestion time and huge amount of traffic caused by the failure of the traffic switching. Economic losses.
  • Task configuration read-write and anti-concurrency operations are based on Redis cache, which greatly reduces the performance consumption of the database, increases the upper limit of the concurrency of tasks, and improves the execution speed of tasks.
  • the query to-be-processed tasks is based on the Redis queue, which greatly reduces the number of times the system traverses the full scheduling task cache configuration, greatly reduces the number of accesses to Redis, and improves the execution speed of tasks.
  • Embodiment 1 of the present application provides a method for traffic switching based on a multi-active data center. As shown in FIG. 4, the method includes:
  • the application server executes the operation of obtaining traffic configuration information;
  • the traffic configuration information is that when the multiple-active switching platform judges that there is a data transmission failure in the data center according to the data transmission status information of each data center Generated by preset rules;
  • the multi-active data center has at least two data centers;
  • the traffic configuration information is used to indicate the traffic distribution corresponding to each data center;
  • the application server obtains the traffic configuration information through the following steps:
  • the application server reads the cache and determines whether the traffic configuration information exists in the cache
  • the application server reads the traffic configuration information from the multiple-active switching platform.
  • the application server monitors that the traffic configuration information of the multiple-active switching platform changes, it will read the changed traffic configuration information and synchronize the changed traffic configuration information to the cache in.
  • the application server parses the traffic configuration information to obtain a traffic distribution corresponding to the data center where it is located.
  • S43 The application server judges whether the application server has processing authority of the currently pending task according to the traffic distribution and the type information of the currently pending task.
  • This step specifically includes: if the application server determines that the current task to be processed is an exclusive task, determining whether the traffic distribution corresponding to the data center where the application server is located is empty;
  • the application server has the processing authority of the currently pending task.
  • the traffic distribution may include a collection of sub-database numbers with read and write permissions corresponding to each data center;
  • the judging whether the traffic distribution corresponding to the data center where the application server is located is empty includes:
  • the multi-active data center has a main data center, and the traffic configuration information further includes the main data center identifier;
  • the application server judging whether the application server has the processing authority of the current task to be processed according to the traffic distribution and the type information of the task currently to be processed includes:
  • the application server determines that the current task to be processed is a competitive task, then determine whether the data center identifier corresponding to the application server is the same as the main data center identifier;
  • the application server has the processing authority of the task currently to be processed.
  • the traffic distribution includes a set of sub-library numbers with read and write permissions corresponding to each data center;
  • the task processing performed by the application server loading task includes:
  • the application server searches for the current task to be processed from the cached task queue, and if it is found, then according to the sub-library number corresponding to the current task to be processed and the application server has read and write permissions according to the data center where it is located To determine whether the application server has the authority to process the currently pending task;
  • the application server determines the status of the sub-library number corresponding to the current task to be processed as processing and saves it in the task configuration information;
  • the application server changes the status of the bank number corresponding to the current task to be processed to to be processed and saves it in the task configuration information.
  • Embodiment 2 of the present application provides a traffic switching method based on a multi-active data center, which is applied to a multi-active switching platform. As shown in FIG. 5, the method includes:
  • the multiple-active switching platform acquires data transmission status information of each data center; the multiple-active data center has at least two data centers;
  • the multi-active switching platform makes a judgment based on the state information and preset conditions. When it is judged that traffic switching is required, it generates traffic configuration information according to the preset rules so that the application server after receiving the task scheduling instruction , Obtain the traffic configuration information and load the task in combination with the obtained task configuration information to perform task processing; the traffic configuration information is used to indicate the traffic distribution corresponding to each data center.
  • the multi-active switching platform makes a judgment based on the state information and preset conditions, and when it is judged that traffic switching is required, generating traffic configuration information according to the preset rules includes:
  • the multiple-active switching platform determines that there is a data transmission failure in a data center according to the status information, according to the current flow of the data center that has not failed, the flow threshold, and the rules for allocating the traffic corresponding to the competing task to the same data center Performing traffic distribution to generate traffic configuration information including the traffic distribution corresponding to each data center and the identifier of the primary data center that carries the competitive task.
  • the method further includes:
  • the multi-active switching platform synchronizes the traffic configuration information to a cache, so that the application server obtains the traffic configuration information from the cache;
  • the multi-active switching platform When the multi-active switching platform receives the flow configuration information acquisition request of the application server, it sends the latest flow configuration information to the application server.
  • Embodiment 3 of the present application provides a traffic switching device based on a multi-active data center, and the device includes:
  • the flow configuration information obtaining unit is used to perform the flow configuration information obtaining operation after receiving the task scheduling instruction;
  • the flow configuration information is that the multiple-active switching platform determines that there is data in the data center based on the data transmission status information of each data center When a transmission fails, it is generated according to a preset rule;
  • the multi-active data center has at least two data centers;
  • the traffic configuration information is used to indicate the traffic distribution corresponding to each data center.
  • the acquiring traffic configuration information unit is specifically configured to read a cache and determine whether the traffic configuration information exists in the cache, and if it does not exist, read the traffic configuration information from the multiple-active switching platform.
  • the parsing unit is used to analyze the traffic configuration information to obtain the traffic distribution corresponding to the data center.
  • the authority judging unit is configured to judge whether it has the processing authority of the currently pending task according to the traffic distribution and the type information of the current pending task.
  • the authority judging unit is specifically configured to judge that the current task to be processed is an exclusive task, and then judge whether the traffic distribution corresponding to the data center where the application server is located is empty, and if it is not empty, it is determined to have the current Processing authority of pending tasks.
  • the task processing unit is configured to obtain task configuration information when it is determined that there is processing authority, and perform task processing in combination with the traffic distribution loading task.
  • the embodiment 4 of the present application provides a traffic switching device based on a multi-active data center, the device including:
  • a data transmission status information acquisition unit configured to acquire data transmission status information of each data center; the multi-active data center has at least two data centers;
  • the traffic configuration information unit is used to make a judgment based on the state information and preset conditions. When it is judged that traffic switching is required, it generates traffic configuration information according to the preset rules so that the application server, after receiving the task scheduling instruction, The traffic configuration information is obtained and combined with the obtained task configuration information to load the task for task processing; the traffic configuration information is used to indicate the traffic distribution corresponding to each data center.
  • the traffic configuration information unit is specifically configured to determine that there is a data transmission failure in the data center according to the current traffic of the non-failed data center, the traffic threshold, and the traffic distribution corresponding to the competing task when it is determined according to the state information
  • the rules for traffic distribution to the same data center generate traffic configuration information including the traffic distribution corresponding to each data center and the identifier of the primary data center that carries the competitive task.
  • the device further includes:
  • a traffic configuration information synchronization unit configured to synchronize the traffic configuration information to a cache, so that the application server can obtain the traffic configuration information from the cache;
  • the flow configuration information sending unit is configured to send the latest flow configuration information to the application server when the flow configuration information acquisition request of the application server is received.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Information Transfer Between Computers (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Telephonic Communication Services (AREA)

Abstract

一种基于多活数据中心的流量切换方法、装置,其中方法包括:应用服务器在接收到任务调度指令后,执行获取流量配置信息操作(S41);流量配置信息为多活切换平台在根据各数据中心的数据传输状态信息判断到有数据中心出现数据传输故障时按照预置的规则而生成;多活数据中心具有至少两个数据中心;流量配置信息用以指示每个数据中心对应的流量分配;应用服务器解析所述流量配置信息,获得所在数据中心对应的流量分配(S42);应用服务器根据流量分配和当前待处理任务的类型信息判断应用服务器是否具有当前待处理任务的处理权限(S43);若有,则应用服务器加载任务进行任务处理(S44)。该方法实现了多活数据中心故障时的流量自动切换。

Description

一种基于多活数据中心的流量切换方法及装置 技术领域
本申请涉及数据处理领域,特别是涉及一种基于多活数据中心的流量方法、装置。
背景技术
容灾系统是指为计算机信息系统提供的一种能应付各种数据灾难的系统。当计算机系统在遭受如火灾、水灾、地震、战争等不可抗拒的自然灾难以及计算机犯罪、计算机病毒、掉电、网络/通信失败、硬件/软件错误和人为操作错误等人为灾难,导致数据传输中断、数据丢失等各类问题时,容灾系统将保证用户数据的安全性。
目前的容灾多采用主备模式,即在远离计算机系统运行的地方建立一个容灾备份中心,该容灾备份中心不承担任何线上业务流量,只是定期将计算机系统中的数据备份出来存放到容灾备份中心,当灾难发生导致系统瘫痪后,再通过这些备份的数据在容灾备份中心恢复系统的运行。
由于容灾备份中心不承载真实的线上业务流量,灾难发生时,我们无法断言该备份中心是可用的,而且由于需要人工启动备份系统,因此对系统维护人员的要求较高,且人工启动对灾难的响应不够迅速。延迟期间还会导致其无法记录停机期间的各种数据。
为应对主备模式的缺点,多活策略应运而生成为解决容灾问题的新技术。
所谓多活其实就是多个站点(位于较远距离的机房)设置相同的数据库,同时承载业务流量,可以根据业务属性如用户ID、地域等决定站点之间怎么分担流量,如将ID1-ID49的用户的数据处理请求分配至第一个站点处理,将ID50-ID99的用户的数据处理请求分配至第二个站点处理。当第一个站点故障时,可以较快(分钟级)且平滑的切换到第二个站点,理想情况下,对业务的损害是非常小的。相对主备模式,多活策略中的每个站点实时具备承载业务流量的能力,其稳定性是可靠的。
当然上述流量切换不一定只出现在数据中心故障时,有时候也会基于其他情况进行流量切换,比如某一特殊时期某个数据中心的任务量大增,则需要将其中一部分分配出去等。
目前,在多活策略下,当需要切换流量,比如其中一个站点出现故障时,故障消息通知到维护人员,则维护人员进行流量切换信息配置,启动流量切换流程,进行站点流量切换。
人工配置需要耗费一定时间,尽管相比主备模式更加迅速,但这段时间的延迟也足以使得很多场景下如电商平台等系统产生大量的数据,这些数据将无法被保存和恢复。
发明内容
本申请提供了一种基于多活数据中心的流量方法、装置,以解决现有技术中多活数据中心流量切换仍存在延迟,造成延时时间内的数据丢失的问题。
本申请提供了如下方案:
第一方面提供一种基于多活数据中心的流量切换方法,所述方法包括:
应用服务器在接收到任务调度指令后,执行获取流量配置信息操作;所述流量配置信息为多活切换平台在根据各数据中心的数据传输状态信息判断到有数据中心需要流量切换时按照预置的规则而生成;所述多活数据中心具有至少两个数据中心;所述流量配置信息用以指示每个数据中心对应的流量分配;
所述应用服务器解析所述流量配置信息,获得所在数据中心对应的流量分配;
所述应用服务器根据所述流量分配和当前待处理任务的类型信息判断所述应用服务器是否具有所述当前待处理任务的处理权限;
若有,则所述应用服务器加载任务进行任务处理。
优选的,所述应用服务器通过如下步骤获取所述流量配置信息:
所述应用服务器读取缓存并判断所述缓存中是否存在所述流量配置信息;
若不存在,则所述应用服务器从所述多活切换平台读取所述流量配置信息。
优选的,所述方法还包括:
所述应用服务器在监听到所述多活切换平台的所述流量配置信息发生变化时,读取变化后的流量配置信息并将所述变化后的流量配置信息同步到所述缓存中。
优选的,所述应用服务器根据所述流量分配和当前待处理的任务的类型信息判断所述应用服务器是否具有所述当前待处理的任务的处理权限包括:
若所述应用服务器判断到当前待处理任务为独占型任务,则判断所述应用服务器所在的数据中心对应的流量分配是否为空;
若不为空,则所述应用服务器具有所述当前待处理任务的处理权限。
优选的,所述流量分配包括每一数据中心对应的具有读写权限的分库号的集合;
所述判断所述应用服务器所在的数据中心对应的流量分配是否为空包括:
判断所述应用服务器所在的数据中心对应的具有读写权限的分库号的集合是否为空。
优选的,所述多活数据中心具有一主数据中心,所述流量配置信息还包括所述主数据中心标识;
所述应用服务器根据所述流量分配和当前待处理的任务的类型信息判断所述应用服务器是否具有所述当前待处理的任务的处理权限包括:
若所述应用服务器判断到所述当前待处理任务为竞争型任务,则判断所述应用服务器对应的数据中心标识是否与所述主数据中心标识相同;
若相同,则所述应用服务器具有所述当前待处理的任务的处理权限。
优选的,
所述流量分配包括每一数据中心对应的具有读写权限的分库号的集合;
所述应用服务器加载任务进行任务处理包括:
所述应用服务器从所述缓存的任务队列查找所述当前待处理任务,若查询到,则根据所述当前待处理任务对应的分库号和所述应用服务器根据所在的数据中心具有读写权限的分库号判断所述应用服务器是否具有处理所述当前待处理任务的权限;
若有权限,则所述应用服务器将所述当前待处理任务对应的分库号的状态确定为处理中并保存在任务配置信息中;
若任务处理完成,则所述应用服务器将所述当前待处理任务对应的分库号的状态更改为待处理并保存在所述任务配置信息中。
第二方面提供一种基于多活数据中心的流量切换方法,所述方法包括:
多活切换平台获取各数据中心的数据传输状态信息;所述多活数据中心具有至少两个数据中心;
所述多活切换平台根据所述状态信息与预置的条件进行判断,当判断到需要进行流量切换时,则按照预置的规则生成流量配置信息以便应用服务器在接收到任务调度指令后,获取所述流量配置信息并结合获得的任务配置信息加载任务进行任务处理;所述流量配置信息用以指示每个数据中心对应的流量分配。
优选的,
所述多活切换平台根据所述状态信息与预置的条件进行判断,当判断到需要进行流量切换时,则按照预置的规则生成流量配置信息包括:
所述多活切换平台根据所述状态信息判断到有数据中心出现数据传输故障时按照未出现故障的数据中心的当前流量、流量阈值以及将竞争型任务对应的流量分配至同一个数据中心的规则进行流量分配生成包括所述各数据中心对应的流量分配以及承载所述竞争型任务的主 数据中心的标识的流量配置信息。
优选的,所述方法还包括:
所述多活切换平台将所述流量配置信息同步至缓存中,以便应用服务器从所述缓存中获取所述流量配置信息;
所述多活切换平台在接收到所述应用服务器的流量配置信息获取请求时,将最新的流量配置信息发送至所述应用服务器。
第三方面提供一种基于多活数据中心的流量切换装置,所述装置包括:
获取流量配置信息单元,用于在接收到任务调度指令后,执行获取流量配置信息操作;所述流量配置信息为多活切换平台在根据各数据中心的数据传输状态信息判断到有数据中心需要流量切换时按照预置的规则而生成;所述多活数据中心具有至少两个数据中心;所述流量配置信息用以指示每个数据中心对应的流量分配;
解析单元,用于解析所述流量配置信息,获得所在数据中心对应的流量分配;
权限判断单元,用于根据所述流量分配和当前待处理任务的类型信息判断是否具有所述当前待处理任务的处理权限;
任务处理单元,用于在判断到有处理权限时,获取任务配置信息,并结合所述流量分配加载任务进行任务处理。
第四方面提供一种基于多活数据中心的流量切换装置,所述装置包括:
数据传输状态信息获取单元,用于获取各数据中心的数据传输状态信息;所述多活数据中心具有至少两个数据中心;
流量配置信息单元,用于根据所述状态信息与预置的条件进行判断,当判断到需要进行流量切换时,则按照预置的规则生成流量配置信息以便应用服务器在接收到任务调度指令后,获取所述流量配置信息并结合获得的任务配置信息加载任务进行任务处理;所述流量配置信息用以指示每个数据中心对应的流量分配。
根据本申请提供的具体实施例,本申请公开了以下技术效果:
本申请的技术方案能够在多活数据中心场景下,实时自动生成、获取多活流量配置信息,并且在配置缺失场景下能够主动补偿获取多活流量配置信息。
本申请中调度任务能够识别解析多活流量配置信息,支持独占任务和竞争任务自动切换机房执行业务操作。
本申请中任务配置与防并发操作基于分布式缓存,降低数据库的性能消耗。
当然本申请产品只需具有其中一种效果即可。
附图说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本申请提供的系统场景图;
图2是本申请提供的独占型任务处理流程图;
图3是本申请提供的竞争型任务处理流程图;
图4是本申请实施例1方法流程图;
图5是本申请实施例2方法流程图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员所获得的所有其他实施例,都属于本申请保护的范围。
为使本申请更加容易理解,首先对本申请中出现的名词进行解释。
多活切换平台:是为配置、管理、执行多活数据中心流量切换开发的管理平台,通过将各应用系统和组件信息维护到平台内,配置切换步骤与多场景切换任务如主数据中心级别切换或非主数据中心级别切换等,实现单数据中心流量切换和多活数据中心流量切换的执行与管理,承担多活数据中心发生预设故障后的流量切换任务,保障切换的及时、全面、可视可控。
Cell:按照指定的数据维度进行切分之后,最小切分维度的数据与数据中心的集合,在逻辑层面上,一个Cell可以完成本Cell内数据分片上的所有业务。当一个用户请求按照数据切分的维度被确定所属Cell之后,该用户的后续业务被完全封闭在一个Cell之内。一个Cell可以是一个分库。
数据中心LDC,是由多个业务可封闭的Cell组成的集合单元。为实现容灾,多活数据中心的各数据中心也称机房,其相互之间的地理位置通常相聚较远。
独占型任务:处理的业务数据仅在某个Cell存在,其他Cell不与交叉和共享。
竞争型任务:处理的业务数据存在各个Cell相互竞争,为避免该业务数据在不同数据中心被分别处理,造成数据的不一致,竞争型任务的业务数据需要在同一个数据中心统一控制。本申请中称可处理竞争型任务的数据中心为主数据中心。
流量配置信息:用于指示各数据中心流量分配的信息,包括各数据中心的标识及其对应的Cell集合如每个数据中心对应的分库号集合,表示每个数据中心可操作的分库数据集合。
如缓存key为Ldclnfo,值为各个数据中心LDC的值以及负责的Cell集合。示例:[{″effectiveLdc″:″NJYH″,″cellList″:″0,2,4,6,8,10,12,14″},{″effectiveLdc″:″NJGX_YG″,″cellList″:″1,3,5,7,9,11,13,15″}],如果一个系统业务库有16个分库,则全流量可划为16份Cell,如果所有流量都划分在主数据中心的话,则主数据中心的cellList配置的值为0-15,子数据中心为空,如果双数据中心情况下,划分1/2流量,则主数据中心的cellList配置的值为0-15的偶数集合,子数据中心的cellList配置的值则为0-15的奇数集合,以此类推,cellList配置中的数值就代表了有写权限的分库号。
流量配置信息还包括主数据中心所在数据中心配置:缓存key为MasterLdc(主数据中心),值为主数据中心英文简写,如主数据中心为南京雨花机房,则值为NJYH,该配置用于竞争型任务判断当前服务器是否属于主数据中心。
环境变量:变量名称ldc,配置在服务器环境变量中,值为当前服务器部署所在数据中心的英文简写,如部署在南京雨花机房,则配置NJYH。
如图1所示,本申请的系统包括多活数据中心(图1中示出了三个数据中心即机房)、每个数据中心包括多活切换平台、任务调度平台、应用集群以及redis分布式缓存集群。对应是否处理竞争型任务,数据中心有主数据中心(主机房)和子数据中心(子机房)之分。
多活切换平台用于生成流量配置信息,本系统中可由主数据中心的多活切换平台生成流量配置信息后同步至子数据中心的多活切换平台。应用集群的应用服务器在监测到有新的流量配置信息时会读取流量配置信息并将流量配置信息同步至redis分布式缓存集群。任务调度平台用于进行任务调度,将任务调度指令下发至应用集群的各个应用服务器进行任务处理。应用服务器根据任务调度指令,从redis分布式缓存集群读取流量配置信息,若读取不到,则应用服务器直接去多活切换平台读取流量配置信息并保存至redis分布式缓存集群,方便下次快速读取。应用服务器后续还会从redis分布式缓存集群从读取任务配置信息,并根据流量配置信息以及任务配置信息执行相关任务处理。后续将详细介绍该步骤。
上边提到多活切换平台自动生成流量配置信息,这是本申请要解决的第一个问题。本申请中利用多活切换平台,对各个数据中心的数据传输状态进行监控,比如数据传输速率等,当根据监控判断到有数据中心的数据传输出现故障时或者其他触发流量切换的事件时,按照预置的规则自动生成流量配置信息。
以流量配置信息为对各数据中心的可读写操作的分库集合为例,预置的规则可以是在其 他数据中心原流量分配不变的情况下,将故障数据中心的分库尽量均匀的分配至其他数据中心,也可以是将故障数据中心的分库分配至目前流量最小的数据中心。预置的规则也可以是对所有流量在剩余的数据中心中重新进行分配。
进一步的,还可以结合非故障剩余数据中心的当前状态进行操作,比如某些数据中心在某些事件业务量会激增,则可尽量不再分配流量至此类数据中心。
另外,基于前述的竞争型任务的存在,若出现故障的数据中心为负责竞争型任务的数据中心即主数据中心,则还需要在流量配置信息中为竞争性任务指定一个新的主数据中心。
总之,流量配置的规则可以预先设置在流量切换平台,使得平台根据该规则和监控的各数据中心的数据传输状态自动生成流量配置信息。
之后涉及各数据中心的应用服务器如何识别该流量配置信息并基于识别结果执行任务处理。
首先我们可以基于应用服务器所属的流量分配是否为空来初步判断应用服务器是否可以执行当前任务。
每个应用服务器在设置之初,就会配置其属于哪个数据中心的信息,该信息被配置在应用服务器的环境变量值中,比如一应用服务器的环境变量的值为“北京海淀”那么该应用服务器就属于名为“北京海淀”的数据中心。
应用服务器对流量配置信息进行解析会得到各个数据中心对应的流量分配,如各个数据中心对应的具有读写权限的分库号集合。
比如数据库一共有16个分库,编号分别为1-16。解析流量配置信息确定主数据中心对应分库1-7,第一个子数据中心对应分库8-12,第二个子数据中心对应分库13-16。若应用服务器属于第一个子数据中心,则该应用服务器对分库8-12具有读写权限,即其可以承载有关分库8-12的流量任务。
若解析流量配置信息确定应用服务器所述的数据中心流量分配为空,即不对应任何分库,则表明应用服务器不具有任何分库的读写权限,无法执行任何任务,此时直接退出流程。
前述提及任务分为独立性任务和竞争型任务,对于独立性任务,如图2所示,通过解析多活流量配置信息中当前数据中心的流量配置来判断当前任务是否可操作,如果流量配置如分库号有值则可操作,如果流量配置如分库号为空集合则不可操作。
竞争型任务由特殊的主数据中心进行处理。因此在当前待处理任务的类型为竞争型任务时,还需判断应用服务器是否是主数据中心的服务器。如图3所示,对应该需求,流量配置信息中还设置了主数据中心的标识。应用服务器基于环境变量获得所述数据中心的标识,与 该主数据中心的标识进行对比,若一致,则说明应用服务器为主数据中心的服务器,可用于执行竞争型任务,若不一致,则说明应用服务器不是主数据中心的服务器,不可用于执行竞争型任务。当前任务为竞争型任务时,可直接退出流程。
应用服务器通过上述任务类型以及所述数据中心的流量分配、主数据中心标识的信息可预先判断是否可以执行当前的任务,若不可以则退出。
对于初步判断有权限执行当前任务的情况,应用服务器进一步获取任务配置,加载执行具体的任务,具体的:
应用服务器将JOB_QUEUE:任务名称作为KEY从Redis缓存任务队列的队头获取任务,如果未获取到任务,则将JOB_TASKPENDING:任务名称作为KEY从Redis全量调度任务缓存中获取有写权限分库的任务配置信息并一一加载至Redis缓存任务队列的队尾,如果根据分库号以及任务名称从Redis全量缓存中未能查到任务配置信息,则读数据库,从公共库中查询任务并加载至Redis全量调度任务缓存中并进一步同步至Redis缓存任务队列。任务配置信息中包含任务对应的分库号,从Redis全量调度任务缓存中获取有写权限分库的任务配置信息时可结合应用服务器的权限分库号,取与任务对应的分库号的交集对应的任务进行加载。
如果根据JOB_QUEUE:任务名称作为KEY从Redis队列的队头获取到任务,则判断获取到的任务当前是否在可操作范围内,防止将任务加载至队列后发生机房流量切换。如果在可操作范围内,则将任务配置缓存中的任务状态从待处理更新为处理中,更新成功则处理该分库的业务数据,处理完则将任务状态更新为待处理,如果更新失败或者任务状态已为处理中或者当前获取的任务不在可操作范围,则继续从Redis队列的队头获取任务,直至Redis队列中的消息消耗完毕。独占型任务根据任务的分库号以及CellList配置判断是否在可操作范围,竞争型任务根据主机房LDC与当前服务器环境变量中的LDC判断是否在可操作范围内。
上述过程中,同样会出现并发操作的问题,对此,本申请提供以下方法,基于Redis缓存防止并发操作,具体包括:
应用服务器将JOB_QUEUE:任务名称作为KEY从Redis任务队列的队头获取任务配置,并判断是否获取到任务配置。
如果获取到任务配置:
判断该任务配置当前是否可操作,避免加载至任务队列后机房切换带来问题,独占型任务通过解析多活配置中当前机房LDC的CellList配置判断当前任务是否可操作,竞争型任务通过解析多活配置中主机房的LDC配置和当前服务器环境变量中的LDC配置判断当前任务是否可操作。
1、如果不可操作,则结束当前任务处理,从任务队列获取下一条任务配置继续执行。
2、如果可操作,将任务名称+分库号作为KEY设置Redis共享锁,超时时间为当前系统时间+超时定值(毫秒),具体包括:
2.1、如果设置共享锁失败,则结束当前任务处理,从任务队列获取下一条任务配置继续执行。
2.2、如果设置共享锁成功,则从全量调度任务缓存中获取该任务配置对应缓存:
2.21、如果从全量调度任务缓存未能获取到该任务配置对应的缓存,则查询公共库该分库的任务配置,如果查到则加载至全量任务缓存。
2.22、判断全量任务缓存中的任务状态:
2.221、如果状态为待处理,则更新状态为处理中。如果更新失败,则释放共享锁结束当前任务配置处理,从任务队列获取下一条任务配置继续执行。如果更新成功,则释放共享锁,并且执行任务对应的具体业务逻辑,业务逻辑执行结束,将任务状态改为待处理,结束当前任务配置处理,从任务队列获取下一条任务配置继续执行。
2.222、如果状态为处理中,则释放共享锁,结束当前任务配置处理,从任务队列获取下一条任务配置继续执行。
如果未能获取到任务配置
判断是否需要加载任务配置(如果该任务是第一次从队列获取任务配置为空则加载任务配置,如果该任务之前获取过任务配置,最后一次获取任务配置为空则不加载,防止任务一直调度执行无法结束),如果不需要加载任务配置则退出,如果需要加载任务配置,执行如下步骤:
1、将JOB_TASK_LOAD_LOCK:任务名称作为KEY,将当前系统时间+失效时间定值(毫秒)作为Value对Redis进行setnx操作加共享锁,防止并发调度导致重复加载待处理任务至redis任务队列。
1.1、如果设置共享锁失败,则检查共享锁是否失效,防止解锁异常导致任务一直处于锁定状态,如果共享锁的值大于当前系统时间则未失效,如果共享锁的值小于当前系统时间则失效:
1.11、如果共享锁未失效,则退出;
1.12、如果共享锁失效,则获取共享锁的值①,然后对该共享锁进行Redis的GetSet操作②,新Value为系统当前时间+失效时间定值(毫秒),并比较①和②的返回值。如果①和②的返回值不相等,则有并发操作直接退出。如果①和②的返回值相等,则可以加载任务配置, 将JOB_TASKPENDING:任务名称作为KEY从Redis中获取任务全量配置缓存:
1.2、判断从缓存中是否获取到任务配置:如果未能获取到任务配置,则根据任务名称查询公共库任务调度表,并将任务配置加载至全量任务配置缓存中。
1.3、筛选出状态为待处理的任务配置。
1.4、判断当前任务是竞争型任务还是独占型任务(根据功能业务区分,编写代码之前任务类型就可确定并写死在代码中):
1.41如果是独占型任务,则将CellList与待处理任务的库号取交集,将取出的交集推送至KEY为JOB_QUEUE:任务名称的Redis队列,并释放JOB_TASK_LOAD_LOCK:任务名称作为KEY的共享锁;
1.42如果是竞争型任务,则将待处理的任务配置都推送至KEY为JOB_QUEUE:任务名称的Redis队列,并释放JOB_TASK_LOAD_LOCK:任务名称作为KEY的共享锁。
可见本申请中,多活数据中心流量切换从原先的人工修改配置文件改为系统自动识别切换平台指令进行实时切换,提高了系统的可用性,减少因故障进行流量切换时造成的业务阻塞时间与巨大经济损失。
任务配置读写与防并发操作基于Redis缓存,极大的降低了数据库的性能消耗,加大了任务的并发量上限,提高了任务的执行速度。
查询待处理任务基于Redis队列,大量减少系统遍历全量调度任务缓存配置次数,大量减少访问Redis次数,提高任务的执行速度。
实施例1
综上,本申请实施例1提供了一种基于多活数据中心的流量切换方法,如图4所示,所述方法包括:
S41、应用服务器在接收到任务调度指令后,执行获取流量配置信息操作;所述流量配置信息为多活切换平台在根据各数据中心的数据传输状态信息判断到有数据中心出现数据传输故障时按照预置的规则而生成;所述多活数据中心具有至少两个数据中心;所述流量配置信息用以指示每个数据中心对应的流量分配;。
其中,所述应用服务器通过如下步骤获取所述流量配置信息:
所述应用服务器读取缓存并判断所述缓存中是否存在所述流量配置信息;
若不存在,则所述应用服务器从所述多活切换平台读取所述流量配置信息。
另外,当所述应用服务器在监听到所述多活切换平台的所述流量配置信息发生变化时,会读取变化后的流量配置信息并将所述变化后的流量配置信息同步到所述缓存中。
S42、所述应用服务器解析所述流量配置信息,获得所在数据中心对应的流量分配。
S43、所述应用服务器根据所述流量分配和当前待处理任务的类型信息判断所述应用服务器是否具有所述当前待处理任务的处理权限。
该步骤具体包括:若所述应用服务器判断到当前待处理任务为独占型任务,则判断所述应用服务器所在的数据中心对应的流量分配是否为空;
若不为空,则所述应用服务器具有所述当前待处理任务的处理权限。
其中,所述流量分配可以包括每一数据中心对应的具有读写权限的分库号的集合;
所述判断所述应用服务器所在的数据中心对应的流量分配是否为空包括:
判断所述应用服务器所在的数据中心对应的具有读写权限的分库号的集合是否为空。
S44、若有,则所述应用服务器加载任务进行任务处理。
优选实施例中,所述多活数据中心具有一主数据中心,所述流量配置信息还包括所述主数据中心标识;
所述应用服务器根据所述流量分配和当前待处理的任务的类型信息判断所述应用服务器是否具有所述当前待处理的任务的处理权限包括:
若所述应用服务器判断到所述当前待处理任务为竞争型任务,则判断所述应用服务器对应的数据中心标识是否与所述主数据中心标识相同;
若相同,则所述应用服务器具有所述当前待处理的任务的处理权限。
优选实施例中,
所述流量分配包括每一数据中心对应的具有读写权限的分库号的集合;
所述应用服务器加载任务进行任务处理包括:
所述应用服务器从所述缓存的任务队列查找所述当前待处理任务,若查询到,则根据所述当前待处理任务对应的分库号和所述应用服务器根据所在的数据中心具有读写权限的分库号判断所述应用服务器是否具有处理所述当前待处理任务的权限;
若有权限,则所述应用服务器将所述当前待处理任务对应的分库号的状态确定为处理中并保存在任务配置信息中;
若任务处理完成,则所述应用服务器将所述当前待处理任务对应的分库号的状态更改为待处理并保存在所述任务配置信息中。
实施例2
对应上述应用服务器,本申请实施例2提供一种基于多活数据中心的流量切换方法,应用于多活切换平台,如图5所示,所述方法包括:
S51、多活切换平台获取各数据中心的数据传输状态信息;所述多活数据中心具有至少两个数据中心;
S52、所述多活切换平台根据所述状态信息与预置的条件进行判断,当判断到需要进行流量切换时,则按照预置的规则生成流量配置信息以便应用服务器在接收到任务调度指令后,获取所述流量配置信息并结合获得的任务配置信息加载任务进行任务处理;所述流量配置信息用以指示每个数据中心对应的流量分配。
所述多活切换平台根据所述状态信息与预置的条件进行判断,当判断到需要进行流量切换时,则按照预置的规则生成流量配置信息包括:
所述多活切换平台根据所述状态信息判断到有数据中心出现数据传输故障时按照未出现故障的数据中心的当前流量、流量阈值以及将竞争型任务对应的流量分配至同一个数据中心的规则进行流量分配生成包括所述各数据中心对应的流量分配以及承载所述竞争型任务的主数据中心的标识的流量配置信息。
优选的,所述方法还包括:
所述多活切换平台将所述流量配置信息同步至缓存中,以便应用服务器从所述缓存中获取所述流量配置信息;
所述多活切换平台在接收到所述应用服务器的流量配置信息获取请求时,将最新的流量配置信息发送至所述应用服务器。
实施例3
对应上述实施例1,本申请实施例3提供一种基于多活数据中心的流量切换装置,所述装置包括:
获取流量配置信息单元,用于在接收到任务调度指令后,执行获取流量配置信息操作;所述流量配置信息为多活切换平台在根据各数据中心的数据传输状态信息判断到有数据中心出现数据传输故障时按照预置的规则而生成;所述多活数据中心具有至少两个数据中心;所述流量配置信息用以指示每个数据中心对应的流量分配。
优选的,获取流量配置信息单元,具体用于读取缓存并判断所述缓存中是否存在所述流量配置信息,在不存在时,从所述多活切换平台读取所述流量配置信息。
解析单元,用于解析所述流量配置信息,获得所在数据中心对应的流量分配。
权限判断单元,用于根据所述流量分配和当前待处理任务的类型信息判断是否具有所述当前待处理任务的处理权限。
优选的,权限判断单元,具体用于判断到当前待处理任务为独占型任务,则判断所述应 用服务器所在的数据中心对应的流量分配是否为空并在不为空,时确定具有所述当前待处理任务的处理权限。
任务处理单元,用于在判断到有处理权限时,获取任务配置信息,并结合所述流量分配加载任务进行任务处理。
实施例4
对应上述实施例2,本申请实施例4提供一种基于多活数据中心的流量切换装置,所述装置包括:
数据传输状态信息获取单元,用于获取各数据中心的数据传输状态信息;所述多活数据中心具有至少两个数据中心;
流量配置信息单元,用于根据所述状态信息与预置的条件进行判断,当判断到需要进行流量切换时,则按照预置的规则生成流量配置信息以便应用服务器在接收到任务调度指令后,获取所述流量配置信息并结合获得的任务配置信息加载任务进行任务处理;所述流量配置信息用以指示每个数据中心对应的流量分配。
优选的,所述流量配置信息单元,具体用于根据所述状态信息判断到有数据中心出现数据传输故障时按照未出现故障的数据中心的当前流量、流量阈值以及将竞争型任务对应的流量分配至同一个数据中心的规则进行流量分配生成包括所述各数据中心对应的流量分配以及承载所述竞争型任务的主数据中心的标识的流量配置信息。
优选的,所述装置还包括:
流量配置信息同步单元,用于将所述流量配置信息同步至缓存中,以便应用服务器从所述缓存中获取所述流量配置信息;
流量配置信息发送单元,用于在接收到所述应用服务器的流量配置信息获取请求时,将最新的流量配置信息发送至所述应用服务器。
通过以上的实施方式的描述可知,本领域的技术人员可以清楚地了解到本申请可借助软件加必需的通用硬件平台的方式来实现。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在存储介质中,如ROM/RAM、磁碟、光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,云服务器,或者网络设备等)执行本申请各个实施例或者实施例的某些部分所述的方法。
本说明书中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于系统或系统实 施例而言,由于其基本相似于方法实施例,所以描述得比较简单,相关之处参见方法实施例的部分说明即可。以上所描述的系统及系统实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性劳动的情况下,即可以理解并实施。
以上对本申请所提供的流量切换方法、装置,进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处。综上所述,本说明书内容不应理解为对本申请的限制。

Claims (10)

  1. 一种基于多活数据中心的流量切换方法,其特征在于,所述方法包括:
    应用服务器接收到任务调度指令后,执行获取流量配置信息操作;所述流量配置信息为多活切换平台在根据各数据中心的数据传输状态信息判断到有数据中心需要流量切换时按照预置的规则而生成;所述多活数据中心具有至少两个数据中心;所述流量配置信息用以指示每个数据中心流量分配;
    所述应用服务器解析所述流量配置信息,获得所在数据中心对应的流量分配;
    所述应用服务器根据所述流量分配和当前待处理任务的类型信息判断所述应用服务器是否具有所述当前待处理任务的处理权限;
    若有,则所述应用服务器加载任务进行任务处理。
  2. 如权利要求1所述的方法,其特征在于,所述应用服务器通过如下步骤获取所述流量配置信息:
    所述应用服务器读取缓存并判断所述缓存中是否存在所述流量配置信息;
    若不存在,则所述应用服务器从所述多活切换平台读取所述流量配置信息;
    所述应用服务器在监听到所述多活切换平台的所述流量配置信息发生变化时,读取变化后的流量配置信息并将所述变化后的流量配置信息同步到所述缓存中。
  3. 如权利要求1所述的方法,其特征在于,所述应用服务器根据所述流量分配和当前待处理的任务的类型信息判断所述应用服务器是否具有所述当前待处理的任务的处理权限包括:
    若所述应用服务器判断到当前待处理任务为独占型任务,则判断所述应用服务器所在的数据中心对应的流量分配是否为空;
    若不为空,则所述应用服务器具有所述当前待处理任务的处理权限。
  4. 如权利要求1所述的方法,其特征在于,所述多活数据中心具有一主数据中心,所述流量配置信息还包括所述主数据中心标识;
    所述应用服务器根据所述流量分配和当前待处理的任务的类型信息判断所述应用服务器是否具有所述当前待处理的任务的处理权限包括:
    若所述应用服务器判断到所述当前待处理任务为竞争型任务,则判断所述应用服务器对应的数据中心标识是否与所述主数据中心标识相同;
    若相同,则所述应用服务器具有所述当前待处理的任务的处理权限。
  5. 如权利要求1所述的方法,其特征在于,
    所述流量分配包括每一数据中心对应的具有读写权限的分库号的集合;
    所述应用服务器加载任务进行任务处理包括:
    所述应用服务器从所述缓存的任务队列查找所述当前待处理任务,若查询到,则根据所述当前待处理任务对应的分库号和所述应用服务器根据所在的数据中心具有读写权限的分库号判断所述应用服务器是否具有处理所述当前待处理任务的权限;
    若有权限,则所述应用服务器将所述当前待处理任务对应的分库号的状态确定为处理中并保存在任务配置信息中;
    若任务处理完成,则所述应用服务器将所述当前待处理任务对应的分库号的状态更改为待处理并保存在所述任务配置信息中。
  6. 一种基于多活数据中心的流量切换方法,其特征在于,所述方法包括:
    多活切换平台获取各数据中心的数据传输状态信息;所述多活数据中心具有至少两个数据中心;
    所述多活切换平台根据所述状态信息与预置的条件进行判断,当判断到需要进行流量切换时,则按照预置的规则生成流量配置信息以便应用服务器在接收到任务调度指令后,获取所述流量配置信息并结合获得的任务配置信息加载任务进行任务处理;所述流量配置信息用以指示每个数据中心对应的流量分配。
  7. 如权利要求6所示的方法,其特征在于,
    所述多活切换平台根据所述状态信息与预置的条件进行判断,当判断到需要进行流量切换时,则按照预置的规则生成流量配置信息包括:
    所述多活切换平台根据所述状态信息判断到有数据中心出现数据传输故障时按照未出现故障的数据中心的当前流量、流量阈值以及将竞争型任务对应的流量分配至同一个数据中心的规则进行流量分配生成包括所述各数据中心对应的流量分配以及承载所述竞争型任务的主数据中心的标识的流量配置信息。
  8. 如权利要求6所示的方法,其特征在于,所述方法还包括:
    所述多活切换平台将所述流量配置信息同步至缓存中,以便应用服务器从所述缓存中获取所述流量配置信息;
    所述多活切换平台在接收到所述应用服务器的流量配置信息获取请求时,将最新的流量配置信息发送至所述应用服务器。
  9. 一种基于多活数据中心的流量切换装置,其特征在于,所述装置包括:
    获取流量配置信息单元,用于接收到任务调度指令后,执行获取流量配置信息操作;所述流量配置信息为多活切换平台根据各数据中心的数据传输状态信息判断到有数据中心需要流量切换时按照预置的规则而生成;所述多活数据中心具有至少两个数据中心;所述流量配置信息用以指示每个数据中心对应的流量分配;
    解析单元,用于解析所述流量配置信息,获得所在数据中心的流量分配;
    权限判断单元,用于根据所述流量分配和当前待处理任务的类型信息判断是否具有所述当前待处理任务的处理权限;
    任务处理单元,用于在判断到有处理权限时,获取任务配置信息,并结合所述流量分配加载任务进行任务处理。
  10. 一种基于多活数据中心的流量切换装置,其特征在于,所述装置包括:
    数据传输状态信息获取单元,用于获取各数据中心的数据传输状态信息;所述多活数据中心具有至少两个数据中心;
    流量配置信息单元,用于根据所述状态信息与预置的条件进行判断,当判断到需要进行流量切换时,则按照预置的规则生成流量配置信息以便应用服务器在接收到任务调度指令后,获取所述流量配置信息并结合获得的任务配置信息加载任务进行任务处理;所述流量配置信息用以指示每个数据中心对应的流量分配。
PCT/CN2020/097003 2019-11-26 2020-06-19 一种基于多活数据中心的流量切换方法及装置 WO2021103499A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CA3162740A CA3162740A1 (en) 2019-11-26 2020-06-19 Traffic switching methods and devices based on multiple active data centers

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911174942.0 2019-11-26
CN201911174942.0A CN110990200B (zh) 2019-11-26 2019-11-26 一种基于多活数据中心的流量切换方法及装置

Publications (1)

Publication Number Publication Date
WO2021103499A1 true WO2021103499A1 (zh) 2021-06-03

Family

ID=70086988

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/097003 WO2021103499A1 (zh) 2019-11-26 2020-06-19 一种基于多活数据中心的流量切换方法及装置

Country Status (3)

Country Link
CN (1) CN110990200B (zh)
CA (1) CA3162740A1 (zh)
WO (1) WO2021103499A1 (zh)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110990200B (zh) * 2019-11-26 2022-07-05 苏宁云计算有限公司 一种基于多活数据中心的流量切换方法及装置
CN113300966B (zh) * 2020-07-27 2024-05-28 阿里巴巴集团控股有限公司 流量控制方法、装置、系统以及电子设备
CN112751782B (zh) * 2020-12-29 2022-09-30 微医云(杭州)控股有限公司 基于多活数据中心的流量切换方法、装置、设备及介质
CN113590314A (zh) * 2021-07-13 2021-11-02 上海一谈网络科技有限公司 网络请求数据处理方法和系统
CN114331576A (zh) * 2021-12-30 2022-04-12 福建博思软件股份有限公司 基于高并发场景下的电子票号快速取票方法及存储介质
CN114465960A (zh) * 2022-02-07 2022-05-10 北京沃东天骏信息技术有限公司 流量切换方法、装置和存储介质
CN117453150B (zh) * 2023-12-25 2024-04-05 杭州阿启视科技有限公司 录像存储调度服务多实例的实现方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108089923A (zh) * 2017-12-15 2018-05-29 中国民航信息网络股份有限公司 基于加权Voronoi图的用户接入区域划分方法和装置
EP3407527A1 (en) * 2016-01-18 2018-11-28 Alibaba Group Holding Limited Method, device, and system for data synchronization
CN109819004A (zh) * 2017-11-22 2019-05-28 中国人寿保险股份有限公司 用于部署多活数据中心的方法和系统
CN110225138A (zh) * 2019-06-25 2019-09-10 深圳前海微众银行股份有限公司 一种分布式架构
CN110990200A (zh) * 2019-11-26 2020-04-10 苏宁云计算有限公司 一种基于多活数据中心的流量切换方法及装置

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7870277B2 (en) * 2007-03-12 2011-01-11 Citrix Systems, Inc. Systems and methods for using object oriented expressions to configure application security policies
US8370835B2 (en) * 2009-03-12 2013-02-05 Arend Erich Dittmer Method for dynamically generating a configuration for a virtual machine with a virtual hard disk in an external storage device
US9185166B2 (en) * 2012-02-28 2015-11-10 International Business Machines Corporation Disjoint multi-pathing for a data center network
CN103888378B (zh) * 2014-04-09 2017-08-25 北京京东尚科信息技术有限公司 一种基于缓存机制的数据交换系统和方法
US9565129B2 (en) * 2014-09-30 2017-02-07 International Business Machines Corporation Resource provisioning planning for enterprise migration and automated application discovery
CN104407964B (zh) * 2014-12-08 2017-10-27 国家电网公司 一种基于数据中心的集中监控系统及方法
CN104506614B (zh) * 2014-12-22 2018-07-31 国家电网公司 一种基于云计算的分布式多活数据中心的设计方法
CN107231221B (zh) * 2016-03-25 2020-10-23 阿里巴巴集团控股有限公司 数据中心间的业务流量控制方法、装置及系统
CN106506588A (zh) * 2016-09-23 2017-03-15 北京许继电气有限公司 多地多中心的数据中心双活方法和系统
CN109542659A (zh) * 2018-11-14 2019-03-29 深圳前海微众银行股份有限公司 应用多活方法、设备、数据中心集群及可读存储介质
CN109660466A (zh) * 2019-02-26 2019-04-19 浪潮软件集团有限公司 一种面向云数据中心租户的多活负载均衡实现方法
CN110166524B (zh) * 2019-04-12 2023-04-07 未鲲(上海)科技服务有限公司 数据中心的切换方法、装置、设备及存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3407527A1 (en) * 2016-01-18 2018-11-28 Alibaba Group Holding Limited Method, device, and system for data synchronization
CN109819004A (zh) * 2017-11-22 2019-05-28 中国人寿保险股份有限公司 用于部署多活数据中心的方法和系统
CN108089923A (zh) * 2017-12-15 2018-05-29 中国民航信息网络股份有限公司 基于加权Voronoi图的用户接入区域划分方法和装置
CN110225138A (zh) * 2019-06-25 2019-09-10 深圳前海微众银行股份有限公司 一种分布式架构
CN110990200A (zh) * 2019-11-26 2020-04-10 苏宁云计算有限公司 一种基于多活数据中心的流量切换方法及装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
YI TIAN BU, ET AL: "Solution for multiple live databases in different places", NON-OFFICIAL TRANSLATION: MULTI-DATA CENTERS HIGH AVAILABILITY SOLUTION, 5 September 2018 (2018-09-05), pages 1 - 6, XP055809060, Retrieved from the Internet <URL:https://help.aliyun.com/knowledge_detail/72721.html> [retrieved on 20210531] *

Also Published As

Publication number Publication date
CN110990200B (zh) 2022-07-05
CA3162740A1 (en) 2021-06-03
CN110990200A (zh) 2020-04-10

Similar Documents

Publication Publication Date Title
WO2021103499A1 (zh) 一种基于多活数据中心的流量切换方法及装置
US11360854B2 (en) Storage cluster configuration change method, storage cluster, and computer system
US10713135B2 (en) Data disaster recovery method, device and system
CN106341454B (zh) 跨机房多活分布式数据库管理系统和方法
WO2019154394A1 (zh) 分布式数据库集群系统、数据同步方法及存储介质
KR101547719B1 (ko) 데이터 센터들에 걸쳐 데이터 서버들내 데이터 무결정의 유지
US9785691B2 (en) Method and apparatus for sequencing transactions globally in a distributed database cluster
US8856091B2 (en) Method and apparatus for sequencing transactions globally in distributed database cluster
CN106776121B (zh) 一种数据灾备装置、系统及方法
US20150261784A1 (en) Dynamically Varying the Number of Database Replicas
CN113515499B (zh) 一种数据库服务方法及系统
CN110224871A (zh) 一种Redis集群的高可用方法及装置
CN106487486B (zh) 业务处理方法和数据中心系统
CN106339278A (zh) 一种网络文件系统的数据备份及恢复方法
CN104243195A (zh) 异地灾备处理方法及装置
US20080082630A1 (en) System and method of fault tolerant reconciliation for control card redundancy
CN107357800A (zh) 一种数据库高可用零丢失解决方法
CN105323271B (zh) 一种云计算系统以及云计算系统的处理方法和装置
CN116185697B (zh) 容器集群管理方法、装置、系统、电子设备及存储介质
CN111404737B (zh) 一种容灾处理方法以及相关装置
CN116389233B (zh) 容器云管理平台主备切换系统、方法、装置和计算机设备
JP2023505879A (ja) 分散型データベースシステム及びデータ災害バックアップ訓練方法
CN116302716A (zh) 一种集群部署方法、装置、电子设备及计算机可读介质
CA2619778C (en) Method and apparatus for sequencing transactions globally in a distributed database cluster with collision monitoring
US20220121510A1 (en) Access Consistency in High-Availability Databases

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20892216

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 3162740

Country of ref document: CA

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20892216

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 20892216

Country of ref document: EP

Kind code of ref document: A1