CN114003580A

CN114003580A - Database construction method and device applied to distributed scheduling system

Info

Publication number: CN114003580A
Application number: CN202111087977.8A
Authority: CN
Inventors: 韩朔; 肖骁
Original assignee: Traffic Control Technology TCT Co Ltd
Current assignee: Traffic Control Technology TCT Co Ltd
Priority date: 2021-09-16
Filing date: 2021-09-16
Publication date: 2022-02-01

Abstract

The invention provides a database construction method and a database construction device applied to a distributed scheduling system, wherein the method comprises the following steps: according to the service scene of the distributed scheduling system, a relational database, a time sequence database, a distributed database and a memory database of the distributed scheduling system are established; the relational database, the time sequence database and the distributed database are established based on an open-source full-stack database; the in-memory database is created based on an open source Redis database. The method and the device for constructing the database applied to the distributed scheduling system are used for analyzing from the aspect of the service scene of the rail transit distributed scheduling system, solving the problem of data storage of the distributed scheduling system by adopting the open-source full-stack database technology and the Redis database technology, and simultaneously dynamically adjusting the database scheme by combining the actual scale of the scheduling system without changing the access mode of a database, thereby not influencing the architectural design of the system and reducing the design complexity and the development cost of the system.

Description

Database construction method and device applied to distributed scheduling system

Technical Field

The invention relates to the technical field of computers, in particular to a database construction method and a database construction device applied to a distributed scheduling system.

Background

In the conventional scheduling system, the scheduling management system is basically a single-line scheduling management system, most of the systems use Client/Server software developed by the conventional technology stack C + + language and C # language, which are not currently mainstream development technologies, and the databases thereof are mainly commercial databases (Oracle, SQL Server, etc.) or real-time databases (a part of industrial SCADA systems, such as GE, Wondware, PI, etc.), the cost of the commercial databases is high, and the requirements of the distributed scheduling system of the current rail transit cannot be met.

The distributed scheduling system of the current rail transit is oriented to a scheduling management system at a multi-line and network level, and the whole system is a distributed system realized by adopting micro-service and internet mainstream open source technology.

Because the design and technical implementation of the distributed scheduling system are greatly different from those of the traditional scheduling system, the design and technical model selection needs to be carried out by combining the characteristics of the distributed scheduling system.

Disclosure of Invention

The invention provides a database construction method and a database construction device applied to a distributed scheduling system, which are used for solving the defects of high cost and complex system design in the prior art of data storage processing based on a commercial database.

In a first aspect, the present invention provides a database construction method applied to a distributed scheduling system, including: according to the service scene of the distributed scheduling system, a relational database, a time sequence database, a distributed database and a memory database of the distributed scheduling system are established; the relational database, the time sequence database and the distributed database are established based on an open-source full-stack database; the in-memory database is created based on an open source Redis database.

Wherein, the creating of the relational database, the time sequence database, the distributed database and the memory database of the distributed scheduling system comprises: vertically splitting the distributed scheduling system into a plurality of micro-services, and constructing an independent database for each micro-service; each database is one of a relational database, a time sequence database, a distributed database and a memory database.

According to the database construction method of the distributed scheduling system provided by the invention, the method for vertically splitting the distributed scheduling system into a plurality of micro-services according to the service scene of the distributed scheduling system and constructing an independent database for each micro-service comprises the following steps:

the service scene of the distributed scheduling system is a part related to conventional service requirements, the part is vertically split into at least one class of micro-services, and a relational database is constructed for each class of micro-services;

the service scene of the distributed scheduling system is a part related to monitoring service requirements, the part is vertically split into at least one second type of micro-service, and a time sequence database is constructed for each second type of micro-service;

dividing a service scene of the distributed scheduling system into parts related to single-table horizontal capacity expansion service requirements, vertically dividing the parts into at least one three types of micro-services, and constructing a distributed database for the three types of micro-services;

and dividing the service scene of the distributed scheduling system into parts related to real-time response service requirements, vertically dividing the parts into at least four types of micro-services, and constructing an internal memory database for the four types of micro-services.

According to the database construction method of the distributed scheduling system, the relational database is specifically established based on an open-source PostgreSQL database; the time sequence database is established based on an open source TimescaleDB database; the distributed database is created based on the cis database.

According to the database construction method of the distributed scheduling system provided by the invention, the first class micro service, the second class micro service and the third class micro service are connected by adopting Java databases, so that data access to the first class sub database, the second class sub database and the third class sub database is respectively realized; and the four types of micro-services adopt a Redis access tool to realize data access to the four types of sub-databases.

According to the database construction method of the distributed scheduling system provided by the invention, the first-class sub database and the second-class sub database both adopt a high-availability cluster mode of service vertical splitting, one-master multi-slave, master-slave asynchronous stream replication, read-write separation and fault transfer; the three types of sub databases adopt a high-availability cluster mode of one main with multiple slaves, main-slave asynchronous stream replication, read-write separation and fault transfer, wherein a single coordination node carries metadata.

According to the database construction method of the distributed scheduling system provided by the invention, the three types of sub-databases adopt a high-availability cluster mode of one master and multiple slaves, master-slave asynchronous stream replication, read-write separation and fault transfer, wherein each working node carries metadata.

According to the database construction method of the distributed scheduling system provided by the invention, the four types of sub-databases adopt a high-availability cluster mode of a sentinel mode or a cluster mode.

In a second aspect, the present invention further provides a database construction device applied to a distributed scheduling system, where the database construction device is configured to create a relational database, a time sequence database, a distributed database, and a memory database of the distributed scheduling system according to a service scenario of the distributed scheduling system; the relational database, the time sequence database and the distributed database are established based on an open-source full-stack database; the in-memory database is created based on an open source Redis database. Wherein, the creating of the relational database, the time sequence database, the distributed database and the memory database of the distributed scheduling system comprises: vertically splitting the distributed scheduling system into a plurality of micro-services, and constructing an independent database for each micro-service; each database is one of a relational database, a time sequence database, a distributed database and a memory database.

In a third aspect, the present invention provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the computer program to implement the steps of any of the database construction methods applied to the distributed scheduling system.

In a fourth aspect, the present invention also provides a non-transitory computer readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the steps of the database construction method applied to the distributed scheduling system as described in any one of the above.

The method and the device for constructing the database applied to the distributed scheduling system are used for analyzing from the aspect of the service scene of the rail transit distributed scheduling system, solving the problem of data storage of the distributed scheduling system by adopting the open-source full-stack database technology and the Redis database technology, and simultaneously dynamically adjusting the database scheme by combining the actual scale of the scheduling system without changing the access mode of a database, thereby not influencing the architectural design of the system and reducing the design complexity and the development cost of the system.

Drawings

In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed for the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

FIG. 1 is a block diagram of a distributed scheduling system according to the present invention;

FIG. 2 is a highly available schematic diagram of PostgreSQL clusters provided by the present invention;

FIG. 3 is a schematic diagram of the high availability of a Citus single CN cluster provided by the present invention;

FIG. 4 is a highly available schematic diagram of the Citus MX cluster provided by the present invention;

FIG. 5 is a schematic diagram of a sentinel-style cluster high availability of the Redis database provided by the present invention;

FIG. 6 is a schematic diagram of Cluster high availability of a Redis database in a Cluster mode provided by the invention;

fig. 7 is a schematic structural diagram of an electronic device provided by the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that in the description of the embodiments of the present invention, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element. The terms "upper", "lower", and the like, indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, and are only for convenience in describing the present invention and simplifying the description, but do not indicate or imply that the referred devices or elements must have a specific orientation, be constructed and operated in a specific orientation, and thus, should not be construed as limiting the present invention. Unless expressly stated or limited otherwise, the terms "mounted," "connected," and "connected" are intended to be inclusive and mean, for example, that they may be fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.

The terms first, second and the like in the description and in the claims of the present application are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that embodiments of the application may be practiced in sequences other than those illustrated or described herein, and that the terms "first," "second," and the like are generally used herein in a generic sense and do not limit the number of terms, e.g., the first term can be one or more than one. In addition, "and/or" in the specification and claims means at least one of connected objects, a character "/" generally means that a preceding and succeeding related objects are in an "or" relationship.

With the rise of the internet web2.0 website, the traditional relational database is not good at dealing with the web2.0 website, especially the dynamic website with super-large scale and high concurrency, and exposes a lot of problems which are difficult to overcome, while the non-relational database (NoSQL) is developed very rapidly due to the characteristics of the non-relational database.

The NoSQL database is generated to solve the challenges of aggregating multiple data types in large-scale data, especially for solving the problems of large-scale data applications, including storage of very large-scale data.

The following describes a database construction method and apparatus applied to a distributed scheduling system according to an embodiment of the present invention with reference to fig. 1 to 7.

The invention provides a database construction method applied to a distributed scheduling system, which mainly comprises the following steps: and according to the service scene of the distributed scheduling system, creating a relational database, a time sequence database, a distributed database and a memory database of the distributed scheduling system.

The relational database, the time sequence database and the distributed database are established based on an open-source full stack database, and the memory database is established based on an open-source Redis database. Wherein, the creating of the relational database, the time sequence database, the distributed database and the memory database of the distributed scheduling system comprises: vertically splitting the distributed scheduling system into a plurality of micro-services, and constructing an independent database for each micro-service; each database is one of a relational database, a time sequence database, a distributed database and a memory database.

The database construction method applied to the distributed scheduling system provided by the invention is combined with the characteristics of the self service scene of the distributed scheduling system, and the optimal solution with low cost is realized by reasonably selecting the Internet open-source relational database and the NoSQL database.

The invention utilizes an open-source full-stack database, and particularly can simultaneously solve the creation of a relational database, a time sequence database and a distributed database by utilizing an ecosystem (PG ecosystem for short) of PostgreSQL.

Wherein, PG ecosystem is a full stack database of a special many length, and the functional component that supports has: online Transaction Processing (OLTP), Online Analytical Processing (OLAP), stream Processing (e.g., pipeline db extension), temporal data Processing (e.g., TimescaleDB extension), spatial data Processing (e.g., PostGIS extension), search index, NoSQL (e.g., JSON, JSONB, XML, HStore native support, etc.), data warehouse (greenplus), etc.

The centralized open source database employed by the present invention is briefly described below:

a Relational Database (RDBMS), also called PostgreSQL, is mainly used to solve the conventional service requirement in a distributed scheduling System. The relational database is established on the basis of a relational data model and is a database for processing data by means of mathematical concepts and methods such as set algebra and the like. PostgreSQL solves the problem of online transaction processing (OLTP) application of a distributed scheduling system, is oriented to basic and daily transaction processing, and is basic information giving, deleting, modifying and checking operations.

A Time Series Database (TSDB), which is called a Time Series Database (TimescaleDB), is mainly used to meet the requirement of comprehensive monitoring service in a distributed scheduling system. TimescaleDB is mainly used to refer to processing time-tagged (time-sequentially changed, i.e., time-sequenced) data, which is also referred to as time-series data. The distributed scheduling system relates to the comprehensive monitoring service, and the most part of the comprehensive monitoring data is time sequence data, which is a data column recorded according to time sequence and needs to be stored by using a TimescaleDB.

Distributed Database (DDB), such as cities, is mainly used to meet the requirement of large-data-volume single-table horizontal capacity expansion service in a Distributed scheduling system. Particularly, cis is an open source distributed database based on postgreSQL, and mainly solves horizontal capacity expansion and real-time data analysis of a large-data-volume single table. The operation diagram service of the distributed scheduling system relates to the requirement of a large single table and needs a status support.

An In-Memory Database (IMDB), such as Redis, is mainly used to address the service requirement of real-time response In a distributed scheduling system. Redis is an open source (BSD licensed), in-memory data structure storage system that can be used as database, cache, and message middleware. The distributed scheduling system has higher requirement on real-time performance, and stores data into Redis for services with high requirement on real-time performance, so that the access time can be obviously reduced, and the response time can be improved.

In addition, the PostgreSQL database is a fully featured, free-software object-relational database management system, supports most SQL standards and provides many other features such as complex queries, foreign keys, triggers, views, transaction integrity, multi-version concurrency control, and the like.

It should be noted that many open source software are matched with the PostgreSQL database, and there are many distributed cluster software, such as pgpool, pgcluster, slony, plploxy, etc., which are easy to make schemes of read-write separation, load balancing, data horizontal splitting, etc.

In view of the above, the invention solves the construction of the relational database, the time sequence database and the distributed database through the PG ecosystem by combining the characteristics of the full stack database of the PG ecosystem, so that when the single data component model selection meets all the requirements of the distributed scheduling system, the splitting and the integration of various heterogeneous technologies are not required to be considered, the design complexity of the distributed scheduling system can be greatly reduced, and meanwhile, the development cost and the operation and maintenance cost can be greatly saved.

Meanwhile, the Redis database is utilized to meet the memory database requirement of real-time response. The Redis database, also called remote dictionary service, is a key-value storage system (key-value) that takes values according to keywords. Similar to the distributed cache system Memcached, it supports relatively more stored value types, including string (string), linked list (list), set (set), sorted set (zset), hash type (hash), and so on. The data types all support push/pop, add/remove, intersection union and difference, and richer operations, and the operations are atomic.

On this basis, the Redis database supports various different ways of sorting. As with Memcached, to ensure efficiency, all data is cached in memory. The difference is that the Redis database periodically writes updated data into a disk or writes modification operations into an additional recording file, and master-slave synchronization (master-slave) is realized on the basis of the updated data or the modification operations.

A relational database refers to a database that organizes, stores, and manages data according to a data structure, and each database has one or more different Application Programming Interfaces (APIs) for creating, accessing, managing, searching, and copying stored data. Data may also be stored in files, but the speed of reading and writing data in a file is relatively slow. Therefore, it is now common to use a relational database management system to store and manage large amounts of data. Wherein, all data can be stored in the form of table: each row is provided with various record names, each column is provided with a data field corresponding to the record name, a plurality of rows and columns form a form, and a plurality of forms form the Database (Database).

The time series database is a database for storing time series (time-series) data and indexing by time (point or interval). The time series database includes: the data structure is simple, namely a certain metric index only has one value at a certain time point, and complex structures (nesting, hierarchy and the like) and relations (association, main foreign keys and the like) do not exist; the amount of data is large, and instant sequence data is typically generated, collected, and sent by a large number of data sources being monitored (such as hosts, IoT devices, terminals, or apps).

The distributed database provides a virtual database for the foreground in the form of middleware, and the middleware actually manages a plurality of database nodes in the background, so that a single table can be distributed to different database nodes for performance balancing. The overall architecture comprises a management server (Master), a Database Node server (DBN), a Database access Interface (DBF), and the like, wherein: the Master is used for providing total functions of resource allocation, load statistics and the like; the DBI is generally deployed on a client, is provided for an application program in a jar package mode, provides a query processing function, and can hide complexity of data access to a certain extent.

The memory database can far surpass the traditional database products (such as Oracle, SQL SERVER and the like) in speed, context switching and unnecessary network operation are eliminated by embedding the application library into the application program, the Data access performance is greatly improved, and compared with the traditional hard disk-based relational database, even if the Data to be accessed is totally cached from a hard disk into the memory of the relational database (such as a Data cache Data Buffer of Oracle SGA), the IMDB is remarkably improved from the responsiveness and the throughput rate by managing the Data in the memory and correspondingly optimizing the Data structure and the access algorithm.

In summary, the database construction method applied to the distributed scheduling system provided by the invention analyzes from the service scene level of the rail transit distributed scheduling system, selects the open-source full-stack database technology and the Redis database technology to solve the data storage problem of the distributed scheduling system, and simultaneously dynamically adjusts the database scheme by combining the actual scale of the scheduling system, without changing the access mode of the database, without affecting the architecture design of the system, and reduces the design complexity and development cost of the system.

As an optional embodiment, the creating a relational database, a timing database, a distributed database, and a memory database of the distributed scheduling system according to the service scenario of the distributed scheduling system includes:

according to the service scene of the distributed scheduling system, vertically splitting the distributed scheduling system into a plurality of micro-services, and constructing an independent database for each micro-service;

each database is one of a relational database, a time sequence database, a distributed database and a memory database.

The distributed scheduling system is a scheduling system facing to a network level, faces huge challenges in concurrency and data storage, and requires a database to have horizontal expansion capability. The distributed scheduling system mentioned in the database construction method applied to the distributed scheduling system provided by the invention is realized by micro-services, can be vertically split into different micro-services according to the services, and each micro-service is provided with an independent database, so that the requirement of the concurrency and storage of a single micro-service database is reduced, and the requirement of most databases can be met as long as the number of servers is increased.

As an optional embodiment, the vertically splitting the distributed scheduling system into a plurality of micro-services according to the service scenario of the distributed scheduling system, and constructing an independent database for each micro-service specifically includes:

the method comprises the steps that a service scene of a distributed scheduling system is a part related to conventional service requirements (namely conventional services), the service scene is vertically split into at least one class of micro-services, and a relational database is constructed for each class of micro-services;

the method comprises the steps that a service scene is a part related to monitoring service requirements (namely, the comprehensive monitoring service), the service scene is vertically split into at least one second-class micro-service, and each second-class micro-service constructs a time sequence database;

dividing a service scene into a part related to the single-table horizontal capacity expansion service requirement (namely a large single-table horizontal capacity expansion service), vertically dividing the service scene into at least three types of micro-services, and constructing a distributed database for each type of micro-service;

and vertically splitting a part related to the real-time response service demand (namely, the real-time response service) into at least four types of micro services, wherein the four types of micro services construct an internal memory database.

The database construction method applied to the distributed scheduling system provided by the invention utilizes the characteristics of the full-stack database of the PG ecosystem to dynamically adjust the database scheme according to the service scene requirements of actual projects, thereby reducing the complexity and the development requirements of system design.

Fig. 1 is a schematic diagram of a framework of a distributed scheduling system provided by the present invention, and as shown in fig. 1, the relational database is specifically created based on an open-source PostgreSQL database; the time sequence database is established based on an open source TimescaleDB database; the distributed database is created based on the cis database.

Specifically, the method dynamically adjusts the database scheme according to the requirements of the actual service scene by using the characteristics of the full-stack database of the PG ecosystem, and comprises the following steps: the method adopts an open-source basic PostgreSQL database, a TimescaleDB database and a cis database to simultaneously solve the construction problems of a relational database, a time sequence database and a distributed database, utilizes the support of an ecosystem of PostgreSQL to standard SQL to solve the access technology of each database of a PG ecosystem, and simultaneously combines a Redis database to solve the construction requirement of a memory database with real-time response.

The invention adopts the database solution method of the distributed scheduling system realized by the open source database technology, solves the relational database, the time sequence database and the distributed database of the rail transit distributed scheduling system by using the PG ecosystem, and can effectively reduce the complexity of the system design.

Based on the content of the above embodiment, as an optional embodiment, the first-type microservice, the second-type microservice and the third-type microservice all adopt a Java database connection mode to respectively realize data access to the first-type sub-database, the second-type sub-database and the third-type sub-database; and the four types of micro-services adopt a Redis access tool to realize data access to the four types of sub-databases.

Specifically, the database construction method applied to the distributed scheduling system provided by the invention relates to micro-service development, which mainly comprises the following steps:

1) PostgreSQL, TimescaleDB and Citus all belong to a PG ecosystem which supports standard SQL data access, and micro services access respective databases by using ava Database connection (Java Database Connectivity, JDBC).

JDBC), among others, is an application program interface in the Java language that is used to specify how a client program accesses a database, providing methods such as querying and updating data in the database.

2) And the application layer of each microservice realizes read-write separation so as to ensure more balanced load.

3) Each microserver may use a database connection pool to improve the read-write performance of the database.

Wherein the database connection pool is responsible for allocating, managing and releasing database connections, which allows an application to reuse one existing database connection instead of reestablishing one; database connections with idle times exceeding the maximum idle time are released to avoid database connection misses caused by not releasing the database connections.

4) The four-class microservice framework provides tools for access to Redis databases, and is ready-to-use when unpacked.

Based on the content of the above embodiment, as an optional embodiment, the first-class sub-database and the second-class sub-database both adopt a high-availability cluster mode of service vertical splitting, one master and multiple slaves, master-slave asynchronous stream replication, read-write separation and fault transfer;

the three types of sub databases adopt a high-availability cluster mode of one-master-multiple-slave, master-slave asynchronous flow replication, read-write separation and fault transfer, wherein a single coordination node carries metadata;

or, the three types of sub-databases adopt a high-availability cluster mode of one master and multiple slaves, master-slave asynchronous stream replication, read-write separation and fault transfer, wherein all working nodes carry metadata.

In addition, the four types of sub-databases may adopt a sentinel (sentinel) manner or a Cluster (Cluster) manner, which is a highly available Cluster manner.

In order to illustrate the scheme of the present invention more clearly, the following specifically illustrates the database design concept related to the database construction method applied to the distributed scheduling system provided by the present invention:

firstly, building a relational database by using a PostgreSQL database:

the PostgreSQL database is a kind of open source database closest to Oracle, such as a complex Structured Query Language (SQL) execution, stored procedures, triggers, indexes, multi-process architecture, and the like. Reliability is the highest priority characteristic of PostgreSQL, supporting high-transaction, mission-critical applications, and data consistency and integrity are also high priority characteristics of PostgreSQL. And the method has good maintainability, and is more suitable for enterprise-level business support systems compared with MySQL.

Additionally, PostgreSQL is an open source project driven entirely by the community, based on free BSD/MIT permissions, PostgreSQL permissions and ecology are completely open, not being controlled by any single individual, company or country, ensuring that users do not have any worries behind. The design principle related to the relational database mainly comprises the following steps:

the distributed scheduling system of the rail transit is realized based on the micro-service, namely the whole distributed scheduling system is vertically split into different micro-services on business, each micro-service has a database, according to business scene analysis, a single database server can bear a plurality of micro-service database instances, and the number of the database servers is increased along with the expansion of the micro-service, so that the horizontal expansion of the scheduling system can be ensured.

Fig. 2 is a schematic diagram of the High availability of the PostgreSQL cluster provided by the present invention, and as shown in fig. 2, a High Availability (HA) scheme related to a relational database is involved, in the database construction method of the distributed scheduling system provided by the present invention, the cluster High availability scheme related to the PostgreSQL database adopts: one master and multiple slaves, master-slave asynchronous stream copy, read-write separation and fault transfer are respectively embodied as follows:

(1) the copying mode of the database:

the existing PostgreSQL database adopts a copy mode, which comprises the following steps: synchronous stream replication, asynchronous stream replication, cascading stream replication, logical replication, single master replication, multi-master replication.

Considering that the realization of multi-master copy is complex and difficult to maintain, and the distributed scheduling system is vertically split into different micro-services on the service, and single master copy can meet the requirement, the invention adopts an asynchronous stream copy mode to copy data in consideration of the service characteristics of the distributed scheduling system.

In addition, when an asynchronous stream replication mode is adopted, one master and multiple slaves are adopted, the complexity of the system can be effectively reduced, and the system is easy to maintain.

(2) Selection of cluster highly available tools:

there are many open-source and highly available tools for PostgreSQL database, and the present invention does not specifically limit the use and selection thereof, and mainly includes: database connections Pgpool-II, high availability environment deployments (PAF), replication manager Repmgr, stroni software, etc.

(3) Development of microservices:

the PostgreSQL database supports data access of standard SQL, and the micro-service uses JDBC to access the database; the micro service realizes read-write separation and load balance in an application layer; the microservice uses a database connection pool to improve the read-write performance of the database.

(4) Micro-service access configuration:

because the master node of the PostgreSQL database high-availability cluster is dynamically changed, when the master node and the standby node are switched, the access of the client to the database also needs to be dynamically connected to a new master node, and the method has several common implementation modes:

mode 1, multi-host Uniform Resource Locator (URL): the driver of the JDBC related to PostgreSQL can configure a plurality of Internet Protocol (IP) in a connection string, and the driver identifies the primary and standby roles of the database, connects appropriate nodes, and supports failover, read-write separation, and load balancing.

Mode 2, virtual IP: a plurality of stateless single points can be built into a high-availability service by means of VIP drifting through keepalived.

And the mode 3, haproxy, is used as a service agent and a high-availability tool for matching use, and supports fault transfer, read-write separation and load balancing.

Secondly, constructing a time sequence database by utilizing the TimescaleDB database:

the TimescaleDB database is a time sequence database which is created in a plug-in mode on the basis of the PostgreSQL database, and is upgraded along with the version upgrade of the PostgreSQL, so that the trouble caused by another branch is avoided. The aim of the method is to combine the natural expansion capability of the NoSQL database with the reliability and query support of the traditional relational database, so that the timescaleDB database supports the key characteristics of the current time sequence database.

It should be noted that the design principle of the TimescaleDB database is substantially the same as the principle of designing the relational database (PostgreSQL database) in the above embodiment, and the cluster high availability scheme of the system also adopts service vertical splitting and one master and multiple slaves, so the design principle about the PostgreSQL database can be referred to.

In addition, for the development of the microservice of the TimescaleDB database, the development scheme of accessing the TimescaleDB database by the microservice is basically the same as the development scheme for the relational database (PostgreSQL database) in the above embodiment, so the development scheme of the microservice database of PostgreSQL may also be referred to, and details are not described herein.

(III) building a distributed database by using a cis database:

the cis database is an open-source distributed database based on a PostgreSQL database, and automatically inherits the strong SQL support capability and application ecology of the PostgreSQL database (not only compatibility of client protocols but also complete compatibility of server extension and management tools).

Compared with other similar distributed schemes based on PostgreSQL database-related servers (such as greenplus, PostgreSQL-XL, PostgreSQL-XC), the biggest difference of the cis database is that it is based on PostgreSQL database extension rather than a separate code branch. The cis database can closely follow the version evolution of the postgreSQL database with little cost and higher speed, and simultaneously can ensure the stability and compatibility of the database to the maximum extent.

The cis cluster is composed of a central Coordination Node (CN) and a plurality of work nodes (Worker). CN only stores metadata related to data distribution, and actual table data is divided into M fragments and scattered to N Worker. The corresponding table is called a fragment table, and multiple copies can be created for each fragment of the fragment table, so that high availability and load balance are realized. The Citus database is also highly available using the PostgreSQL database native stream replication.

(1) Design principles for the cis database:

the cis database mainly addresses horizontal expansion of large data volume sheets (a problem that PostgreSQL and TimescaleDB databases cannot solve), real-time data analysis. According to the concurrency condition of the distributed scheduling system, the invention mainly selects two high-availability schemes of the cluster: single CN cluster and MX cluster (cis MX).

Fig. 3 is a schematic diagram of high availability of a cis single CN cluster provided by the present invention, and as shown in fig. 3, a single CN cluster refers to a CN and a plurality of workers, and is a cluster scheme for solving a scenario with little concurrent read-write.

Fig. 4 is a schematic diagram of high availability of a cis MX cluster provided by the present invention, and as shown in fig. 4, the MX cluster refers to a cluster scheme for enabling each worker node to carry metadata, and the worker nodes carrying the metadata all support read-write capability, so as to solve a high concurrent read-write scenario.

(2) High available solutions for the cis database:

in the high-availability scheme of either a single CN cluster or a Citus MX cluster, each CN and each Worker respectively perform one master and multiple slaves, master-slave asynchronous flow replication, read-write separation and failover.

In addition, the selection of the highly available tools is substantially consistent with the relational database (i.e., PostgreSQL database), and will not be described herein.

(3) Development of microservices on the cis database:

access to the cis cluster is achieved by coordinating the CN with the worker carrying the metadata (e.g., MX); the development scheme of accessing the cis database by the microservice is basically the same as the mode of accessing the PostgreSQL database provided in the above embodiment, and the development scheme of the microservice database of the PostgreSQL server may be referred to, which is not described herein again.

(IV) constructing an in-memory database by utilizing a Redis database:

since the Redis database is fully open-sourced, it is a high-performance key-value database, complying with the Berkeley Software Distribution (BSD). And the Key-value database is a database that stores data in Key-value pairs, like maps in Java. The entire database can be understood as a large map, with each key corresponding to a unique value.

The Redis database is a key-value-based cache database and has the following three characteristics:

(1) the Redis database supports data persistence, can store data in a memory in a disk, and can be loaded again for use when in restart.

(2) The Redis database not only supports simple key-value type data, but also provides data storage for data structures such as list, set, zset, hash, and the like.

(3) Redis supports backup of data, i.e., data backup in a master-slave mode.

In view of this, the design principle of the Redis database provided by the present invention is as follows: the Redis database is based on a key-value database of an internal memory, and the real-time data response requirement (the specific design needs to be combined with an actual service scene) of a distributed scheduling system is met.

As an optional embodiment, in the database construction method applied to the distributed scheduling system, the four types of sub-databases (i.e., Redis databases) may adopt one of a sentinel mode or a Cluster mode, which is a high-availability Cluster mode.

Fig. 5 is a schematic diagram of cluster high availability of a Redis database in a sentinel manner, as shown in fig. 5, when the Redis database is in a Master-Slave copy mode, once a Master server Master cannot provide services due to a failure, any Slave server Slave needs to be manually promoted to the Master, and an application party needs to be notified to update a Master server address, which is unacceptable for many application scenarios (the disadvantage of the Redis database in Master-Slave copy: there is no way to dynamically elect the Master).

In view of this, the database construction method of the distributed scheduling system provided by the invention adopts a sentinel mode to complete dynamic election to solve the defect. Specifically, the Sentinel process is mainly used for monitoring the working state of a Master in a Redis cluster main server, when the Master fails, a notification is sent to an administrator or other application programs through an API, a node can be selected from all the plurality of Slave programs in a competitive mode and serves as a new Master, switching between the Master and the Slave servers can be achieved, and high availability of the system is guaranteed.

Fig. 6 is a schematic diagram of Cluster high availability of the Redis database in a Cluster manner, as shown in fig. 6, the Cluster manner is a Redis distributed Cluster solution proposed by a community edition, mainly solves the requirement of the Redis distribution aspect, and has the characteristics of high availability, expandability, distribution, fault tolerance and the like.

The Cluster mode mainly provides certain availability through partitioning, automatically divides data to different nodes to continue processing commands under the condition that a certain node is down or unreachable in an actual environment, and ensures that commands can continue to be processed under the condition that part of nodes of the whole Cluster fail or are unreachable.

In summary, the database construction method applied to the distributed scheduling system provided by the invention utilizes the cluster technology of the postgreSQL database and the Redis database to realize high availability, load balancing and read-write separation, thereby ensuring the stability of the system.

The invention also provides a database construction device applied to the distributed scheduling system, and the database construction device is used for creating the relational database, the time sequence database, the distributed database and the memory database of the distributed scheduling system according to the service scene of the distributed scheduling system.

The relational database, the time sequence database and the distributed database are established based on an open-source full-stack database; the in-memory database is created based on an open source Redis database. Wherein, the creating of the relational database, the time sequence database, the distributed database and the memory database of the distributed scheduling system comprises: vertically splitting the distributed scheduling system into a plurality of micro-services, and constructing an independent database for each micro-service; each database is one of a relational database, a time sequence database, a distributed database and a memory database.

The database construction device applied to the distributed scheduling system analyzes from the aspect of the service scene of the rail transit distributed scheduling system, selects the open-source full-stack database technology and the Redis database technology to solve the data storage problem of the distributed scheduling system, and simultaneously dynamically adjusts the database scheme by combining the actual scale of the scheduling system without changing the access mode of the database, so that the architecture design of the system is not influenced, and the design complexity and the development cost of the system are reduced.

It should be noted that, during specific operation, the database construction apparatus applied to the distributed scheduling system according to the embodiment of the present invention may execute the database construction method applied to the distributed scheduling system according to any of the above embodiments, which is not described in detail in this embodiment.

Fig. 7 is a schematic structural diagram of an electronic device provided in the present invention, and as shown in fig. 7, the electronic device may include: a processor (processor)710, a communication Interface (Communications Interface)720, a memory (memory)730, and a communication bus 740, wherein the processor 710, the communication Interface 720, and the memory 730 communicate with each other via the communication bus 740. Processor 710 may invoke logic instructions in memory 730 to perform a database building method for use with a distributed scheduling system, the method comprising: according to the service scene of the distributed scheduling system, a relational database, a time sequence database, a distributed database and a memory database of the distributed scheduling system are established; the relational database, the time sequence database and the distributed database are established based on an open-source full-stack database; the in-memory database is created based on an open source Redis database. Wherein, the creating of the relational database, the time sequence database, the distributed database and the memory database of the distributed scheduling system comprises: vertically splitting the distributed scheduling system into a plurality of micro-services, and constructing an independent database for each micro-service; each database is one of a relational database, a time sequence database, a distributed database and a memory database.

In addition, the logic instructions in the memory 730 can be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

In another aspect, the present invention also provides a computer program product, which includes a computer program stored on a non-transitory computer-readable storage medium, the computer program including program instructions, when the program instructions are executed by a computer, the computer being capable of executing the database construction method applied to the distributed scheduling system provided by the above methods, the method including: according to the service scene of the distributed scheduling system, a relational database, a time sequence database, a distributed database and a memory database of the distributed scheduling system are established; the relational database, the time sequence database and the distributed database are established based on an open-source full-stack database; the in-memory database is created based on an open source Redis database. Wherein, the creating of the relational database, the time sequence database, the distributed database and the memory database of the distributed scheduling system comprises: vertically splitting the distributed scheduling system into a plurality of micro-services, and constructing an independent database for each micro-service; each database is one of a relational database, a time sequence database, a distributed database and a memory database.

In still another aspect, the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, the computer program being implemented by a processor to execute the database construction method applied to the distributed scheduling system provided in the foregoing embodiments, the method including: according to the service scene of the distributed scheduling system, a relational database, a time sequence database, a distributed database and a memory database of the distributed scheduling system are established; the relational database, the time sequence database and the distributed database are established based on an open-source full-stack database; the in-memory database is created based on an open source Redis database. Wherein, the creating of the relational database, the time sequence database, the distributed database and the memory database of the distributed scheduling system comprises: vertically splitting the distributed scheduling system into a plurality of micro-services, and constructing an independent database for each micro-service; each database is one of a relational database, a time sequence database, a distributed database and a memory database.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A database construction method applied to a distributed scheduling system is characterized by comprising the following steps:

according to the service scene of the distributed scheduling system, a relational database, a time sequence database, a distributed database and a memory database of the distributed scheduling system are established; the relational database, the time sequence database and the distributed database are established based on an open-source full-stack database; the memory database is established based on an open source Redis database;

wherein, the creating of the relational database, the time sequence database, the distributed database and the memory database of the distributed scheduling system comprises:

vertically splitting the distributed scheduling system into a plurality of micro-services, and constructing an independent database for each micro-service; each database is one of a relational database, a time sequence database, a distributed database and a memory database.

2. The method as claimed in claim 1, wherein the vertically splitting the distributed scheduling system into multiple microservices according to the service scenario of the distributed scheduling system, and constructing an independent database for each microservice comprises:

dividing a service scene of the distributed scheduling system into parts related to single-table horizontal capacity expansion service requirements, vertically dividing the parts into at least one three types of micro-services, and constructing a distributed database for each three types of micro-services;

and dividing the service scene of the distributed scheduling system into parts related to real-time response service requirements, vertically dividing the parts into at least four types of micro-services, and constructing an internal memory database for each type of micro-service.

3. The database construction method applied to the distributed scheduling system according to claim 1, wherein the relational database is specifically created based on an open-source PostgreSQL database;

the time sequence database is established based on an open source TimescaleDB database;

the distributed database is created based on the cis database.

4. The database construction method applied to the distributed scheduling system according to claim 2, wherein the first class of micro-service, the second class of micro-service and the third class of micro-service all adopt a Java database connection mode to respectively realize data access to the first class of sub-database, the second class of sub-database and the third class of sub-database;

and the four types of micro-services adopt a Redis access tool to realize data access to the four types of sub-databases.

5. The database construction method applied to the distributed scheduling system according to claim 4, wherein the first-class sub-database and the second-class sub-database both adopt a high-availability cluster mode of service vertical splitting, one-master-multiple-slave, master-slave asynchronous stream replication, read-write separation and failover;

the three types of sub databases adopt a high-availability cluster mode of one main with multiple slaves, main-slave asynchronous stream replication, read-write separation and fault transfer, wherein a single coordination node carries metadata.

6. The database construction method applied to the distributed scheduling system according to claim 4, wherein the three types of sub-databases adopt a high-availability cluster mode of one master and multiple slaves, master-slave asynchronous stream replication, read-write separation and failover in which each working node carries metadata.

7. The database construction method applied to the distributed scheduling system according to claim 4, wherein the four types of sub-databases are in a sentinel manner or a cluster manner with high availability.

8. The database construction device is used for creating a relational database, a time sequence database, a distributed database and a memory database of the distributed scheduling system according to a service scene of the distributed scheduling system;

the relational database, the time sequence database and the distributed database are established based on an open-source full-stack database;

the memory database is established based on an open source Redis database;

9. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements the database construction method steps applied to the distributed scheduling system according to any one of claims 1 to 7 when executing the computer program.

10. A non-transitory computer readable storage medium, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the database construction method steps for a distributed scheduling system according to any one of claims 1 to 7.