CN108763572B

CN108763572B - Method and device for realizing Apache Solr read-write separation

Info

Publication number: CN108763572B
Application number: CN201810573076.1A
Authority: CN
Inventors: 何小成; 王晓斌; 黄三伟
Original assignee: Hunan Eefung Software Co ltd
Current assignee: Hunan Eefung Software Co ltd
Priority date: 2018-06-06
Filing date: 2018-06-06
Publication date: 2021-06-22
Anticipated expiration: 2038-06-06
Also published as: CN108763572A

Abstract

The application relates to a method and a device for realizing Apache Solr read-write separation, computer equipment and a storage medium. The method comprises the following steps: receiving a data writing request of a persistence program, and writing data into a writing cluster and a snapshot cluster; receiving a search request of an Apache Solr client, and searching data from a reading cluster and a snapshot cluster; receiving a segment merging instruction sent by a persistence program, and executing segment merging operation on data written into a cluster; receiving a synchronous instruction sent by a persistence program, and incrementally loading an index file from a data directory written into a cluster into an off-heap memory of a read cluster; and receiving a data cleaning instruction sent by the persistence program, and cleaning the expired data which is synchronized in the snapshot cluster. By adopting the method, the read operation and the write operation can be separated, the situation of competing system resources is avoided, the normal operation of the server is ensured, the segment merging operation is completed before the data synchronization, and the system crash caused by segment merging during the synchronization is avoided.

Description

Method and device for realizing Apache Solr read-write separation

Technical Field

The present application relates to the field of computing technologies, and in particular, to a method and an apparatus for implementing Apache Solr read-write separation, a computer device, and a storage medium.

Background

With the development of computer technology, an Apache Solr technology appears, which is a powerful enterprise-level search engine, is implemented based on Lucene, supports near-real-time full-text search, has a mechanism of 'segment merging', and when the number of written data segments reaches a set threshold value, triggers 'segment merging', and merges a plurality of segments with smaller data volume into one segment. When the write is performed at high concurrency of hundred million levels of data volume, Apache Solr frequently triggers segment merging, while the Apache Solr has a write lock during the segment merging, so that the real-time write speed is influenced, a large number of disk IO operations exist, and a large number of read operations are generated due to a search request, and the read operations and the write operations occupy system resources, so that the server is crashed.

Disclosure of Invention

Therefore, it is necessary to provide a method, an apparatus, a computer device, and a storage medium for separating Apache Solr read and write operations to ensure normal operation of a server by isolating the read operation and the write operation from each other, in order to solve the problem of server crash caused by the read and write operation competition.

A method for implementing Apache Solr read-write separation, the method comprising:

receiving a data writing request of a persistence program, and writing data into a writing cluster and a snapshot cluster according to the data writing request;

receiving a search request of an Apache Solr client, and searching data from a reading cluster and a snapshot cluster according to the search request;

receiving a segment merging instruction sent by a persistence program, and executing segment merging operation on data written into a cluster according to the segment merging instruction;

receiving the segment merging instruction, executing segment merging operation according to the segment merging instruction, receiving a synchronous instruction sent by a persistence program, and incrementally loading an index file from a data directory written into a cluster into an off-heap memory of a read cluster according to the synchronous instruction;

and after receiving the synchronization instruction and executing synchronization operation according to the synchronization instruction, receiving a data cleaning instruction sent by a persistence program, and cleaning out the synchronized overdue data in the snapshot cluster according to the data cleaning instruction.

In one embodiment, before the receiving a data write request of a persister, writing data into a write cluster and a snapshot cluster according to the data write request, the method further includes:

respectively deploying the Apache Solr system into a write cluster and a read cluster by adopting docker containerization deployment;

and deploying the Apache Solr system into the snapshot cluster in a single deployment mode.

In one embodiment, the write cluster receives a write of data, and implements persistent storage of the data;

the reading cluster provides search service to the outside through the off-heap memory storage index;

the snapshot cluster adopts an off-heap memory storage index, receives data writing and provides search service for the outside.

In one embodiment, the receiving a synchronization instruction sent by a persistent program, and incrementally loading an index file from a data directory written in a cluster to an off-heap memory of a read cluster according to the synchronization instruction includes:

receiving a synchronization instruction sent by a persistence program;

presetting synchronization time according to the synchronization instruction;

and carrying out data synchronization according to the synchronization time, and loading the data written in the period of time from the writing cluster increment to an off-stack memory of the reading cluster.

In one embodiment, the receiving a data write request of a persistent program, and writing data into a write cluster and a snapshot cluster according to the data write request further includes:

and when the data is written, the index data is written into the write cluster and the snapshot cluster.

An Apache Solr read-write separation apparatus, the apparatus comprising:

the data writing module is used for receiving a data writing request of the persistence program and writing data into the writing cluster and the snapshot cluster according to the data writing request;

the data searching module is used for receiving a searching request of an Apache Solr client and searching data from the reading cluster and the snapshot cluster according to the searching request;

the segment merging module is used for receiving a segment merging instruction sent by the persistence program and executing segment merging operation on data written into the cluster according to the segment merging instruction;

the data synchronization module is used for receiving the segment merging instruction, executing segment merging operation according to the segment merging instruction, receiving a synchronization instruction sent by a persistence program, and incrementally loading an index file from a data directory written into the cluster into an off-stack memory of the read cluster according to the synchronization instruction;

and the data cleaning module is used for receiving the synchronization instruction, receiving a data cleaning instruction sent by a persistence program after the synchronization operation is executed according to the synchronization instruction, and cleaning out the synchronized overdue data in the snapshot cluster according to the data cleaning instruction.

In one embodiment, the apparatus further comprises:

the write cluster deployment module is used for deploying the Apache Solr system into the write cluster by adopting docker containerization deployment;

the reading cluster deployment module is used for deploying the Apache Solr system into the reading cluster by adopting docker containerization deployment;

and the snapshot cluster deployment module is used for deploying the Apache Solr system into the snapshot cluster in a single deployment mode.

In one embodiment, the data synchronization module comprises:

the synchronous instruction receiving unit is used for receiving a synchronous instruction sent by the persistence program;

the preset unit is used for presetting the synchronous time according to the synchronous instruction;

and the write data loading unit is used for carrying out data synchronization according to the synchronization time and loading the data written in the period of time from the write cluster increment to the off-stack memory of the read cluster.

A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:

A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:

According to the Apache Solr read-write separation method, the Apache Solr read-write separation device, the Apache Solr computer equipment and the storage medium, data are written into the write-in cluster and the snapshot cluster, data are searched from the read cluster and the snapshot cluster, read operation and write operation are separated, the situation of competition of system resources is avoided, normal operation of a server is guaranteed, meanwhile, a persistence program sends a segment merging instruction to the write-in cluster before data synchronization, segment merging operation can be completed before synchronization, system resource bottleneck or system crash caused by segment merging during synchronization is avoided, the read cluster is synchronized once every certain time, the problem that cache is frequently emptied due to frequent writing is avoided, the cache utilization rate of the read cluster is greatly improved, and the search efficiency is improved.

Drawings

FIG. 1 is a schematic flow chart illustrating an embodiment of an Apache Solr read-write separation method;

FIG. 2 is a schematic flow chart illustrating an Apache Solr read-write separation method in accordance with another embodiment;

FIG. 3 is a flowchart illustrating a step of receiving a synchronization instruction sent by a persistent program, and incrementally loading an index file from a data directory of a write cluster to an off-heap memory of a read cluster according to the synchronization instruction in one embodiment;

FIG. 4 is a block diagram of an embodiment of an Apache Solr read-write separation apparatus;

FIG. 5 is a block diagram of an alternative embodiment of an Apache Solr read/write separation apparatus;

FIG. 6 is a block diagram of an Apache Solr read/write separation apparatus in accordance with yet another embodiment;

FIG. 7 is a diagram illustrating an internal structure of a computer device according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

In one embodiment, as shown in FIG. 1, there is provided an Apache Solr read-write separation method, comprising the steps of:

s101, receiving a data writing request of a persistence program, and writing data into a writing cluster and a snapshot cluster according to the data writing request.

The "persistent program" refers to a program for implementing data writing, that is, implementing persistent storage after data is written into the Apache Solr cluster.

Apache Solr is an open-source search server, developed by using Java language and mainly realized based on HTTP and Apache Lucene, the resources stored in the Apache Solr are stored by taking Document as an object, each Document is composed of a series of fields, and each Field represents an attribute of the resource. Each Document in the Solr needs to have an attribute that can uniquely identify itself, and the name of this attribute is id by default, and is used in the Schema configuration file: < uniqueKey id </uniqueKey > to describe.

Specifically, the write-in cluster receives write-in of data, persistent storage of the data is achieved, and the snapshot cluster receives the write-in of the data and provides search service for the outside at the same time.

S103, receiving a search request of the Apache Solr client, and searching data from the reading cluster and the snapshot cluster according to the search request.

The main characteristics of Apache Solr include: the method has the advantages of high-efficiency and flexible caching function, vertical searching function, highlighted display of searching results, improvement of usability through index replication, provision of a set of powerful Data Schema to define fields, type and set text analysis, provision of a Web-based management interface and the like.

Specifically, the read cluster provides search services to the outside through the off-heap memory storage index.

The snapshot cluster is separately deployed and adopts an off-stack memory to store indexes, the data writing and reading are completed in the memory, the problem of disk IO performance bottleneck does not exist, and the reading and writing performance is ensured; in addition, the snapshot cluster simultaneously receives data writing and provides search service to the outside, and the consistency of data reading and writing is ensured.

And S105, receiving a segment merging instruction sent by the persistence program, and executing segment merging operation on the data written into the cluster according to the segment merging instruction.

The Apache Solr has a 'segment merging' mechanism, and when the number of written data segments reaches a set threshold, segment merging is triggered, and a plurality of segments with smaller data volume are merged into one segment. During high concurrent writing of hundred million levels of data volume, Apache Solr frequently triggers segment merging, and the Apache Solr has a write lock during segment merging, so that the real-time writing speed is influenced, and a large number of disk IO operations exist.

Specifically, before data synchronization, the persistent program sends a segment merging instruction to the write cluster, so that segment merging operation can be completed before synchronization, and system resource bottleneck or system crash caused by segment merging during synchronization is avoided.

And S107, after receiving the segment merging instruction and executing the segment merging operation according to the segment merging instruction, receiving a synchronization instruction sent by a persistence program, and incrementally loading an index file from a data directory written into the cluster into an off-heap memory of the read cluster according to the synchronization instruction.

Specifically, if some files in the write group have been synchronized in reads: if the file exists in the write group, the file is loaded into the off-heap memory if the file does not exist in the read cluster, if the file does not exist in the write cluster, but exists in the read cluster, the file is deleted from the off-heap memory by the read cluster, and if the file exists in the read cluster, the file is not processed.

And S109, after receiving the synchronization instruction and executing the synchronization operation according to the synchronization instruction, receiving a data cleaning instruction sent by a persistence program, and cleaning the synchronized outdated data in the snapshot cluster according to the data cleaning instruction.

Specifically, data synchronization is performed at regular intervals, newly written data in the period of time is incrementally loaded from the write cluster to the read cluster, and corresponding data in the snapshot cluster is cleaned.

According to the Apache Solr read-write separation method, data are written into a write-in cluster and a snapshot cluster, the data are searched from the read cluster and the snapshot cluster, read operation and write operation are separated, the situation of competition on system resources is avoided, normal operation of a server is guaranteed, meanwhile, before data synchronization is carried out, a persistence program sends a segment merging instruction to the write-in cluster, segment merging operation can be completed before synchronization, the problem that system resources are bottleneck or system crash caused by segment merging during synchronization is avoided, the read cluster is synchronized once at a certain time interval, the problem that cache is frequently emptied due to frequent write-in is avoided, the cache utilization rate of the read cluster is greatly improved, and the search efficiency is improved.

In another embodiment, as shown in fig. 2, there is provided an Apache Solr read-write separation method, further comprising:

s201, adopting docker containerization deployment to deploy the Apache Solr system to the write cluster and the read cluster respectively.

The docker is an open-source application container engine, developers can package applications and dependency packages of the developers into a portable container and then distribute the portable container to any popular Linux machine, virtualization can be achieved, the container completely uses a sandbox mechanism, and no interface exists between the containers.

The docker uses a client-server (C/S) architecture model, uses a remote API to manage and create docker containers, which are created through docker mirroring, with container-to-mirror relationships similar to objects and classes in object-oriented programming.

Specifically, the deployment of the Apache Solr system to the write cluster and the read cluster is respectively realized by adopting docker containerization deployment, the write cluster and the read cluster are isolated, and the problem that the reading performance is sharply reduced due to the rapid surge of the server load caused by IO (input/output) waiting brought by segment merging can be avoided.

S203, the Apache Solr system is deployed to the snapshot cluster in a single deployment mode.

Specifically, the snapshot cluster is deployed independently and stores indexes by using an off-heap memory, and data writing and reading are completed in the memory.

The in-heap and out-heap memories refer to memories allocated when Java.nio.DirectBlueBuffer is created, and the method has the advantages of improving IO efficiency and avoiding copying data from a user state to a kernel state; GC times are reduced, and a large amount of in-heap memory is saved. In a relational database, an index is a single, physical storage structure that orders one or more columns of values in a database table, which is a collection of one or more columns of values in a table and a corresponding list of logical pointers to data pages in the table that physically identify the values. The index is equivalent to the directory of the book, and the required content can be quickly found according to the page number in the directory.

According to the Apache Solr read-write separation method, the write cluster and the read cluster are deployed in a docker container mode, so that the write cluster and the read cluster are isolated from each other on resources, and the problem that the write cluster and the read cluster compete with each other for system resources is solved. The snapshot cluster is separately deployed and adopts an off-stack memory to store indexes, the data writing and reading are completed in the memory, the problem of disk IO performance bottleneck does not exist, and the reading and writing performance is ensured; in addition, the snapshot cluster simultaneously receives data writing and provides search service to the outside, and the consistency of data reading and writing is ensured.

In an embodiment, as shown in fig. 3, a step of receiving a synchronization instruction sent by a persistent program, and incrementally loading an index file from a data directory written in a cluster into an off-heap memory of a read cluster according to the synchronization instruction is provided, where the step includes:

s301, receiving a synchronization instruction sent by the persistence program.

And S303, presetting synchronization time according to the synchronization instruction.

Specifically, the server receives a synchronization instruction sent by the persistence program, analyzes the synchronization instruction, and presets synchronization time according to the synchronization instruction.

S305, carrying out data synchronization according to the synchronization time, and loading the data written in the period of time from the write cluster increment to the off-stack memory of the read cluster.

Before the synchronous operation is executed according to the synchronization, the persistent program sends a segment merging instruction to the write-in cluster, so that the segment merging operation can be completed before the synchronization, and the problem that the system resource bottleneck or the system crash is caused by segment merging during the synchronization is avoided.

It should be understood that although the various steps in the flow charts of fig. 1-3 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 1-3 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternating with other steps or at least some of the sub-steps or stages of other steps.

In one embodiment, as shown in fig. 4, there is provided an Apache Solr read-write separation apparatus, including: a data writing module 401, a data searching module 403, a segment merging module 405, a data synchronizing module 407 and a data cleaning module 409, wherein:

the data writing module 401 is configured to receive a data writing request of a persistent program, and write data into a write cluster and a snapshot cluster according to the data writing request.

And a data searching module 403, configured to receive a search request from an Apache Solr client, and search data from the read cluster and the snapshot cluster according to the search request.

The segment merging module 405 is configured to receive a segment merging instruction sent by the persistent program, and perform a segment merging operation on data written in the cluster according to the segment merging instruction.

And the data synchronization module 407 is configured to receive the segment merging instruction, perform a segment merging operation according to the segment merging instruction, receive a synchronization instruction sent by the persistence program, and incrementally load an index file from the data directory written into the cluster into the off-stack memory of the read cluster according to the synchronization instruction.

And the data cleaning module 409 is configured to receive the synchronization instruction, execute synchronization operation according to the synchronization instruction, receive a data cleaning instruction sent by a persistence program, and clean out outdated data synchronized in the snapshot cluster according to the data cleaning instruction.

According to the Apache Solr read-write separation device, data are written into the write-in cluster and the snapshot cluster, the data are searched from the read cluster and the snapshot cluster, the read operation and the write operation are separated, the situation of competition on system resources is avoided, normal operation of a server is guaranteed, meanwhile, before data synchronization is carried out, a persistence program sends a segment merging instruction to the write-in cluster, segment merging operation can be completed before synchronization, the problem that system resource bottlenecks or system collapse is caused by segment merging during synchronization is avoided, the read cluster is synchronized once at a certain time interval, the problem that cache is frequently emptied due to frequent write-in is avoided, the cache utilization rate of the read cluster is greatly improved, and the search efficiency is improved.

In another embodiment, as shown in fig. 5, there is provided an Apache Solr read-write separation apparatus, further comprising:

and a write cluster deployment module 501, configured to deploy the Apache Solr system to the write cluster by using docker containerization deployment.

A reading cluster deployment module 503, configured to deploy the Apache Solr system to the reading cluster by using docker containerization deployment.

And a snapshot cluster deployment module 505, configured to deploy the Apache Solr system into the snapshot cluster in a single deployment manner.

In one embodiment, as shown in fig. 6, a data synchronization module of an Apache Solr read-write separation apparatus is provided, and the module includes:

a synchronization instruction receiving unit 601, configured to receive a synchronization instruction sent by the persistent program.

The presetting unit 603 is configured to preset a synchronization time according to the synchronization instruction.

And a write data loading unit 605, configured to perform data synchronization according to the synchronization time, and incrementally load the data written in the period of time from the write cluster to an off-stack memory of the read cluster.

For the specific limitations of the Apache Solr read/write separation apparatus, reference may be made to the limitations of the Apache Solr read/write separation method in the foregoing, and details are not repeated here. All modules in the Apache Solr read-write separation device can be completely or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 7. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing data. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement an Apache Solr read-write separation method.

Those skilled in the art will appreciate that the architecture shown in fig. 7 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, there is provided a computer device comprising a memory storing a computer program and a processor implementing the following steps when the processor executes the computer program:

In one embodiment, the processor, when executing the computer program, further performs the steps of:

receiving a synchronization instruction sent by a persistence program;

presetting synchronization time according to the synchronization instruction;

In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of:

In one embodiment, the computer program when executed by the processor further performs the steps of:

receiving a synchronization instruction sent by a persistence program;

presetting synchronization time according to the synchronization instruction;

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A method for realizing Apache Solr read-write separation is characterized by comprising the following steps:

2. The method of claim 1, wherein before said receiving a data write request from a persistence program, writing data into a write cluster and a snapshot cluster according to said data write request, further comprises:

3. The method for implementing Apache Solr read-write separation according to claim 1 or 2, wherein the write cluster receives a write of data, implementing persistent storage of the data;

4. The method of claim 1 or 2, wherein the receiving a synchronization instruction sent by a persistent program, and the incrementally loading an index file from a data directory written into a cluster into an off-heap memory of a read cluster according to the synchronization instruction comprises:

receiving a synchronization instruction sent by a persistence program;

presetting synchronization time according to the synchronization instruction;

and carrying out data synchronization according to the synchronization time, and loading the data written in the synchronization time from the writing cluster increment to an off-stack memory of the reading cluster.

5. The method for implementing Apache Solr read-write separation according to claim 1 or 2, wherein the receiving a data write request of a persistence program, and writing data into a write cluster and a snapshot cluster according to the data write request further comprises:

6. An Apache Solr read-write separation apparatus, the apparatus comprising:

7. The Apache Solr read-write separation apparatus of claim 6, further comprising:

8. The Apache Solr read-write separation apparatus of claim 6 or 7, wherein the data synchronization module comprises:

and the write data loading unit is used for carrying out data synchronization according to the synchronization time and loading the data written in the synchronization time from the write cluster increment to the off-stack memory of the read cluster.

9. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps of the method of any one of claims 1 to 5 when executing the computer program.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 5.