CN108763572B - Method and device for realizing Apache Solr read-write separation - Google Patents

Method and device for realizing Apache Solr read-write separation Download PDF

Info

Publication number
CN108763572B
CN108763572B CN201810573076.1A CN201810573076A CN108763572B CN 108763572 B CN108763572 B CN 108763572B CN 201810573076 A CN201810573076 A CN 201810573076A CN 108763572 B CN108763572 B CN 108763572B
Authority
CN
China
Prior art keywords
cluster
data
receiving
write
synchronization
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810573076.1A
Other languages
Chinese (zh)
Other versions
CN108763572A (en
Inventor
何小成
王晓斌
黄三伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan Eefung Software Co ltd
Original Assignee
Hunan Eefung Software Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan Eefung Software Co ltd filed Critical Hunan Eefung Software Co ltd
Priority to CN201810573076.1A priority Critical patent/CN108763572B/en
Publication of CN108763572A publication Critical patent/CN108763572A/en
Application granted granted Critical
Publication of CN108763572B publication Critical patent/CN108763572B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45595Network integration; Enabling network access in virtual machine instances

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Retry When Errors Occur (AREA)

Abstract

The application relates to a method and a device for realizing Apache Solr read-write separation, computer equipment and a storage medium. The method comprises the following steps: receiving a data writing request of a persistence program, and writing data into a writing cluster and a snapshot cluster; receiving a search request of an Apache Solr client, and searching data from a reading cluster and a snapshot cluster; receiving a segment merging instruction sent by a persistence program, and executing segment merging operation on data written into a cluster; receiving a synchronous instruction sent by a persistence program, and incrementally loading an index file from a data directory written into a cluster into an off-heap memory of a read cluster; and receiving a data cleaning instruction sent by the persistence program, and cleaning the expired data which is synchronized in the snapshot cluster. By adopting the method, the read operation and the write operation can be separated, the situation of competing system resources is avoided, the normal operation of the server is ensured, the segment merging operation is completed before the data synchronization, and the system crash caused by segment merging during the synchronization is avoided.

Description

Method and device for realizing Apache Solr read-write separation
Technical Field
The present application relates to the field of computing technologies, and in particular, to a method and an apparatus for implementing Apache Solr read-write separation, a computer device, and a storage medium.
Background
With the development of computer technology, an Apache Solr technology appears, which is a powerful enterprise-level search engine, is implemented based on Lucene, supports near-real-time full-text search, has a mechanism of 'segment merging', and when the number of written data segments reaches a set threshold value, triggers 'segment merging', and merges a plurality of segments with smaller data volume into one segment. When the write is performed at high concurrency of hundred million levels of data volume, Apache Solr frequently triggers segment merging, while the Apache Solr has a write lock during the segment merging, so that the real-time write speed is influenced, a large number of disk IO operations exist, and a large number of read operations are generated due to a search request, and the read operations and the write operations occupy system resources, so that the server is crashed.
Disclosure of Invention
Therefore, it is necessary to provide a method, an apparatus, a computer device, and a storage medium for separating Apache Solr read and write operations to ensure normal operation of a server by isolating the read operation and the write operation from each other, in order to solve the problem of server crash caused by the read and write operation competition.
A method for implementing Apache Solr read-write separation, the method comprising:
receiving a data writing request of a persistence program, and writing data into a writing cluster and a snapshot cluster according to the data writing request;
receiving a search request of an Apache Solr client, and searching data from a reading cluster and a snapshot cluster according to the search request;
receiving a segment merging instruction sent by a persistence program, and executing segment merging operation on data written into a cluster according to the segment merging instruction;
receiving the segment merging instruction, executing segment merging operation according to the segment merging instruction, receiving a synchronous instruction sent by a persistence program, and incrementally loading an index file from a data directory written into a cluster into an off-heap memory of a read cluster according to the synchronous instruction;
and after receiving the synchronization instruction and executing synchronization operation according to the synchronization instruction, receiving a data cleaning instruction sent by a persistence program, and cleaning out the synchronized overdue data in the snapshot cluster according to the data cleaning instruction.
In one embodiment, before the receiving a data write request of a persister, writing data into a write cluster and a snapshot cluster according to the data write request, the method further includes:
respectively deploying the Apache Solr system into a write cluster and a read cluster by adopting docker containerization deployment;
and deploying the Apache Solr system into the snapshot cluster in a single deployment mode.
In one embodiment, the write cluster receives a write of data, and implements persistent storage of the data;
the reading cluster provides search service to the outside through the off-heap memory storage index;
the snapshot cluster adopts an off-heap memory storage index, receives data writing and provides search service for the outside.
In one embodiment, the receiving a synchronization instruction sent by a persistent program, and incrementally loading an index file from a data directory written in a cluster to an off-heap memory of a read cluster according to the synchronization instruction includes:
receiving a synchronization instruction sent by a persistence program;
presetting synchronization time according to the synchronization instruction;
and carrying out data synchronization according to the synchronization time, and loading the data written in the period of time from the writing cluster increment to an off-stack memory of the reading cluster.
In one embodiment, the receiving a data write request of a persistent program, and writing data into a write cluster and a snapshot cluster according to the data write request further includes:
and when the data is written, the index data is written into the write cluster and the snapshot cluster.
An Apache Solr read-write separation apparatus, the apparatus comprising:
the data writing module is used for receiving a data writing request of the persistence program and writing data into the writing cluster and the snapshot cluster according to the data writing request;
the data searching module is used for receiving a searching request of an Apache Solr client and searching data from the reading cluster and the snapshot cluster according to the searching request;
the segment merging module is used for receiving a segment merging instruction sent by the persistence program and executing segment merging operation on data written into the cluster according to the segment merging instruction;
the data synchronization module is used for receiving the segment merging instruction, executing segment merging operation according to the segment merging instruction, receiving a synchronization instruction sent by a persistence program, and incrementally loading an index file from a data directory written into the cluster into an off-stack memory of the read cluster according to the synchronization instruction;
and the data cleaning module is used for receiving the synchronization instruction, receiving a data cleaning instruction sent by a persistence program after the synchronization operation is executed according to the synchronization instruction, and cleaning out the synchronized overdue data in the snapshot cluster according to the data cleaning instruction.
In one embodiment, the apparatus further comprises:
the write cluster deployment module is used for deploying the Apache Solr system into the write cluster by adopting docker containerization deployment;
the reading cluster deployment module is used for deploying the Apache Solr system into the reading cluster by adopting docker containerization deployment;
and the snapshot cluster deployment module is used for deploying the Apache Solr system into the snapshot cluster in a single deployment mode.
In one embodiment, the data synchronization module comprises:
the synchronous instruction receiving unit is used for receiving a synchronous instruction sent by the persistence program;
the preset unit is used for presetting the synchronous time according to the synchronous instruction;
and the write data loading unit is used for carrying out data synchronization according to the synchronization time and loading the data written in the period of time from the write cluster increment to the off-stack memory of the read cluster.
A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:
receiving a data writing request of a persistence program, and writing data into a writing cluster and a snapshot cluster according to the data writing request;
receiving a search request of an Apache Solr client, and searching data from a reading cluster and a snapshot cluster according to the search request;
receiving a segment merging instruction sent by a persistence program, and executing segment merging operation on data written into a cluster according to the segment merging instruction;
receiving the segment merging instruction, executing segment merging operation according to the segment merging instruction, receiving a synchronous instruction sent by a persistence program, and incrementally loading an index file from a data directory written into a cluster into an off-heap memory of a read cluster according to the synchronous instruction;
and after receiving the synchronization instruction and executing synchronization operation according to the synchronization instruction, receiving a data cleaning instruction sent by a persistence program, and cleaning out the synchronized overdue data in the snapshot cluster according to the data cleaning instruction.
A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:
receiving a data writing request of a persistence program, and writing data into a writing cluster and a snapshot cluster according to the data writing request;
receiving a search request of an Apache Solr client, and searching data from a reading cluster and a snapshot cluster according to the search request;
receiving a segment merging instruction sent by a persistence program, and executing segment merging operation on data written into a cluster according to the segment merging instruction;
receiving the segment merging instruction, executing segment merging operation according to the segment merging instruction, receiving a synchronous instruction sent by a persistence program, and incrementally loading an index file from a data directory written into a cluster into an off-heap memory of a read cluster according to the synchronous instruction;
and after receiving the synchronization instruction and executing synchronization operation according to the synchronization instruction, receiving a data cleaning instruction sent by a persistence program, and cleaning out the synchronized overdue data in the snapshot cluster according to the data cleaning instruction.
According to the Apache Solr read-write separation method, the Apache Solr read-write separation device, the Apache Solr computer equipment and the storage medium, data are written into the write-in cluster and the snapshot cluster, data are searched from the read cluster and the snapshot cluster, read operation and write operation are separated, the situation of competition of system resources is avoided, normal operation of a server is guaranteed, meanwhile, a persistence program sends a segment merging instruction to the write-in cluster before data synchronization, segment merging operation can be completed before synchronization, system resource bottleneck or system crash caused by segment merging during synchronization is avoided, the read cluster is synchronized once every certain time, the problem that cache is frequently emptied due to frequent writing is avoided, the cache utilization rate of the read cluster is greatly improved, and the search efficiency is improved.
Drawings
FIG. 1 is a schematic flow chart illustrating an embodiment of an Apache Solr read-write separation method;
FIG. 2 is a schematic flow chart illustrating an Apache Solr read-write separation method in accordance with another embodiment;
FIG. 3 is a flowchart illustrating a step of receiving a synchronization instruction sent by a persistent program, and incrementally loading an index file from a data directory of a write cluster to an off-heap memory of a read cluster according to the synchronization instruction in one embodiment;
FIG. 4 is a block diagram of an embodiment of an Apache Solr read-write separation apparatus;
FIG. 5 is a block diagram of an alternative embodiment of an Apache Solr read/write separation apparatus;
FIG. 6 is a block diagram of an Apache Solr read/write separation apparatus in accordance with yet another embodiment;
FIG. 7 is a diagram illustrating an internal structure of a computer device according to an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
In one embodiment, as shown in FIG. 1, there is provided an Apache Solr read-write separation method, comprising the steps of:
s101, receiving a data writing request of a persistence program, and writing data into a writing cluster and a snapshot cluster according to the data writing request.
The "persistent program" refers to a program for implementing data writing, that is, implementing persistent storage after data is written into the Apache Solr cluster.
Apache Solr is an open-source search server, developed by using Java language and mainly realized based on HTTP and Apache Lucene, the resources stored in the Apache Solr are stored by taking Document as an object, each Document is composed of a series of fields, and each Field represents an attribute of the resource. Each Document in the Solr needs to have an attribute that can uniquely identify itself, and the name of this attribute is id by default, and is used in the Schema configuration file: < uniqueKey id </uniqueKey > to describe.
Specifically, the write-in cluster receives write-in of data, persistent storage of the data is achieved, and the snapshot cluster receives the write-in of the data and provides search service for the outside at the same time.
S103, receiving a search request of the Apache Solr client, and searching data from the reading cluster and the snapshot cluster according to the search request.
The main characteristics of Apache Solr include: the method has the advantages of high-efficiency and flexible caching function, vertical searching function, highlighted display of searching results, improvement of usability through index replication, provision of a set of powerful Data Schema to define fields, type and set text analysis, provision of a Web-based management interface and the like.
Specifically, the read cluster provides search services to the outside through the off-heap memory storage index.
The snapshot cluster is separately deployed and adopts an off-stack memory to store indexes, the data writing and reading are completed in the memory, the problem of disk IO performance bottleneck does not exist, and the reading and writing performance is ensured; in addition, the snapshot cluster simultaneously receives data writing and provides search service to the outside, and the consistency of data reading and writing is ensured.
And S105, receiving a segment merging instruction sent by the persistence program, and executing segment merging operation on the data written into the cluster according to the segment merging instruction.
The Apache Solr has a 'segment merging' mechanism, and when the number of written data segments reaches a set threshold, segment merging is triggered, and a plurality of segments with smaller data volume are merged into one segment. During high concurrent writing of hundred million levels of data volume, Apache Solr frequently triggers segment merging, and the Apache Solr has a write lock during segment merging, so that the real-time writing speed is influenced, and a large number of disk IO operations exist.
Specifically, before data synchronization, the persistent program sends a segment merging instruction to the write cluster, so that segment merging operation can be completed before synchronization, and system resource bottleneck or system crash caused by segment merging during synchronization is avoided.
And S107, after receiving the segment merging instruction and executing the segment merging operation according to the segment merging instruction, receiving a synchronization instruction sent by a persistence program, and incrementally loading an index file from a data directory written into the cluster into an off-heap memory of the read cluster according to the synchronization instruction.
Specifically, if some files in the write group have been synchronized in reads: if the file exists in the write group, the file is loaded into the off-heap memory if the file does not exist in the read cluster, if the file does not exist in the write cluster, but exists in the read cluster, the file is deleted from the off-heap memory by the read cluster, and if the file exists in the read cluster, the file is not processed.
And S109, after receiving the synchronization instruction and executing the synchronization operation according to the synchronization instruction, receiving a data cleaning instruction sent by a persistence program, and cleaning the synchronized outdated data in the snapshot cluster according to the data cleaning instruction.
Specifically, data synchronization is performed at regular intervals, newly written data in the period of time is incrementally loaded from the write cluster to the read cluster, and corresponding data in the snapshot cluster is cleaned.
According to the Apache Solr read-write separation method, data are written into a write-in cluster and a snapshot cluster, the data are searched from the read cluster and the snapshot cluster, read operation and write operation are separated, the situation of competition on system resources is avoided, normal operation of a server is guaranteed, meanwhile, before data synchronization is carried out, a persistence program sends a segment merging instruction to the write-in cluster, segment merging operation can be completed before synchronization, the problem that system resources are bottleneck or system crash caused by segment merging during synchronization is avoided, the read cluster is synchronized once at a certain time interval, the problem that cache is frequently emptied due to frequent write-in is avoided, the cache utilization rate of the read cluster is greatly improved, and the search efficiency is improved.
In another embodiment, as shown in fig. 2, there is provided an Apache Solr read-write separation method, further comprising:
s201, adopting docker containerization deployment to deploy the Apache Solr system to the write cluster and the read cluster respectively.
The docker is an open-source application container engine, developers can package applications and dependency packages of the developers into a portable container and then distribute the portable container to any popular Linux machine, virtualization can be achieved, the container completely uses a sandbox mechanism, and no interface exists between the containers.
The docker uses a client-server (C/S) architecture model, uses a remote API to manage and create docker containers, which are created through docker mirroring, with container-to-mirror relationships similar to objects and classes in object-oriented programming.
Specifically, the deployment of the Apache Solr system to the write cluster and the read cluster is respectively realized by adopting docker containerization deployment, the write cluster and the read cluster are isolated, and the problem that the reading performance is sharply reduced due to the rapid surge of the server load caused by IO (input/output) waiting brought by segment merging can be avoided.
S203, the Apache Solr system is deployed to the snapshot cluster in a single deployment mode.
Specifically, the snapshot cluster is deployed independently and stores indexes by using an off-heap memory, and data writing and reading are completed in the memory.
The in-heap and out-heap memories refer to memories allocated when Java.nio.DirectBlueBuffer is created, and the method has the advantages of improving IO efficiency and avoiding copying data from a user state to a kernel state; GC times are reduced, and a large amount of in-heap memory is saved. In a relational database, an index is a single, physical storage structure that orders one or more columns of values in a database table, which is a collection of one or more columns of values in a table and a corresponding list of logical pointers to data pages in the table that physically identify the values. The index is equivalent to the directory of the book, and the required content can be quickly found according to the page number in the directory.
According to the Apache Solr read-write separation method, the write cluster and the read cluster are deployed in a docker container mode, so that the write cluster and the read cluster are isolated from each other on resources, and the problem that the write cluster and the read cluster compete with each other for system resources is solved. The snapshot cluster is separately deployed and adopts an off-stack memory to store indexes, the data writing and reading are completed in the memory, the problem of disk IO performance bottleneck does not exist, and the reading and writing performance is ensured; in addition, the snapshot cluster simultaneously receives data writing and provides search service to the outside, and the consistency of data reading and writing is ensured.
In an embodiment, as shown in fig. 3, a step of receiving a synchronization instruction sent by a persistent program, and incrementally loading an index file from a data directory written in a cluster into an off-heap memory of a read cluster according to the synchronization instruction is provided, where the step includes:
s301, receiving a synchronization instruction sent by the persistence program.
And S303, presetting synchronization time according to the synchronization instruction.
Specifically, the server receives a synchronization instruction sent by the persistence program, analyzes the synchronization instruction, and presets synchronization time according to the synchronization instruction.
S305, carrying out data synchronization according to the synchronization time, and loading the data written in the period of time from the write cluster increment to the off-stack memory of the read cluster.
Before the synchronous operation is executed according to the synchronization, the persistent program sends a segment merging instruction to the write-in cluster, so that the segment merging operation can be completed before the synchronization, and the problem that the system resource bottleneck or the system crash is caused by segment merging during the synchronization is avoided.
Specifically, if some files in the write group have been synchronized in reads: if the file exists in the write group, the file is loaded into the off-heap memory if the file does not exist in the read cluster, if the file does not exist in the write cluster, but exists in the read cluster, the file is deleted from the off-heap memory by the read cluster, and if the file exists in the read cluster, the file is not processed.
It should be understood that although the various steps in the flow charts of fig. 1-3 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 1-3 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternating with other steps or at least some of the sub-steps or stages of other steps.
In one embodiment, as shown in fig. 4, there is provided an Apache Solr read-write separation apparatus, including: a data writing module 401, a data searching module 403, a segment merging module 405, a data synchronizing module 407 and a data cleaning module 409, wherein:
the data writing module 401 is configured to receive a data writing request of a persistent program, and write data into a write cluster and a snapshot cluster according to the data writing request.
The "persistent program" refers to a program for implementing data writing, that is, implementing persistent storage after data is written into the Apache Solr cluster.
Specifically, the write-in cluster receives write-in of data, persistent storage of the data is achieved, and the snapshot cluster receives the write-in of the data and provides search service for the outside at the same time.
And a data searching module 403, configured to receive a search request from an Apache Solr client, and search data from the read cluster and the snapshot cluster according to the search request.
Specifically, the read cluster provides search services to the outside through the off-heap memory storage index.
The snapshot cluster is separately deployed and adopts an off-stack memory to store indexes, the data writing and reading are completed in the memory, the problem of disk IO performance bottleneck does not exist, and the reading and writing performance is ensured; in addition, the snapshot cluster simultaneously receives data writing and provides search service to the outside, and the consistency of data reading and writing is ensured.
The segment merging module 405 is configured to receive a segment merging instruction sent by the persistent program, and perform a segment merging operation on data written in the cluster according to the segment merging instruction.
The Apache Solr has a 'segment merging' mechanism, and when the number of written data segments reaches a set threshold, segment merging is triggered, and a plurality of segments with smaller data volume are merged into one segment. During high concurrent writing of hundred million levels of data volume, Apache Solr frequently triggers segment merging, and the Apache Solr has a write lock during segment merging, so that the real-time writing speed is influenced, and a large number of disk IO operations exist.
Specifically, before data synchronization, the persistent program sends a segment merging instruction to the write cluster, so that segment merging operation can be completed before synchronization, and system resource bottleneck or system crash caused by segment merging during synchronization is avoided.
And the data synchronization module 407 is configured to receive the segment merging instruction, perform a segment merging operation according to the segment merging instruction, receive a synchronization instruction sent by the persistence program, and incrementally load an index file from the data directory written into the cluster into the off-stack memory of the read cluster according to the synchronization instruction.
Specifically, if some files in the write group have been synchronized in reads: if the file exists in the write group, the file is loaded into the off-heap memory if the file does not exist in the read cluster, if the file does not exist in the write cluster, but exists in the read cluster, the file is deleted from the off-heap memory by the read cluster, and if the file exists in the read cluster, the file is not processed.
And the data cleaning module 409 is configured to receive the synchronization instruction, execute synchronization operation according to the synchronization instruction, receive a data cleaning instruction sent by a persistence program, and clean out outdated data synchronized in the snapshot cluster according to the data cleaning instruction.
Specifically, data synchronization is performed at regular intervals, newly written data in the period of time is incrementally loaded from the write cluster to the read cluster, and corresponding data in the snapshot cluster is cleaned.
According to the Apache Solr read-write separation device, data are written into the write-in cluster and the snapshot cluster, the data are searched from the read cluster and the snapshot cluster, the read operation and the write operation are separated, the situation of competition on system resources is avoided, normal operation of a server is guaranteed, meanwhile, before data synchronization is carried out, a persistence program sends a segment merging instruction to the write-in cluster, segment merging operation can be completed before synchronization, the problem that system resource bottlenecks or system collapse is caused by segment merging during synchronization is avoided, the read cluster is synchronized once at a certain time interval, the problem that cache is frequently emptied due to frequent write-in is avoided, the cache utilization rate of the read cluster is greatly improved, and the search efficiency is improved.
In another embodiment, as shown in fig. 5, there is provided an Apache Solr read-write separation apparatus, further comprising:
and a write cluster deployment module 501, configured to deploy the Apache Solr system to the write cluster by using docker containerization deployment.
A reading cluster deployment module 503, configured to deploy the Apache Solr system to the reading cluster by using docker containerization deployment.
Specifically, the deployment of the Apache Solr system to the write cluster and the read cluster is respectively realized by adopting docker containerization deployment, the write cluster and the read cluster are isolated, and the problem that the reading performance is sharply reduced due to the rapid surge of the server load caused by IO (input/output) waiting brought by segment merging can be avoided.
And a snapshot cluster deployment module 505, configured to deploy the Apache Solr system into the snapshot cluster in a single deployment manner.
Specifically, the snapshot cluster is deployed independently and stores indexes by using an off-heap memory, and data writing and reading are completed in the memory.
In one embodiment, as shown in fig. 6, a data synchronization module of an Apache Solr read-write separation apparatus is provided, and the module includes:
a synchronization instruction receiving unit 601, configured to receive a synchronization instruction sent by the persistent program.
The presetting unit 603 is configured to preset a synchronization time according to the synchronization instruction.
Specifically, the server receives a synchronization instruction sent by the persistence program, analyzes the synchronization instruction, and presets synchronization time according to the synchronization instruction.
And a write data loading unit 605, configured to perform data synchronization according to the synchronization time, and incrementally load the data written in the period of time from the write cluster to an off-stack memory of the read cluster.
Before the synchronous operation is executed according to the synchronization, the persistent program sends a segment merging instruction to the write-in cluster, so that the segment merging operation can be completed before the synchronization, and the problem that the system resource bottleneck or the system crash is caused by segment merging during the synchronization is avoided.
Specifically, if some files in the write group have been synchronized in reads: if the file exists in the write group, the file is loaded into the off-heap memory if the file does not exist in the read cluster, if the file does not exist in the write cluster, but exists in the read cluster, the file is deleted from the off-heap memory by the read cluster, and if the file exists in the read cluster, the file is not processed.
For the specific limitations of the Apache Solr read/write separation apparatus, reference may be made to the limitations of the Apache Solr read/write separation method in the foregoing, and details are not repeated here. All modules in the Apache Solr read-write separation device can be completely or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 7. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing data. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement an Apache Solr read-write separation method.
Those skilled in the art will appreciate that the architecture shown in fig. 7 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, there is provided a computer device comprising a memory storing a computer program and a processor implementing the following steps when the processor executes the computer program:
receiving a data writing request of a persistence program, and writing data into a writing cluster and a snapshot cluster according to the data writing request;
receiving a search request of an Apache Solr client, and searching data from a reading cluster and a snapshot cluster according to the search request;
receiving a segment merging instruction sent by a persistence program, and executing segment merging operation on data written into a cluster according to the segment merging instruction;
receiving the segment merging instruction, executing segment merging operation according to the segment merging instruction, receiving a synchronous instruction sent by a persistence program, and incrementally loading an index file from a data directory written into a cluster into an off-heap memory of a read cluster according to the synchronous instruction;
and after receiving the synchronization instruction and executing synchronization operation according to the synchronization instruction, receiving a data cleaning instruction sent by a persistence program, and cleaning out the synchronized overdue data in the snapshot cluster according to the data cleaning instruction.
In one embodiment, the processor, when executing the computer program, further performs the steps of:
respectively deploying the Apache Solr system into a write cluster and a read cluster by adopting docker containerization deployment;
and deploying the Apache Solr system into the snapshot cluster in a single deployment mode.
In one embodiment, the processor, when executing the computer program, further performs the steps of:
receiving a synchronization instruction sent by a persistence program;
presetting synchronization time according to the synchronization instruction;
and carrying out data synchronization according to the synchronization time, and loading the data written in the period of time from the writing cluster increment to an off-stack memory of the reading cluster.
In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of:
receiving a data writing request of a persistence program, and writing data into a writing cluster and a snapshot cluster according to the data writing request;
receiving a search request of an Apache Solr client, and searching data from a reading cluster and a snapshot cluster according to the search request;
receiving a segment merging instruction sent by a persistence program, and executing segment merging operation on data written into a cluster according to the segment merging instruction;
receiving the segment merging instruction, executing segment merging operation according to the segment merging instruction, receiving a synchronous instruction sent by a persistence program, and incrementally loading an index file from a data directory written into a cluster into an off-heap memory of a read cluster according to the synchronous instruction;
and after receiving the synchronization instruction and executing synchronization operation according to the synchronization instruction, receiving a data cleaning instruction sent by a persistence program, and cleaning out the synchronized overdue data in the snapshot cluster according to the data cleaning instruction.
In one embodiment, the computer program when executed by the processor further performs the steps of:
respectively deploying the Apache Solr system into a write cluster and a read cluster by adopting docker containerization deployment;
and deploying the Apache Solr system into the snapshot cluster in a single deployment mode.
In one embodiment, the computer program when executed by the processor further performs the steps of:
receiving a synchronization instruction sent by a persistence program;
presetting synchronization time according to the synchronization instruction;
and carrying out data synchronization according to the synchronization time, and loading the data written in the period of time from the writing cluster increment to an off-stack memory of the reading cluster.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. A method for realizing Apache Solr read-write separation is characterized by comprising the following steps:
receiving a data writing request of a persistence program, and writing data into a writing cluster and a snapshot cluster according to the data writing request;
receiving a search request of an Apache Solr client, and searching data from a reading cluster and a snapshot cluster according to the search request;
receiving a segment merging instruction sent by a persistence program, and executing segment merging operation on data written into a cluster according to the segment merging instruction;
receiving the segment merging instruction, executing segment merging operation according to the segment merging instruction, receiving a synchronous instruction sent by a persistence program, and incrementally loading an index file from a data directory written into a cluster into an off-heap memory of a read cluster according to the synchronous instruction;
and after receiving the synchronization instruction and executing synchronization operation according to the synchronization instruction, receiving a data cleaning instruction sent by a persistence program, and cleaning out the synchronized overdue data in the snapshot cluster according to the data cleaning instruction.
2. The method of claim 1, wherein before said receiving a data write request from a persistence program, writing data into a write cluster and a snapshot cluster according to said data write request, further comprises:
respectively deploying the Apache Solr system into a write cluster and a read cluster by adopting docker containerization deployment;
and deploying the Apache Solr system into the snapshot cluster in a single deployment mode.
3. The method for implementing Apache Solr read-write separation according to claim 1 or 2, wherein the write cluster receives a write of data, implementing persistent storage of the data;
the reading cluster provides search service to the outside through the off-heap memory storage index;
the snapshot cluster adopts an off-heap memory storage index, receives data writing and provides search service for the outside.
4. The method of claim 1 or 2, wherein the receiving a synchronization instruction sent by a persistent program, and the incrementally loading an index file from a data directory written into a cluster into an off-heap memory of a read cluster according to the synchronization instruction comprises:
receiving a synchronization instruction sent by a persistence program;
presetting synchronization time according to the synchronization instruction;
and carrying out data synchronization according to the synchronization time, and loading the data written in the synchronization time from the writing cluster increment to an off-stack memory of the reading cluster.
5. The method for implementing Apache Solr read-write separation according to claim 1 or 2, wherein the receiving a data write request of a persistence program, and writing data into a write cluster and a snapshot cluster according to the data write request further comprises:
and when the data is written, the index data is written into the write cluster and the snapshot cluster.
6. An Apache Solr read-write separation apparatus, the apparatus comprising:
the data writing module is used for receiving a data writing request of the persistence program and writing data into the writing cluster and the snapshot cluster according to the data writing request;
the data searching module is used for receiving a searching request of an Apache Solr client and searching data from the reading cluster and the snapshot cluster according to the searching request;
the segment merging module is used for receiving a segment merging instruction sent by the persistence program and executing segment merging operation on data written into the cluster according to the segment merging instruction;
the data synchronization module is used for receiving the segment merging instruction, executing segment merging operation according to the segment merging instruction, receiving a synchronization instruction sent by a persistence program, and incrementally loading an index file from a data directory written into the cluster into an off-stack memory of the read cluster according to the synchronization instruction;
and the data cleaning module is used for receiving the synchronization instruction, receiving a data cleaning instruction sent by a persistence program after the synchronization operation is executed according to the synchronization instruction, and cleaning out the synchronized overdue data in the snapshot cluster according to the data cleaning instruction.
7. The Apache Solr read-write separation apparatus of claim 6, further comprising:
the write cluster deployment module is used for deploying the Apache Solr system into the write cluster by adopting docker containerization deployment;
the reading cluster deployment module is used for deploying the Apache Solr system into the reading cluster by adopting docker containerization deployment;
and the snapshot cluster deployment module is used for deploying the Apache Solr system into the snapshot cluster in a single deployment mode.
8. The Apache Solr read-write separation apparatus of claim 6 or 7, wherein the data synchronization module comprises:
the synchronous instruction receiving unit is used for receiving a synchronous instruction sent by the persistence program;
the preset unit is used for presetting the synchronous time according to the synchronous instruction;
and the write data loading unit is used for carrying out data synchronization according to the synchronization time and loading the data written in the synchronization time from the write cluster increment to the off-stack memory of the read cluster.
9. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps of the method of any one of claims 1 to 5 when executing the computer program.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 5.
CN201810573076.1A 2018-06-06 2018-06-06 Method and device for realizing Apache Solr read-write separation Active CN108763572B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810573076.1A CN108763572B (en) 2018-06-06 2018-06-06 Method and device for realizing Apache Solr read-write separation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810573076.1A CN108763572B (en) 2018-06-06 2018-06-06 Method and device for realizing Apache Solr read-write separation

Publications (2)

Publication Number Publication Date
CN108763572A CN108763572A (en) 2018-11-06
CN108763572B true CN108763572B (en) 2021-06-22

Family

ID=63999034

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810573076.1A Active CN108763572B (en) 2018-06-06 2018-06-06 Method and device for realizing Apache Solr read-write separation

Country Status (1)

Country Link
CN (1) CN108763572B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111211993B (en) * 2018-11-21 2023-08-11 百度在线网络技术(北京)有限公司 Incremental persistence method, device and storage medium for stream computation
CN109614390A (en) * 2018-12-06 2019-04-12 无锡华云数据技术服务有限公司 Data base read-write separation method, device, service system, equipment and medium
CN112235332B (en) * 2019-07-15 2024-01-16 北京京东尚科信息技术有限公司 Method and device for switching reading and writing of clusters
CN111966529A (en) * 2020-07-14 2020-11-20 上海浩霖汇信息科技有限公司 Method and system for real-time incremental synchronous backup of database files
CN112231148B (en) * 2020-10-23 2022-07-05 北京思特奇信息技术股份有限公司 Distributed cache data offline transmission method and device and readable storage medium
CN112307008B (en) * 2020-12-14 2023-12-08 湖南蚁坊软件股份有限公司 Druid compacting method
CN113626446B (en) * 2021-10-09 2022-09-20 阿里云计算有限公司 Data storage and search method, device, electronic equipment and medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102722449A (en) * 2012-05-24 2012-10-10 中国科学院计算技术研究所 Key-Value local storage method and system based on solid state disk (SSD)
US20130042140A1 (en) * 2011-08-12 2013-02-14 International Business Machines Corporation Technique for improving replication persistance in a caching applicance structure
CN107066527A (en) * 2017-02-24 2017-08-18 湖南蚁坊软件股份有限公司 A kind of method and system of the caching index based on out-pile internal memory
CN107357680A (en) * 2013-12-02 2017-11-17 华为技术有限公司 Data processing equipment and the method for data processing

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130042140A1 (en) * 2011-08-12 2013-02-14 International Business Machines Corporation Technique for improving replication persistance in a caching applicance structure
CN102722449A (en) * 2012-05-24 2012-10-10 中国科学院计算技术研究所 Key-Value local storage method and system based on solid state disk (SSD)
CN107357680A (en) * 2013-12-02 2017-11-17 华为技术有限公司 Data processing equipment and the method for data processing
CN107066527A (en) * 2017-02-24 2017-08-18 湖南蚁坊软件股份有限公司 A kind of method and system of the caching index based on out-pile internal memory

Also Published As

Publication number Publication date
CN108763572A (en) 2018-11-06

Similar Documents

Publication Publication Date Title
CN108763572B (en) Method and device for realizing Apache Solr read-write separation
US10891165B2 (en) Frozen indices
CN109739815B (en) File processing method, system, device, equipment and storage medium
US20180136842A1 (en) Partition metadata for distributed data objects
US9081692B2 (en) Information processing apparatus and method thereof
CN110427258B (en) Resource scheduling control method and device based on cloud platform
CN104254840A (en) Memory dump and analysis in a computer system
CN105843819B (en) Data export method and device
CN108536745B (en) Shell-based data table extraction method, terminal, equipment and storage medium
US10346150B2 (en) Computerized system and method for patching an application by separating executables and working data using different images
CN111324427A (en) Task scheduling method and device based on DSP
US9575827B2 (en) Memory management program, memory management method, and memory management device
CN106919620B (en) Single page processing method and device
WO2015116125A1 (en) File system analysis in user daemon
CN112965939A (en) File merging method, device and equipment
CN108829345B (en) Data processing method of log file and terminal equipment
CN113254223B (en) Resource allocation method and system after system restart and related components
US20230376357A1 (en) Scaling virtualization resource units of applications
US10474512B1 (en) Inter-process intra-application communications
EP3264254A1 (en) System and method for a simulation of a block storage system on an object storage system
US20190034284A1 (en) Sequencing host i/o requests and i/o snapshots
KR102456017B1 (en) Apparatus and method for file sharing between applications
US9384253B1 (en) System and method for multiple-layer data replication in a Linux architecture
US20240241856A1 (en) Method and apparatus for compatibility of file aggregation, computer device and storage medium
US20170147256A1 (en) Memory storage recycling

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant