CN106648897A

CN106648897A - SOLR cluster extension method and system supporting resource balancing

Info

Publication number: CN106648897A
Application number: CN201611234696.XA
Authority: CN
Inventors: 曾超; 温若辉; 赵庸; 林艺滨; 江汉祥
Original assignee: Xiamen Meiya Pico Information Co Ltd
Current assignee: Xiamen Meiya Pico Information Co Ltd
Priority date: 2016-12-28
Filing date: 2016-12-28
Publication date: 2017-05-10
Anticipated expiration: 2036-12-28
Also published as: CN106648897B

Abstract

The invention provides an SOLR cluster extension method and system supporting resource balancing. When new servers are added, new fragments are automatically created according to the increasing situation of the node number and the data volume in the servers, and then it is guaranteed that new data is warehoused to the corresponding servers in a balanced mode. According to the method and system, the servers can be automatically and flexibly added into a cluster according to the system performance and the data volume in the SOLR cluster, then the system can automatically create the fragments and a replication set according to the server performance and balance the data volume to corresponding SOLR Cores, manual fragment adding is not needed, and a local hot spot phenomenon cannot be caused.

Description

A kind of SOLR cluster expansion method and system for supporting balanced resource

Technical field

The present invention relates to field of computer technology, and in particular to a kind of SOLR cluster expansions method for supporting balanced resource and System.

Background technology

With the progress of society, the big data epoch are marched toward, the storage and retrieval of mass data have been applied to Every field.Wherein full-text search belongs to wherein one of common function, the similar inquiry effect for realizing Baidu, Taobao.And SOLR Belong to enterprise-level search application server most used in full-text search, possess feature richness, near real-time retrieval, support cluster The features such as, and belong to the project under Apache, freely increase income.

The Clustering mechanism of SOLR itself is fairly perfect, supports burst (shard) using SOLRCloud components and replicates collection (replication).When system data amount causes to a certain extent greatly server resource not enough, server is typically all increased newly In being added to cluster, pressure is shared.SOLRCloud supports two kinds of fragmentation schemas：CompositeId and implicit. CompositeId is to determine data based on ID calculation hash values to fall in which burst, just must be consolidated when creating collection Determine burst quantity, be not suitable for subsequently increasing burst newly, therefore it is extending transversely to be not suitable for cluster dynamic.And implicit fragmentation schemas Hold and specify piece key when burst is created, the value of piece key is arranged when inserting data data Cun Chudao which bursts determined.Therefore, may be used So that by the operate interface of implicit fragmentation schemas, that realizes artificial or simple system adds burst to new demand servicing device automatically.

In prior art, the implicit fragmentation schemas of SOLRCloud are provided with the interface of dynamic addition burst, therefore one Class method is the such as monthly burst simply by automatic dynamic burst on a time period, sets up a burst every month.At the beginning of the month Automatically one burst of addition on a few server of data volume is selected within first day, for storing the data of next month.Although so Auto plate separation is realized, but there are hot localised points.And every server resource in cluster may all, than Such as there is the resources such as the cluster of the old and new's server hybrid combining, rotating speed, space size, the memory size of the old and new's server disk all May be different, at this moment the quantity of burst cannot mean allocation.

The content of the invention

For this purpose, the present invention propose it is a kind of supports equilibrium resource SOLR cluster expansion method and system, add in new demand servicing device It is fashionable, create new burst automatically according to the growth pattern of the nodes on server and data volume, it is ensured that new data is by equilibrium Put in storage on corresponding server.The SOLR nodes that respective numbers are first installed according to server hardware resource of the invention, system is led to The number of documents crossed in a thread dynamic monitoring cluster, according to circumstances dynamic creation burst.Warehouse-in thread is according to dynamic thread The timeslicing parameters of middle adjustment, the data balancing of new warehouse-in are inserted in new burst.

Concrete scheme is as follows：

A kind of SOLR cluster expansion methods for supporting balanced resource, including step：

S10, according to the hardware resource of server SOLR nodes are installed；

S20, arranges the parameter of SOLR clusters；

S30, the shape of the parameter, current number of documents and current SOLR clusters of the SOLR clusters in SOLR clusters State value dynamic creation burst, the state value of SOLR clusters is updated；

S40, the new document that the state value of the SOLR clusters after being updated according to step S30 will write is inserted into corresponding point Piece, number of documents is updated；

Circulation execution step S10 to S40.

Further, described step S10 is specifically included：CPU, internal memory, disk space and the net of server are obtained respectively Network bandwidth can allow the most SOLR nodes supported, therefrom acquisition minima is server can support that SOLR nodes take Minima, by the quantity of the minima SOLR nodes are installed.

Further, the parameter of described SOLR clusters includes：Name, the i.e. title of set collection, ConfigName, the i.e. configuration name of set collection, serverNodes, i.e. set collection allow to create The node and replicationFactor of Core, i.e. replicator, the dynamic value of described SOLR clusters includes： LiveShardMinIndex, that is, put subscript, liveShardMaxIndex minimum in data storage burst list in storage, that is, put in storage Maximum subscript, liveCoreNumPerNode, the i.e. meansigma methodss of each node distribution Core in data storage burst list, NextShardedDocNum, i.e., next time burst when number of files and sumCoreNum, i.e., the number of Core in all nodes Mesh.

Further, described step S30 is specifically included：

S300, creates first thread, the number of documents of first thread monitoring SOLR clusters and perform S301 and The step of S303 to S311, to realize dynamic creation burst；

S301, reads number of documents set collection total in SOLR clusters, judges whether to be more than NextShardedDocNum, if greater than or be equal to, then it represents that need newly-increased burst, into step S302, if it is less, Into step S313；

S302, to first thread mutual exclusion lock is created；

S303, according to formula addCoreNum=liveCoreNumPerNode*Nodes-sumCoreNum this is calculated The newly-increased Core quantity of secondary plan；

Whether S304, judge addCoreNum less than replicator replicationFactor, if it is less, into Step S305, otherwise into step S307；

S305, adjustment liveCoreNumPerNode values Jia 1；

S306, according to formula addCoreNum=liveCoreNumPerNode*Nodes-sumCoreNum this is calculated The newly-increased Core quantity of secondary plan；

S307, calculates according to formula addShard=addCoreNum/replicationFactor and plans newly-increased dividing Piece number, round numbers；

S308, reads SOLR cluster state values, obtains the SOLR Core quantity installed on each node, Ran Houyu LiveCoreNumPerNode values compare, and difference as allows the most numbers for installing Core；

S309, according to the Core numbers for allowing to install on each node, creates a burst to replicationFactor On Core, the replicationFactor Core is distributed in different nodes, the entitled shardX of burst, wherein X Value is one and is incremented by unduplicated integer, if i-th newly-increased burst of current this wheel, the value of X is liveCoreNumPerNode+i；

S310, judges whether to have created addShard burst, if it is not, then jump to step S309 to continue to create Burst, if yes then enter step S311；

S311, updates SOLR cluster state values, and renewal is persisted to disk, and state value updates as follows：

LiveShardMinIndex=liveShardMaxIndex+1；

LiveShardMaxIndex=liveShardMaxIndex+addShard；

SumCoreNum=sumCoreNum+addCoreNum；

NextShardedDocNum=sumCoreNum*docNumPerCore；

S312, discharges mutual exclusion lock；

S313, first thread dormancy certain hour, detects again after waking up into step S301.

Further, described step S40 is specifically included：

S401, creates the second thread, and to the second thread creation mutual exclusion lock, second thread performs following S402 extremely The step of S404, document is inserted into into corresponding burst, the mutual exclusion lock is the mutual exclusion lock identical created with first thread Mutual exclusion lock；

S402, reads parameter liveShardMinIndex and liveShardMaxIndex of SOLR clusters；

S403, burst is randomly choosed in shardI～shardJ in burst list, and (value of I is The value of liveShardMinIndex, J is liveShardMaxIndex) as the warehouse-in target burst of new document data, arrange Piece key field is the burst title chosen；

S404, submits document to, and new document data is inserted in corresponding burst in a balanced way.

A kind of SOLR cluster expansion systems for supporting balanced resource, including：

Node installation module, for installing SOLR nodes according to the hardware resource of server；

Setup module, for arranging the parameter of SOLR clusters；

Burst creation module, for the parameter of the SOLR clusters in SOLR clusters, current number of documents and current SOLR clusters state value dynamic creation burst, the state value of SOLR clusters is updated；

Data insertion module, the new document for being write according to the state value of the SOLR clusters after renewal is inserted into phase The burst answered, number of documents is updated；

Loop module, for being recycled into node installation module, setup module, burst creation module and data insertion mould Block.

Further, described node installation module is specifically additionally operable to：CPU, internal memory, the disk sky of server are obtained respectively Between and the network bandwidth can allow most SOLR nodes of support, therefrom acquisition minima is server and can support that SOLR is saved Points take minima, and by the quantity of the minima SOLR nodes are installed.

Further, the parameter of the SOLR clusters in the setup module includes：Name, the i.e. name of set collection Title, configName, the i.e. configuration name of set collection, serverNodes, i.e. set collection allows to create The node and replicationFactor of Core, i.e. replicator, the dynamic value of described SOLR clusters includes： LiveShardMinIndex, that is, put subscript, liveShardMaxIndex minimum in data storage burst list in storage, that is, put in storage Maximum subscript, liveCoreNumPerNode, the i.e. meansigma methodss of each node distribution Core in data storage burst list, NextShardedDocNum, i.e., next time burst when number of files and sumCoreNum, i.e., the number of Core in all nodes Mesh.

Further, described burst creation module is specifically additionally operable to perform following steps：

S302, to first thread mutual exclusion lock is created；

S305, adjustment liveCoreNumPerNode values Jia 1；

LiveShardMinIndex=liveShardMaxIndex+1；

LiveShardMaxIndex=liveShardMaxIndex+addShard；

SumCoreNum=sumCoreNum+addCoreNum；

NextShardedDocNum=sumCoreNum*docNumPerCore；

S312, discharges mutual exclusion lock；

Further, described Data insertion module is specifically additionally operable to perform following steps：

Beneficial effect of the present invention：1) corresponding proportion can be stored to the performance load equilibrium in SOLR clusters according to server Data volume, and support that according to target data volume automatically creates burst, extends cluster.Avoid a fragment data amount too big, also avoid There is the problem of hot localised points when inserting in new data, while hold a concurrent post new server add after cluster, can automatic identification it is simultaneously right SOLR Core numbers and number of files do load balancing；

2) propose a kind of dynamic model, with a thread to monitor SOLR in document data amount growth pattern, certain In the case of dynamic create new Core according to nodes on server, and allow follow-up new insertion data distribution to new Core In, realize the Dynamic Program Slicing and load balancing of SOLR clusters.

Description of the drawings

Fig. 1 is the flow chart of the dynamic creation burst of one embodiment of the invention；

Fig. 2 is the flow chart that document is inserted into corresponding burst of one embodiment of the invention.

Specific embodiment

To further illustrate each embodiment, the present invention is provided with accompanying drawing.These accompanying drawings are the invention discloses one of content Point, it can coordinate the associated description of description to explain the operation principles of embodiment mainly to illustrate embodiment.Coordinate ginseng These contents are examined, those of ordinary skill in the art will be understood that other possible embodiments and advantages of the present invention.Now tie The present invention is further described to close the drawings and specific embodiments.

The SOLR cluster expansion methods of the support equilibrium resource of one embodiment of the invention specifically include following steps：

1. SOLR nodes are installed according to server hardware resource：

Corresponding data are physically stored with node (node) in SOLR clusters, will abundant profit on every server With resource, the nodes of most multipotency installation must be first evaluated.For server resource, maximum supporting node quantity is relied primarily on In the hardware device performance such as CPU, internal memory, disk space, network bandwidth, the dependence for each first rule of thumb provides one Estimation function is used as computing formula.The estimation function of such as internal memory, can first draw memory size Sall total on server, so The memory source Sother that operating system and other non-SOLR applications need is deducted afterwards, and thus drawing can distribute to SOLR nodes Maximum memory source SSOLR=Sall-Sother.The memory source Snode needed divided by each node just can be evaluated The most SOLR nodes supported are allowed from memory source, the calculating of the estimation function fmem of memory headroom supporting node number is public Formula is as follows：

Fmem=(Sall-Sother)/Snode

Providing CPU, disk, network these equipment respectively according to the situation and service application of concrete system in the same manner can support Estimation function fcpu, fdisk, fnet of SOLR nodes.Final server can support that SOLR nodes take minima fserver =Min (fdisk, fcpu, fmem, fnet).SOLR nodes are installed according to estimate amount, and are added in SOLRCloud, give tacit consent to First burst is not created.

2. the basic parameter of dynamic cluster is set：

In the dynamic cluster of SOLR, some basic parameters must be first set, such as which collection (name) is existed On which node (serverNodes), which configuration (configName) burst is created with, creating burst there are several nodes (replicationFactor).The interval numbering of some state values of its secondary record dynamic cluster, such as current slice numbering (liveShardMinIndex～liveShardMaxIndex), the meansigma methodss of current each node distribution Core, newly next time Increase number of files during burst, at present the number of the Core in all interdependent nodes.These parameters and dynamic value are with data base or XML It is persisted in disk etc. form.

3. burst and adjustment warehouse-in strategy are created according to dynamic state of parameters, support dynamic cluster：

A thread is first created, timing detection number of files, according to number of documents and systematic parameter dynamic creation burst, is such as schemed Shown in 1, main handling process is as follows：

The first step：The total number of documents of collection is read from SOLRCloud, judges whether to be more than NextShardedDocNum, if greater than or equal to then representing need newly-increased burst, into step 2.If less than then entering Step 13；

Second step：In order to prevent warehouse-in thread from reading incomplete data, mutual exclusion lock is added；

3rd step：Calculated according to formula addCoreNum=liveCoreNumPerNode*Nodes-sumCoreNum The newly-increased Core quantity of current plan；

4th step：AddCoreNum is judged whether less than replicator replicationFactor, if less than then entering Step 5, otherwise into step 7；

5th step：Adjustment liveCoreNumPerNode values Jia 1；

6th step：Calculated according to formula addCoreNum=liveCoreNumPerNode*Nodes-sumCoreNum The newly-increased Core quantity of current plan；

7th step：Calculate what plan was increased newly according to formula addShard=addCoreNum/replicationFactor Burst number, round numbers；

8th step：Cluster state is read from zookeeper, the SOLR Core numbers installed on each node are obtained Amount, then compares with liveCoreNumPerNode values, and difference is exactly the most numbers for allowing to install Core；

9th step：According to the Core numbers for allowing to install on each node, a burst is created to replicationFactor On individual Core, in order to provide disaster tolerance, this replicationFactor Core is distributed in different nodes as far as possible.Burst Entitled shardX, the wherein value of X is one and is incremented by unduplicated integer, such as i-th newly-increased burst of current this wheel, then The value of X is liveCoreNumPerNode+i；

Tenth step：Judge whether to have created addShard burst, if otherwise jump to step 9 to continue to create Burst；If yes then enter step 11；

11st step：The state value in dynamic point storehouse is updated, and renewal is persisted to disk.

LiveShardMinIndex=liveShardMaxIndex+1；

LiveShardMaxIndex=liveShardMaxIndex+addShard；

SumCoreNum=sumCoreNum+addCoreNum；

NextShardedDocNum=sumCoreNum*docNumPerCore；

12nd step：Release mutual exclusion lock, it is allowed to continue to put in storage；

13rd step：Thread dormancy certain hour, detects again after waking up into step one；

It should be noted that：If new server adds cluster, first manually installed SOLR nodes and configuring are added to In cluster, while changing serverNodes parameter values.New demand servicing device will not be created toward on new server at once after adding Core, in order that data distribution is more balanced, but triggers, because new demand servicing until next round needs to create burst again Node on device does not create Core, then most Core can be created in new server in a new wheel, so handle Core equiblibrium mass distributions are on corresponding node.If new server adds cluster, the 4th step to jump directly in judging 7th step, otherwise just from the 4th step order go to the 7th step.

4. data parsing warehouse-in thread is inserted into corresponding Core according to the state value in dynamic point storehouse new data equilibrium In, as shown in Fig. 2 main handling process is as follows：

The first step：In order to prevent reading incomplete data, first add mutual exclusion lock, be with the mutual exclusion lock of the thread in dynamic point storehouse It is same；

Second step：Read parameter liveShardMinIndex and liveShardMaxIndex of Dynamic Program Slicing；

3rd step：Burst is randomly choosed in shardI～shardJ in burst list, and (value of I is The value of liveShardMinIndex, J is liveShardMaxIndex) as the warehouse-in target burst of new data, piece key is set Field is the burst title chosen；

4th step：Submit document to, thus new data is inserted in corresponding burst in a balanced way, while eliminating local Focus.

In other one embodiment, the present invention proposes a kind of SOLR cluster expansion systems for supporting balanced resource, wraps Include：

Setup module, for arranging the parameter of SOLR clusters；

S302, to first thread mutual exclusion lock is created；

S305, adjustment liveCoreNumPerNode values Jia 1；

LiveShardMinIndex=liveShardMaxIndex+1；

LiveShardMaxIndex=liveShardMaxIndex+addShard；

SumCoreNum=sumCoreNum+addCoreNum；

NextShardedDocNum=sumCoreNum*docNumPerCore；

S312, discharges mutual exclusion lock；

Can automatically in the cluster of SOLR, flexibly according to systematic function and data volume by above-mentioned method and system To in cluster, then system can create burst and replicate collection addition server automatically according to server performance, and equalization data amount is arrived In corresponding SOLR Core, without the need for manually adding burst, the phenomenon of hot localised points is not resulted in yet.

Although specifically showing and describing the present invention with reference to preferred embodiment, those skilled in the art should be bright In vain, in the spirit and scope of the present invention limited without departing from appended claims, in the form and details can be right The present invention makes a variety of changes, and is protection scope of the present invention.

Claims

1. a kind of SOLR cluster expansion methods for supporting balanced resource, it is characterised in that including step：

S10, according to the hardware resource of server SOLR nodes are installed；

S20, arranges the parameter of SOLR clusters；

S30, the state value of the parameter, current number of documents and current SOLR clusters of the SOLR clusters in SOLR clusters Dynamic creation burst, the state value of SOLR clusters is updated；

S40, the new document that the state value of the SOLR clusters after being updated according to step S30 will write is inserted into corresponding burst, Number of documents is updated；

Circulation execution step S10 to S40.

2. a kind of SOLR cluster expansion methods for supporting balanced resource according to claim 1, it is characterised in that described Step S10 is specifically included：Obtaining CPU, internal memory, disk space and the network bandwidth of server respectively can allow the most of support SOLR nodes, therefrom acquisition minima is server can support that SOLR nodes take minima, by the quantity of the minima SOLR nodes are installed.

3. a kind of SOLR cluster expansion methods for supporting balanced resource according to claim 1, it is characterised in that described The parameter of SOLR clusters includes：Name's, the i.e. title of set collection, configName, i.e. set collection Configuration name, serverNodes, i.e. set collection allow the node and replicationFactor for creating Core, That is replicator, the state value of described SOLR clusters includes：LiveShardMinIndex, that is, put data storage burst row in storage Minimum subscript, liveShardMaxIndex in table, that is, put in storage subscript maximum in data storage burst list, The meansigma methodss of liveCoreNumPerNode, i.e. each node distribution Core, nextShardedDocNum, i.e. burst next time When number of files and sumCoreNum, i.e., the number of Core in all nodes.

4. a kind of SOLR cluster expansion methods for supporting balanced resource according to claim 3, it is characterised in that described Step S30 is specifically included：

S300, creates first thread, and the number of documents of the first thread monitoring SOLR clusters simultaneously performs S301 and S303 extremely The step of S311, to realize dynamic creation burst；

S302, to first thread mutual exclusion lock is created；

S303, according to formula addCoreNum=liveCoreNumPerNode*Nodes-sumCoreNum this time meter is calculated Draw newly-increased Core quantity；

S305, adjustment liveCoreNumPerNode values Jia 1；

S306, according to formula addCoreNum=liveCoreNumPerNode*Nodes-sumCoreNum this time meter is calculated Draw newly-increased Core quantity；

S307, according to formula addShard=addCoreNum/replicationFactor the newly-increased burst of plan is calculated Number, round numbers；

S309, according to the Core numbers for allowing to install on each node, creates a burst to replicationFactor Core On, the replicationFactor Core is distributed in different nodes, the entitled shardX of burst, the wherein value of X It is incremented by unduplicated integer for one, if i-th newly-increased burst of current this wheel, the value of X is liveCoreNumPerNode+ i；

S310, judges whether to have created addShard burst, divides if it is not, then jumping to step S309 and continuing to create Piece, if yes then enter step S311；

LiveShardMinIndex=liveShardMaxIndex+1；

LiveShardMaxIndex=liveShardMaxIndex+addShard；

SumCoreNum=sumCoreNum+addCoreNum；

NextShardedDocNum=sumCoreNum*docNumPerCore；

S312, discharges mutual exclusion lock；

5. a kind of SOLR cluster expansion methods for supporting balanced resource according to claim 4, it is characterised in that described Step S40 is specifically included：

S401, creates the second thread, and to the second thread creation mutual exclusion lock, second thread performs following S402's to S404 Step, by document corresponding burst is inserted into, and the mutual exclusion lock is the mutual exclusion lock identical mutual exclusion lock created with first thread；

6. a kind of SOLR cluster expansion systems for supporting balanced resource, it is characterised in that include：

Setup module, for arranging the parameter of SOLR clusters；

Burst creation module, for the parameter of the SOLR clusters in SOLR clusters, current number of documents and current The state value dynamic creation burst of SOLR clusters, the state value of SOLR clusters is updated；

Data insertion module, the new document for being write according to the state value of the SOLR clusters after renewal is inserted into accordingly Burst, number of documents is updated；

Loop module, for being recycled into node installation module, setup module, burst creation module and Data insertion module.

7. a kind of SOLR cluster expansion systems for supporting balanced resource according to claim 6, it is characterised in that described Node installation module is specifically additionally operable to：Obtaining CPU, internal memory, disk space and the network bandwidth of server respectively can allow to prop up The most SOLR nodes held, therefrom acquisition minima is server can support that SOLR nodes take minima, by the minimum The quantity of value installs SOLR nodes.

8. a kind of SOLR cluster expansion systems for supporting balanced resource according to claim 6, it is characterised in that described to set Putting the parameter of the SOLR clusters in module includes：Name, the i.e. title of set collection, configName, that is, gather The configuration name of collection, serverNodes, i.e. set collection allow create Core node and ReplicationFactor, i.e. replicator, the state value of described SOLR clusters includes：LiveShardMinIndex, i.e., Minimum subscript, liveShardMaxIndex in warehouse-in data storage burst list, that is, put in storage in data storage burst list most Big subscript, liveCoreNumPerNode, the i.e. meansigma methodss of each node distribution Core, nextShardedDocNum, i.e., under Number of files and sumCoreNum during burst, i.e., the number of Core in all nodes.

9. a kind of SOLR cluster expansion systems for supporting balanced resource according to claim 8, it is characterised in that described Burst creation module is specifically additionally operable to perform following steps：

S302, to first thread mutual exclusion lock is created；

S305, adjustment liveCoreNumPerNode values Jia 1；

LiveShardMinIndex=liveShardMaxIndex+1；

LiveShardMaxIndex=liveShardMaxIndex+addShard；

SumCoreNum=sumCoreNum+addCoreNum；

NextShardedDocNum=sumCoreNum*docNumPerCore；

S312, discharges mutual exclusion lock；

10. a kind of SOLR cluster expansion systems for supporting balanced resource according to claim 9, it is characterised in that described Data insertion module be specifically additionally operable to perform following steps：