Summary of the invention
The invention provides a kind of data distributing method and system of practicality, to solve in large-scale application system, need the data of data producer production to be quasi real time balancedly distributed to the problem on each data consumer.
A data distribution systems, comprising: data balancing device, be deployed in data distributor on data producer, be deployed in the data sink on data consumer.Described data balancing device carries out the confirmation of existing state to data producer and data consumer transmission survival confirmation, and regularly to the data producer being confirmed, sends new data consumer information to be distributed; After the data consumer information that described data distributor reception data balancing device sends over, can set up new data transmission channel and carry out data transmission; Described data sink sets up with data distributor the reception that new data transmission channel carries out data.
Described data balancing device, data distributor, data sink can be deployed in (more than internal memory 2GB) on general PC server, need to install the above linux operating system of centos5.5 on PC server.
Particularly, data balancing device is the core of data distribution systems, is responsible for relevant planning and the management of Data dissemination.There are several critical functions:
Data producer information and data consumer information exchange are crossed produce.txt (the IP address and the transmitting terminal slogan that mainly comprise data producer) and consume.txt (the IP address and the receiving port number that mainly comprise data consumer) and are kept on data balancing device with the form of file.When data balancing device starts, information in these two files can be loaded in internal memory, then the server node transmission survival confirmation to data producer and data consumer (comprises heartbeat timing parameters simultaneously, data fragmentation timing parameters, disk free space parameter, NUMS_OF_CONSUME parameter etc.), the data producer being confirmed and data consumer are kept at respectively in data producer and current survival list corresponding to data consumer, to not having data producer and the data consumer of response to be kept at respectively in data producer and current unreachable list corresponding to data consumer.Data balancing device can send survival confirmation to current inaccessible server node at set intervals, to ensure that the follow-up server reaching can be found in time and add in corresponding current survival list.
In addition data balancing device is also being safeguarded the current exception list of data producer and data consumer.These extremely mainly comprise that the memory space of appointment reaches that the upper limit, file system are read-only, server node is in non-cluster mode etc. at present.Above-mentioned two current exception list do not comprise the current inaccessible server info not being confirmed in survival confirmation checking, because this part server needs timed sending survival confirmation.
Data balancing device is in running, can monitor in real time produce.txt and two files of consume.txt, once find that file is modified, will again read in it, then in conjunction with current survival list, current unreachable list, this three classes list of current exception list, analyze, to what deleted in file, can from above-mentioned list, delete; To new interpolation, send survival confirmation, what be confirmed be saved in corresponding current survival list, to adding in corresponding current unreachable list of response not.
Data distributor and data sink can regularly send heartbeat message to data balancing device after obtaining survival confirmation, and data balancing device can be by receiving the heartbeat message monitor data producer and data consumer in running.Surpass heartbeat timing and do not receive heartbeat message, can think that server is in inaccessible state, it can be moved on in current unreachable list.If comprised abnormal information (comprise disk free space surpasses that set point, disk are read-only, server be at present non-cluster mode etc.) in heartbeat message, it can have been moved on in current exception list.If abnormal without other in heartbeat message, can be retained in current survival list motionless.
Data balancing device can be regularly (by data fragmentation timing parameters, SECONDS_OF_SENDING configures, unit for second) sends new data consumer information to be distributed to all data producers in current survival list.Data producer can carry out Data dissemination operation according to new data consumer information to be distributed.Each data producer data consumer resource to be distributed is managed and is safeguarded by data balancing device, specifically, by adopting rolling circulation distribution mechanisms to realize to data producer and data consumer, has wherein used NUMS_OF_CONSUME parameter.This parameter is used for being controlled in circulating rolling distribution procedure, the data consumer quantity that separate unit data producer can be distributed at synchronization simultaneously.If NUMS_OF_CONSUME is set to 1, data producer can only a corresponding data consumer at synchronization, if NUMS_OF_CONSUME is set to n, a data producer can be at corresponding n the data consumer of synchronization.Rolling circulation distribution mechanisms is mainly: data balancing device completes the confirmation of current survival list, and the current survival list of data producer and the current survival list of data consumer all have in the situation of value, and data balancing device can generate a data producer and distribute corresponding circular list; When each Data dissemination timing arrives, data balancing device can upgrade circular list, then to data producer, sends the data consumer information after upgrading; At circular list, point to after the tail of current survival list of data consumer, can restart from the head of the current survival list of data consumer.
The data distributor being deployed on data producer is mainly responsible for following work:
After data distributor on data producer starts, can start a dedicated thread and wait for that the survival confirmation of data balancing device arrives, the relevant information of meeting save data equalizer after receiving, and reply confirmation (feedback current state is for normal or abnormal);
Data distributor can regularly send heartbeat message to data balancing device after receiving the survival confirmation of data balancing device, and (the confirmation information content of heartbeat message and reply is identical, the confirmation of replying is the information of replying at once in the survival confirmation arrival of data balancing device, and heartbeat message is the information that data distributor regularly initiatively sends), the heartbeat timing parameters (SECONDS_OF_HEARTBEAT) of timing frequency in the survival confirmation of data balancing device arranges, and unit is a second decision.Whether memory space service condition information, the file system that in heartbeat message, can comprise appointment be read-only, whether server is in the information such as state that manually arrange at present.
In addition data distributor can start special thread and waits for that in real time data balancing device sends new data consumer information to be distributed and comes.Once receive after the new data consumer information to be distributed that data balancing device sends over, (partial data transmitting procedure is atomicity operation can to close original data transmission channel, complete Deng this part transfer of data, complete again shutoff operation), according to new data consumer information to be distributed, to data sink, send and set up the request of data transmission channel and setting up the laggard row transfer of data of new data transmission channel.
The data sink being deployed on data consumer is mainly responsible for following work:
After data sink startup, can start a dedicated thread and wait for that the survival confirmation of data balancing device arrives, the relevant information of meeting save data equalizer after receiving, and reply confirmation;
Data sink regularly sends heartbeat message to data balancing device after receiving the survival confirmation of data balancing device, the heartbeat timing information comprising in the survival confirmation that timing frequency is sent by data balancing device (SECONDS_OF_HEARTBEAT parameter arrange, unit second) determines.Whether memory space service condition information, the file system that in heartbeat message, can comprise appointment be read-only, whether server is in the information such as state that manually arrange at present.
Data sink can start special thread and wait for that in real time the data distributor on data producer sets up the request of data transmission channel.Once receive the request of setting up data transmission channel, start the reception task that a new dedicated thread carries out data.
Data distribution systems can also be by providing the redundancy of data balancing device with auxiliary data equalizer.Auxiliary data equalizer is the standby equalizer that the related service of data balancing device is provided when data balancing device breaks down.At data balancing device under normal circumstances, this equalizer just regularly completes the information simultaneous operation with data balancing device, does not participate in the equalization function of data balancing device.In diagnosis, to after data balancing device fault, can send survival confirmation to data producer and data consumer, and then the related service of data balancing device is provided, state that simultaneously can real-time diagnosis data balancing device.After finding that data balancing device recovers normally, can first to data balancing device, complete information synchronous, then to data balancing device, initiate stand-by state handover request.Data balancing device, after receiving the state handover request of auxiliary data equalizer, can send and agree to switch confirmation to auxiliary data equalizer on the one hand; To data producer and data consumer, send survival confirmation on the other hand, and then switch master state.Auxiliary equalizer switches to stand-by state from master state after receiving agreement switching confirmation.
In addition, in order to meet the partial data producer and partial data consumer, can depart from the data distribution management of data balancing device, in data distribution systems, add non-cluster mode.Be configured to data producer and the data consumer of non-cluster mode, data balancing device can move on to them current exception list from current survival list, no longer it is carried out to resource dissemination.Now be configured to data producer under non-cluster mode and data consumer can according to configure one to one or the distribution mode of one-to-many carry out Data dissemination.
Utilize above-mentioned data distribution systems to carry out a method for Data dissemination, comprise the following steps:
1) data balancing device sends survival confirmation to data producer and data consumer, carries out the state confirmation first after data producer and data consumer start;
2) data balancing device sends new data consumer information to be distributed to the data distributor in the data producer being confirmed;
3) data distributor is received after new data consumer information to be distributed, to the data sink of the data consumer of appointment, initiates set up the request of data transmission channel and set up laggard row transfer of data at new data transmission channel;
4) after the request of setting up data transmission channel that the reception of the data sink in data consumer data distributor sends over, with data distributor, set up new data transmission channel and carry out data receiver.
Data distributing method provided by the invention and system, by introducing rolling circulation distribution mechanisms, in the situation that not changing machines configurations and network presence, while thoroughly having solved the data producer that exists in large-scale application system to data consumer distributing data, often run into the uneven problem of Data dissemination.Data dissemination speed is before without any impact.In addition by introducing non-cluster mode, having solved indivedual data producers need to be to the demand of specific several data consumer distributing datas.
Embodiment
Below in conjunction with two embodiment, invention is described further.Certain concrete case study on implementation described herein is only for explaining the present invention, but not limitation of the invention.
As shown in Figure 1, data producer 1 will carry out Data dissemination to data consumer m to data consumer 1 to data producer n to the general frame figure of data distribution systems of the present invention.By at data producer deploy data distributor, at data consumer deploy data sink, configure two PC servers simultaneously, dispose respectively data balancing device and auxiliary data equalizer in the above, can realize well Data dissemination operation.Data channel is the data transmission link between them.Distribution management channels is the control and management passage between data producer and data consumer and data balancing device and auxiliary data equalizer.These two passages can be a physical link physically.
Embodiment 1
Fig. 2 has provided 4 data producers (with A, B, C, D sign) and 6 data consumers (with 1,2,3,4,5,6 signs) in situation and NUMS_OF_CONSUME is set to 1, SECONDS_OF_HEARTBEAT parameter is that 10 seconds, SECONDS_OF_SENDING parameter are that 300 seconds, disk free space are the Data dissemination process in situation more than 30GB.
When system is disposed, except needs, on 4 data producers and 6 data consumers, distinguish installation data distributor and data sink, also need a number of units according to equalizer (auxiliary data equalizer can be matched).On data producer and data consumer, except disposing program separately, need not carry out other configurations (acquiescence is automatic mode).Whole configuration all completes on data balancing device.
On data balancing device, need to configure IP address and the transmitting terminal slogan of 4 data producers, the IP address of 6 data consumers and receiving port number and heartbeat timing information, data fragmentation transmit timing information, NUMS_OF_CONSUME information etc.After system program deployment and the configuration of data balancing device, can start it.
Start-up course does not have sequencing, advises the first log-on data producer and data consumer, restarts data balancing device, can reduce like this interactive operation of equipment room.
After data balancing device, data producer, data consumer all start, data balancing device can send distribution ground resource for the first time to data producer.In first Data dissemination timeslice, A is to 1 transmission data like this, and B is to 2 transmission data, and C is to 3 transmission data, and D is to 4 transmission data.Data producer and data consumer can regularly be reported the situation of oneself to data balancing device generation heartbeat message during this time.
In first Data dissemination timeslice, after the time, data balancing device can send distribution ground resource for the second time according to the health status result of each node arranging during first Data dissemination timeslice.In second Data dissemination timeslice, A is just to 2 transmission data like this, and B is just to 3 transmission data, and C is just to 4 transmission data, and D is just to 5 transmission data.The data that now each data producer sends to corresponding data consumer can not ensure balanced, and the equilibrium of Data dissemination is rolled and sent guarantee for a long time by one.
After second Data dissemination timeslice arrives, data balancing device can send distribution ground resource for the third time.In the 3rd Data dissemination timeslice, A is just to 3 transmission data like this, and B is just to 4 transmission data, and C is just to 5 transmission data, and D is just to 6 transmission data.So circulation, the distribution procedure of whole data just continues to have gone on.
If 4 data consumers break down in the meantime, data balancing device can not receive the heartbeat message of 4 nodes, therefore it can be moved on in current unreachable list from current survival list, and will no longer be taken into account in the Data dissemination process of next round.If having crossed a period of times 4 node failure has investigated.It can receive the request of the survival information inquiry that the timed sending of data balancing device is come, by its health status judged to data balancing device can determine whether to need to be added in current survival list.
Embodiment 2
Fig. 3 has provided 4 data producers (with A, B, C, D sign) and 6 data consumers (with 1,2,3,4,5,6 signs) in situation and NUMS_OF_CONSUME is set to 2, SECONDS_OF_HEARTBEAT parameter is that 10 seconds, SECONDS_OF_SENDING parameter are that 300 seconds, disk free space are the time Data dissemination process in situation more than 30GB.
System dispose and boot sequence with NUMS_OF_CONSUME be set to 1 o'clock just the same.Unique different be that NUMS_OF_CONSUME is set to 2.
After data balancing device, data producer, data consumer all start, data balancing device can send distribution ground resource for the first time to data producer.Because NUMS_OF_CONSUME arranges for 2, in first Data dissemination timeslice, A is to 1 and 2 transmission data like this, and B is to 3 and 4 transmission data, and C is to 5 and 6 transmission data, and D sends data to 1 and 2.Here the sending mode of one-to-many is a kind of concurrent transmit mechanism when meeting under sending mode one to one that transmission speed can not be caught up with data speed of production.The Data dissemination that the transmit mechanism of the one-to-many here gathers is more balanced.
After first Data dissemination timeslice arrives, data balancing device can send distribution ground resource for the second time.In second Data dissemination timeslice, A is to 3 and 4 transmission data like this, and B is to 5 and 6 transmission data, and C is to 1 and 2 transmission data, and D sends data to 3 and 4.
After second Data dissemination timeslice arrives, data balancing device can send distribution ground resource for the third time.In the 3rd Data dissemination timeslice, A is to 5 and 6 transmission data like this, and B is to 1 and 2 transmission data, and C is to 3 and 4 transmission data, and D sends data to 5 and 6.So circulation, the distribution procedure of whole data just continues to have gone on.
If during this period, there is a special requirement, need A separately to 1 transmission data, other are still according to original both mould-fixed distributing data.At this moment need the data distributor of A to cut off, automatic mode is set to manual mode, deposit data path and the data consumer IP address of configuration simultaneously oneself, 1 data sink need to be cut off, automatic mode is set to manual mode, the deposit data path of configuration simultaneously oneself, then starts A and 1.
Two ways of distribution that the invention is not restricted to provide in embodiment, particularly quantitatively can support in 3,000 at data producer and data consumer.