CN108009019A

CN108009019A - Method, client and the distributed computing system of distributed data located instance

Info

Publication number: CN108009019A
Application number: CN201610964886.0A
Authority: CN
Inventors: 刘成彦; 李瑜婷; 刘华明
Original assignee: Wangsu Science and Technology Co Ltd
Current assignee: Wangsu Science and Technology Co Ltd
Priority date: 2016-10-29
Filing date: 2016-10-29
Publication date: 2018-05-08
Anticipated expiration: 2036-10-29
Also published as: CN108009019B

Abstract

The invention discloses a kind of method, client and the distributed computing system of distributed data located instance, distributed computing system includes some clients and an at least server-side, the described method includes：Destination client obtains pending data；If it is stem portion by the pending data cutting；Obtain the serializing result of each part；The serializing result of all parts is combined as to the serializing result of the pending data；The serializing result of the pending data is subjected to Hash calculation, so as to specify the processing example of the pending data.The present invention can ensure data transfer in streaming computing to the accuracy of server-side, while greatly reduce the time of Data Serialization consumption, the time consumed when positioning redis examples be further reduced, so as to improve the speed of service of system.

Description

Method, client and the distributed computing system of distributed data located instance

Technical field

The present embodiments relate to Internet technical field, more particularly to a kind of method of distributed data located instance, Client and distributed computing system.

Background technology

Developing rapidly while data scale explosive growth is brought for Internet industry, also makes big data show more Distinct streaming feature is sent out, traditional batch mode is difficult to meet requirement of the streaming big data processing for calculating real-time, Therefore, highly efficient distributed computing system is more and more widely used.

In the business procession of streaming computing, system using redis (high performance key-value databases, i.e., Key-value database) calculating of data is carried out, and data are temporarily stored in redis.Calculated since the data of magnanimity enter, Need to use multiple redis examples at the same time in calculating process.In calculating process, client produces redis according to business demand Key, calculates the corresponding redis examples of redis key, and the redis where corresponding order finally is sent to redis key is real Example processing.Need to ensure in this process, the identical redis key produced during streaming computing need to enter identical Redis examples in, result data otherwise can be caused inaccurate.In calculating process, identical redis may be repeatedly produced Key, moreover, this will result in the need for that the redis examples corresponding to positioning key are repeated several times.

But in existing technology, feasible mode is not provided with and determines that the redis for handling identical redis key is real Example, and in calculating process during positioning redis examples, when handling identical redis key, system needs to expend vast resources (such as：Cpu, time etc.) repetitive sequence, so as to cause the efficiency for positioning redis examples to reduce, influence the speed of service of system.

The content of the invention

The technical problem to be solved in the present invention is in order to overcome in the prior art distributed computing system processing can not be provided The method of identical redis key, so that the defects of causing system consumption vast resources, there is provided one kind optimization generation redis key So as to fast and accurately position method, the client of the distributed data located instance of the redis examples where redis key And distributed computing system.

The present invention is to solve above-mentioned technical problem by following technical proposals：

A kind of method of distributed data located instance, for distributed computing system, the distributed computing system bag Some clients and at least a server-side are included, its feature is, the described method includes：

Destination client obtains pending data；

If it is stem portion by the pending data cutting；

Obtain the serializing result of each part；

The serializing result of all parts is combined as to the serializing result of the pending data；

The serializing result of the pending data is subjected to Hash calculation, so as to specify the processing of the pending data Example.

In existing streaming computing, many identical redis key can be produced, the difference between these redis key may Simply some character strings.If each identical redis key are handled with identical processing mode, system can do many The work repeated, does not simply fail to quickly handle identical redis key, and can take the vast resources of system.

Redis key are divided into several parts by the present invention, and judge that the part whether there is in the buffer, if In the presence of then directly invoke caching in result of calculation, if there is no then to there is no part carry out serializing processing and will place Result after reason is stored in caching, and facility is provided to handle the part next time.The present invention can effectively improve system fortune Scanning frequency degree.

It is preferred that the serializing result step for obtaining each part includes：

For each part, judge whether the size of data of the part is more than preset value；

If so,

Then judge whether the serializing result of the part is present in internal memory cache region,

If obtaining the serializing result in internal memory cache region if being present in internal memory cache region；

If not serialized there are internal memory cache region to the part, obtain it and serialize result and by the sequence Rowization result is stored to the internal memory cache region；

If it is not,

Then the part is serialized, it is obtained and serializes result.

Further, utilization of the present invention to caching more optimizes.Only sufficiently large part is just buffered, so as to save The resource of system.For each part, when the size of the part is more than a preset value, will partly be stored in caching, Calculating to the part can directly from caching called data, accelerate the serializing processing speed to the part.Work as institute When stating the size of part and being less than the preset value, directly calculate the part obtain the serializing result speed return be more than from The speed of serializing result is read in caching, therefore buffer memory need not be recycled less than the part of preset value.The present invention can It is rational to utilize caching, further improve the speed of Data Serialization.

Each part is serialized, and obtains it and serializes result.

It is preferred that if the method that the pending data cutting is stem portion is included：

The pending data is subjected to cutting according to field and/or field combination.

The interrecord structure of the redis key can be divided into some fields, such as be recorded with the structure of " title _ time _ address " Redis key, are divided into and represent title, time and 3 portions of address by redis key by underscore according to different implications Point, dividing mode of the invention is convenient and efficient, further increases the speed of service of distributed computing system.

It is preferred that the value range of the preset value is 128 bytes to 512 bytes.

It is preferred that judging whether the size of the serializing result of the pending data is greater than or equal to predetermined threshold value；

If then select murmur_128 Hash strategies；

If otherwise select fnv1a Hash strategies.

It is preferred that the predetermined threshold value is 432byte.

It is preferred that the server-side is redis server-sides, the pending data is redis key, the method bag Include：

If redis key are divided stem portion by the client according to preset condition in redis key, redis is recorded Order of the field in key；

The serializing result of all parts is according to the serializing result that the sequential combination is the redis key.

Record field order when dividing redis key, serializes each portion after result according to the order of the field obtaining Divide and reconfigure, it is ensured that the order inconvenience of various pieces, and server-side is transferred in the correct order, the transmission of data is more Accurately.

It is preferred that the described method includes：

Calculate the cryptographic Hash of the serializing result of the redis key obtained；

The correspondence of redis key and redis examples are obtained according to the cryptographic Hash；

The corresponding redis orders of redis key are sent to by redis examples according to the correspondence.

The present invention also provides a kind of client, and for distributed computing system, the distributed computing system includes some Client and at least server-side, its feature are that the client includes acquisition module, cutting module, computing module, combination Module and computing module,

The acquisition module is used to obtain pending data；

If it is stem portion that the cutting module, which is used for the pending data cutting,；

The computing module is used for the serializing result for obtaining each part；

The composite module is used for the sequence that the serializing result of all parts is combined as to the pending data Change result；

The computing module is used to the serializing result of the pending data carrying out Hash calculation, so as to specify described The processing example of pending data.

It is preferred that the client further include the first judgment module, the second judgment module, read module, memory module with And processing module,

First judgment module is used to judge whether the size of data of a part is more than preset value, if then calling the Two judgment modules, if otherwise calling the processing module；

Second judgment module is used to judge whether the serializing result of the part is present in internal memory cache region, if The read module is then called, if otherwise calling the memory module；

The processing module is used to serialize the part；

The read module is used to obtain the serializing result in internal memory cache region；

The memory module is used to serialize the part, obtains it and serializes result and the serializing result Store to the internal memory cache region.

The present invention provides a kind of distributed computing system again, its feature is, the distributed computing system includes some Client and an at least server-side as described above.

On the basis of common knowledge of the art, above-mentioned each optimum condition, can be combined, each preferably real up to the present invention Example.

The positive effect of the present invention is：

Ensure the one-to-one corresponding of redis key and redis examples.The combination producing of result is serialized according to redis key Cryptographic Hash, as the redis examples where key is accurately positioned in cryptographic Hash；

Improve system running speed.When generating redis key, the part of each composition key of definition composition.Serializing When, each will partly it separate, and larger serializing result cache is got up.The identical key being again introduced into can utilize slow Deposit result positioning redis examples.The time consumed when positioning redis examples is greatly reduced, so as to improve the operation of system Speed.

Brief description of the drawings

In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing There is attached drawing needed in technology description to be briefly described, it should be apparent that, drawings in the following description are this hair Some bright embodiments, for those of ordinary skills, without creative efforts, can be with root Other attached drawings are obtained according to these attached drawings.

Fig. 1 is the structure diagram of the distributed computing system of the embodiment of the present invention 1.

Fig. 2 is the flow chart of the method for the distributed data located instance of the embodiment of the present invention 1.

Embodiment

To make the purpose, technical scheme and advantage of the embodiment of the present invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, the technical solution in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is Part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art All other embodiments obtained without creative efforts, belong to the scope of protection of the invention.

Embodiment 1

Referring to Fig. 1, the present embodiment provides a kind of distributed computing system 1, the distributed computing system includes 5 clients It is redis server-sides to hold 11,1 server-sides 12 and 1 database 13, the server-side, if the redis server-sides include Dry redis examples, the database is hbase.

For any one client in 5 clients, the client 11 includes computing module 110, obtains Module 111, cutting module 112, the first judgment module 113, the second judgment module 114, computing module 115, read module 116, Memory module 117, processing module 118 and composite module 119.

The acquisition module is used to obtain pending data.The pending data is redis key.

If the cutting module is used to the pending data being divided into stem portion.

First judgment module is used to judge whether the byte of a part is more than 256byte, if then calling Second judgment module, if otherwise calling the processing module.

256 oneself be preset value, its value range is 128 bytes to 512 bytes.

In order to arithmetic speed is further improved, just only sufficiently large part is buffered, so as to save the resource of system. For each part, when the size of the part is more than preset value, will partly be stored in caching, to the meter of the part Calculate can directly from caching called data, accelerate the serializing processing speed to the part.It is big slight when the part When the preset value, directly calculating the part and obtaining the speed of the serializing result can be more than sequence is read from caching Change the speed of result, therefore buffer memory need not be recycled less than the part of preset value.The present invention can reasonably utilize caching, Further improve the speed of Data Serialization.

The processing module is used to serialize the part.

Second judgment module is used to judge whether the serializing result of the part is present in caching, if then adjusting With the read module, if otherwise calling the memory module.

The read module is used to obtain the serializing result in caching.

The computing module is used for the serializing result for obtaining each part

Composite module described in the composite module is used to be combined as described waiting to locate by the serializing result of all parts Manage the serializing result of data.

Referring to Fig. 2, using above-mentioned distributed computing system, the present embodiment can realize that a kind of distributed data positioning is real The method of example：

Step 100, client obtain pending data, and the pending data is redis key.

If step 101, the client are stem portion according to field or field combination cutting redis key, redis is recorded Order of the field in key.

Step 102, for each part, judge whether the byte of the part is more than 256 bytes, if so then execute Step 103, if otherwise performing step 106.

The preset value of the present embodiment takes 256 byte of optimal value, and the preset value can also be in 128 byte of value range extremely Chosen between 512 bytes.By calculating, testing, serializing is read as a result, it is possible to have in the part more than 256 bytes from caching Imitate the resource of saving system.When the size of the part is less than the preset value, directly calculates the part and obtain the sequence The speed of rowization result returns the speed for being more than and serializing result being read from caching, therefore it is pre- that buffer memory need not be recycled to be less than If the part of value.The present embodiment can further improve the speed of Data Serialization reasonably using caching.

Step 103, judge whether the serializing result of the part is present in caching, if so then execute step 104, if Otherwise step 105 is performed.

Step 104, obtain the serializing in caching as a result, then performing step 107.

Step 105, serialize the part and delayed with obtaining serializing result and storing serializing result to described In depositing, step 107 is then performed.

Step 106, serialize the part.

Step 107, the serializing result of all parts are according to the serializing knot that the sequential combination is the redis key Fruit.

Step 108, judge whether the size of serializing result of the pending data is greater than or equal to 432 bytes, if It is then to perform step 109, if otherwise performing step 110.

Step 109, select murmur_128 Hash strategy that the serializing result of the pending data is carried out Hash meter Calculate, so that the processing example of the pending data after designated treatment, then terminates flow.

Step 110, select fnv1a Hash strategy that the serializing result of the pending data is carried out Hash calculation, from And the processing example of the pending data after designated treatment.

During the redis key of the present embodiment processing " name_address_time ", by redis key according to each field Meaning is divided into three parts, is name parts, address parts and time parts respectively, and the order of each field is fixed, The serializing result that three parts obtain after the completion of handling respectively further according to order arrangement various pieces.

The serializing result of the redis key is consistent with redis key, it is ensured that is sending redis key sequences The accuracy of data when changing result.

Pending data after processing is obtained to the redis server-sides of corresponding redis examples from redis server-sides Intermediate result is stored in the database hbase by intermediate result, the streaming computing system.

Method, client and the distributed computing system of the distributed data located instance of the present embodiment can ensure The one-to-one corresponding of redis key and redis examples.The combination producing cryptographic Hash of result is serialized according to redis key, by Hash The redis examples where key are accurately positioned in value.Especially, it is possible to increase system running speed, it is fixed when generating redis key The part of each composition key of justice composition.In serializing, each will partly separate, and part length is more than 256 byte's Serializing result cache gets up.The identical key being again introduced into can utilize buffered results positioning redis examples.Greatly reduce Time for being consumed when positioning redis examples, so as to improve the speed of service of system.

Device embodiment described above is only schematical, wherein the unit illustrated as separating component can To be or may not be physically separate, physics list is may or may not be as the component that unit is shown Member, you can with positioned at a place, or can also be distributed in multiple network unit.It can be selected according to the actual needs In some or all of module realize the purpose of this embodiment scheme.Those of ordinary skill in the art are not paying creativeness Work in the case of, you can to understand and implement.

Through the above description of the embodiments, those skilled in the art can be understood that each embodiment can Realized by the mode of software plus required general hardware platform, naturally it is also possible to pass through hardware.Based on such understanding, on The part that technical solution substantially in other words contributes to the prior art is stated to embody in the form of software product, should Computer software product can store in a computer-readable storage medium, such as ROM/RAM, magnetic disc, CD, including some fingers Order is used so that a computer equipment (can be personal computer, server, or network equipment etc.) performs each implementation Method described in some parts of example or embodiment.

Finally it should be noted that：The above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations；Although The present invention is described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that：It still may be used To modify to the technical solution described in foregoing embodiments, or equivalent substitution is carried out to which part technical characteristic； And these modification or replace, do not make appropriate technical solution essence depart from various embodiments of the present invention technical solution spirit and Scope.

Claims

1. a kind of method of distributed data located instance, for distributed computing system, the distributed computing system includes Some clients and at least a server-side, it is characterised in that the described method includes：

Destination client obtains pending data；

If it is stem portion by the pending data cutting；

Obtain the serializing result of each part；

The serializing result of the pending data is subjected to Hash calculation, so as to specify the processing of the pending data real Example.

2. the method for distributed data located instance as claimed in claim 1, it is characterised in that each part of acquisition Serializing result step includes：

If so,

If not serialized there are internal memory cache region to the part, obtain it and serialize result and by the serializing As a result store to the internal memory cache region；

If it is not,

Then the part is serialized, it is obtained and serializes result.

3. the method for distributed data located instance as claimed in claim 1, it is characterised in that each part of acquisition Serializing result step includes：

Each part is serialized, and obtains it and serializes result.

4. the method for distributed data located instance as claimed in claim 1, it is characterised in that

If the method that the pending data cutting is stem portion is included：

5. the method for distributed data located instance as claimed in claim 2, it is characterised in that the value model of the preset value Enclose for 128 bytes to 512 bytes.

6. the method for distributed data located instance as claimed in claim 1, it is characterised in that

Judge whether the size of the serializing result of the pending data is greater than or equal to predetermined threshold value；

If then select murmur_128 Hash strategies；

If otherwise select fnv1a Hash strategies.

7. the method for distributed data located instance as claimed in claim 6, it is characterised in that the predetermined threshold value is 432byte。

8. the method for distributed data located instance as claimed in claim 1, it is characterised in that the server-side is redis Server-side, the pending data are redis key, the described method includes：

If redis key are divided stem portion by the client according to preset condition in redis key, record in redis key Order of the field；

9. the method for distributed data located instance as claimed in claim 8, it is characterised in that the described method includes：

10. a kind of client, for distributed computing system, the distributed computing system is including some clients and at least Server-side, it is characterised in that the client includes acquisition module, cutting module, computing module, composite module and calculates mould Block,

The acquisition module is used to obtain pending data；

The composite module is used for the serializing knot that the serializing result of all parts is combined as to the pending data Fruit；

The computing module is used to the serializing result of the pending data carrying out Hash calculation, so as to wait to locate described in specifying Manage the processing example of data.

11. client as claimed in claim 10, it is characterised in that the client further includes the first judgment module, second Judgment module, read module, memory module and processing module,

First judgment module is used to judge whether the size of data of a part is more than preset value, if then calling second to sentence Disconnected module, if otherwise calling the processing module；

Second judgment module is used to judge whether the serializing result of the part is present in internal memory cache region, if then adjusting With the read module, if otherwise calling the memory module；

The processing module is used to serialize the part；

The memory module is used to serialize the part, obtains it and serializes result and the serializing result storage To the internal memory cache region.

A kind of 12. distributed computing system, it is characterised in that the distributed computing system include it is some such as claims 10 or Client and an at least server-side described in 11.