CN107633001A - Hash partition optimization method and device - Google Patents
Hash partition optimization method and device Download PDFInfo
- Publication number
- CN107633001A CN107633001A CN201710656815.9A CN201710656815A CN107633001A CN 107633001 A CN107633001 A CN 107633001A CN 201710656815 A CN201710656815 A CN 201710656815A CN 107633001 A CN107633001 A CN 107633001A
- Authority
- CN
- China
- Prior art keywords
- hash
- hash partition
- result
- partition
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of hash partition optimization method and device.Wherein, this method includes:Data set is obtained, wherein, data set includes one or more data;First time hash partition is carried out to data set using data skew optimized algorithm, obtains the first hash partition result;Second of hash partition is carried out to the first hash partition result, obtains the second hash partition result.The present invention is solved in the prior art due to the technical problem of hash partition short -board effect caused by data skew.
Description
Technical field
The present invention relates to field of computer data processing, in particular to a kind of hash partition optimization method and device.
Background technology
Hash partition is disposed usually as data, the conventional strategy of dynamic query processing.It can be in processing data unit
When obtain high-caliber operation repetitive and shorten the response time.Adjusted including the Directory Enquiries such as Hash join algorithm and converging operation
In method, intermediate result can be effectively obtained using hash partition.The main target of hash partition can be summarized as with it is several compared with
Small subtask substitutes a larger female task.Advantage of this is that by more efficiently using caching and internal memory, shortening
Handle the time of female task.In data base querying processing, hash partition is an operation all the fashion;It is being attached place
In reason and clustering processing, hash partition can lift process performance;In sequence processing, hash partition is highly important ring
Section.Liu et al. devises a kind of hash partition strategy based on Distributed Query Processing, can effectively shorten query time.
Shin et al. proposes the hash partition method of optimization a kind of for solid state hard disc, this method ignore primary storage size or input/
The support of IOB, realize the result better than traditional hash partition method.
Hash partition (Hash partitioning) also known as hashes subregion, is to realize data by specified partition numbering
Equally distributed a kind of partition method, by carrying out hash partition on input-output apparatus, when data reach certain scale
When so that these partition sizes are approximate consistent, and then improve the efficiency of whole query processing.Need not carry out subregion addition or
In the case of deletion, hash partition can effectively improve the efficiency of inquiry.But when needing to carry out subregion addition or deletion
Wait, traditional hash partition method will go wrong.Assuming that be originally 7 conventional hash partitions, it is now desired to merge or
A conventional hash partition is deleted, modulus algorithm becomes mod6 by mod7, and the data in originally 7 subregions will need to recalculate
Again subregion.
And for the data of highly asymmetric property, such as aerospace data, although such as China, Russia, Europe and India
Spacefaring nation is belonged to together Deng state and the U.S., but due to national power and the difference of scientific and technological level, all kinds of spacecraft quantity, the manufacturing machine in the U.S.
Structure etc. will far more than etc. other countries.The efficiency for the methods of this data skew phenomenon often influences whether hash partition, and not
The resource of multi-core parallel concurrent processor can efficiently be utilized.
For above-mentioned in the prior art due to the problem of hash partition short -board effect, not yet being carried at present caused by data skew
Go out effective solution.
The content of the invention
The embodiments of the invention provide a kind of hash partition optimization method and device, with least solve in the prior art due to
The technical problem of hash partition short -board effect caused by data skew.
One side according to embodiments of the present invention, there is provided a kind of hash partition optimization method, including:Obtain data
Collection, wherein, data set includes one or more data;First time Hash point is carried out to data set using data skew optimized algorithm
Area, obtain the first hash partition result;Second of hash partition is carried out to the first hash partition result, obtains the second hash partition
As a result.
Further, data set is stored in the form of key-value pair, is comprised at least in key-value pair and is compiled corresponding to key-value pair
Number.
Further, first time hash partition is carried out to data set using data skew optimized algorithm, obtains the first Hash
Division result, including:First time hash partition is carried out to data set using mapping thread, obtains the first middle hash partition knot
Fruit, wherein, mapping thread is used to carry out Hash calculation to numbering corresponding to key-value pair, obtains Hash calculation result, and by Hash
Result of calculation identical key-value pair is assigned to same subregion, and mapping thread is one or more;Optimized using data skew and calculated
Method optimizes to the first middle hash partition result, obtains the first hash partition result.
Further, the first middle hash partition result is optimized using data skew optimized algorithm, obtains first
Hash partition result, including:Calculate the average partition size of the first middle hash partition result;By the first middle hash partition knot
Partition size is split more than the division result of average partition size according to average partition size in fruit.
Further, map thread has an independent memory space, separate storage sky for multiple and each mapping thread
Between be used for write key-value pair, using mapping thread to data set carry out first time hash partition include:Supervise each separate storage
The use degree in space, when use degree exceedes predetermined threshold value, to use more than the distribution storage of the independent memory space of predetermined threshold value
Space.
Another aspect according to embodiments of the present invention, a kind of hash partition optimization device is additionally provided, including:Obtain mould
Block, for obtaining data set, wherein, data set includes one or more data;First division module, for using data skew
Optimized algorithm carries out first time hash partition to data set, obtains the first hash partition result;Second division module, for
One hash partition result carries out second of hash partition, obtains the second hash partition result.
Another aspect according to embodiments of the present invention, additionally provides a kind of storage medium, and storage medium includes the journey of storage
Sequence, wherein, equipment performs above-mentioned hash partition optimization method where controlling storage medium when program is run.
Another aspect according to embodiments of the present invention, a kind of processor being additionally provided, processor is used for operation program, its
In, program performs above-mentioned hash partition optimization method when running.
Another aspect according to embodiments of the present invention, a kind of terminal is additionally provided, including:Acquisition module, for obtaining number
According to collection, wherein, data set includes one or more data;First division module, for using data skew optimized algorithm logarithm
First time hash partition is carried out according to collection, obtains the first hash partition result;Second division module, for the first hash partition knot
Fruit carries out second of hash partition, obtains the second hash partition result;Processor, processor operation program, wherein, program operation
When perform above-mentioned hash partition optimization side for the data that are exported from acquisition module, the first division module and the second division module
Method.
Another aspect according to embodiments of the present invention, a kind of terminal is additionally provided, including:Acquisition module, for obtaining number
According to collection, wherein, data set includes one or more data;First division module, for using data skew optimized algorithm logarithm
First time hash partition is carried out according to collection, obtains the first hash partition result;Second division module, for the first hash partition knot
Fruit carries out second of hash partition, obtains the second hash partition result;Storage medium, for storage program, wherein, program is being transported
During row above-mentioned hash partition optimization side is performed for the data exported from acquisition module, the first division module and the second division module
Method.
In embodiments of the present invention, by obtaining data set, wherein, data set includes one or more data;Using number
First time hash partition is carried out to data set according to optimized algorithm is tilted, obtains the first hash partition result;To the first hash partition
As a result second of hash partition is carried out, obtains the second hash partition result, has reached the mesh that data are carried out with efficient hash partition
, it is achieved thereby that reducing the influence that tilt data brings hash partition, the task amount handled by each subregion thread is homogenized, is delayed
Short -board effect caused by solving data skew, shorten subregion time of return, improve the technique effect of subregion efficiency, and then solve existing
Have in technology due to the technical problem of hash partition short -board effect caused by data skew.
Brief description of the drawings
Accompanying drawing described herein is used for providing a further understanding of the present invention, forms the part of the application, this hair
Bright schematic description and description is used to explain the present invention, does not form inappropriate limitation of the present invention.In the accompanying drawings:
Fig. 1 is a kind of schematic diagram of hash partition optimization method according to embodiments of the present invention;
Fig. 2 is a kind of schematic diagram of optional hash partition optimization method according to embodiments of the present invention;
Fig. 3 is a kind of schematic diagram of optional hash partition optimization method according to embodiments of the present invention;And
Fig. 4 is a kind of schematic diagram of hash partition optimization device according to embodiments of the present invention.
Embodiment
It should be noted that in the case where not conflicting, the feature in embodiment and embodiment in the application can phase
Mutually combination.Describe the present invention in detail below with reference to the accompanying drawings and in conjunction with the embodiments.
In order that those skilled in the art more fully understand the present invention program, below in conjunction with the embodiment of the present invention
Accompanying drawing, the technical scheme in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is only
The embodiment of a part of the invention, rather than whole embodiments.Based on the embodiment in the present invention, ordinary skill people
The every other embodiment that member is obtained under the premise of creative work is not made, it should all belong to the model that the present invention protects
Enclose.
It should be noted that term " first " in description and claims of this specification and above-mentioned accompanying drawing, "
Two " etc. be for distinguishing similar object, without for describing specific order or precedence.It should be appreciated that so use
Data can exchange in the appropriate case, so as to embodiments of the invention described herein can with except illustrating herein or
Order beyond those of description is implemented.In addition, term " comprising " and " having " and their any deformation, it is intended that cover
Cover it is non-exclusive include, be not necessarily limited to for example, containing the process of series of steps or unit, method, system, product or equipment
Those steps or unit clearly listed, but may include not list clearly or for these processes, method, product
Or the intrinsic other steps of equipment or unit.
Embodiment 1
According to embodiments of the present invention, there is provided a kind of embodiment of the method for hash partition optimization method, it is necessary to explanation,
It can be performed the step of the flow of accompanying drawing illustrates in the computer system of such as one group computer executable instructions, and
And although showing logical order in flow charts, in some cases, can be with different from order execution institute herein
The step of showing or describing.
Fig. 1 is hash partition optimization method according to embodiments of the present invention, as shown in figure 1, this method comprises the following steps:
Step S102, data set is obtained, wherein, data set includes one or more data.
Specifically, obtain data set before, method also include obtain data the step of, after getting data, can will
Data carry out piecemeal, and each piece is exactly a data set, wherein, it is space flight information data that data, which can include but is not limited to,.Obtain
The mode for taking data set can read to obtain from the file or memory for be stored with data set, for example, data set can be with
It is stored in txt texts.
Optionally, data set is stored in the form of key-value pair, is comprised at least in key-value pair and is numbered corresponding to key-value pair.
Specifically, the representation of key-value pair can be (Key, Value), wherein, Key represents the volume corresponding to key-value pair
Number, Value represents the value of the corresponding storage of key-value pair;When data set is stored in txt texts, each key-value pair (Key,
Value a line) is accounted for;The key-value pair that txt file reads input per a line can be read, wherein each key-value pair size can be 16
Byte, wherein, numbering Key can account for 8 bytes, and corresponding value Value accounts for 8 bytes.
Step S104, first time hash partition is carried out to data set using data skew optimized algorithm, obtains the first Hash
Division result.
Optionally, first time hash partition is carried out to data set using data skew optimized algorithm in step S104, obtained
First hash partition result, including:Step S202, first time hash partition is carried out to data set using mapping thread, obtains the
One middle hash partition result, wherein, mapping thread is used to carry out Hash calculation to numbering corresponding to key-value pair, obtains Hash meter
Result is calculated, and Hash calculation result identical key-value pair is assigned to same subregion, mapping thread is one or more;Step
S204, the first middle hash partition result is optimized using data skew optimized algorithm, obtains the first hash partition result.
Specifically, the mapping hash function of mapping thread can be:fm(Key)=Keymod2HashValue, wherein
HashValue is pre-defined positive integer Hash parameter, its span be [1 ,+∞), map thread for it is multiple when, often
Individual mapping thread is according to mapping hash function fm(Key) Hash meter is carried out to the Key values in data set key-value pair (Key, Value)
Calculate, result of calculation identical key-value pair is assigned to same subregion, the first middle hash partition result includes that t can be given birth to common property
An individual hash partition, the size of each subregion can be set to R1, R2..., Rj..., Rt, wherein t >=2, and R1≤R2≤…
≤Rj≤…≤Rt。
Optionally, the first middle hash partition result is optimized using data skew optimized algorithm in step S204,
The first hash partition result is obtained, including:Step S302, calculate the average partition size of the first middle hash partition result;Step
Rapid S304, partition size in the first middle hash partition result is more than the division result of average partition size according to average subregion
Size is split.
Specifically, in the first middle t hash partitions of hash partition result common property life, the first hash partition result
A hash partitions of uniform size can be given birth to common property, can be with when calculating the average partition size of the first middle hash partition result
In the following way:
First, calculate the accumulative of the subregion of k before partition size comes in t hash partitions and:
Secondly, according to average partition size that is above-mentioned accumulative and calculating a t hash partitions:
If the subregion R in a t hash partitionsj≤Rm, then the subregion is not handled, the subregion can be put into wait
In the queue of second of hash partition, prepare second of hash partition, if Rj≥Rm, then the subregion is carried out according to average mark
Area size RmSplit, and the subregion after fractionation is put into the queue for waiting second of hash partition, prepared second and breathe out
Uncommon subregion.
By above-mentioned steps 204 and step S302- step S304, the uniform of first time hash partition result can be improved
Change degree, make its suitability stronger.
Optionally, map thread has an independent memory space, independent memory space for multiple and each mapping thread
Carrying out first time hash partition to data set using mapping thread for writing key-value pair, in step S202 includes:Step S402,
The use degree of each independent memory space is supervised, when use degree exceedes predetermined threshold value, to use more than the independence of predetermined threshold value
Memory allocation memory space.
Specifically, after reading data set, the data set of reading can also be stored using Hash storage organization, this Shen
Please in Hash storage organization can be made up of a continuous array, array each represent a Hash bucket, each
Hash bucket stores the key-value pair in some subregion, wherein, each Hash bucket is by a free pointer (free pointers), one section of company
Continuous memory space and heir pointer (next pointers) composition, free pointer is pointed to next in this section of Coutinuous store space
Individual clear position, continuous memory space store key-value pair, and heir pointer points to a new Hash bucket.
Specifically, on the premise of each mapping thread has an independent memory space, following supervision plan can be used
Slightly ensure that the doubling for mapping thread performs and avoids write conflict:Key-value pair is only write oneself by each mapping thread parallel
Independent memory space in, mapping has corresponding partitioned area in the independent memory space of thread, most all mapping threads at last
Independent memory space merge, obtain the first middle hash partition result, in this process, each mapping can be supervised
The workload of thread, or the use degree of each independent memory space of supervision, when use degree exceedes predetermined threshold value, Ke Yiwei
The independent memory space distribution memory space of predetermined threshold value is used more than, until all threads are finished.
Step S106, second of hash partition is carried out to the first hash partition result, obtains the second hash partition result.
Specifically, in the first hash partition result common property a hash partitions of uniform size of life, abbreviation line can be passed through
Cheng Jinhang subregions calculate, and abbreviation thread can be one or more, and the abbreviation hash function of wherein abbreviation thread can be:fr
(Key)=Keymod2HashValue+1, give a subregion to abbreviation thread and carry out subregion calculating, i.e., by abbreviation thread according to abbreviation
Hash function carries out Hash operation to the Key values in the key-value pair (Key, Value) in each division result, by operation result phase
Same key-value pair is assigned in same subregion, it is possible thereby to b division result is produced, wherein, b >=2, it is individual secondary that common property gives birth to a*b
Hash partition result, therefore, a*b division result of final output.
In embodiments of the present invention, by obtaining data set, wherein, data set includes one or more data;Using number
First time hash partition is carried out to data set according to optimized algorithm is tilted, obtains the first hash partition result;To the first hash partition
As a result second of hash partition is carried out, obtains the second hash partition result, has reached the mesh that data are carried out with efficient hash partition
, it is achieved thereby that reducing the influence that tilt data brings hash partition, the task amount handled by each subregion thread is homogenized, is delayed
Short -board effect caused by solving data skew, shorten subregion time of return, improve the technique effect of subregion efficiency, and then solve existing
Have in technology due to the technical problem of hash partition short -board effect caused by data skew.
In a kind of specific embodiment, as shown in Fig. 2 after getting data set, mapping line can be carried out to data set
The first time hash partition of journey, obtains t intermediate partition, carries out data skew optimization to the t intermediate partition, can obtain a
Individual subregion, the hash partition of abbreviation thread is carried out to a subregion, finally gives a*b subregion.
In a kind of specific embodiment, hash partition optimization method of the invention can be applied in space industry, obtained
After getting space flight information data, based on multi -CPU multi-core parallel computation, using mapping thread and abbreviation loft journey and data are inclined
Oblique optimized algorithm, by space flight information data uniform segmentation and parallel computation is carried out, can realize that multistep hash partition operates, improve
Buffer efficiency, lift the overall performance of multi -CPU multi-core processor.
The hash partition optimization method of the present invention can be applied in space industry, can enter one by following emulation experiment
Walk explanation:
1. emulation experiment condition:This emulation is emulated using C++ programming languages in linux system.
2. emulation content:In this experiment, the space flight information data of input integrates as 32M, totally 32768 pairs of key-value pairs, due to boat
Its information data is the data of data skew, and the neat husband value of its gradient is 1.25, is carried out using traditional Hash storage organization
Storage, mapping Thread Count be 16, take multiple hash function parameter HashValue, compare using data skew optimization method with not
The efficiency of subregion is carried out using data skew optimization method, its result is as shown in Figure 3.
3. simulation result:From figure 3, it can be seen that in the inclined space flight information data of processing data, carried using the present invention
The performance that the data skew optimization method gone out compares unused data skew optimization method is significantly improved.Because this hair
The data skew optimization method of bright proposition can be averaged partition size to greatest extent, and then ensure the thread of each parallel computation
Workload it is roughly the same, shorten the stand-by period, hash partition efficiency can be improved.
Embodiment 2
According to embodiments of the present invention, there is provided a kind of product embodiments of hash partition optimization device, Fig. 4 is according to this hair
The hash partition optimization device of bright embodiment, as shown in figure 4, the device includes acquisition module, the first division module and second point
Area's module, wherein, acquisition module, for obtaining data set, wherein, data set includes one or more data;First subregion mould
Block, for carrying out first time hash partition to data set using data skew optimized algorithm, obtain the first hash partition result;The
Two division modules, for carrying out second of hash partition to the first hash partition result, obtain the second hash partition result.
In embodiments of the present invention, data set is obtained by acquisition module, wherein, data set includes one or more numbers
According to;First division module carries out first time hash partition using data skew optimized algorithm to data set, obtains the first Hash point
Area's result;Second division module carries out second of hash partition to the first hash partition result, obtains the second hash partition result,
Reach the purpose that data are carried out with efficient hash partition, it is achieved thereby that the influence that tilt data brings hash partition is reduced,
The task amount handled by each subregion thread is homogenized, alleviates short -board effect caused by data skew, shortens subregion time of return, carries
The technique effect of high subregion efficiency, and then solve in the prior art due to hash partition short -board effect caused by data skew
Technical problem.
Herein it should be noted that above-mentioned acquisition module, the first division module and the second division module correspond to embodiment 1
In step S102 to step S106, the example and application scenarios that above-mentioned module is realized with corresponding step be identical but unlimited
In the disclosure of that of above-described embodiment 1.It should be noted that above-mentioned module can be at such as one group as a part of of device
Performed in the computer system of computer executable instructions.
Optionally, data set is stored in the form of key-value pair, is comprised at least in key-value pair and is numbered corresponding to key-value pair.
Optionally, the first division module includes the 3rd division module and optimization module, wherein, the 3rd division module, it is used for
First time hash partition is carried out to data set using mapping thread, obtains the first middle hash partition result, wherein, map thread
For carrying out Hash calculation to numbering corresponding to key-value pair, Hash calculation result is obtained, and by Hash calculation result identical key
Value is one or more to being assigned to same subregion, mapping thread;Optimization module, for using data skew optimized algorithm pair
First middle hash partition result optimizes, and obtains the first hash partition result.
Herein it should be noted that the step S202 that above-mentioned 3rd division module and optimization module correspond in embodiment 1
To step S204, above-mentioned module is identical with example and application scenarios that corresponding step is realized, but is not limited to above-described embodiment 1
Disclosure of that.It should be noted that above-mentioned module can perform as a part of of device in such as one group of computer
Performed in the computer system of instruction.
Optionally, optimization module includes computing module and splits module, wherein, computing module, for calculating among first
The average partition size of hash partition result;Module is split, for partition size in the first middle hash partition result to be more than
The division result of average partition size is split according to average partition size.
Herein it should be noted that the step S302 that above-mentioned computing module and fractionation module correspond in embodiment 1 is extremely walked
Rapid S304, above-mentioned module is identical with example and application scenarios that corresponding step is realized, but it is public to be not limited to the institute of above-described embodiment 1
The content opened.It should be noted that above-mentioned module can be in such as one group of computer executable instructions as a part of of device
Computer system in perform.
Optionally, map thread has an independent memory space, independent memory space for multiple and each mapping thread
For writing key-value pair, the 3rd division module includes administration module, for supervising the use degree of each independent memory space, when making
When expenditure exceedes predetermined threshold value, memory space is distributed to use more than the independent memory space of predetermined threshold value.
Herein it should be noted that above-mentioned administration module correspond to embodiment 1 in step S402, above-mentioned module with it is corresponding
The step of the example realized it is identical with application scenarios, but be not limited to the disclosure of that of above-described embodiment 1.Need what is illustrated
It is that above-mentioned module can perform as a part of of device in the computer system of such as one group computer executable instructions.
Embodiment 3
According to embodiments of the present invention, there is provided a kind of product embodiments of storage medium, the storage medium include storage
Program, wherein, equipment performs above-mentioned hash partition optimization method where controlling storage medium when program is run.
Embodiment 4
According to embodiments of the present invention, there is provided a kind of product embodiments of processor, the processor are used for operation program, its
In, program performs above-mentioned hash partition optimization method when running.
Embodiment 5
According to embodiments of the present invention, there is provided a kind of product embodiments of terminal, the terminal include acquisition module, first point
Area's module, the second division module and processor, wherein, acquisition module, for obtaining data set, wherein, data set includes one
Or multiple data;First division module, for carrying out first time hash partition to data set using data skew optimized algorithm, obtain
To the first hash partition result;Second division module, for carrying out second of hash partition to the first hash partition result, obtain
Second hash partition result;Processor, processor operation program, wherein, for from acquisition module, the first subregion when program is run
Module and the data of the second division module output perform above-mentioned hash partition optimization method.
Embodiment 6
According to embodiments of the present invention, there is provided a kind of product embodiments of terminal, the terminal include acquisition module, first point
Area's module, the second division module and storage medium, wherein, acquisition module, for obtaining data set, wherein, data set includes one
Individual or multiple data;First division module, for carrying out first time hash partition to data set using data skew optimized algorithm,
Obtain the first hash partition result;Second division module, for carrying out second of hash partition to the first hash partition result, obtain
To the second hash partition result;Storage medium, for storage program, wherein, program is operationally for from acquisition module, first
Division module and the data of the second division module output perform above-mentioned hash partition optimization method.
The embodiments of the present invention are for illustration only, do not represent the quality of embodiment.
In the above embodiment of the present invention, the description to each embodiment all emphasizes particularly on different fields, and does not have in some embodiment
The part of detailed description, it may refer to the associated description of other embodiment.
In several embodiments provided herein, it should be understood that disclosed technology contents, others can be passed through
Mode is realized.Wherein, device embodiment described above is only schematical, such as the division of the unit, Ke Yiwei
A kind of division of logic function, can there is an other dividing mode when actually realizing, for example, multiple units or component can combine or
Person is desirably integrated into another system, or some features can be ignored, or does not perform.Another, shown or discussed is mutual
Between coupling or direct-coupling or communication connection can be INDIRECT COUPLING or communication link by some interfaces, unit or module
Connect, can be electrical or other forms.
The unit illustrated as separating component can be or may not be physically separate, show as unit
The part shown can be or may not be physical location, you can with positioned at a place, or can also be distributed to multiple
On unit.Some or all of unit therein can be selected to realize the purpose of this embodiment scheme according to the actual needs.
In addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit, can also
That unit is individually physically present, can also two or more units it is integrated in a unit.Above-mentioned integrated list
Member can both be realized in the form of hardware, can also be realized in the form of SFU software functional unit.
If the integrated unit is realized in the form of SFU software functional unit and is used as independent production marketing or use
When, it can be stored in a computer read/write memory medium.Based on such understanding, technical scheme is substantially
The part to be contributed in other words to prior art or all or part of the technical scheme can be in the form of software products
Embody, the computer software product is stored in a storage medium, including some instructions are causing a computer
Equipment (can be personal computer, server or network equipment etc.) perform each embodiment methods described of the present invention whole or
Part steps.And foregoing storage medium includes:USB flash disk, read-only storage (ROM, Read-Only Memory), arbitrary access are deposited
Reservoir (RAM, Random Access Memory), mobile hard disk, magnetic disc or CD etc. are various can be with store program codes
Medium.
Described above is only the preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art
For member, under the premise without departing from the principles of the invention, some improvements and modifications can also be made, these improvements and modifications also should
It is considered as protection scope of the present invention.
Claims (10)
- A kind of 1. hash partition optimization method, it is characterised in that including:Data set is obtained, wherein, the data set includes one or more data;First time hash partition is carried out to the data set using data skew optimized algorithm, obtains the first hash partition result;Second of hash partition is carried out to the first hash partition result, obtains the second hash partition result.
- 2. according to the method for claim 1, it is characterised in that the data set is stored in the form of key-value pair, institute State to comprise at least in key-value pair and numbered corresponding to the key-value pair.
- 3. according to the method for claim 2, it is characterised in that the data set is carried out using data skew optimized algorithm First time hash partition, the first hash partition result is obtained, including:First time hash partition is carried out to the data set using mapping thread, obtains the first middle hash partition result, wherein, The mapping thread is used to carry out Hash calculation to numbering corresponding to the key-value pair, obtains Hash calculation result, and by described in Key-value pair described in Hash calculation result identical is assigned to same subregion, and the mapping thread is one or more;The described first middle hash partition result is optimized using data skew optimized algorithm, obtains first Hash point Area's result.
- 4. according to the method for claim 3, it is characterised in that the described first centre is breathed out using data skew optimized algorithm Uncommon division result optimizes, and obtains the first hash partition result, including:Calculate the average partition size of the described first middle hash partition result;Partition size in described first middle hash partition result is more than the division result of the average partition size according to institute Average partition size is stated to be split.
- 5. the method according to claim 3 or 4, it is characterised in that the mapping thread is multiple and each mapping Thread has an independent memory space, and the independent memory space is used to write the key-value pair, using mapping thread to institute Stating data set progress first time hash partition includes:The use degree of each independent memory space of supervision, when the use degree exceedes predetermined threshold value, to be described using super Cross the independent memory space distribution memory space of predetermined threshold value.
- 6. a kind of hash partition optimizes device, it is characterised in that including:Acquisition module, for obtaining data set, wherein, the data set includes one or more data;First division module, for carrying out first time hash partition to the data set using data skew optimized algorithm, obtain First hash partition result;Second division module, for carrying out second of hash partition to the first hash partition result, obtain the second Hash point Area's result.
- A kind of 7. storage medium, it is characterised in that the storage medium includes the program of storage, wherein, run in described program When control the storage medium where hash partition optimization method in equipment perform claim requirement 1 to 5 described in any one.
- A kind of 8. processor, it is characterised in that the processor is used for operation program, wherein, right of execution when described program is run Profit requires the hash partition optimization method described in any one in 1 to 5.
- A kind of 9. terminal, it is characterised in that including:Acquisition module, for obtaining data set, wherein, the data set includes one or more data;First division module, for carrying out first time hash partition to the data set using data skew optimized algorithm, obtain First hash partition result;Second division module, for carrying out second of hash partition to the first hash partition result, obtain the second Hash point Area's result;Processor, the processor operation program, wherein, for from the acquisition module, described first when described program is run Hash partition in the data perform claim requirement 1 to 5 of division module and second division module output described in any one Optimization method.
- A kind of 10. terminal, it is characterised in that including:Acquisition module, for obtaining data set, wherein, the data set includes one or more data;First division module, for carrying out first time hash partition to the data set using data skew optimized algorithm, obtain First hash partition result;Second division module, for carrying out second of hash partition to the first hash partition result, obtain the second Hash point Area's result;Storage medium, for storage program, wherein, described program is operationally for from the acquisition module, described first point Hash partition in the data perform claim requirement 1 to 5 of area's module and second division module output described in any one is excellent Change method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710656815.9A CN107633001A (en) | 2017-08-03 | 2017-08-03 | Hash partition optimization method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710656815.9A CN107633001A (en) | 2017-08-03 | 2017-08-03 | Hash partition optimization method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107633001A true CN107633001A (en) | 2018-01-26 |
Family
ID=61099515
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710656815.9A Pending CN107633001A (en) | 2017-08-03 | 2017-08-03 | Hash partition optimization method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107633001A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109492657A (en) * | 2018-09-18 | 2019-03-19 | 平安科技(深圳)有限公司 | Handwriting samples digitizing solution, device, computer equipment and storage medium |
CN110532425A (en) * | 2019-08-19 | 2019-12-03 | 深圳市网心科技有限公司 | Video data placement formula storage method, device, computer equipment and storage medium |
CN111694693A (en) * | 2019-03-12 | 2020-09-22 | 上海晶赞融宣科技有限公司 | Data stream storage method and device and computer storage medium |
CN112286917A (en) * | 2020-10-22 | 2021-01-29 | 北京锐安科技有限公司 | Data processing method and device, electronic equipment and storage medium |
CN113516506A (en) * | 2021-06-10 | 2021-10-19 | 深圳市云网万店科技有限公司 | Data processing method and device and electronic equipment |
CN116467354A (en) * | 2023-06-15 | 2023-07-21 | 本原数据(北京)信息技术有限公司 | Database query method and device, computer equipment and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104133661A (en) * | 2014-07-30 | 2014-11-05 | 西安电子科技大学 | Multi-core parallel hash partitioning optimizing method based on column storage |
US20150234846A1 (en) * | 2014-02-17 | 2015-08-20 | Netapp, Inc. | Partitioning file system namespace |
CN105183880A (en) * | 2015-09-22 | 2015-12-23 | 浪潮集团有限公司 | Hash join method and device |
CN106156159A (en) * | 2015-04-16 | 2016-11-23 | 阿里巴巴集团控股有限公司 | A kind of table connection processing method, device and cloud computing system |
-
2017
- 2017-08-03 CN CN201710656815.9A patent/CN107633001A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150234846A1 (en) * | 2014-02-17 | 2015-08-20 | Netapp, Inc. | Partitioning file system namespace |
CN104133661A (en) * | 2014-07-30 | 2014-11-05 | 西安电子科技大学 | Multi-core parallel hash partitioning optimizing method based on column storage |
CN106156159A (en) * | 2015-04-16 | 2016-11-23 | 阿里巴巴集团控股有限公司 | A kind of table connection processing method, device and cloud computing system |
CN105183880A (en) * | 2015-09-22 | 2015-12-23 | 浪潮集团有限公司 | Hash join method and device |
Non-Patent Citations (2)
Title |
---|
袁通等: "多核处理器中基于MapReduce的哈希划分优化", 《西安交通大学学报》 * |
赵宇兰: "基于MapReduce的两表数据倾斜连接的优化算法", 《吉林大学学报(理学版)》 * |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109492657A (en) * | 2018-09-18 | 2019-03-19 | 平安科技(深圳)有限公司 | Handwriting samples digitizing solution, device, computer equipment and storage medium |
CN111694693A (en) * | 2019-03-12 | 2020-09-22 | 上海晶赞融宣科技有限公司 | Data stream storage method and device and computer storage medium |
CN110532425A (en) * | 2019-08-19 | 2019-12-03 | 深圳市网心科技有限公司 | Video data placement formula storage method, device, computer equipment and storage medium |
CN110532425B (en) * | 2019-08-19 | 2022-04-01 | 深圳市网心科技有限公司 | Video data distributed storage method and device, computer equipment and storage medium |
CN112286917A (en) * | 2020-10-22 | 2021-01-29 | 北京锐安科技有限公司 | Data processing method and device, electronic equipment and storage medium |
CN113516506A (en) * | 2021-06-10 | 2021-10-19 | 深圳市云网万店科技有限公司 | Data processing method and device and electronic equipment |
CN113516506B (en) * | 2021-06-10 | 2024-04-26 | 深圳市云网万店科技有限公司 | Data processing method and device and electronic equipment |
CN116467354A (en) * | 2023-06-15 | 2023-07-21 | 本原数据(北京)信息技术有限公司 | Database query method and device, computer equipment and storage medium |
CN116467354B (en) * | 2023-06-15 | 2023-09-12 | 本原数据(北京)信息技术有限公司 | Database query method and device, computer equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107633001A (en) | Hash partition optimization method and device | |
CN111913955A (en) | Data sorting processing device, method and storage medium | |
US8381230B2 (en) | Message passing with queues and channels | |
CN107391629A (en) | Data migration method, system, server and computer-readable storage medium between cluster | |
US20180314566A1 (en) | Systems for parallel processing of datasets with dynamic skew compensation | |
CN106815254A (en) | A kind of data processing method and device | |
CN104978228A (en) | Scheduling method and scheduling device of distributed computing system | |
Awad et al. | Dynamic graphs on the GPU | |
CN107729423A (en) | A kind of big data processing method and processing device | |
CN104407879A (en) | A power grid timing sequence large data parallel loading method | |
CN108021449A (en) | One kind association journey implementation method, terminal device and storage medium | |
CN110555700A (en) | block chain intelligent contract execution method and device and computer readable storage medium | |
CN105868218B (en) | A kind of data processing method and electronic equipment | |
CN107070645A (en) | Compare the method and system of the data of tables of data | |
CN112214319A (en) | Task scheduling method for sensing computing resources | |
CN108415912A (en) | Data processing method based on MapReduce model and equipment | |
CN104158875B (en) | It is a kind of to share the method and system for mitigating data center server task | |
CN105637482A (en) | Method and device for processing data stream based on gpu | |
CN108256182A (en) | A kind of layout method of dynamic reconfigurable FPGA | |
WO2022179023A1 (en) | Sorting device and method | |
CN107544848B (en) | Cluster expansion method, apparatus, electronic equipment and storage medium | |
CN114461384A (en) | Task execution method and device, computer equipment and storage medium | |
CN108389152A (en) | A kind of figure processing method and processing device of graph structure perception | |
Liu et al. | KubFBS: A fine‐grained and balance‐aware scheduling system for deep learning tasks based on kubernetes | |
CN107621980A (en) | A kind of virtual machine migration method, cluster control system and control device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180126 |
|
RJ01 | Rejection of invention patent application after publication |