CN105740332A - Data sorting method and device - Google Patents

Data sorting method and device Download PDF

Info

Publication number
CN105740332A
CN105740332A CN201610045738.9A CN201610045738A CN105740332A CN 105740332 A CN105740332 A CN 105740332A CN 201610045738 A CN201610045738 A CN 201610045738A CN 105740332 A CN105740332 A CN 105740332A
Authority
CN
China
Prior art keywords
data
subinterval
point value
value
data identification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610045738.9A
Other languages
Chinese (zh)
Inventor
魏国建
王春明
周涛
韦永剑
叶华
张思进
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Jingdong Shangke Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN201610045738.9A priority Critical patent/CN105740332A/en
Publication of CN105740332A publication Critical patent/CN105740332A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • G06F16/90348Query processing by searching ordered data, e.g. alpha-numerically ordered data

Abstract

The invention discloses a data sorting method and device. A specific embodiment of the method comprises the following steps: obtaining data to be sorted and data identifiers of the data to be sorted; carrying out distribution operations, that is, determining a maximum identifier value and a minimum identifier value of the data identifiers, dividing an interval between a right endpoint value and a left endpoint value, which respectively serve as the maximum value and the minimum value, into a plurality of subintervals, and generating a plurality of data sets to be sorted, each of which corresponds to one of the subintervals; and carrying out sorting operations. The data sorting method and device provided by the invention have the advantages that the data to be sorted can be uniformly distributed during the sorting process, and differences among the identifier values of the data identifiers of the data to be sorted in each generated data set to be sorted are small, so that when the data to be sorted in each data set to be sorted is sorted according to the data identifier values, the expenses can be approximately equal, and the sorting efficiency of a system can be further improved.

Description

Data reordering method and device
Technical field
The application relates to computer realm, is specifically related to big data technique field, particularly relates to data reordering method and device.
Background technology
The Map-Reduce model of distributed computing framework Hadoop is widely used in big data processing technique.When utilizing Map-Reduce model that data are processed, need to utilize Map task that data are distributed to different Reduce tasks, then according to the size of the ident value of the Data Identification of data data it is ranked up and processes so that data global orderly after treatment.At present, the ways of distribution generally adopted is: gather the ident value of the Data Identification of data, it was predicted that the regularity of distribution of the ident value of the Data Identification of overall data, then according to the regularity of distribution, data is distributed to different Reduce tasks.
But, when adopting aforesaid way that data are distributed to different Reduce tasks, there is problems in that 1) when the data gathered are part data, there will be the uncorrelated situation of ident value of the ident value of the Data Identification of the data collected and the Data Identification of not collected data, cause the regularity of distribution that cannot dope the ident value of entirety exactly, and then data cannot be distributed to each Reduce task equably, the sequence efficiency of reduction system, 2) when the data gathered are total data, cause that overhead sharply increases, and then reduce the sequence efficiency of system.
Summary of the invention
This application provides data reordering method and device, for solving the technical problem that above-mentioned background section exists.
First aspect, this application provides data reordering method, and the method includes: obtain the Data Identification of pending data evidence and pending data evidence;Perform distribution operation: determine maximum and the minima of the ident value of Data Identification in Data Identification;It is multiple subinterval by the interval division of right-hand member point value and left end point value respectively maximum and minima, wherein, each subinterval meets the following conditions: left end point value is the right-hand member point value in the subinterval before it, and right-hand member point value is the left end point value in subinterval after;Determine the subinterval belonging to ident value of the Data Identification of each pending data evidence;Generating multiple pending data according to set, each treats the corresponding subinterval of ordered set;Perform sorting operation: treat the pending data evidence in sorting data set, be ranked up according to the size of the ident value of Data Identification.
Second aspect, this application provides data sorting device, and this device includes: acquiring unit, and configuration is for obtaining pending data evidence and the Data Identification of pending data evidence;Dispatching Unit, is configured to carry out distribution operation: determine maximum and the minima of the ident value of Data Identification in Data Identification;It is multiple subinterval by the interval division of right-hand member point value and left end point value respectively maximum and minima, wherein, each subinterval meets the following conditions: left end point value is the right-hand member point value in the subinterval before it, and right-hand member point value is the left end point value in subinterval after;Determine the subinterval belonging to ident value of the Data Identification of each pending data evidence;Generating multiple pending data according to set, each treats the corresponding subinterval of ordered set;Sequencing unit, is configured to carry out sorting operation: treats the pending data evidence in sorting data set, is ranked up according to the size of the ident value of Data Identification.
The data reordering method of the application offer and device, by obtaining the Data Identification of pending data evidence and pending data evidence;Perform distribution operation: determine maximum and the minima of the ident value of Data Identification in Data Identification;It is multiple subinterval by the interval division of right-hand member point value and left end point value respectively maximum and minima;Generating multiple pending data according to set, each treats the corresponding subinterval of ordered set;Perform sorting operation: treat the pending data evidence in sorting data set, be ranked up according to the size of the ident value of Data Identification.Achieve and in sequencer procedure, treat sorting data distribute equably, the pending data generated is less according to the difference between the ident value of the Data Identification of the pending data evidence in set, so that in the size according to Data Identification to each pending data in ordered set according to when being ranked up, expense can be similar to equal, and then the sequence efficiency of the system of lifting.
Accompanying drawing explanation
By reading the detailed description that non-limiting example is made made with reference to the following drawings, other features, purpose and advantage will become more apparent upon:
Fig. 1 is that the application can apply to exemplary system architecture figure therein;
Fig. 2 illustrates the flow chart of an embodiment of the data reordering method according to the application;
Fig. 3 illustrates the schematic diagram generating multiple pending datas according to set;
Fig. 4 illustrates an exemplary architecture figure of the data reordering method suitable in the application;
Fig. 5 illustrates the structural representation of an embodiment of the data sorting device according to the application;
Fig. 6 is adapted for the structural representation of the computer system for the terminal unit or server realizing the embodiment of the present application.
Detailed description of the invention
Below in conjunction with drawings and Examples, the application is described in further detail.It is understood that specific embodiment described herein is used only for explaining related invention, but not the restriction to this invention.It also should be noted that, for the ease of describing, accompanying drawing illustrate only the part relevant to about invention.
It should be noted that when not conflicting, the embodiment in the application and the feature in embodiment can be mutually combined.Describe the application below with reference to the accompanying drawings and in conjunction with the embodiments in detail.
Fig. 1 illustrates the exemplary system architecture 100 of the embodiment of the data reordering method that can apply the application or data sorting device.
As it is shown in figure 1, system architecture 100 can include terminal unit 101,102,103, network 104 and server 105.Network 104 in order to provide the medium of transmission link between terminal unit 101,102,103 and server 105.Network 104 can include various connection type, for instance wired, wireless transmission link or fiber optic cables etc..
User can use terminal unit 101,102,103 mutual with server 105 by network 104, to receive or to send message etc..Terminal unit 101,102,103 can be provided with various communication applications, for instance, browser class application, searching class application.
Terminal unit 101,102,103 can be have display screen and support the various electronic equipments of network service, include but not limited to smart mobile phone, panel computer, E-book reader, MP3 player (MovingPictureExpertsGroupAudioLayerIII, dynamic image expert's compression standard audio frequency aspect 3), MP4 (MovingPictureExpertsGroupAudioLayerIV, dynamic image expert's compression standard audio frequency aspect 4) player, pocket computer on knee and desk computer etc..
Server 105 can be applied from the browser class terminal unit 101,102,103 and obtain the data (such as cookie) that are associated with the network behavior of user as pending data according to this and the Data Identification of pending data evidence, it is ranked up it is then possible to treat sorting data according to the size of the Data Identification of pending data evidence.
It should be understood that the number of terminal unit in Fig. 1, network and server is merely schematic.According to realizing needs, it is possible to have any number of terminal unit, network and server.
Refer to Fig. 2, it illustrates the flow process 200 of the data reordering method according to the application a embodiment.It should be noted that the data reordering method that the embodiment of the present application provides generally is performed by the server 105 in Fig. 1, correspondingly, data sorting device is generally positioned in server 105.The method comprises the following steps:
Step 201, obtains the Data Identification of pending data evidence and pending data evidence.
In the present embodiment, pending data evidence can in units of bar.Such as, pending data is according to the cookie that can be the information being associated for the network behavior recorded with user.Article one, cookie can comprise the URL of the webpage that user browses, browsing time.Correspondingly, a cookie can correspond to a Data Identification, i.e. the Data Identification (key) of pending data evidence, and the type of the ident value of this Data Identification can be integer (int) type.
Step 202, performs distribution operation.
In the present embodiment, it is possible to by distributing operation by pending data according to being divided into multiple pending data according to set.Distribution operates maximum and the minima of the ident value of Data Identification in the Data Identification comprising determining that pending data evidence;It is multiple subinterval by the interval division of right-hand member point value and left end point value respectively maximum and minima, wherein, each subinterval meets the following conditions: left end point value is the right-hand member point value in the subinterval before it, and right-hand member point value is the left end point value in subinterval after;Determine the subinterval belonging to ident value of the Data Identification of each pending data evidence;Generating multiple pending data according to set, each treats the corresponding subinterval of ordered set.
In some optional implementations of the present embodiment, also include: utilize the distribution operation of the Map tasks carrying in the Map-Reduce model of distributed computing framework Hadoop and utilize the Reduce tasks carrying sorting operation in Map-Reduce model.
In the present embodiment, it is possible to first look for out maximum and the minima of Data Identification value in the Data Identification of all pending data evidences.In the Data Identification finding out all pending data evidences after the maximum of Data Identification value and minima, it may be determined that going out an interval, the numerical value of the left end point in this interval is above-mentioned minima, and the numerical value of right endpoint is above-mentioned maximum.It is then possible to by the subinterval that this interval division is multiple sequential, namely the left end point value in each subinterval is the right-hand member point value in the subinterval before it, right-hand member point value is the left end point value in subinterval after.After marking off the subinterval of multiple sequential, the subinterval belonging to ident value of the Data Identification of each pending data evidence can be determined respectively, so so that belong to the pending data in same subinterval according to may be constructed a pending data according to set.
In some optional implementations of the present embodiment, it is that multiple subinterval includes by the interval division of right-hand member point value and left end point value respectively described maximum and minima: adopt below equation to calculate the right-hand member point value in subinterval: Nmaxkey=Minkey+Average*N;Average=(Maxkey-Minkey)/Rnumber;Wherein, Nmaxkey represents the right-hand member point value in n-th subinterval, and Minkey represents the minima of the ident value of Data Identification, and Average represents that meansigma methods, Maxkey represent the maximum of the ident value of Data Identification, and Rnumber represents the quantity of Reduce task.
Below for Map-Reduce model, illustrate the mode that interval division is multiple subinterval of right-hand member point value and left end point value respectively maximum and minima: in the present embodiment, can according to the quantity of Reduce task, it is determined that go out the quantity in the subinterval marked off.Map task can be passed through based on the subinterval marked off, generate pending data according to set, then pending data is sent to Reduce task according to set, completes to treat the distribution of sorting data.
Refer to Fig. 3, it illustrates the schematic diagram generating multiple pending datas according to set.
First, determine maximum (Maxkey) and the minima (Minkey) of the ident value of Data Identification in the Data Identification of all of pending data evidence, it is thus possible to determine that a left end point value is Minkey, right-hand member point value is the interval of Maxkey.
It is then possible to the numerical value (also referred to as intervaled scale) of the right endpoint by calculating each subinterval.It is thus possible to be Minkey by left end point value, right-hand member point value is the interval division of Maxkey is multiple subinterval.The intervaled scale in each subinterval can adopt below equation to calculate:
Each_dregion=(Maxkey-Minkey)/Rnumber
Dregion_0=Minkey
Dregion_1=Minkey+Each_dregion*1
Dregion_2=Minkey+Each_dregion*2
……
……
Dregion_N-2=Minkey+Each_dregion* (n-2)
Dregion_N-1=Minkey+Each_dregion* (n-1)
Dregion_N=Maxkey
Wherein: Rnumber represents the quantity in subinterval, this quantity can be identical with the quantity of Reduce task.Each_dregion represents the meansigma methods of the ident value that Data Data identifies.Dregion_0 represents the initial value of intervaled scale, and this initial value can be Minkey.Dregion_N-1 represents that the intervaled scale in N-1 subinterval, Dregion_N represent the intervaled scale in n-th subinterval, i.e. the maximum Maxkey of the ident value of Data Identification in the Data Identification of all of pending data evidence.
In the present embodiment, it is being Minkey by left end point value, right-hand member point value is the interval division of Maxkey is after multiple subinterval, can based on the intervaled scale in subinterval, treat sorting data and be distributed (being such as distributed by Map task), namely judge that each pending data is according to affiliated subinterval, the pending data evidence belonged to so that belonging in same subinterval is made to may be constructed a pending data according to set (also referred to as treating sort file), such as, the FileN shown in Fig. 3.After treating that sort file is sent to Reduce task, complete to treat the distribution of sorting data.
In the present embodiment, ident value owing to being in each Data Identification treating pending data evidence in ordered set is in and both corresponds to same interval, each is treated, and the difference between the ident value of the Data Identification of pending data evidence in ordered set is less, thus in the size of the ident value according to Data Identification, to each pending data in ordered set according to when being ranked up, expense can be similar to equal.
In some optional implementations of the present embodiment, pre-set pending data according in the maximum of ident value of Data Identification of pending data evidence and minima.
In the present embodiment, can pre-set pending data according in the maximum Maxkey and minimum M inkey of ident value of Data Identification of pending data evidence, so that in sequencer procedure, the time complexity finding out Maxkey and Minkey is O (1), promotes the sequence efficiency of system further.
Step 203, performs sorting operation.
In the present embodiment, sorting operation includes: treats the pending data evidence in sorting data set, is ranked up according to the size of the ident value of Data Identification.In the present embodiment, to each pending data according to the pending data in set according to after being ranked up, it is possible to complete the overall sequence to whole pending data evidences further.Treat that ordered set first is treated ordered set, second treated ordered set for two, assume, first treat in ordered set after sorted, during the maximum of the ident value of the Data Identification of data is waited to sort less than second after sorted, the minima of the ident value of the Data Identification of data, after treating that ordered set is ranked up to two respectively, then may determine that first treats that ordered set came before second treats ordered set.Based on aforesaid way, it is determined that the position of each ordered set, thus completing the overall sequence to whole pending data evidences.
Refer to Fig. 4, it illustrates the data reordering method suitable in the application an exemplary architecture figure.
In fig. 4 it is shown that PartA part and PartB part.PartA part for providing non-ordered data and the pending data evidence of magnanimity to PartB part, the ident value of the Data Identification of every pending data evidence can be int type, simultaneously, PartA part can also get maximum and the minima of the ident value of the Data Identification of all pending data evidences in advance, such as, maximum and the minima of the ident value of Data Identification is obtained in the way of traveling through all pending data evidences, then, the maximum of the ident value of Data Identification and minima are supplied to PartB part.PartB part is used for maximum and the minima of the ident value based on Data Identification, the pending data evidence of magnanimity is ranked up, PartB part comprises the Map task in the Map-Reduce model of multiple distributed computing framework Hadoop and Reduce task, PartB carry out overall situation sequence.
The distribution operation of Map tasks carrying, judge that each pending data is according to affiliated subinterval, make the pending data belonging in same subinterval according to may be constructed a pending data according to set (also referred to as treating sort file), then, after treating that sort file is sent to Reduce task, complete to treat the distribution of sorting data.
Each Reduce task can to the pending data evidence received, the size of the Data Identification according to pending data evidence is ranked up, namely to the pending data belonged in a subinterval according to being ranked up, thus forming the sort file (also referred to as local order small documents) of local order.Additionally, Reduce task can also to the pending data that receives according to being further processed, thus obtaining the sort file (also referred to as Reduce output file) of local order after treatment.In the present embodiment, sort file each corresponding subinterval due to local order, and the maximum that the ident value of the Data Identification of the data in the sort file of local order is likely to obtain has predetermined that, the right-hand member point value in each subinterval namely marked off, magnitude relationship between the sort file of each local order can also be determined, such that it is able to the ident value according to the Data Identification of the data in the sort file of local order is likely to the magnitude relationship between the maximum obtained, form overall orderly sort file.
In some optional implementations of the present embodiment, also include: under Hadoopstreaming mode of operation, perform distribution operation, sorting operation.
In the present embodiment, can under HadoopStreaming mode of operation, partitioner data distribution interface in definition Hadoop, it is possible to adopt multiple code speech to realize distribution operation and sorting operation, so that the code of distribution operation, sorting operation runs in Hadoop.
Below for Map-Reduce model, the difference of the data sorting mode in the present embodiment and data sorting mode of the prior art is described: in the prior art, when Map task according to the regularity of distribution of the ident value of the Data Identification of pending data evidence to Reduce task distribute pending data according to time, can cause pending data evidence in some intervals Data Identification ident value between difference less, and the difference between the ident value of the Data Identification of the pending data evidence in other intervals is bigger.Thus, pending data in the interval that difference is less is according to when completing to sort, need the pending data waiting in the interval that difference is bigger according to completing sequence, perform the thread of the sorting operation of the pending data evidence in this interval can be suspended, cause that the sequence efficiency of whole system reduces.
And in the present embodiment, it is distributed to the ident value of the Data Identification of all pending data evidences of same Reduce task to be in both correspond to same interval due to Map task, make the difference being distributed between the ident value of the Data Identification of the pending data evidence of same Reduce task less, thus when each Reduce task is to its pending data evidence received, when the size of the ident value of the Data Identification according to pending data evidence is ranked up, expense is similar to equal, and then promotes the sequence efficiency of whole system.
With further reference to Fig. 5, as the realization to method shown in above-mentioned each figure, this application provides an embodiment of a kind of data sorting device, this device embodiment is corresponding with the data reordering method embodiment shown in Fig. 2, and this device specifically can apply in various electronic equipment.
As it is shown in figure 5, the data sorting device 500 of the present embodiment includes: acquiring unit 501, Dispatching Unit 502, sequencing unit 503.Acquiring unit 501 configuration is for obtaining pending data evidence and the Data Identification of pending data evidence;Dispatching Unit 502 is configured to carry out distribution operation: determine maximum and the minima of the ident value of Data Identification in Data Identification;It is multiple subinterval by the interval division of right-hand member point value and left end point value respectively maximum and minima, wherein, each subinterval meets the following conditions: left end point value is the right-hand member point value in the subinterval before it, and right-hand member point value is the left end point value in subinterval after;Determine the subinterval belonging to ident value of the Data Identification of each pending data evidence;Generating multiple pending data according to set, each treats the corresponding subinterval of ordered set;Sequencing unit 503 is configured to carry out sorting operation: treats the pending data evidence in sorting data set, is ranked up according to the size of the ident value of Data Identification.
In the present embodiment, acquiring unit 501 can obtain pending data evidence and the Data Identification of pending data evidence.Pending data evidence can in units of bar.Such as, pending data is according to the cookie that can be the information being associated for the network behavior recorded with user.Article one, cookie can comprise the URL of the webpage that user browses, browsing time.Correspondingly, a cookie can correspond to a Data Identification.
In the present embodiment, Dispatching Unit 502 can first look for out maximum and the minima of Data Identification value in the Data Identification of all pending data evidences.In the Data Identification finding out all pending data evidences after the maximum of Data Identification value and minima, it may be determined that going out an interval, the numerical value of the left end point in this interval is above-mentioned minima, and the numerical value of right endpoint is above-mentioned maximum.It is then possible to by the subinterval that this interval division is multiple sequential, namely the left end point value in each subinterval is the right-hand member point value in the subinterval before it, right-hand member point value is the left end point value in subinterval after.After marking off the subinterval of multiple sequential, the subinterval belonging to ident value of the Data Identification of each pending data evidence can be determined respectively, so so that belong to the pending data in same subinterval according to may be constructed a pending data according to set.
In the present embodiment, sequencing unit 503 can treat the pending data evidence in sorting data set, is ranked up according to the size of the ident value of Data Identification.To each pending data according to the pending data in set according to after being ranked up, it is possible to complete the overall sequence to whole pending data evidences further.
In some optional implementations of the present embodiment, device 500 also includes: distribution operation execution unit (not shown), and configuration is used for the Map tasks carrying distribution operation utilized in the Map-Reduce model of distributed computing framework Hadoop;Sorting operation performance element (not shown), configuration is for utilizing the Reduce tasks carrying sorting operation in Map-Reduce model.
In some optional implementations of the present embodiment, Dispatching Unit 502 includes: computation subunit (not shown), and configuration calculates the right-hand member point value in subinterval for adopting below equation: Nmaxkey=Minkey+Average*N;Average=(Maxkey-Minkey)/Rnumber;Wherein, Nmaxkey represents the right-hand member point value in n-th subinterval, and Minkey represents the minima of the ident value of Data Identification, and Average represents that meansigma methods, Maxkey represent the maximum of the ident value of Data Identification, and Rnumber represents the quantity of Reduce task.
In some optional implementations of the present embodiment, device 500 also includes: arrange unit (not shown), and configuration is for pre-setting maximum and the minima of the ident value of the Data Identification of the pending data evidence in pending data evidence.
In some optional implementations of the present embodiment, device 500 also includes: performance element (not shown), and configuration for performing distribution operation, sorting operation under Hadoopstreaming mode of operation.
It will be understood by those skilled in the art that above-mentioned data sorting device 500 also includes some other known features, for instance processor, memorizer etc., embodiment of the disclosure in order to unnecessarily fuzzy, these known structures are not shown in Figure 5.
Fig. 6 illustrates the structural representation being suitable to the computer system for the terminal unit or server realizing the embodiment of the present application.
As shown in Figure 6, computer system 600 includes CPU (CPU) 601, its can according to the program being stored in read only memory (ROM) 602 or from storage part 608 be loaded into the program random access storage device (RAM) 603 and perform various suitable action and process.In RAM603, also storage has system 600 to operate required various programs and data.CPU601, ROM602 and RAM603 are connected with each other by bus 604.Input/output (I/O) interface 605 is also connected to bus 604.
It is connected to I/O interface 605: include the importation 606 of keyboard, mouse etc. with lower component;Output part 607 including such as cathode ray tube (CRT), liquid crystal display (LCD) etc. and speaker etc.;Storage part 608 including hard disk etc.;And include the communications portion 609 of the NIC of such as LAN card, modem etc..Communications portion 609 performs communication process via the network of such as the Internet.Driver 610 is connected to I/O interface 605 also according to needs.Detachable media 611, such as disk, CD, magneto-optic disk, semiconductor memory etc., be arranged in driver 610 as required, in order to the computer program read from it is mounted into storage part 608 as required.
Especially, according to embodiment of the disclosure, the process described above with reference to flow chart may be implemented as computer software programs.Such as, embodiment of the disclosure and include a kind of computer program, it includes the computer program being tangibly embodied on machine readable media, and described computer program comprises the program code for performing the method shown in flow chart.In such embodiments, this computer program can pass through communications portion 609 and be downloaded and installed from network, and/or is mounted from detachable media 611.
Flow chart in accompanying drawing and block diagram, it is illustrated that according to the system of the various embodiment of the application, the architectural framework in the cards of method and computer program product, function and operation.In this, flow chart or each square frame in block diagram can represent a part for a module, program segment or code, and a part for described module, program segment or code comprises the executable instruction of one or more logic function for realizing regulation.It should also be noted that at some as in the realization replaced, the function marked in square frame can also to be different from the order generation marked in accompanying drawing.Such as, two square frames succeedingly represented can essentially perform substantially in parallel, and they can also perform sometimes in the opposite order, and this determines according to involved function.It will also be noted that, the combination of the square frame in each square frame in block diagram and/or flow chart and block diagram and/or flow chart, can realize by the special hardware based system of the function or operation that perform regulation, or can realize with the combination of specialized hardware Yu computer instruction.
As on the other hand, present invention also provides a kind of nonvolatile computer storage media, this nonvolatile computer storage media can be the nonvolatile computer storage media comprised in device described in above-described embodiment;Can also be individualism, be unkitted the nonvolatile computer storage media allocating in terminal.Above-mentioned nonvolatile computer storage media storage has one or more program, when one or multiple program are performed by an equipment so that described equipment: obtain the Data Identification of pending data evidence and pending data evidence;Perform distribution operation: determine maximum and the minima of the ident value of Data Identification in described Data Identification;It is multiple subinterval by the interval division of right-hand member point value and left end point value respectively described maximum and minima, wherein, each subinterval meets the following conditions: left end point value is the right-hand member point value in the subinterval before it, and right-hand member point value is the left end point value in subinterval after;Determine the subinterval belonging to ident value of the Data Identification of each pending data evidence;Generating multiple pending data according to set, each treats the corresponding subinterval of ordered set;Perform sorting operation: treat the pending data evidence in sorting data set, be ranked up according to the size of the ident value of Data Identification.
Above description is only the preferred embodiment of the application and the explanation to institute's application technology principle.Skilled artisan would appreciate that, invention scope involved in the application, it is not limited to the technical scheme of the particular combination of above-mentioned technical characteristic, when also should be encompassed in without departing from described inventive concept simultaneously, other technical scheme being carried out combination in any by above-mentioned technical characteristic or its equivalent feature and being formed.Such as features described above and (but not limited to) disclosed herein have the technical characteristic of similar functions and replace mutually and the technical scheme that formed.

Claims (10)

1. a data reordering method, it is characterised in that described method includes:
Obtain the Data Identification of pending data evidence and pending data evidence;
Perform distribution operation: determine maximum and the minima of the ident value of Data Identification in described Data Identification;It is multiple subinterval by the interval division of right-hand member point value and left end point value respectively described maximum and minima, wherein, each subinterval meets the following conditions: left end point value is the right-hand member point value in the subinterval before it, and right-hand member point value is the left end point value in subinterval after;Determine the subinterval belonging to ident value of the Data Identification of each pending data evidence;Generating multiple pending data according to set, each treats the corresponding subinterval of ordered set;
Perform sorting operation: treat the pending data evidence in sorting data set, be ranked up according to the size of the ident value of Data Identification.
2. method according to claim 1, it is characterised in that described method also includes:
Utilize distribution operation described in the Map tasks carrying in the Map-Reduce model of distributed computing framework Hadoop and utilize sorting operation described in the Reduce tasks carrying in Map-Reduce model.
3. method according to claim 2, it is characterised in that be that multiple subinterval includes by the interval division of right-hand member point value and left end point value respectively described maximum and minima:
Below equation is adopted to calculate the right-hand member point value in subinterval:
Nmaxkey=Minkey+Average*N;Average=(Maxkey-Minkey)/Rnumber;
Wherein, Nmaxkey represents the right-hand member point value in n-th subinterval, and Minkey represents the minima of the ident value of described Data Identification, and Average represents meansigma methods, Maxkey represents the maximum of the ident value of described Data Identification, and Rnumber represents the quantity of Reduce task.
4. method according to claim 3, it is characterised in that described method also includes:
Pre-set pending data according in the maximum of ident value of Data Identification of pending data evidence and minima.
5. method according to claim 4, it is characterised in that described method also includes: perform described distribution operation, sorting operation under Hadoopstreaming mode of operation.
6. a data sorting device, it is characterised in that described device includes:
Acquiring unit, configuration is for obtaining pending data evidence and the Data Identification of pending data evidence;
Dispatching Unit, is configured to carry out distribution operation: determine maximum and the minima of the ident value of Data Identification in described Data Identification;It is multiple subinterval by the interval division of right-hand member point value and left end point value respectively described maximum and minima, wherein, each subinterval meets the following conditions: left end point value is the right-hand member point value in the subinterval before it, and right-hand member point value is the left end point value in subinterval after;Determine the subinterval belonging to ident value of the Data Identification of each pending data evidence;Generating multiple pending data according to set, each treats the corresponding subinterval of ordered set;
Sequencing unit, is configured to carry out sorting operation: treats the pending data evidence in sorting data set, is ranked up according to the size of the ident value of Data Identification.
7. device according to claim 6, it is characterised in that described device also includes:
Distribution operation execution unit, configuration is used for utilizing distribution operation described in the Map tasks carrying in the Map-Reduce model of distributed computing framework Hadoop;
Sorting operation performance element, configuration is for utilizing sorting operation described in the Reduce tasks carrying in Map-Reduce model.
8. device according to claim 7, it is characterised in that described Dispatching Unit includes:
Computation subunit, configuration calculates the right-hand member point value in subinterval for adopting below equation:
Nmaxkey=Minkey+Average*N;Average=(Maxkey-Minkey)/Rnumber;
Wherein, Nmaxkey represents the right-hand member point value in n-th subinterval, and Minkey represents the minima of the ident value of described Data Identification, and Average represents meansigma methods, Maxkey represents the maximum of the ident value of described Data Identification, and Rnumber represents the quantity of Reduce task.
9. device according to claim 8, it is characterised in that described device also includes:
Arranging unit, configuration is for pre-setting maximum and the minima of the ident value of the Data Identification of the pending data evidence in pending data evidence.
10. device according to claim 9, it is characterised in that described device also includes:
Performance element, configuration for performing described distribution operation, sorting operation under Hadoopstreaming mode of operation.
CN201610045738.9A 2016-01-22 2016-01-22 Data sorting method and device Pending CN105740332A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610045738.9A CN105740332A (en) 2016-01-22 2016-01-22 Data sorting method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610045738.9A CN105740332A (en) 2016-01-22 2016-01-22 Data sorting method and device

Publications (1)

Publication Number Publication Date
CN105740332A true CN105740332A (en) 2016-07-06

Family

ID=56246505

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610045738.9A Pending CN105740332A (en) 2016-01-22 2016-01-22 Data sorting method and device

Country Status (1)

Country Link
CN (1) CN105740332A (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106250956A (en) * 2016-08-02 2016-12-21 立德高科(昆山)数码科技有限责任公司 According to the interval method and system generating Quick Response Code with the information of appointment of selected code value
CN106484662A (en) * 2016-10-12 2017-03-08 天闻数媒科技(湖南)有限公司 A kind of blended data sort method and device
CN106682085A (en) * 2016-11-24 2017-05-17 努比亚技术有限公司 Data processing method and terminal
CN106775586A (en) * 2016-11-11 2017-05-31 珠海市杰理科技股份有限公司 Data reordering method and device
CN107506399A (en) * 2017-08-02 2017-12-22 携程旅游网络技术(上海)有限公司 Method, system, equipment and the storage medium of data cell quick segmentation
CN108733790A (en) * 2018-05-11 2018-11-02 广州虎牙信息科技有限公司 Data reordering method, device, server and storage medium
CN108874798A (en) * 2017-05-09 2018-11-23 北京京东尚科信息技术有限公司 A kind of big data sort method and system
CN110070911A (en) * 2019-04-12 2019-07-30 内蒙古农业大学 A kind of parallel comparison method of gene order based on Hadoop
WO2019214303A1 (en) * 2018-05-07 2019-11-14 华为技术有限公司 Method and device for batch selection of data
CN110618866A (en) * 2019-09-20 2019-12-27 北京中科寒武纪科技有限公司 Data processing method and device and related product
CN112540985A (en) * 2020-12-07 2021-03-23 江苏赛融科技股份有限公司 Global sequencing output system and method based on distributed computing framework
CN113254488A (en) * 2020-08-05 2021-08-13 深圳市汉云科技有限公司 Data sorting method and system of distributed database

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101551814A (en) * 2009-05-13 2009-10-07 广东威创视讯科技股份有限公司 Method for data management and data search
CN102662639A (en) * 2012-04-10 2012-09-12 南京航空航天大学 Mapreduce-based multi-GPU (Graphic Processing Unit) cooperative computing method
CN103631940A (en) * 2013-12-09 2014-03-12 中国联合网络通信集团有限公司 Data writing method and data writing system applied to HBASE database

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101551814A (en) * 2009-05-13 2009-10-07 广东威创视讯科技股份有限公司 Method for data management and data search
CN102662639A (en) * 2012-04-10 2012-09-12 南京航空航天大学 Mapreduce-based multi-GPU (Graphic Processing Unit) cooperative computing method
CN103631940A (en) * 2013-12-09 2014-03-12 中国联合网络通信集团有限公司 Data writing method and data writing system applied to HBASE database

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
丁玉成: "云计算环境下排序算法的性能分析", 《重庆大学学报》 *

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106250956A (en) * 2016-08-02 2016-12-21 立德高科(昆山)数码科技有限责任公司 According to the interval method and system generating Quick Response Code with the information of appointment of selected code value
CN106484662A (en) * 2016-10-12 2017-03-08 天闻数媒科技(湖南)有限公司 A kind of blended data sort method and device
CN106775586A (en) * 2016-11-11 2017-05-31 珠海市杰理科技股份有限公司 Data reordering method and device
CN106775586B (en) * 2016-11-11 2019-06-11 珠海市杰理科技股份有限公司 Data reordering method and device
CN106682085A (en) * 2016-11-24 2017-05-17 努比亚技术有限公司 Data processing method and terminal
CN108874798A (en) * 2017-05-09 2018-11-23 北京京东尚科信息技术有限公司 A kind of big data sort method and system
CN107506399B (en) * 2017-08-02 2020-06-19 携程旅游网络技术(上海)有限公司 Method, system, device and storage medium for fast segmentation of data unit
CN107506399A (en) * 2017-08-02 2017-12-22 携程旅游网络技术(上海)有限公司 Method, system, equipment and the storage medium of data cell quick segmentation
WO2019214303A1 (en) * 2018-05-07 2019-11-14 华为技术有限公司 Method and device for batch selection of data
CN110457649A (en) * 2018-05-07 2019-11-15 华为技术有限公司 The method and apparatus of batch data selection
CN110457649B (en) * 2018-05-07 2021-05-04 华为技术有限公司 Method and device for selecting data in batches and computer storage medium
CN108733790A (en) * 2018-05-11 2018-11-02 广州虎牙信息科技有限公司 Data reordering method, device, server and storage medium
CN108733790B (en) * 2018-05-11 2021-07-02 广州虎牙信息科技有限公司 Data sorting method, device, server and storage medium
CN110070911A (en) * 2019-04-12 2019-07-30 内蒙古农业大学 A kind of parallel comparison method of gene order based on Hadoop
CN110618866A (en) * 2019-09-20 2019-12-27 北京中科寒武纪科技有限公司 Data processing method and device and related product
CN113254488A (en) * 2020-08-05 2021-08-13 深圳市汉云科技有限公司 Data sorting method and system of distributed database
CN112540985A (en) * 2020-12-07 2021-03-23 江苏赛融科技股份有限公司 Global sequencing output system and method based on distributed computing framework
CN112540985B (en) * 2020-12-07 2023-09-26 江苏赛融科技股份有限公司 Global ordering output system and method based on distributed computing framework

Similar Documents

Publication Publication Date Title
CN105740332A (en) Data sorting method and device
CN108959292A (en) A kind of data uploading method, system and computer readable storage medium
CN105205174A (en) File processing method and device for distributed system
CN106959965A (en) A kind of information processing method and server
CN106961454A (en) Document down loading method, device and terminal device
CN110363303B (en) Memory training method and device for intelligent distribution model and computer readable storage medium
CN107369055A (en) The picking distribution method and device of order messages
CN109597810B (en) Task segmentation method, device, medium and electronic equipment
Chuang et al. Parallel machine scheduling with preference of machines
CN110309142B (en) Method and device for rule management
WO2022222834A1 (en) Data processing method and apparatus
CN109635986A (en) Shops's method for pushing, device, equipment and storage medium
CN107341235A (en) Report form inquiring method, device, medium and electronic equipment
Vanchipura et al. Development and analysis of constructive heuristic algorithms for flow shop scheduling problems with sequence-dependent setup times
CN103581273B (en) A kind of distributed system performs method, the apparatus and system of business
CN112631751A (en) Task scheduling method and device, computer equipment and storage medium
CN113760488A (en) Method, device, equipment and computer readable medium for scheduling task
CN109428926A (en) A kind of method and apparatus of scheduler task node
CN112817562A (en) Service processing method and device
CN108512674A (en) Method, apparatus and equipment for output information
CN108228355A (en) Task processing method and device, method for scheduling task and device
CN109087139A (en) Advertisement placement method and device for feed stream
CN109218339B (en) Request processing method and device
CN109933727A (en) User's portrait generation method and system, user's portrait application method and system
CN106851189A (en) Video information processing method, system and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20160706