CN109189732A - A kind of median analysis method and device - Google Patents

A kind of median analysis method and device Download PDF

Info

Publication number
CN109189732A
CN109189732A CN201810883746.XA CN201810883746A CN109189732A CN 109189732 A CN109189732 A CN 109189732A CN 201810883746 A CN201810883746 A CN 201810883746A CN 109189732 A CN109189732 A CN 109189732A
Authority
CN
China
Prior art keywords
file
median
merges
data
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810883746.XA
Other languages
Chinese (zh)
Inventor
杨星
赖文
王建洪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Sefon Software Co Ltd
Original Assignee
Chengdu Sefon Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Sefon Software Co Ltd filed Critical Chengdu Sefon Software Co Ltd
Priority to CN201810883746.XA priority Critical patent/CN109189732A/en
Publication of CN109189732A publication Critical patent/CN109189732A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention provides a kind of median analysis method and devices, are related to big data analysis technical field.The median analysis method includes: user query to be operated corresponding task data collection to be cut into multiple Sub Data Sets, and each Sub Data Set is respectively sent to corresponding operation node;Each operation node obtains node calculated result after calculating using median selection algorithm each Sub Data Set, merges the node calculated result and obtains median analysis result.This method carries out the specific calculating of median by merogenesis point processing and using median selection algorithm, reduces demand of the median calculating to computing resource, improves the efficiency of median calculating.

Description

A kind of median analysis method and device
Technical field
The present invention relates to big data analysis technical fields, in particular to a kind of median analysis method and device.
Background technique
It with data acquisition, management, flourishes to memory technology, data gradually present the data scale, fast of magnanimity The stream compression of speed, the data type of multiplicity and novel features, the data such as value density is low have also penetrated into the every of today's society One industry and operation function field, and become the important factor of production of enterprises.
Divide currently for the average value for calculating a measurement usually using the algorithm of average, but when data do not meet normal state In the case where cloth, due to being influenced by extreme value, average tends not to reflect true average level (such as house average price, a People's income etc.), so effect of the median in data analysis is also increasingly taken seriously.But calculating and place due to median Reason is increasingly complex compared to average, especially in the case where mass data, how to be carried out using median to higher efficiency Data analysis is a urgent problem needed to be solved.
Summary of the invention
In view of this, the embodiment of the present invention is designed to provide a kind of median analysis method and device, it is existing to solve There is the calculating of median when data volume is huge in technology and handles complex, elapsed time and the excessive problem of computing resource.
In a first aspect, the embodiment of the invention provides a kind of median analysis method, the median analysis method includes: User query are operated into corresponding task data collection and are cut into multiple Sub Data Sets, and each Sub Data Set is respectively sent to pair The operation node answered;Each operation node obtains node meter after calculating using median selection algorithm each Sub Data Set It calculates as a result, merging the node calculated result obtains median analysis result.
It is comprehensive in a first aspect, it is described by user query operate corresponding task data collection be cut into multiple Sub Data Sets it Before, further includes: user query operation is converted into structured query language SQL script;Institute is determined based on the SQL script It states user query and operates corresponding task data collection;The task data collection can be carried out there is no independent operation node by determining Quicksort.
Synthesis is in a first aspect, each operation node calculates each Sub Data Set using median selection algorithm Node calculated result is obtained afterwards, comprising: each operation node will correspond to the number binary representation in Sub Data Set;Each fortune The first file is written in the corresponding data of binary digit that most significant position is 1 of not comparing in node by operator node, will be in node The corresponding data of binary digit that most significant position is 0 that do not compare the second file is written.
Synthesis is in a first aspect, described merge the node calculated result acquisition median analysis result, comprising: by all fortune The first file mergences that operator node generates obtains first and merges file, and the second file mergences that all operation nodes are generated obtains Second merges file;It, will be described when the data volume that described first merges file is greater than the data volume that described second merges file First, which merges file, is stored in cache table;It is less than the described second data volume for merging file in the data volume that described first merges file When, merge file for described second and is stored in the cache table;It is equal to described second in the data volume that described first merges file to close And when the data volume of file, merges file and described second for described first and merge the file deposit cache table;Described slow It deposits in table and there was only first file or when second file, first file or second file are cut into multiple Sub Data Set, and the Sub Data Set that each cutting obtains is respectively sent to corresponding operation node, " each operation described in repetition Node will correspond to the number binary representation in Sub Data Set " to " described in being greater than in the data volume that described first merges file When the data volume of the second merging file, merges file for described first and be stored in cache table;In the data that described first merges file When amount is less than the data volume that described second merges file, merges file for described second and be stored in the cache table;Described first When the data volume that the data volume for merging file merges file equal to described second, merges file and described second for described first and close And file is stored in the cache table " the step of until determine that there are independent operation nodes can be to first file or described the Data in two files carry out quicksort, carry out quicksort simultaneously to the data in first file or second file It determines median, analyzes result for the median as the median;There is first text simultaneously in the cache table When part and second file, the average value of the maximum value in first file and the minimum value in second file is made Result is analyzed for the median.
It is comprehensive in a first aspect, merge after the node calculated result obtains median analysis result described, it is described in Digit analysis method further include: median analysis result is packaged into data set and is returned, and is collected based on the data preceding End interface carries out tables of data and visualization icon is shown.
Second aspect, the embodiment of the invention provides a kind of median analytical equipment, the median analytical equipment includes: Sub Data Set determining module is cut into multiple Sub Data Sets for user query to be operated corresponding task data collection, and will be every A Sub Data Set is respectively sent to corresponding operation node;It analyzes result and obtains module, for position in the use of each operation node Number selection algorithm obtains node calculated result after calculating each Sub Data Set, merges in the node calculated result acquisition Digit analyzes result.
Comprehensive second aspect, the median analytical equipment further includes pre- judgment module, and the pre- judgment module includes: to turn Unit is changed, for user query operation to be converted to structured query language SQL script;Task data collection determination unit, For determining that the user query operate corresponding task data collection based on the SQL script;Quicksort unit, for determining There is no independent operation nodes to carry out quicksort to the task data collection.
Comprehensive second aspect, it includes: Binary Conversion unit that the analysis result, which obtains module, is used for each operation node By the number binary representation in corresponding Sub Data Set;Taxon, for each operation node by not comparing in node The first file is written in the corresponding data of binary digit that most significant position is 1, and by not comparing in node, most significant position is 0 two The second file is written in the corresponding data of binary digits.
Comprehensive second aspect, the analysis result obtain module further include: combining unit, for all operation nodes are raw At the first file mergences obtain first merge file, by all operation nodes generate the second file mergences obtain second merge File;Cache unit will when the data volume for merging file described first is greater than the data volume that described second merges file Described first, which merges file, is stored in cache table;It is less than the described second number for merging file in the data volume that described first merges file When according to amount, merges file for described second and be stored in the cache table;It is equal to described the in the data volume that described first merges file When the data volume of two merging files, merges file and described second for described first and merge the file deposit cache table;Middle position Number analysis result acquiring unit will be described when for there was only first file or second file in the cache table First file or second file are cut into multiple Sub Data Sets, and the Sub Data Set that each cutting obtains is respectively sent to Corresponding operation node, repeat described in " each operation node by the number binary representation in corresponding Sub Data Set " to " When described first data volume for merging file is greater than the data volume that described second merges file, merge file deposit for described first Cache table;When the data volume that described first merges file is less than the data volume that described second merges file, described second is closed And file is stored in the cache table;It is equal to the described second data volume for merging file in the data volume that described first merges file When, merge file and described second for described first and merge file and be stored in the cache table " the step of until determine exist it is independent Operation node can carry out quicksort to the data in first file or second file, to first file or Data in second file carry out quicksort and determine median, and the median is analyzed as the median and is tied Fruit;When having first file and second file simultaneously in the cache table, by the maximum value in first file Result is analyzed as the median with the average value of the minimum value in second file.
The third aspect, it is described computer-readable the embodiment of the invention also provides a kind of computer-readable storage medium It takes and is stored with computer program instructions in storage medium, when the computer program instructions are read and run by a processor, hold Step in any of the above-described aspect the method for row.
Beneficial effect provided by the invention is:
The present invention provides a kind of median analysis method and device, the median analysis method is by by user query It operates corresponding task data collection and is cut into multiple Sub Data Sets and carry out operation respectively, keep calculating between each operation node negative Carry it is more balanced, while greatly improve median calculating speed;Meanwhile by median selection algorithm to each subnumber Node calculated result is obtained after being calculated according to collection and obtain median analysis by merging the node calculated result as a result, from And traditional serial arithmetic is optimized for by concurrent operation by distributed arithmetic, reduce median operation resources occupation rate, Improve the speed of median operation.
Other features and advantages of the present invention will be illustrated in subsequent specification, also, partly be become from specification It is clear that by implementing understanding of the embodiment of the present invention.The objectives and other advantages of the invention can be by written theory Specifically noted structure is achieved and obtained in bright book, claims and attached drawing.
Detailed description of the invention
In order to illustrate the technical solution of the embodiments of the present invention more clearly, below will be to needed in the embodiment attached Figure is briefly described, it should be understood that the following drawings illustrates only certain embodiments of the present invention, therefore is not construed as pair The restriction of range for those of ordinary skill in the art without creative efforts, can also be according to this A little attached drawings obtain other relevant attached drawings.
Fig. 1 is a kind of flow diagram for median analysis method that first embodiment of the invention provides;
Fig. 2 is a kind of step flow diagram for median selection algorithm that first embodiment of the invention provides;
Fig. 3 is a kind of module diagram for median analytical equipment that second embodiment of the invention provides;
Fig. 4 is a kind of structure that can be applied to the electronic equipment in the embodiment of the present application that third embodiment of the invention provides Block diagram.
Icon: 100- median analytical equipment;The pre- judgment module of 105-;110- Sub Data Set determining module;120- analysis As a result module is obtained;200- electronic equipment;201- memory;202- storage control;203- processor;204- Peripheral Interface; 205- input-output unit;206- audio unit;207- display unit.
Specific embodiment
Below in conjunction with attached drawing in the embodiment of the present invention, technical solution in the embodiment of the present invention carries out clear, complete Ground description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.Usually exist The component of the embodiment of the present invention described and illustrated in attached drawing can be arranged and be designed with a variety of different configurations herein.Cause This, is not intended to limit claimed invention to the detailed description of the embodiment of the present invention provided in the accompanying drawings below Range, but it is merely representative of selected embodiment of the invention.Based on the embodiment of the present invention, those skilled in the art are not doing Every other embodiment obtained under the premise of creative work out, shall fall within the protection scope of the present invention.
It should also be noted that similar label and letter indicate similar terms in following attached drawing, therefore, once a certain Xiang Yi It is defined in a attached drawing, does not then need that it is further defined and explained in subsequent attached drawing.Meanwhile of the invention In description, term " first ", " second " etc. are only used for distinguishing description, are not understood to indicate or imply relative importance.
First embodiment
Through the applicant the study found that traditional median calculation method is to arrange full dose data, due in sea In the case where measuring data, calculator memory is limited, usually can not once read all data and be ranked up, and repeatedly reads and also can Consume a large amount of computing resource, and inefficiency.To solve the above-mentioned problems, first embodiment of the invention provides in one kind Digit analysis method, it is noted that the step executing subject of the median analysis method in the embodiment of the present invention can be Computer, Cloud Server or other be able to carry out the electronic equipment of each step in median analysis method.
Referring to FIG. 1, Fig. 1 is a kind of flow diagram for median analysis method that first embodiment of the invention provides. The specific steps of the median analysis method can be such that
Step S20: operating corresponding task data collection for user query and be cut into multiple Sub Data Sets, and by each subnumber Corresponding operation node is respectively sent to according to collection.
For step S20, user query operation can be user in web front-end interface by way of dilatory drag The business datum that system middle finger determines data warehouse is subjected to the modeling of median algorithm, and the logical model after modeling is mapped into institute In the tables of data for belonging to data warehouse.
Step S40: each operation node is saved after being calculated using median selection algorithm each Sub Data Set Point calculated result merges the node calculated result and obtains median analysis result.
In embodiments of the present invention, the median analysis method is by operating corresponding task data collection for user query It is cut into multiple Sub Data Sets and carries out operation respectively, keep the computational load between each operation node more balanced, while greatly Ground improves the speed of median calculating;Meanwhile it being obtained after being calculated by median selection algorithm each Sub Data Set Node calculated result simultaneously obtains median analysis as a result, to will by distributed arithmetic by merging the node calculated result Traditional serial arithmetic is optimized for concurrent operation, reduces the resources occupation rate of median operation, improves median operation Speed.
As an implementation, user query " are operated corresponding task data described in the step S20 by the present embodiment Collection be cut into multiple Sub Data Sets " the step of before further include following sub-step:
Step S11: user query operation is converted into structured query language SQL script.
Specifically, before user query operation is converted to SQL script, also by according to the logical model of creation, The dimension and index in each median algorithm model are distinguished, user query operation is automatically converted to SQL script.Its In, it is one that structured query language (Structured Query Language, SQL), which is a kind of programming language of specific purposes, Kind data base querying and programming language, for accessing data and querying, updating, and managing relational database system;Simultaneously It is also the extension name of database script file.
Step S12: determine that the user query operate corresponding task data collection based on the SQL script.
Step S13: determine that there is no independent operation nodes to carry out quicksort to the task data collection.
Faster than common bubble sort speed, because comparing bubble sort, exchange is jump to the quicksort every time Formula, a datum mark is set when every minor sort, and the number that will be less than or equal to datum mark all be put into the left sides of datum mark, will It, in this way would not be as bubble sort one when each exchange more than or equal to the right that the number of datum mark is all put into datum mark Sample can only swap between adjacent number every time, and the distance of exchange is with regard to big more therefore total comparisons and exchange times It is just few, improve the speed that sequence calculates.
For step S20, complete Sub Data Set cutting and the step of multiple operation nodes calculate can based on Presto into Row, Presto are the distributed SQL query engines of an open source, while being also that the big data based on Java exploitation is distributed SQL query engine is suitable for interactive analysis and inquires, can inquiry to formula is interacted to the big data of several P from several G, inquiry Speed reach the rank of Business Data Warehouse, it is said that the performance of the engine is 10 times of Hive or more.Presto can be inquired Product data storage including the even some business of Hive, Cassandra, single Presto inquiry is combinable to come from multiple numbers United analysis is carried out according to the data in source.To improve the whole efficiency of median analysis, the threshold of data analysis is reduced.
For step S40, described " each operation node calculates each Sub Data Set using median selection algorithm The step of acquisition node calculated result afterwards " can also include following sub-step as shown in Figure 2:
Step S41: each operation node will correspond to the number binary representation in Sub Data Set.
Step S42: each operation node will not compare most significant position in node as the 1 corresponding data of binary digit The first file is written, the second file is written into the corresponding data of binary digit that most significant position is 0 of not comparing in node.
For step S42, in other embodiments, the first file or second is written in the binary digit that highest order is 1 or 0 The corresponding relationship of file can be different.
With continued reference to FIG. 2, Fig. 2 is a kind of step process for median selection algorithm that first embodiment of the invention provides Schematic diagram, Fig. 2 also show the following sub-step of " merge the node calculated result and the obtain median analysis result " step It is rapid:
Step S43: the first file mergences that all operation nodes are generated obtains first and merges file, by all operation sections The second file mergences that point generates obtains second and merges file.
Step S44: when the data volume that described first merges file is greater than the data volume that described second merges file, by institute State the first merging file deposit cache table;It is less than the described second data for merging file in the data volume that described first merges file When amount, merges file for described second and be stored in the cache table;It is equal to described second in the data volume that described first merges file When merging the data volume of file, merges file and described second for described first and merge the file deposit cache table.
Step S45: when there was only first file or second file in the cache table, by first file Or second file is cut into multiple Sub Data Sets, and the Sub Data Set that each cutting obtains is respectively sent to corresponding fortune Operator node, repeat described in " each operation node by the number binary representation in corresponding Sub Data Set " to " described first When the data volume that the data volume for merging file merges file greater than described second, merges file for described first and be stored in cache table; When the data volume that described first merges file is less than the data volume that described second merges file, the second merging file is deposited Enter the cache table;It, will be described when the data volume that described first merges file is equal to the data volume that described second merges file The step of first merging file and the second merging file deposit cache table ", is until determine that there are independent operation node energy Enough data in first file or second file carry out quicksort, to first file or second text Data in part carry out quicksort and determine median, analyze result for the median as the median;Described When having first file and second file simultaneously in cache table, by the maximum value and described second in first file The average value of minimum value in file analyzes result as the median.
Further, the present embodiment is after obtaining median analysis result, in order to allow user to obtain with being more clearly understood The threshold got effective information, reduce data analysis, can also execute step: median analysis result is packaged into data set It returns, and collects carry out tables of data in front-end interface and visualize icon to show based on the data.
Second embodiment
For the median analysis method for cooperating first embodiment of the invention to provide, second embodiment of the invention is additionally provided A kind of median analytical equipment 100.
Referring to FIG. 3, Fig. 3 is a kind of module diagram for median analytical equipment that second embodiment of the invention provides.
Median analytical equipment 100 includes Sub Data Set determining module 110, analysis result acquisition module 120.
Sub Data Set determining module 110 is cut into multiple subnumbers for user query to be operated corresponding task data collection Corresponding operation node is respectively sent to according to collection, and by each Sub Data Set.
It analyzes result and obtains module 120, for each operation node using median selection algorithm to each Sub Data Set Node calculated result is obtained after being calculated, and is merged the node calculated result and is obtained median analysis result.
Optionally, it includes Binary Conversion unit and taxon that analysis result, which obtains module 120,.
Binary Conversion unit will correspond to the number binary representation in Sub Data Set for each operation node.
Taxon corresponds to the binary digit that most significant position is 1 that do not compare in node for each operation node Data the first file is written, the binary digit corresponding data write-in second that most significant position is 0 will not be compared in node File.
Further, it further includes that combining unit, cache unit and median analysis result obtain that analysis result, which obtains module 120, Take unit.
Combining unit, the first file mergences for generating all operation nodes obtain first and merge file, will own The second file mergences that operation node generates obtains second and merges file.
Cache unit, the data volume for merging file described first are greater than the described second data volume for merging file When, merge file for described first and is stored in cache table;It is less than described second in the data volume that described first merges file and merges text When the data volume of part, merges file for described second and be stored in the cache table;It is equal in the data volume of the first merging file When the data volume of the second merging file, merges file and described second for described first and merge the file deposit caching Table.
Median analyzes result acquiring unit, for there was only first file or second text in the cache table When part, first file or second file are cut into multiple Sub Data Sets, and the subdata that each cutting is obtained Collection is respectively sent to corresponding operation node, repeat described in " each operation node by the number in corresponding Sub Data Set with two into Tabulation is shown " to " when the data volume that described first merges file is greater than the data volume that described second merges file, by described first Merge file and is stored in cache table;When the data volume that described first merges file is less than the data volume that described second merges file, Merge file for described second and is stored in the cache table;It is equal to described second in the data volume that described first merges file and merges text When the data volume of part, merge file and described second for described first and merge file and be stored in the cache table " the step of until really Surely there is independent operation node can carry out quicksort to the data in first file or second file, to described Data in first file or second file carry out quicksort and determine median, using the median as in described Digit analyzes result;When having first file and second file simultaneously in the cache table, by first file In maximum value and second file in minimum value average value as the median analyze result.
As an alternative embodiment, the median analytical equipment 100 in the present embodiment further includes pre- judgment module 105, pre- judgment module 105 includes converting unit, task data collection determination unit and quicksort unit.
Converting unit, for user query operation to be converted to structured query language SQL script.
Task data collection determination unit, for determining that the user query operate corresponding task based on the SQL script Data set.
Quicksort unit can quickly arrange the task data collection there is no independent operation node for determining Sequence.
It is apparent to those skilled in the art that for convenience and simplicity of description, the device of foregoing description Specific work process, no longer can excessively be repeated herein with reference to the corresponding process in preceding method.
3rd embodiment
Referring to FIG. 4, Fig. 4 is a kind of electronics that can be applied in the embodiment of the present application that third embodiment of the invention provides The structural block diagram of equipment.Electronic equipment 200 provided in this embodiment may include median analytical equipment 100, memory 201, Storage control 202, processor 203, Peripheral Interface 204, input-output unit 205, audio unit 206, display unit 207.
The memory 201, storage control 202, processor 203, Peripheral Interface 204, input-output unit 205, sound Frequency unit 206, each element of display unit 207 are directly or indirectly electrically connected between each other, to realize the transmission or friendship of data Mutually.It is electrically connected for example, these elements can be realized between each other by one or more communication bus or signal wire.The middle position Number analytical equipment 100 includes that at least one can be stored in the memory 201 in the form of software or firmware (firmware) Or it is solidificated in the software function module in the operating system (operating system, OS) of median analytical equipment 100.It is described Processor 203 is for executing the executable module stored in memory 201, such as the software that median analytical equipment 100 includes Functional module or computer program.
Wherein, memory 201 may be, but not limited to, random access memory (Random Access Memory, RAM), read-only memory (Read Only Memory, ROM), programmable read only memory (Programmable Read-Only Memory, PROM), erasable read-only memory (Erasable Programmable Read-Only Memory, EPROM), Electricallyerasable ROM (EEROM) (Electric Erasable Programmable Read-Only Memory, EEPROM) etc.. Wherein, memory 201 is for storing program, and the processor 203 executes described program after receiving and executing instruction, aforementioned Method performed by the server that the stream process that any embodiment of the embodiment of the present invention discloses defines can be applied to processor 203 In, or realized by processor 203.
Processor 203 can be a kind of IC chip, the processing capacity with signal.Above-mentioned processor 203 can To be general processor, including central processing unit (Central Processing Unit, abbreviation CPU), network processing unit (Network Processor, abbreviation NP) etc.;Can also be digital signal processor (DSP), specific integrated circuit (ASIC), Ready-made programmable gate array (FPGA) either other programmable logic device, discrete gate or transistor logic, discrete hard Part component.It may be implemented or execute disclosed each method, step and the logic diagram in the embodiment of the present invention.General processor It can be microprocessor or the processor 203 be also possible to any conventional processor etc..
Various input/output devices are couple processor 203 and memory 201 by the Peripheral Interface 204.Some In embodiment, Peripheral Interface 204, processor 203 and storage control 202 can be realized in one single chip.Other one In a little examples, they can be realized by independent chip respectively.
Input-output unit 205 realizes user and the server (or local terminal) for being supplied to user input data Interaction.The input-output unit 205 may be, but not limited to, the equipment such as mouse and keyboard.
Audio unit 206 provides a user audio interface, may include one or more microphones, one or more raises Sound device and voicefrequency circuit.
Display unit 207 provides an interactive interface (such as user's operation circle between the electronic equipment 200 and user Face) or for display image data give user reference.In the present embodiment, the display unit 207 can be liquid crystal display Or touch control display.It can be the capacitance type touch control screen or resistance of support single-point and multi-point touch operation if touch control display Formula touch screen etc..Single-point and multi-point touch operation is supported to refer to that touch control display can sense on the touch control display one Or at multiple positions simultaneously generate touch control operation, and the touch control operation that this is sensed transfer to processor 203 carry out calculate and Processing.
It is appreciated that structure shown in Fig. 4 is only to illustrate, the electronic equipment 200 may also include more than shown in Fig. 4 Perhaps less component or with the configuration different from shown in Fig. 4.Each component shown in Fig. 4 can use hardware, software Or combinations thereof realize.
It is apparent to those skilled in the art that for convenience and simplicity of description, the device of foregoing description Specific work process, no longer can excessively be repeated herein with reference to the corresponding process in preceding method.
In conclusion the embodiment of the invention provides a kind of median analysis method and device, the median analysis side Method by by user query operate corresponding task data collection be cut into multiple Sub Data Sets carry out respectively operation, make each operation Computational load between node is more balanced, while greatly improving the speed of median calculating;Meanwhile it being selected by median It selects and obtains node calculated result after algorithm calculates each Sub Data Set and obtained by merging the node calculated result Median analysis reduces median as a result, to which traditional serial arithmetic is optimized for concurrent operation by distributed arithmetic The resources occupation rate of operation, the speed for improving median operation.
In several embodiments provided herein, it should be understood that disclosed device and method can also pass through Other modes are realized.The apparatus embodiments described above are merely exemplary, for example, flow chart and block diagram in attached drawing Show the device of multiple embodiments according to the present invention, the architectural framework in the cards of method and computer program product, Function and operation.In this regard, each box in flowchart or block diagram can represent the one of a module, section or code Part, a part of the module, section or code, which includes that one or more is for implementing the specified logical function, to be held Row instruction.It should also be noted that function marked in the box can also be to be different from some implementations as replacement The sequence marked in attached drawing occurs.For example, two continuous boxes can actually be basically executed in parallel, they are sometimes It can execute in the opposite order, this depends on the function involved.It is also noted that every in block diagram and or flow chart The combination of box in a box and block diagram and or flow chart can use the dedicated base for executing defined function or movement It realizes, or can realize using a combination of dedicated hardware and computer instructions in the system of hardware.
In addition, each functional module in each embodiment of the present invention can integrate one independent portion of formation together Point, it is also possible to modules individualism, an independent part can also be integrated to form with two or more modules.
It, can be with if the function is realized and when sold or used as an independent product in the form of software function module It is stored in a computer readable storage medium.Based on this understanding, technical solution of the present invention is substantially in other words The part of the part that contributes to existing technology or the technical solution can be embodied in the form of software products, the meter Calculation machine software product is stored in a storage medium, including some instructions are used so that a computer equipment (can be a People's computer, server or network equipment etc.) it performs all or part of the steps of the method described in the various embodiments of the present invention. And storage medium above-mentioned includes: that USB flash disk, mobile hard disk, read-only memory (ROM, Read-OnlyMemory), arbitrary access are deposited The various media that can store program code such as reservoir (RAM, Random Access Memory), magnetic or disk.
The foregoing is only a preferred embodiment of the present invention, is not intended to restrict the invention, for the skill of this field For art personnel, the invention may be variously modified and varied.All within the spirits and principles of the present invention, made any to repair Change, equivalent replacement, improvement etc., should all be included in the protection scope of the present invention.It should also be noted that similar label and letter exist Similar terms are indicated in following attached drawing, therefore, once being defined in a certain Xiang Yi attached drawing, are then not required in subsequent attached drawing It is further defined and explained.
The above description is merely a specific embodiment, but scope of protection of the present invention is not limited thereto, any Those familiar with the art in the technical scope disclosed by the present invention, can easily think of the change or the replacement, and should all contain Lid is within protection scope of the present invention.Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.
It should be noted that, in this document, relational terms such as first and second and the like are used merely to a reality Body or operation are distinguished with another entity or operation, are deposited without necessarily requiring or implying between these entities or operation In any actual relationship or order or sequence.Moreover, the terms "include", "comprise" or its any other variant are intended to Non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only wanted including those Element, but also including other elements that are not explicitly listed, or further include for this process, method, article or equipment Intrinsic element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that There is also other identical elements in process, method, article or equipment including the element.

Claims (10)

1. a kind of median analysis method, which is characterized in that the median querying method includes:
User query are operated into corresponding task data collection and are cut into multiple Sub Data Sets, and each Sub Data Set is sent respectively To corresponding operation node;
Each operation node obtains node calculated result after calculating using median selection algorithm each Sub Data Set, closes And the node calculated result obtains median and analyzes result.
2. median analysis method according to claim 1, which is characterized in that it is described user query are operated it is corresponding Task data collection is cut into before multiple Sub Data Sets, further includes:
User query operation is converted into structured query language SQL script;
Determine that the user query operate corresponding task data collection based on the SQL script;
Determine that there is no independent operation nodes to carry out quicksort to the task data collection.
3. median analysis method according to claim 2, which is characterized in that each operation node uses median Selection algorithm obtains node calculated result after calculating each Sub Data Set, comprising:
Each operation node will correspond to the number binary representation in Sub Data Set;
Each operation node is literary by the corresponding data write-in first of the binary digit that most significant position is 1 of not comparing in node The second file is written in the corresponding data of binary digit that most significant position is 0 of not comparing in node by part.
4. median analysis method according to claim 3, which is characterized in that the merging node calculated result obtains It obtains median and analyzes result, comprising:
The first file mergences that all operation nodes are generated obtains first and merges file, second that all operation nodes are generated File mergences obtains second and merges file;
When the data volume that described first merges file is greater than the data volume that described second merges file, merge text for described first Part is stored in cache table;It, will be described when the data volume that described first merges file is less than the data volume that described second merges file Second, which merges file, is stored in the cache table;It is equal to the described second number for merging file in the data volume that described first merges file When according to amount, merges file and described second for described first and merge the file deposit cache table;
When there was only first file or second file in the cache table, by first file or second text Part is cut into multiple Sub Data Sets, and the Sub Data Set that each cutting obtains is respectively sent to corresponding operation node, repeats " each operation node will correspond to the number binary representation in Sub Data Set " is to " in the number that described first merges file When being greater than the data volume that described second merges file according to amount, merges file for described first and be stored in cache table;It is closed described first And the data volume of file is when being less than the data volume that described second merges file, described second will merge the file deposit caching Table;When the data volume that described first merges file is equal to the data volume that described second merges file, merge text for described first Part and described second merges file and is stored in the cache table " the step of until determine that there are independent operation nodes can be to described the Data in one file or second file carry out quicksort, to the data in first file or second file It carries out quicksort and determines median, analyze result for the median as the median;It is same in the cache table When having first file and second file, by first file maximum value and second file in most The average value of small value analyzes result as the median.
5. median analysis method according to claim 1, which is characterized in that merge the node calculated result described After obtaining median analysis result, the median analysis method further include:
Median analysis result is packaged into data set to return, and collects carry out tables of data in front-end interface based on the data And visualization icon is shown.
6. a kind of median analytical equipment, which is characterized in that the median analytical equipment includes:
Sub Data Set determining module is cut into multiple Sub Data Sets for user query to be operated corresponding task data collection, and Each Sub Data Set is respectively sent to corresponding operation node;
It analyzes result and obtains module, each Sub Data Set is calculated using median selection algorithm for each operation node Node calculated result is obtained afterwards, is merged the node calculated result and is obtained median analysis result.
7. median analytical equipment according to claim 6, which is characterized in that the median analytical equipment further includes pre- Judgment module, the pre- judgment module include:
Converting unit, for user query operation to be converted to structured query language SQL script;
Task data collection determination unit, for determining that the user query operate corresponding task data based on the SQL script Collection;
Quicksort unit, for determining that there is no independent operation nodes to carry out quicksort to the task data collection.
8. median analytical equipment according to claim 6, which is characterized in that the analysis result obtains module and includes:
Binary Conversion unit will correspond to the number binary representation in Sub Data Set for each operation node;
Taxon will not compare most significant position in node for each operation node as the 1 corresponding number of binary digit According to the first file is written, the second file is written into the corresponding data of binary digit that most significant position is 0 of not comparing in node.
9. median analytical equipment according to claim 7, which is characterized in that the analysis result obtains module and also wraps It includes:
Combining unit, the first file mergences for generating all operation nodes obtains first and merges file, by all operations The second file mergences that node generates obtains second and merges file;
Cache unit will when the data volume for merging file described first is greater than the data volume that described second merges file Described first, which merges file, is stored in cache table;It is less than the described second number for merging file in the data volume that described first merges file When according to amount, merges file for described second and be stored in the cache table;It is equal to described the in the data volume that described first merges file When the data volume of two merging files, merges file and described second for described first and merge the file deposit cache table;
Median analyzes result acquiring unit, for there was only first file or second file in the cache table When, first file or second file are cut into multiple Sub Data Sets, and the Sub Data Set that each cutting is obtained Be respectively sent to corresponding operation node, repeat described in " each operation node is by the number binary system in corresponding Sub Data Set Indicate " to " it is described first merge file data volume be greater than it is described second merge file data volume when, described first is closed And file is stored in cache table;It, will when the data volume that described first merges file is less than the data volume that described second merges file Described second, which merges file, is stored in the cache table;It is equal to described second in the data volume that described first merges file and merges file Data volume when, merge file and described second for described first and merge file and be stored in the cache table " the step of until determining There are independent operation nodes to carry out quicksort to the data in first file or second file, to described the Data in one file or second file carry out quicksort and determine median, using the median as the middle position Number analysis result;It, will be in first file when having first file and second file simultaneously in the cache table Maximum value and second file in minimum value average value as the median analyze result.
10. a kind of computer-readable storage medium, which is characterized in that be stored with meter in the computer-readable storage medium Calculation machine program instruction, when the computer program instructions are read and run by a processor, perform claim requires any one of 1-5 institute State the step in method.
CN201810883746.XA 2018-08-03 2018-08-03 A kind of median analysis method and device Pending CN109189732A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810883746.XA CN109189732A (en) 2018-08-03 2018-08-03 A kind of median analysis method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810883746.XA CN109189732A (en) 2018-08-03 2018-08-03 A kind of median analysis method and device

Publications (1)

Publication Number Publication Date
CN109189732A true CN109189732A (en) 2019-01-11

Family

ID=64920206

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810883746.XA Pending CN109189732A (en) 2018-08-03 2018-08-03 A kind of median analysis method and device

Country Status (1)

Country Link
CN (1) CN109189732A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101996102A (en) * 2009-08-31 2011-03-30 中国移动通信集团公司 Method and system for mining data association rule
CN104038543A (en) * 2013-05-27 2014-09-10 沈阳东软医疗系统有限公司 Method, cloud platform and system for cloud reconstruction of medical imaging devices
CN106611037A (en) * 2016-09-12 2017-05-03 星环信息科技(上海)有限公司 Method and device for distributed diagram calculation
CN106845536A (en) * 2017-01-09 2017-06-13 西北工业大学 A kind of parallel clustering method based on image scaling
CN107181682A (en) * 2016-03-11 2017-09-19 中国电信股份有限公司 The method and apparatus of calculating network access capability end to end
CN107273339A (en) * 2017-06-21 2017-10-20 郑州云海信息技术有限公司 A kind of task processing method and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101996102A (en) * 2009-08-31 2011-03-30 中国移动通信集团公司 Method and system for mining data association rule
CN104038543A (en) * 2013-05-27 2014-09-10 沈阳东软医疗系统有限公司 Method, cloud platform and system for cloud reconstruction of medical imaging devices
CN107181682A (en) * 2016-03-11 2017-09-19 中国电信股份有限公司 The method and apparatus of calculating network access capability end to end
CN106611037A (en) * 2016-09-12 2017-05-03 星环信息科技(上海)有限公司 Method and device for distributed diagram calculation
CN106845536A (en) * 2017-01-09 2017-06-13 西北工业大学 A kind of parallel clustering method based on image scaling
CN107273339A (en) * 2017-06-21 2017-10-20 郑州云海信息技术有限公司 A kind of task processing method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
HAPJIN: "海量数据查找中位数", 《博客园》 *

Similar Documents

Publication Publication Date Title
CN102915347B (en) A kind of distributed traffic clustering method and system
CN108376143B (en) Novel OLAP pre-calculation system and method for generating pre-calculation result
CN104281701B (en) Multiscale Distributed Spatial data query method and system
US20180165347A1 (en) Multi-dimensional analysis using named filters
CN111159184B (en) Metadata tracing method and device and server
US8566308B2 (en) Intelligent adaptive index density in a database management system
CN110309110A (en) A kind of big data log monitoring method and device, storage medium and computer equipment
CN108255897A (en) Visual Chart data conversion treatment method and apparatus
CN107273519A (en) Data analysing method, device, terminal and storage medium
CN111460011A (en) Page data display method and device, server and storage medium
Gupta et al. Faster as well as early measurements from big data predictive analytics model
US11550762B2 (en) Implementation of data access metrics for automated physical database design
CN109408502A (en) A kind of data standard processing method, device and its storage medium
CN113535788A (en) Retrieval method, system, equipment and medium for marine environment data
CN108829804A (en) Based on the high dimensional data similarity join querying method and device apart from partition tree
CN103699534A (en) Display method and device for data object in system directory
Fekete et al. Managing data for visual analytics: Opportunities and challenges.
CN112818013A (en) Time sequence database query optimization method, device, equipment and storage medium
CN102024067A (en) Method for technology transplant of analog circuit
Wang et al. TreeRank: a similarity measure for nearest neighbor searching in phylogenetic databases
CN108920516A (en) Real-time analysis method, system, device and computer readable storage medium
CN110874366A (en) Data processing and query method and device
CN113570464B (en) Digital currency transaction community identification method, system, equipment and storage medium
CN110489732A (en) Method for processing report data and equipment
CN109189732A (en) A kind of median analysis method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190111

RJ01 Rejection of invention patent application after publication