CN109189732A - A kind of median analysis method and device - Google Patents
A kind of median analysis method and device Download PDFInfo
- Publication number
- CN109189732A CN109189732A CN201810883746.XA CN201810883746A CN109189732A CN 109189732 A CN109189732 A CN 109189732A CN 201810883746 A CN201810883746 A CN 201810883746A CN 109189732 A CN109189732 A CN 109189732A
- Authority
- CN
- China
- Prior art keywords
- file
- median
- merges
- data
- node
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention provides a kind of median analysis method and devices, are related to big data analysis technical field.The median analysis method includes: user query to be operated corresponding task data collection to be cut into multiple Sub Data Sets, and each Sub Data Set is respectively sent to corresponding operation node;Each operation node obtains node calculated result after calculating using median selection algorithm each Sub Data Set, merges the node calculated result and obtains median analysis result.This method carries out the specific calculating of median by merogenesis point processing and using median selection algorithm, reduces demand of the median calculating to computing resource, improves the efficiency of median calculating.
Description
Technical field
The present invention relates to big data analysis technical fields, in particular to a kind of median analysis method and device.
Background technique
It with data acquisition, management, flourishes to memory technology, data gradually present the data scale, fast of magnanimity
The stream compression of speed, the data type of multiplicity and novel features, the data such as value density is low have also penetrated into the every of today's society
One industry and operation function field, and become the important factor of production of enterprises.
Divide currently for the average value for calculating a measurement usually using the algorithm of average, but when data do not meet normal state
In the case where cloth, due to being influenced by extreme value, average tends not to reflect true average level (such as house average price, a
People's income etc.), so effect of the median in data analysis is also increasingly taken seriously.But calculating and place due to median
Reason is increasingly complex compared to average, especially in the case where mass data, how to be carried out using median to higher efficiency
Data analysis is a urgent problem needed to be solved.
Summary of the invention
In view of this, the embodiment of the present invention is designed to provide a kind of median analysis method and device, it is existing to solve
There is the calculating of median when data volume is huge in technology and handles complex, elapsed time and the excessive problem of computing resource.
In a first aspect, the embodiment of the invention provides a kind of median analysis method, the median analysis method includes:
User query are operated into corresponding task data collection and are cut into multiple Sub Data Sets, and each Sub Data Set is respectively sent to pair
The operation node answered;Each operation node obtains node meter after calculating using median selection algorithm each Sub Data Set
It calculates as a result, merging the node calculated result obtains median analysis result.
It is comprehensive in a first aspect, it is described by user query operate corresponding task data collection be cut into multiple Sub Data Sets it
Before, further includes: user query operation is converted into structured query language SQL script;Institute is determined based on the SQL script
It states user query and operates corresponding task data collection;The task data collection can be carried out there is no independent operation node by determining
Quicksort.
Synthesis is in a first aspect, each operation node calculates each Sub Data Set using median selection algorithm
Node calculated result is obtained afterwards, comprising: each operation node will correspond to the number binary representation in Sub Data Set;Each fortune
The first file is written in the corresponding data of binary digit that most significant position is 1 of not comparing in node by operator node, will be in node
The corresponding data of binary digit that most significant position is 0 that do not compare the second file is written.
Synthesis is in a first aspect, described merge the node calculated result acquisition median analysis result, comprising: by all fortune
The first file mergences that operator node generates obtains first and merges file, and the second file mergences that all operation nodes are generated obtains
Second merges file;It, will be described when the data volume that described first merges file is greater than the data volume that described second merges file
First, which merges file, is stored in cache table;It is less than the described second data volume for merging file in the data volume that described first merges file
When, merge file for described second and is stored in the cache table;It is equal to described second in the data volume that described first merges file to close
And when the data volume of file, merges file and described second for described first and merge the file deposit cache table;Described slow
It deposits in table and there was only first file or when second file, first file or second file are cut into multiple
Sub Data Set, and the Sub Data Set that each cutting obtains is respectively sent to corresponding operation node, " each operation described in repetition
Node will correspond to the number binary representation in Sub Data Set " to " described in being greater than in the data volume that described first merges file
When the data volume of the second merging file, merges file for described first and be stored in cache table;In the data that described first merges file
When amount is less than the data volume that described second merges file, merges file for described second and be stored in the cache table;Described first
When the data volume that the data volume for merging file merges file equal to described second, merges file and described second for described first and close
And file is stored in the cache table " the step of until determine that there are independent operation nodes can be to first file or described the
Data in two files carry out quicksort, carry out quicksort simultaneously to the data in first file or second file
It determines median, analyzes result for the median as the median;There is first text simultaneously in the cache table
When part and second file, the average value of the maximum value in first file and the minimum value in second file is made
Result is analyzed for the median.
It is comprehensive in a first aspect, merge after the node calculated result obtains median analysis result described, it is described in
Digit analysis method further include: median analysis result is packaged into data set and is returned, and is collected based on the data preceding
End interface carries out tables of data and visualization icon is shown.
Second aspect, the embodiment of the invention provides a kind of median analytical equipment, the median analytical equipment includes:
Sub Data Set determining module is cut into multiple Sub Data Sets for user query to be operated corresponding task data collection, and will be every
A Sub Data Set is respectively sent to corresponding operation node;It analyzes result and obtains module, for position in the use of each operation node
Number selection algorithm obtains node calculated result after calculating each Sub Data Set, merges in the node calculated result acquisition
Digit analyzes result.
Comprehensive second aspect, the median analytical equipment further includes pre- judgment module, and the pre- judgment module includes: to turn
Unit is changed, for user query operation to be converted to structured query language SQL script;Task data collection determination unit,
For determining that the user query operate corresponding task data collection based on the SQL script;Quicksort unit, for determining
There is no independent operation nodes to carry out quicksort to the task data collection.
Comprehensive second aspect, it includes: Binary Conversion unit that the analysis result, which obtains module, is used for each operation node
By the number binary representation in corresponding Sub Data Set;Taxon, for each operation node by not comparing in node
The first file is written in the corresponding data of binary digit that most significant position is 1, and by not comparing in node, most significant position is 0 two
The second file is written in the corresponding data of binary digits.
Comprehensive second aspect, the analysis result obtain module further include: combining unit, for all operation nodes are raw
At the first file mergences obtain first merge file, by all operation nodes generate the second file mergences obtain second merge
File;Cache unit will when the data volume for merging file described first is greater than the data volume that described second merges file
Described first, which merges file, is stored in cache table;It is less than the described second number for merging file in the data volume that described first merges file
When according to amount, merges file for described second and be stored in the cache table;It is equal to described the in the data volume that described first merges file
When the data volume of two merging files, merges file and described second for described first and merge the file deposit cache table;Middle position
Number analysis result acquiring unit will be described when for there was only first file or second file in the cache table
First file or second file are cut into multiple Sub Data Sets, and the Sub Data Set that each cutting obtains is respectively sent to
Corresponding operation node, repeat described in " each operation node by the number binary representation in corresponding Sub Data Set " to "
When described first data volume for merging file is greater than the data volume that described second merges file, merge file deposit for described first
Cache table;When the data volume that described first merges file is less than the data volume that described second merges file, described second is closed
And file is stored in the cache table;It is equal to the described second data volume for merging file in the data volume that described first merges file
When, merge file and described second for described first and merge file and be stored in the cache table " the step of until determine exist it is independent
Operation node can carry out quicksort to the data in first file or second file, to first file or
Data in second file carry out quicksort and determine median, and the median is analyzed as the median and is tied
Fruit;When having first file and second file simultaneously in the cache table, by the maximum value in first file
Result is analyzed as the median with the average value of the minimum value in second file.
The third aspect, it is described computer-readable the embodiment of the invention also provides a kind of computer-readable storage medium
It takes and is stored with computer program instructions in storage medium, when the computer program instructions are read and run by a processor, hold
Step in any of the above-described aspect the method for row.
Beneficial effect provided by the invention is:
The present invention provides a kind of median analysis method and device, the median analysis method is by by user query
It operates corresponding task data collection and is cut into multiple Sub Data Sets and carry out operation respectively, keep calculating between each operation node negative
Carry it is more balanced, while greatly improve median calculating speed;Meanwhile by median selection algorithm to each subnumber
Node calculated result is obtained after being calculated according to collection and obtain median analysis by merging the node calculated result as a result, from
And traditional serial arithmetic is optimized for by concurrent operation by distributed arithmetic, reduce median operation resources occupation rate,
Improve the speed of median operation.
Other features and advantages of the present invention will be illustrated in subsequent specification, also, partly be become from specification
It is clear that by implementing understanding of the embodiment of the present invention.The objectives and other advantages of the invention can be by written theory
Specifically noted structure is achieved and obtained in bright book, claims and attached drawing.
Detailed description of the invention
In order to illustrate the technical solution of the embodiments of the present invention more clearly, below will be to needed in the embodiment attached
Figure is briefly described, it should be understood that the following drawings illustrates only certain embodiments of the present invention, therefore is not construed as pair
The restriction of range for those of ordinary skill in the art without creative efforts, can also be according to this
A little attached drawings obtain other relevant attached drawings.
Fig. 1 is a kind of flow diagram for median analysis method that first embodiment of the invention provides;
Fig. 2 is a kind of step flow diagram for median selection algorithm that first embodiment of the invention provides;
Fig. 3 is a kind of module diagram for median analytical equipment that second embodiment of the invention provides;
Fig. 4 is a kind of structure that can be applied to the electronic equipment in the embodiment of the present application that third embodiment of the invention provides
Block diagram.
Icon: 100- median analytical equipment;The pre- judgment module of 105-;110- Sub Data Set determining module;120- analysis
As a result module is obtained;200- electronic equipment;201- memory;202- storage control;203- processor;204- Peripheral Interface;
205- input-output unit;206- audio unit;207- display unit.
Specific embodiment
Below in conjunction with attached drawing in the embodiment of the present invention, technical solution in the embodiment of the present invention carries out clear, complete
Ground description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.Usually exist
The component of the embodiment of the present invention described and illustrated in attached drawing can be arranged and be designed with a variety of different configurations herein.Cause
This, is not intended to limit claimed invention to the detailed description of the embodiment of the present invention provided in the accompanying drawings below
Range, but it is merely representative of selected embodiment of the invention.Based on the embodiment of the present invention, those skilled in the art are not doing
Every other embodiment obtained under the premise of creative work out, shall fall within the protection scope of the present invention.
It should also be noted that similar label and letter indicate similar terms in following attached drawing, therefore, once a certain Xiang Yi
It is defined in a attached drawing, does not then need that it is further defined and explained in subsequent attached drawing.Meanwhile of the invention
In description, term " first ", " second " etc. are only used for distinguishing description, are not understood to indicate or imply relative importance.
First embodiment
Through the applicant the study found that traditional median calculation method is to arrange full dose data, due in sea
In the case where measuring data, calculator memory is limited, usually can not once read all data and be ranked up, and repeatedly reads and also can
Consume a large amount of computing resource, and inefficiency.To solve the above-mentioned problems, first embodiment of the invention provides in one kind
Digit analysis method, it is noted that the step executing subject of the median analysis method in the embodiment of the present invention can be
Computer, Cloud Server or other be able to carry out the electronic equipment of each step in median analysis method.
Referring to FIG. 1, Fig. 1 is a kind of flow diagram for median analysis method that first embodiment of the invention provides.
The specific steps of the median analysis method can be such that
Step S20: operating corresponding task data collection for user query and be cut into multiple Sub Data Sets, and by each subnumber
Corresponding operation node is respectively sent to according to collection.
For step S20, user query operation can be user in web front-end interface by way of dilatory drag
The business datum that system middle finger determines data warehouse is subjected to the modeling of median algorithm, and the logical model after modeling is mapped into institute
In the tables of data for belonging to data warehouse.
Step S40: each operation node is saved after being calculated using median selection algorithm each Sub Data Set
Point calculated result merges the node calculated result and obtains median analysis result.
In embodiments of the present invention, the median analysis method is by operating corresponding task data collection for user query
It is cut into multiple Sub Data Sets and carries out operation respectively, keep the computational load between each operation node more balanced, while greatly
Ground improves the speed of median calculating;Meanwhile it being obtained after being calculated by median selection algorithm each Sub Data Set
Node calculated result simultaneously obtains median analysis as a result, to will by distributed arithmetic by merging the node calculated result
Traditional serial arithmetic is optimized for concurrent operation, reduces the resources occupation rate of median operation, improves median operation
Speed.
As an implementation, user query " are operated corresponding task data described in the step S20 by the present embodiment
Collection be cut into multiple Sub Data Sets " the step of before further include following sub-step:
Step S11: user query operation is converted into structured query language SQL script.
Specifically, before user query operation is converted to SQL script, also by according to the logical model of creation,
The dimension and index in each median algorithm model are distinguished, user query operation is automatically converted to SQL script.Its
In, it is one that structured query language (Structured Query Language, SQL), which is a kind of programming language of specific purposes,
Kind data base querying and programming language, for accessing data and querying, updating, and managing relational database system;Simultaneously
It is also the extension name of database script file.
Step S12: determine that the user query operate corresponding task data collection based on the SQL script.
Step S13: determine that there is no independent operation nodes to carry out quicksort to the task data collection.
Faster than common bubble sort speed, because comparing bubble sort, exchange is jump to the quicksort every time
Formula, a datum mark is set when every minor sort, and the number that will be less than or equal to datum mark all be put into the left sides of datum mark, will
It, in this way would not be as bubble sort one when each exchange more than or equal to the right that the number of datum mark is all put into datum mark
Sample can only swap between adjacent number every time, and the distance of exchange is with regard to big more therefore total comparisons and exchange times
It is just few, improve the speed that sequence calculates.
For step S20, complete Sub Data Set cutting and the step of multiple operation nodes calculate can based on Presto into
Row, Presto are the distributed SQL query engines of an open source, while being also that the big data based on Java exploitation is distributed
SQL query engine is suitable for interactive analysis and inquires, can inquiry to formula is interacted to the big data of several P from several G, inquiry
Speed reach the rank of Business Data Warehouse, it is said that the performance of the engine is 10 times of Hive or more.Presto can be inquired
Product data storage including the even some business of Hive, Cassandra, single Presto inquiry is combinable to come from multiple numbers
United analysis is carried out according to the data in source.To improve the whole efficiency of median analysis, the threshold of data analysis is reduced.
For step S40, described " each operation node calculates each Sub Data Set using median selection algorithm
The step of acquisition node calculated result afterwards " can also include following sub-step as shown in Figure 2:
Step S41: each operation node will correspond to the number binary representation in Sub Data Set.
Step S42: each operation node will not compare most significant position in node as the 1 corresponding data of binary digit
The first file is written, the second file is written into the corresponding data of binary digit that most significant position is 0 of not comparing in node.
For step S42, in other embodiments, the first file or second is written in the binary digit that highest order is 1 or 0
The corresponding relationship of file can be different.
With continued reference to FIG. 2, Fig. 2 is a kind of step process for median selection algorithm that first embodiment of the invention provides
Schematic diagram, Fig. 2 also show the following sub-step of " merge the node calculated result and the obtain median analysis result " step
It is rapid:
Step S43: the first file mergences that all operation nodes are generated obtains first and merges file, by all operation sections
The second file mergences that point generates obtains second and merges file.
Step S44: when the data volume that described first merges file is greater than the data volume that described second merges file, by institute
State the first merging file deposit cache table;It is less than the described second data for merging file in the data volume that described first merges file
When amount, merges file for described second and be stored in the cache table;It is equal to described second in the data volume that described first merges file
When merging the data volume of file, merges file and described second for described first and merge the file deposit cache table.
Step S45: when there was only first file or second file in the cache table, by first file
Or second file is cut into multiple Sub Data Sets, and the Sub Data Set that each cutting obtains is respectively sent to corresponding fortune
Operator node, repeat described in " each operation node by the number binary representation in corresponding Sub Data Set " to " described first
When the data volume that the data volume for merging file merges file greater than described second, merges file for described first and be stored in cache table;
When the data volume that described first merges file is less than the data volume that described second merges file, the second merging file is deposited
Enter the cache table;It, will be described when the data volume that described first merges file is equal to the data volume that described second merges file
The step of first merging file and the second merging file deposit cache table ", is until determine that there are independent operation node energy
Enough data in first file or second file carry out quicksort, to first file or second text
Data in part carry out quicksort and determine median, analyze result for the median as the median;Described
When having first file and second file simultaneously in cache table, by the maximum value and described second in first file
The average value of minimum value in file analyzes result as the median.
Further, the present embodiment is after obtaining median analysis result, in order to allow user to obtain with being more clearly understood
The threshold got effective information, reduce data analysis, can also execute step: median analysis result is packaged into data set
It returns, and collects carry out tables of data in front-end interface and visualize icon to show based on the data.
Second embodiment
For the median analysis method for cooperating first embodiment of the invention to provide, second embodiment of the invention is additionally provided
A kind of median analytical equipment 100.
Referring to FIG. 3, Fig. 3 is a kind of module diagram for median analytical equipment that second embodiment of the invention provides.
Median analytical equipment 100 includes Sub Data Set determining module 110, analysis result acquisition module 120.
Sub Data Set determining module 110 is cut into multiple subnumbers for user query to be operated corresponding task data collection
Corresponding operation node is respectively sent to according to collection, and by each Sub Data Set.
It analyzes result and obtains module 120, for each operation node using median selection algorithm to each Sub Data Set
Node calculated result is obtained after being calculated, and is merged the node calculated result and is obtained median analysis result.
Optionally, it includes Binary Conversion unit and taxon that analysis result, which obtains module 120,.
Binary Conversion unit will correspond to the number binary representation in Sub Data Set for each operation node.
Taxon corresponds to the binary digit that most significant position is 1 that do not compare in node for each operation node
Data the first file is written, the binary digit corresponding data write-in second that most significant position is 0 will not be compared in node
File.
Further, it further includes that combining unit, cache unit and median analysis result obtain that analysis result, which obtains module 120,
Take unit.
Combining unit, the first file mergences for generating all operation nodes obtain first and merge file, will own
The second file mergences that operation node generates obtains second and merges file.
Cache unit, the data volume for merging file described first are greater than the described second data volume for merging file
When, merge file for described first and is stored in cache table;It is less than described second in the data volume that described first merges file and merges text
When the data volume of part, merges file for described second and be stored in the cache table;It is equal in the data volume of the first merging file
When the data volume of the second merging file, merges file and described second for described first and merge the file deposit caching
Table.
Median analyzes result acquiring unit, for there was only first file or second text in the cache table
When part, first file or second file are cut into multiple Sub Data Sets, and the subdata that each cutting is obtained
Collection is respectively sent to corresponding operation node, repeat described in " each operation node by the number in corresponding Sub Data Set with two into
Tabulation is shown " to " when the data volume that described first merges file is greater than the data volume that described second merges file, by described first
Merge file and is stored in cache table;When the data volume that described first merges file is less than the data volume that described second merges file,
Merge file for described second and is stored in the cache table;It is equal to described second in the data volume that described first merges file and merges text
When the data volume of part, merge file and described second for described first and merge file and be stored in the cache table " the step of until really
Surely there is independent operation node can carry out quicksort to the data in first file or second file, to described
Data in first file or second file carry out quicksort and determine median, using the median as in described
Digit analyzes result;When having first file and second file simultaneously in the cache table, by first file
In maximum value and second file in minimum value average value as the median analyze result.
As an alternative embodiment, the median analytical equipment 100 in the present embodiment further includes pre- judgment module
105, pre- judgment module 105 includes converting unit, task data collection determination unit and quicksort unit.
Converting unit, for user query operation to be converted to structured query language SQL script.
Task data collection determination unit, for determining that the user query operate corresponding task based on the SQL script
Data set.
Quicksort unit can quickly arrange the task data collection there is no independent operation node for determining
Sequence.
It is apparent to those skilled in the art that for convenience and simplicity of description, the device of foregoing description
Specific work process, no longer can excessively be repeated herein with reference to the corresponding process in preceding method.
3rd embodiment
Referring to FIG. 4, Fig. 4 is a kind of electronics that can be applied in the embodiment of the present application that third embodiment of the invention provides
The structural block diagram of equipment.Electronic equipment 200 provided in this embodiment may include median analytical equipment 100, memory 201,
Storage control 202, processor 203, Peripheral Interface 204, input-output unit 205, audio unit 206, display unit 207.
The memory 201, storage control 202, processor 203, Peripheral Interface 204, input-output unit 205, sound
Frequency unit 206, each element of display unit 207 are directly or indirectly electrically connected between each other, to realize the transmission or friendship of data
Mutually.It is electrically connected for example, these elements can be realized between each other by one or more communication bus or signal wire.The middle position
Number analytical equipment 100 includes that at least one can be stored in the memory 201 in the form of software or firmware (firmware)
Or it is solidificated in the software function module in the operating system (operating system, OS) of median analytical equipment 100.It is described
Processor 203 is for executing the executable module stored in memory 201, such as the software that median analytical equipment 100 includes
Functional module or computer program.
Wherein, memory 201 may be, but not limited to, random access memory (Random Access Memory,
RAM), read-only memory (Read Only Memory, ROM), programmable read only memory (Programmable Read-Only
Memory, PROM), erasable read-only memory (Erasable Programmable Read-Only Memory, EPROM),
Electricallyerasable ROM (EEROM) (Electric Erasable Programmable Read-Only Memory, EEPROM) etc..
Wherein, memory 201 is for storing program, and the processor 203 executes described program after receiving and executing instruction, aforementioned
Method performed by the server that the stream process that any embodiment of the embodiment of the present invention discloses defines can be applied to processor 203
In, or realized by processor 203.
Processor 203 can be a kind of IC chip, the processing capacity with signal.Above-mentioned processor 203 can
To be general processor, including central processing unit (Central Processing Unit, abbreviation CPU), network processing unit
(Network Processor, abbreviation NP) etc.;Can also be digital signal processor (DSP), specific integrated circuit (ASIC),
Ready-made programmable gate array (FPGA) either other programmable logic device, discrete gate or transistor logic, discrete hard
Part component.It may be implemented or execute disclosed each method, step and the logic diagram in the embodiment of the present invention.General processor
It can be microprocessor or the processor 203 be also possible to any conventional processor etc..
Various input/output devices are couple processor 203 and memory 201 by the Peripheral Interface 204.Some
In embodiment, Peripheral Interface 204, processor 203 and storage control 202 can be realized in one single chip.Other one
In a little examples, they can be realized by independent chip respectively.
Input-output unit 205 realizes user and the server (or local terminal) for being supplied to user input data
Interaction.The input-output unit 205 may be, but not limited to, the equipment such as mouse and keyboard.
Audio unit 206 provides a user audio interface, may include one or more microphones, one or more raises
Sound device and voicefrequency circuit.
Display unit 207 provides an interactive interface (such as user's operation circle between the electronic equipment 200 and user
Face) or for display image data give user reference.In the present embodiment, the display unit 207 can be liquid crystal display
Or touch control display.It can be the capacitance type touch control screen or resistance of support single-point and multi-point touch operation if touch control display
Formula touch screen etc..Single-point and multi-point touch operation is supported to refer to that touch control display can sense on the touch control display one
Or at multiple positions simultaneously generate touch control operation, and the touch control operation that this is sensed transfer to processor 203 carry out calculate and
Processing.
It is appreciated that structure shown in Fig. 4 is only to illustrate, the electronic equipment 200 may also include more than shown in Fig. 4
Perhaps less component or with the configuration different from shown in Fig. 4.Each component shown in Fig. 4 can use hardware, software
Or combinations thereof realize.
It is apparent to those skilled in the art that for convenience and simplicity of description, the device of foregoing description
Specific work process, no longer can excessively be repeated herein with reference to the corresponding process in preceding method.
In conclusion the embodiment of the invention provides a kind of median analysis method and device, the median analysis side
Method by by user query operate corresponding task data collection be cut into multiple Sub Data Sets carry out respectively operation, make each operation
Computational load between node is more balanced, while greatly improving the speed of median calculating;Meanwhile it being selected by median
It selects and obtains node calculated result after algorithm calculates each Sub Data Set and obtained by merging the node calculated result
Median analysis reduces median as a result, to which traditional serial arithmetic is optimized for concurrent operation by distributed arithmetic
The resources occupation rate of operation, the speed for improving median operation.
In several embodiments provided herein, it should be understood that disclosed device and method can also pass through
Other modes are realized.The apparatus embodiments described above are merely exemplary, for example, flow chart and block diagram in attached drawing
Show the device of multiple embodiments according to the present invention, the architectural framework in the cards of method and computer program product,
Function and operation.In this regard, each box in flowchart or block diagram can represent the one of a module, section or code
Part, a part of the module, section or code, which includes that one or more is for implementing the specified logical function, to be held
Row instruction.It should also be noted that function marked in the box can also be to be different from some implementations as replacement
The sequence marked in attached drawing occurs.For example, two continuous boxes can actually be basically executed in parallel, they are sometimes
It can execute in the opposite order, this depends on the function involved.It is also noted that every in block diagram and or flow chart
The combination of box in a box and block diagram and or flow chart can use the dedicated base for executing defined function or movement
It realizes, or can realize using a combination of dedicated hardware and computer instructions in the system of hardware.
In addition, each functional module in each embodiment of the present invention can integrate one independent portion of formation together
Point, it is also possible to modules individualism, an independent part can also be integrated to form with two or more modules.
It, can be with if the function is realized and when sold or used as an independent product in the form of software function module
It is stored in a computer readable storage medium.Based on this understanding, technical solution of the present invention is substantially in other words
The part of the part that contributes to existing technology or the technical solution can be embodied in the form of software products, the meter
Calculation machine software product is stored in a storage medium, including some instructions are used so that a computer equipment (can be a
People's computer, server or network equipment etc.) it performs all or part of the steps of the method described in the various embodiments of the present invention.
And storage medium above-mentioned includes: that USB flash disk, mobile hard disk, read-only memory (ROM, Read-OnlyMemory), arbitrary access are deposited
The various media that can store program code such as reservoir (RAM, Random Access Memory), magnetic or disk.
The foregoing is only a preferred embodiment of the present invention, is not intended to restrict the invention, for the skill of this field
For art personnel, the invention may be variously modified and varied.All within the spirits and principles of the present invention, made any to repair
Change, equivalent replacement, improvement etc., should all be included in the protection scope of the present invention.It should also be noted that similar label and letter exist
Similar terms are indicated in following attached drawing, therefore, once being defined in a certain Xiang Yi attached drawing, are then not required in subsequent attached drawing
It is further defined and explained.
The above description is merely a specific embodiment, but scope of protection of the present invention is not limited thereto, any
Those familiar with the art in the technical scope disclosed by the present invention, can easily think of the change or the replacement, and should all contain
Lid is within protection scope of the present invention.Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.
It should be noted that, in this document, relational terms such as first and second and the like are used merely to a reality
Body or operation are distinguished with another entity or operation, are deposited without necessarily requiring or implying between these entities or operation
In any actual relationship or order or sequence.Moreover, the terms "include", "comprise" or its any other variant are intended to
Non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only wanted including those
Element, but also including other elements that are not explicitly listed, or further include for this process, method, article or equipment
Intrinsic element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that
There is also other identical elements in process, method, article or equipment including the element.
Claims (10)
1. a kind of median analysis method, which is characterized in that the median querying method includes:
User query are operated into corresponding task data collection and are cut into multiple Sub Data Sets, and each Sub Data Set is sent respectively
To corresponding operation node;
Each operation node obtains node calculated result after calculating using median selection algorithm each Sub Data Set, closes
And the node calculated result obtains median and analyzes result.
2. median analysis method according to claim 1, which is characterized in that it is described user query are operated it is corresponding
Task data collection is cut into before multiple Sub Data Sets, further includes:
User query operation is converted into structured query language SQL script;
Determine that the user query operate corresponding task data collection based on the SQL script;
Determine that there is no independent operation nodes to carry out quicksort to the task data collection.
3. median analysis method according to claim 2, which is characterized in that each operation node uses median
Selection algorithm obtains node calculated result after calculating each Sub Data Set, comprising:
Each operation node will correspond to the number binary representation in Sub Data Set;
Each operation node is literary by the corresponding data write-in first of the binary digit that most significant position is 1 of not comparing in node
The second file is written in the corresponding data of binary digit that most significant position is 0 of not comparing in node by part.
4. median analysis method according to claim 3, which is characterized in that the merging node calculated result obtains
It obtains median and analyzes result, comprising:
The first file mergences that all operation nodes are generated obtains first and merges file, second that all operation nodes are generated
File mergences obtains second and merges file;
When the data volume that described first merges file is greater than the data volume that described second merges file, merge text for described first
Part is stored in cache table;It, will be described when the data volume that described first merges file is less than the data volume that described second merges file
Second, which merges file, is stored in the cache table;It is equal to the described second number for merging file in the data volume that described first merges file
When according to amount, merges file and described second for described first and merge the file deposit cache table;
When there was only first file or second file in the cache table, by first file or second text
Part is cut into multiple Sub Data Sets, and the Sub Data Set that each cutting obtains is respectively sent to corresponding operation node, repeats
" each operation node will correspond to the number binary representation in Sub Data Set " is to " in the number that described first merges file
When being greater than the data volume that described second merges file according to amount, merges file for described first and be stored in cache table;It is closed described first
And the data volume of file is when being less than the data volume that described second merges file, described second will merge the file deposit caching
Table;When the data volume that described first merges file is equal to the data volume that described second merges file, merge text for described first
Part and described second merges file and is stored in the cache table " the step of until determine that there are independent operation nodes can be to described the
Data in one file or second file carry out quicksort, to the data in first file or second file
It carries out quicksort and determines median, analyze result for the median as the median;It is same in the cache table
When having first file and second file, by first file maximum value and second file in most
The average value of small value analyzes result as the median.
5. median analysis method according to claim 1, which is characterized in that merge the node calculated result described
After obtaining median analysis result, the median analysis method further include:
Median analysis result is packaged into data set to return, and collects carry out tables of data in front-end interface based on the data
And visualization icon is shown.
6. a kind of median analytical equipment, which is characterized in that the median analytical equipment includes:
Sub Data Set determining module is cut into multiple Sub Data Sets for user query to be operated corresponding task data collection, and
Each Sub Data Set is respectively sent to corresponding operation node;
It analyzes result and obtains module, each Sub Data Set is calculated using median selection algorithm for each operation node
Node calculated result is obtained afterwards, is merged the node calculated result and is obtained median analysis result.
7. median analytical equipment according to claim 6, which is characterized in that the median analytical equipment further includes pre-
Judgment module, the pre- judgment module include:
Converting unit, for user query operation to be converted to structured query language SQL script;
Task data collection determination unit, for determining that the user query operate corresponding task data based on the SQL script
Collection;
Quicksort unit, for determining that there is no independent operation nodes to carry out quicksort to the task data collection.
8. median analytical equipment according to claim 6, which is characterized in that the analysis result obtains module and includes:
Binary Conversion unit will correspond to the number binary representation in Sub Data Set for each operation node;
Taxon will not compare most significant position in node for each operation node as the 1 corresponding number of binary digit
According to the first file is written, the second file is written into the corresponding data of binary digit that most significant position is 0 of not comparing in node.
9. median analytical equipment according to claim 7, which is characterized in that the analysis result obtains module and also wraps
It includes:
Combining unit, the first file mergences for generating all operation nodes obtains first and merges file, by all operations
The second file mergences that node generates obtains second and merges file;
Cache unit will when the data volume for merging file described first is greater than the data volume that described second merges file
Described first, which merges file, is stored in cache table;It is less than the described second number for merging file in the data volume that described first merges file
When according to amount, merges file for described second and be stored in the cache table;It is equal to described the in the data volume that described first merges file
When the data volume of two merging files, merges file and described second for described first and merge the file deposit cache table;
Median analyzes result acquiring unit, for there was only first file or second file in the cache table
When, first file or second file are cut into multiple Sub Data Sets, and the Sub Data Set that each cutting is obtained
Be respectively sent to corresponding operation node, repeat described in " each operation node is by the number binary system in corresponding Sub Data Set
Indicate " to " it is described first merge file data volume be greater than it is described second merge file data volume when, described first is closed
And file is stored in cache table;It, will when the data volume that described first merges file is less than the data volume that described second merges file
Described second, which merges file, is stored in the cache table;It is equal to described second in the data volume that described first merges file and merges file
Data volume when, merge file and described second for described first and merge file and be stored in the cache table " the step of until determining
There are independent operation nodes to carry out quicksort to the data in first file or second file, to described the
Data in one file or second file carry out quicksort and determine median, using the median as the middle position
Number analysis result;It, will be in first file when having first file and second file simultaneously in the cache table
Maximum value and second file in minimum value average value as the median analyze result.
10. a kind of computer-readable storage medium, which is characterized in that be stored with meter in the computer-readable storage medium
Calculation machine program instruction, when the computer program instructions are read and run by a processor, perform claim requires any one of 1-5 institute
State the step in method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810883746.XA CN109189732A (en) | 2018-08-03 | 2018-08-03 | A kind of median analysis method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810883746.XA CN109189732A (en) | 2018-08-03 | 2018-08-03 | A kind of median analysis method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109189732A true CN109189732A (en) | 2019-01-11 |
Family
ID=64920206
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810883746.XA Pending CN109189732A (en) | 2018-08-03 | 2018-08-03 | A kind of median analysis method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109189732A (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101996102A (en) * | 2009-08-31 | 2011-03-30 | 中国移动通信集团公司 | Method and system for mining data association rule |
CN104038543A (en) * | 2013-05-27 | 2014-09-10 | 沈阳东软医疗系统有限公司 | Method, cloud platform and system for cloud reconstruction of medical imaging devices |
CN106611037A (en) * | 2016-09-12 | 2017-05-03 | 星环信息科技(上海)有限公司 | Method and device for distributed diagram calculation |
CN106845536A (en) * | 2017-01-09 | 2017-06-13 | 西北工业大学 | A kind of parallel clustering method based on image scaling |
CN107181682A (en) * | 2016-03-11 | 2017-09-19 | 中国电信股份有限公司 | The method and apparatus of calculating network access capability end to end |
CN107273339A (en) * | 2017-06-21 | 2017-10-20 | 郑州云海信息技术有限公司 | A kind of task processing method and device |
-
2018
- 2018-08-03 CN CN201810883746.XA patent/CN109189732A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101996102A (en) * | 2009-08-31 | 2011-03-30 | 中国移动通信集团公司 | Method and system for mining data association rule |
CN104038543A (en) * | 2013-05-27 | 2014-09-10 | 沈阳东软医疗系统有限公司 | Method, cloud platform and system for cloud reconstruction of medical imaging devices |
CN107181682A (en) * | 2016-03-11 | 2017-09-19 | 中国电信股份有限公司 | The method and apparatus of calculating network access capability end to end |
CN106611037A (en) * | 2016-09-12 | 2017-05-03 | 星环信息科技(上海)有限公司 | Method and device for distributed diagram calculation |
CN106845536A (en) * | 2017-01-09 | 2017-06-13 | 西北工业大学 | A kind of parallel clustering method based on image scaling |
CN107273339A (en) * | 2017-06-21 | 2017-10-20 | 郑州云海信息技术有限公司 | A kind of task processing method and device |
Non-Patent Citations (1)
Title |
---|
HAPJIN: "海量数据查找中位数", 《博客园》 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102915347B (en) | A kind of distributed traffic clustering method and system | |
CN108376143B (en) | Novel OLAP pre-calculation system and method for generating pre-calculation result | |
CN104281701B (en) | Multiscale Distributed Spatial data query method and system | |
US20180165347A1 (en) | Multi-dimensional analysis using named filters | |
CN111159184B (en) | Metadata tracing method and device and server | |
US8566308B2 (en) | Intelligent adaptive index density in a database management system | |
CN110309110A (en) | A kind of big data log monitoring method and device, storage medium and computer equipment | |
CN108255897A (en) | Visual Chart data conversion treatment method and apparatus | |
CN107273519A (en) | Data analysing method, device, terminal and storage medium | |
CN111460011A (en) | Page data display method and device, server and storage medium | |
Gupta et al. | Faster as well as early measurements from big data predictive analytics model | |
US11550762B2 (en) | Implementation of data access metrics for automated physical database design | |
CN109408502A (en) | A kind of data standard processing method, device and its storage medium | |
CN113535788A (en) | Retrieval method, system, equipment and medium for marine environment data | |
CN108829804A (en) | Based on the high dimensional data similarity join querying method and device apart from partition tree | |
CN103699534A (en) | Display method and device for data object in system directory | |
Fekete et al. | Managing data for visual analytics: Opportunities and challenges. | |
CN112818013A (en) | Time sequence database query optimization method, device, equipment and storage medium | |
CN102024067A (en) | Method for technology transplant of analog circuit | |
Wang et al. | TreeRank: a similarity measure for nearest neighbor searching in phylogenetic databases | |
CN108920516A (en) | Real-time analysis method, system, device and computer readable storage medium | |
CN110874366A (en) | Data processing and query method and device | |
CN113570464B (en) | Digital currency transaction community identification method, system, equipment and storage medium | |
CN110489732A (en) | Method for processing report data and equipment | |
CN109189732A (en) | A kind of median analysis method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190111 |
|
RJ01 | Rejection of invention patent application after publication |