CN110968835A - Approximate quantile calculation method and device - Google Patents

Approximate quantile calculation method and device Download PDF

Info

Publication number
CN110968835A
CN110968835A CN201911275488.8A CN201911275488A CN110968835A CN 110968835 A CN110968835 A CN 110968835A CN 201911275488 A CN201911275488 A CN 201911275488A CN 110968835 A CN110968835 A CN 110968835A
Authority
CN
China
Prior art keywords
information
data
equal
approximate
depth histogram
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911275488.8A
Other languages
Chinese (zh)
Inventor
宋韶旭
陈之威
王建民
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN201911275488.8A priority Critical patent/CN110968835A/en
Publication of CN110968835A publication Critical patent/CN110968835A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/17Function evaluation by approximation methods, e.g. inter- or extrapolation, smoothing, least mean square method
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Algebra (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Operations Research (AREA)
  • Probability & Statistics with Applications (AREA)
  • Complex Calculations (AREA)

Abstract

The embodiment of the invention provides an approximate quantile calculation method and device, wherein the method comprises the following steps: sequentially reading each data information in the internet streaming data set, and updating the equal-depth histogram information until all data in the internet streaming data set are read to obtain target equal-depth histogram information; and determining interval information of the approximate quantiles in the target equal-depth histogram information according to the quantile degree information to obtain the approximate quantile of the Internet streaming data set. The method comprises the steps of dynamically maintaining an approximate equal-depth histogram in a streaming computing scene by using the dynamic equal-depth histogram, finally obtaining a target equal-depth histogram, obtaining an approximate quantile of a streaming data set according to the maintenance result of the target equal-depth histogram, and efficiently completing approximate quantile computation aiming at streaming data of any scale by using the property of the equal-depth histogram.

Description

Approximate quantile calculation method and device
Technical Field
The invention relates to the technical field of information processing, in particular to an approximate quantile calculation method and device.
Background
Quantile is one of the important statistical indicators in data analysis, and can generate descriptive information about the distribution of original data under the condition of no parameters. It can reflect the cumulative distribution function of the data at a lower cost and further generate a probability distribution function. In practical applications, quantiles are also widely and effectively applied. For example, in data analysis tools or languages such as Excel, MATLAB, Python, etc., quantiles are built-in as a default function or function. Meanwhile, the internet service provider also uses the quantile as one of the indexes for measuring the network running state. In wireless sensor networks, quantiles are also applied to the process of data acquisition. Quantiles also play an essential role in data quality work. When the data has abnormal values, the central value and the discrete degree of the data set can be more reflected by using the median and the absolute median compared with the mean and the standard deviation, because the quantile is not influenced by the abnormal values.
In the prior art, the quantile operation is equivalent to obtaining a certain data of a specified position after sequencing a data sequence. The process of calculating the phi-quantile for the data set with the size of N is mainly to sort all data in the data set from small to large according to a set rule; computing
Figure BDA0002315451700000011
And selecting data with the rank name r in the sorted sequence for returning. The temporal and spatial complexity of this method is O (NlogN). However, for large-scale streaming data, the conventional method is no longer practical due to the limitation of computer memory and the data size of TB or PB level. The streaming data is characterized in that the data size is generally unknown in advance, and the data are input one by one in a streaming form in chronological order.
Therefore, how to perform quantile calculation on large-scale streaming data has become an urgent problem to be solved in the industry.
Disclosure of Invention
Embodiments of the present invention provide a method and an apparatus for calculating an approximate quantile, so as to solve the technical problems mentioned in the foregoing background art, or at least partially solve the technical problems mentioned in the foregoing background art.
In a first aspect, an embodiment of the present invention provides an approximate quantile calculation method, including:
sequentially reading each data information in the internet streaming data set, and updating the equal-depth histogram information until all data in the internet streaming data set are read to obtain target equal-depth histogram information;
and determining interval information of the approximate quantiles in the target equal-depth histogram information according to the quantile degree information to obtain the approximate quantile of the Internet streaming data set.
More specifically, before the step of sequentially reading each data information in the data set and updating the equal-depth histogram information, the method further includes:
acquiring preset approximate error information;
and obtaining data interval information of the equal-depth histogram according to the preset approximate error information so as to construct the equal-depth histogram according to the data interval information.
More specifically, the step of sequentially reading each data information in the internet streaming data set and updating the equal-depth histogram information specifically includes:
sequentially reading each data information in the internet streaming data set, and analyzing the deep histogram interval information of the data information to obtain data interval information;
and performing incremental processing on the data statistics number corresponding to the data interval information to update the equal-depth histogram information.
More specifically, after the step of performing increment processing on the statistical number of data corresponding to the data interval information to update the equal-depth histogram information, the method further includes:
acquiring the data statistical number corresponding to each data interval information in the updated equal-depth histogram information;
and carrying out interval boundary adjustment on the data interval information of which the data statistical number does not meet the equal depth condition to obtain adjusted equal depth histogram information.
In a second aspect, an embodiment of the present invention provides an approximate quantile calculation apparatus, including:
the updating module is used for reading all data information in the internet streaming data set in sequence and updating the equal-depth histogram information until all data in the internet streaming data set are read to obtain target equal-depth histogram information;
and the calculation module is used for determining the interval information of the approximate quantile in the target equal-depth histogram information according to the quantile degree information to obtain the approximate quantile of the internet streaming data set.
More specifically, the update module is specifically configured to:
acquiring preset approximate error information;
and obtaining data interval information of the equal-depth histogram according to the preset approximate error information so as to construct the equal-depth histogram according to the data interval information.
More specifically, the update module is further configured to:
sequentially reading each data information in the internet streaming data set, and analyzing the deep histogram interval information of the data information to obtain data interval information;
and performing incremental processing on the data statistics number corresponding to the data interval information to update the equal-depth histogram information.
More specifically, the calculation module is specifically configured to:
acquiring the data statistical number corresponding to each data interval information in the updated equal-depth histogram information;
and carrying out interval boundary adjustment on the data interval information of which the data statistical number does not meet the equal depth condition to obtain adjusted equal depth histogram information.
In a third aspect, an embodiment of the present invention provides an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the steps of the approximate quantile calculation method according to the first aspect when executing the program.
In a fourth aspect, an embodiment of the present invention provides a non-transitory computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the steps of the approximate quantile calculation method according to the first aspect.
According to the approximate quantile calculation method and device provided by the embodiment of the invention, the dynamic equal-depth histogram is used, the approximate equal-depth histogram is dynamically maintained in a flow type calculation scene, the approximate equal-depth histogram is updated once every time one piece of internet flow data is read, the data flow with any length can be read, the target equal-depth histogram is finally obtained, the approximate quantile of the flow type data set is obtained according to the maintenance result of the target value equal-depth histogram, and the approximate quantile calculation aiming at the flow type data with any scale is efficiently completed by using the property of the equal-depth histogram, so that the problems of difficulty in quantile calculation and low efficiency caused by the fact that the data volume is too large and the conflict of the memory limit of a computer is solved in an industrial scene.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
FIG. 1 is a schematic flow chart illustrating a method for calculating an approximate quantile according to an embodiment of the present invention;
FIG. 2 is a diagram illustrating an exemplary histogram update according to an embodiment of the present invention;
FIG. 3 is a schematic diagram illustrating an adjustment of the equal-depth histogram interval boundary according to an embodiment of the present invention;
FIG. 4 is a block diagram of an approximate quantile calculation apparatus according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The approximate quantile described in the embodiment of the invention is to introduce the approximate idea into quantile calculation, and only calculate the approximate quantile in a large-scale data scene, so that the operation efficiency can be greatly improved, and the required memory space can be reduced. Moreover, the error approximation calculation result has practical usability because the error amplitude is diluted by huge data quantity, and the large-scale data set usually has noise data in itself.
Fig. 1 is a schematic flow chart of an approximate quantile calculation method described in an embodiment of the present invention, as shown in fig. 1, including:
step S1, sequentially reading each data information in the internet streaming data set, and updating the equal-depth histogram information until all data in the internet streaming data set are read to obtain target equal-depth histogram information;
and step S2, determining the interval information of the approximate quantile in the target equal-depth histogram information according to the quantile degree information, and obtaining the approximate quantile of the Internet streaming data set.
Specifically, the histogram described in the embodiment of the present invention is a precise graphical representation of numerical data distribution, which divides the range of data into a plurality of intervals, and then counts the number of data in each interval to form a histogram. The characteristic of the equal-depth histogram is that the number of data in each interval of the division is the same, but the interval sizes may be different. Corresponding to the columns in the histogram, which have uniform height but different widths. Since the accurate histogram calculation needs to complete at least 2 times of data scanning, when the whole calculation process is to be completed in 1 time of data scanning, an approximate equal-depth histogram needs to be dynamically maintained, and then the approximate calculation of quantiles is completed by inquiring the approximate equal-depth histogram.
The internet streaming data set described in the embodiments of the present invention refers to a sequence of sequential, massive, fast, and continuous arriving data, and the streaming data needs to be processed incrementally by recording or sequentially according to a sliding time window, and can be used for a variety of analyses, including association, aggregation, screening, and sampling.
Therefore, in the embodiment of the present invention, sequentially reading the data information in the internet streaming data set means sequentially reading the streaming data in the order of the streaming data itself.
The updating of the equal-depth histogram information described in the embodiments of the present invention means that, each time one piece of streaming data is read, an interval range of the streaming data in the equal-depth histogram information is analyzed, and a data statistical number of the interval range of the streaming data is subjected to incremental processing, where the incremental processing may be to add one to the data statistical number, and at this time, the equal-depth histogram no longer satisfies an equal-depth condition constraint, that is, an absolute value of a difference between depths of all intervals is less than or equal to 1, and interval adjustment is performed on an interval range that does not satisfy the equal-depth condition, so that depths of the histogram are unified again.
And after the examination images of the equal-depth histogram are unified again, sequentially and continuously reading each data information in the internet streaming data set until all data in the internet streaming data set are read, and finally, the updated equal-depth histogram is obtained and is used as target equal-depth histogram information.
And inquiring interval information of the approximate quantiles in the target equal-depth histogram information according to the quantile degree information, and using the central value of the interval information as an approximate quantile calculation result to obtain the approximate quantile of the Internet streaming data set.
For example, to query for φ -quantiles, the jth interval is chosen to be satisfied
Figure BDA0002315451700000051
Then, the interval center value c is calculatedj=(lj+rj) C 2, return tojAs an approximate quantile. Since the depth of each interval is equal in size, it is less than ljIs equal to (j-1) × d, where d is N/m, so that the jth interval covers the ranking ((j-1) × d +1, j × d)]Data of (i), i.e. (,)
Figure BDA0002315451700000061
Thus the central value cjRank of
Figure BDA0002315451700000063
Ranking with required quantiles
Figure BDA0002315451700000062
Must be less than N, so the returned result satisfies the quantile approximation error constraint.
The embodiment of the invention dynamically maintains an approximate equal-depth histogram in a flow type calculation scene by using the dynamic equal-depth histogram, updates the approximate equal-depth histogram once every time when internet flow data is read, can read data flow with any length to finally obtain a target equal-depth histogram, obtains the approximate quantile of the flow type data set according to the maintenance result of the target value equal-depth histogram, and efficiently completes the approximate quantile calculation aiming at the flow type data with any scale by using the property of the equal-depth histogram so as to solve the problems of difficult quantile calculation and low efficiency caused by the conflict of too large data quantity and the limitation of a computer memory in an industrial scene.
On the basis of the above embodiment, before the step of sequentially reading each data information in the data set and updating the equal-depth histogram information, the method further includes:
acquiring preset approximate error information;
and obtaining data interval information of the equal-depth histogram according to the preset approximate error information so as to construct the equal-depth histogram according to the data interval information.
Specifically, the preset approximate error information described in the embodiment of the present invention is preset, and the interval information required for calculating the iso-depth histogram according to the preset approximate error information is specifically that the calculation formula is the interval information
Figure BDA0002315451700000064
For example, if ∈ is 0.1, the interval information is set to 10, and thereafter, for any degree of quantityingΦ ∈ (0,1), it is necessary to set the interval information to 10An interval can be found such that the median of the interval is ranked
Figure BDA0002315451700000065
The absolute value of the phase difference is within the range of N, and after data interval information is determined, an equal-depth histogram is built by combining an internet streaming data set.
According to the embodiment of the invention, the preliminary equal-depth histogram is obtained by calculating the interval calculation of the equal-depth histogram, so that the subsequent steps can be favorably carried out.
On the basis of the above embodiment, the step of sequentially reading each piece of data information in the internet streaming data set and updating the equal-depth histogram information specifically includes:
sequentially reading each data information in the internet streaming data set, and analyzing the deep histogram interval information of the data information to obtain data interval information;
and performing incremental processing on the data statistics number corresponding to the data interval information to update the equal-depth histogram information.
Specifically, sequential reading of data is started. Inputting the i +1 th data vi+1Previously, the maximum value of the data was maxiMinimum value of miniThe left boundary and the right boundary of the jth interval are respectively ljAnd rjThe current depth (i.e. the number of values) is djWhere j is ∈ [1, m ]]. In order to satisfy the condition of the equal-depth histogram, the absolute value of the difference between the depths of all histograms must be less than or equal to 1.
Fig. 2 is a schematic diagram illustrating an equal-depth histogram update according to an embodiment of the present invention, as shown in fig. 2, where m is 10, and j is 2, l2=1,r2=3,d 210. If v isi+1In [ min ]i,maxi) Within the range, for j e [1, m]Find j such that lj≤vi+1<rjThen d isjIs updated to dj+1. As in the left half of fig. 2, vi+1After searching, it finds that j 7 satisfies the range condition, so d will be used7Is updated to d7+1 ═ 11. If v isi+1Is not in [ min ]i,maxi) Within the range, assume
vi+1≥maxi(vi+1<miniThe process of the situation is similar), max will beiIs updated to vi+1And d ismIs updated to dm+1,rmIs updated to vi+1This corresponds to extending the rightmost interval. As in the right half of fig. 2, vi+1=61>max i56, so d will be10Is updated to d10+1 ═ 11, while changing r10The update is 61. At the beginning, max0Is set to- ∞, min0Set to + ∞, and all interval depths are set to 0.
On the basis of the above embodiment, after the step of performing increment processing on the statistical number of data corresponding to the data interval information to update the equal-depth histogram information, the method further includes:
acquiring the data statistical number corresponding to each data interval information in the updated equal-depth histogram information;
and carrying out interval boundary adjustment on the data interval information of which the data statistical number does not meet the equal depth condition to obtain adjusted equal depth histogram information.
Specifically, after the incremental processing is performed on the statistical number of data, the histogram may not satisfy the equal depth condition, and therefore, it is necessary to perform boundary adjustment on the interval that does not satisfy the equal depth condition, assuming that
dj=d+2,dj+1If d, the left boundary of the j +1 th interval is updated to the maximum value in the j interval, and the right boundary of the j interval is also updated correspondingly, namely, the maximum data is taken from the j interval and transferred to the j +1 th interval, and d is adjustedj=dj+1And d +1, re-satisfying the equal depth condition.
FIG. 3 is a schematic diagram illustrating an adjustment of the interval boundary of the equal-depth histogram according to an embodiment of the present invention, as shown in FIG. 3, in the upper half, d7=12>d8Thus, the maximum value 35 in the 7 th interval is shifted to the 8 th interval, updated d7=d9=11。
Fig. 4 is a schematic structural diagram of an approximate quantile calculating device according to an embodiment of the present invention, as shown in fig. 4, including: an update module 410 and a calculation module 420; the updating module 410 is configured to sequentially read each data information in the internet streaming data set, and update the equal-depth histogram information until all data in the internet streaming data set is read, so as to obtain target equal-depth histogram information; the calculating module 420 is configured to determine, according to the quantile degree information, interval information of an approximate quantile in the target equal-depth histogram information, and obtain an approximate quantile of the internet streaming data set.
More specifically, the update module is specifically configured to:
acquiring preset approximate error information;
and obtaining data interval information of the equal-depth histogram according to the preset approximate error information so as to construct the equal-depth histogram according to the data interval information.
The update module is further to:
sequentially reading each data information in the internet streaming data set, and analyzing the deep histogram interval information of the data information to obtain data interval information;
and performing incremental processing on the data statistics number corresponding to the data interval information to update the equal-depth histogram information.
The calculation module is specifically configured to:
acquiring the data statistical number corresponding to each data interval information in the updated equal-depth histogram information;
and carrying out interval boundary adjustment on the data interval information of which the data statistical number does not meet the equal depth condition to obtain adjusted equal depth histogram information.
The apparatus provided in the embodiment of the present invention is used for executing the above method embodiments, and for details of the process and the details, reference is made to the above embodiments, which are not described herein again.
The embodiment of the invention dynamically maintains an approximate equal-depth histogram in a flow type calculation scene by using the dynamic equal-depth histogram, updates the approximate equal-depth histogram once every time when internet flow data is read, can read data flow with any length to finally obtain a target equal-depth histogram, obtains the approximate quantile of the flow type data set according to the maintenance result of the target value equal-depth histogram, and efficiently completes the approximate quantile calculation aiming at the flow type data with any scale by using the property of the equal-depth histogram so as to solve the problems of difficult quantile calculation and low efficiency caused by the conflict of too large data quantity and the limitation of a computer memory in an industrial scene.
Fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present invention, and as shown in fig. 5, the electronic device may include: a processor (processor)510, a communication Interface (Communications Interface)520, a memory (memory)530 and a communication bus 540, wherein the processor 510, the communication Interface 520 and the memory 530 communicate with each other via the communication bus 540. Processor 510 may call logic instructions in memory 530 to perform the following method: sequentially reading each data information in the internet streaming data set, and updating the equal-depth histogram information until all data in the internet streaming data set are read to obtain target equal-depth histogram information; and determining interval information of the approximate quantiles in the target equal-depth histogram information according to the quantile degree information to obtain the approximate quantile of the Internet streaming data set.
Furthermore, the logic instructions in the memory 530 may be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
An embodiment of the present invention discloses a computer program product, which includes a computer program stored on a non-transitory computer readable storage medium, the computer program including program instructions, when the program instructions are executed by a computer, the computer can execute the methods provided by the above method embodiments, for example, the method includes: sequentially reading each data information in the internet streaming data set, and updating the equal-depth histogram information until all data in the internet streaming data set are read to obtain target equal-depth histogram information; and determining interval information of the approximate quantiles in the target equal-depth histogram information according to the quantile degree information to obtain the approximate quantile of the Internet streaming data set.
Embodiments of the present invention provide a non-transitory computer-readable storage medium storing server instructions, where the server instructions cause a computer to execute the method provided in the foregoing embodiments, for example, the method includes: sequentially reading each data information in the internet streaming data set, and updating the equal-depth histogram information until all data in the internet streaming data set are read to obtain target equal-depth histogram information; and determining interval information of the approximate quantiles in the target equal-depth histogram information according to the quantile degree information to obtain the approximate quantile of the Internet streaming data set.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. An approximate quantile calculation method, comprising:
sequentially reading each data information in the internet streaming data set, and updating the equal-depth histogram information until all data in the internet streaming data set are read to obtain target equal-depth histogram information;
and determining interval information of the approximate quantiles in the target equal-depth histogram information according to the quantile degree information to obtain the approximate quantile of the Internet streaming data set.
2. The approximate quantile calculation method according to claim 1, wherein before the step of sequentially reading the respective data information in the data sets and updating the equal-depth histogram information, the method further comprises:
acquiring preset approximate error information;
and obtaining data interval information of the equal-depth histogram according to the preset approximate error information so as to construct the equal-depth histogram according to the data interval information.
3. The approximate quantile calculation method according to claim 1, wherein the step of sequentially reading each data information in the internet streaming data set and updating the equal-depth histogram information specifically comprises:
sequentially reading each data information in the internet streaming data set, and analyzing the deep histogram interval information of the data information to obtain data interval information;
and performing incremental processing on the data statistics number corresponding to the data interval information to update the equal-depth histogram information.
4. The approximate quantile calculation method of claim 3, wherein after the step of performing incremental processing on the statistical number of data corresponding to the data interval information to update the equal-depth histogram information, the method further comprises:
acquiring the data statistical number corresponding to each data interval information in the updated equal-depth histogram information;
and carrying out interval boundary adjustment on the data interval information of which the data statistical number does not meet the equal depth condition to obtain adjusted equal depth histogram information.
5. An approximate quantile calculation apparatus, comprising:
the updating module is used for reading all data information in the internet streaming data set in sequence and updating the equal-depth histogram information until all data in the internet streaming data set are read to obtain target equal-depth histogram information;
and the calculation module is used for determining the interval information of the approximate quantile in the target equal-depth histogram information according to the quantile degree information to obtain the approximate quantile of the internet streaming data set.
6. The approximate quantile calculation device of claim 5, wherein the update module is specifically configured to:
acquiring preset approximate error information;
and obtaining data interval information of the equal-depth histogram according to the preset approximate error information so as to construct the equal-depth histogram according to the data interval information.
7. The approximate quantile calculation device of claim 5, wherein the update module is further configured to:
sequentially reading each data information in the internet streaming data set, and analyzing the deep histogram interval information of the data information to obtain data interval information;
and performing incremental processing on the data statistics number corresponding to the data interval information to update the equal-depth histogram information.
8. The approximate quantile calculation device of claim 7, wherein the calculation module is specifically configured to:
acquiring the data statistical number corresponding to each data interval information in the updated equal-depth histogram information;
and carrying out interval boundary adjustment on the data interval information of which the data statistical number does not meet the equal depth condition to obtain adjusted equal depth histogram information.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the approximate quantile calculation method according to any of claims 1 to 4 when executing the program.
10. A non-transitory computer readable storage medium having stored thereon a computer program, which when executed by a processor implements the steps of the approximate quantile calculation method according to any one of claims 1 to 4.
CN201911275488.8A 2019-12-12 2019-12-12 Approximate quantile calculation method and device Pending CN110968835A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911275488.8A CN110968835A (en) 2019-12-12 2019-12-12 Approximate quantile calculation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911275488.8A CN110968835A (en) 2019-12-12 2019-12-12 Approximate quantile calculation method and device

Publications (1)

Publication Number Publication Date
CN110968835A true CN110968835A (en) 2020-04-07

Family

ID=70033935

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911275488.8A Pending CN110968835A (en) 2019-12-12 2019-12-12 Approximate quantile calculation method and device

Country Status (1)

Country Link
CN (1) CN110968835A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113434907A (en) * 2021-07-09 2021-09-24 四川大学 Safe and efficient quantile aggregation method and device for private data set
WO2024016731A1 (en) * 2022-07-19 2024-01-25 华为云计算技术有限公司 Data point query method and apparatus, device cluster, program product, and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113434907A (en) * 2021-07-09 2021-09-24 四川大学 Safe and efficient quantile aggregation method and device for private data set
WO2024016731A1 (en) * 2022-07-19 2024-01-25 华为云计算技术有限公司 Data point query method and apparatus, device cluster, program product, and storage medium

Similar Documents

Publication Publication Date Title
CN111294819B (en) Network optimization method and device
CN112258093A (en) Risk level data processing method and device, storage medium and electronic equipment
CN109726195B (en) Data enhancement method and device
CN112434188B (en) Data integration method, device and storage medium of heterogeneous database
CN112734494A (en) Sales prediction method and device, terminal equipment and readable storage medium
CN110968835A (en) Approximate quantile calculation method and device
CN112364014B (en) Data query method, device, server and storage medium
CN111122222B (en) Sample point position determining method and system
CN114116828A (en) Association rule analysis method, device and storage medium for multidimensional network index
CN110324352A (en) Identify the method and device of batch registration account group
CN115952426B (en) Distributed noise data clustering method based on random sampling and user classification method
US11663184B2 (en) Information processing method of grouping data, information processing system for grouping data, and non-transitory computer readable storage medium
CN116451081A (en) Data drift detection method, device, terminal and storage medium
CN111177644A (en) Model parameter optimization method, device, equipment and storage medium
CN116070958A (en) Attribution analysis method, attribution analysis device, electronic equipment and storage medium
CN113850523A (en) ESG index determining method based on data completion and related product
CN114881136A (en) Classification method based on pruning convolutional neural network and related equipment
CN110020728B (en) Service model reinforcement learning method and device
CN114398228A (en) Method and device for predicting equipment resource use condition and electronic equipment
CN110704433B (en) Brin index construction method of columnar storage data, data retrieval method and device
CN113052325A (en) Method, device, equipment, storage medium and program product for optimizing online model
CN113190429A (en) Server performance prediction method and device and terminal equipment
CN113360218A (en) Service scheme selection method, device, equipment and storage medium
CN117056663B (en) Data processing method and device, electronic equipment and storage medium
CN110765303A (en) Method and system for updating database

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200407

RJ01 Rejection of invention patent application after publication