WO2019214303A1

WO2019214303A1 - Method and device for batch selection of data

Info

Publication number: WO2019214303A1
Application number: PCT/CN2019/074777
Authority: WO
Inventors: 毛坤; 张臻; 李翀
Original assignee: 华为技术有限公司
Priority date: 2018-05-07
Filing date: 2019-02-11
Publication date: 2019-11-14
Also published as: CN110457649A; CN110457649B

Abstract

Provided by the present application are a method and device for batch selection of data, which does not require fully sorting candidate data, thus avoiding multiple repeated calculations of candidate data, saving memory and bandwidth, and improving system efficiency. The method comprises: a data analyzer calculates a data interval to which data among candidate data belongs so as to obtain a statistical result, the statistical result comprising the number of data comprised in each data interval among a plurality of data intervals, and the sum of interval ranges of the each data interval being equal to a data distribution interval range of the candidate data; an interval counter adds up the amount of data comprised in the each data interval respectively according to the statistical result so as to obtain an accumulative result, the accumulative result being the sum of the amount of data comprised in the each data interval and the amount of data comprised in all data intervals before the each data interval; and a batch selector determines a target data interval in which target data is located according to the accumulative result, and outputs candidate data belonging to the target data interval.

Description

Method and device for batch selection of data

Technical field

The present application relates to the field of data processing and, more particularly, to a method and apparatus for batch selection of data.

Background technique

Before the computer processes the data, it generally needs to determine the target data from the huge amount of candidate data, and then further process the target data, such as finding the target person or vehicle from the massive video in the tide of “Safe City”. For example, when using the fast region convolutional neural network Faster R-CNN for picture object detection, the input picture is connected to a plurality of candidate windows via a series of convolutional layers and full-layer connections, and the target is detected in the plurality of candidate windows. In the prior art, the candidate data is generally sorted to determine the target data. For ultra-large-scale data, it is increasingly difficult to increase the speed of traditional sorting or selection algorithms by providing the processor's main frequency. However, existing distributed parallel algorithms have problems such as repeated calculations, high memory requirements, and poor scalability. This leads to the selection/sorting process becoming a bottleneck that can't overcome and limit system performance.

How to accurately and quickly find target data in massive data is an urgent problem to be solved.

Summary of the invention

The present application provides a method and apparatus for batch selection of data, which does not need to perform full sorting of candidate data, avoids repeated calculation of candidate data multiple times, saves memory and bandwidth, and improves system efficiency.

In a first aspect, a method for batch selection of data is provided, the method comprising: a data analyzer stats a data interval to which data in the candidate data belongs to obtain a statistical result, the statistical result including each of the plurality of data intervals The number of data included in the interval, the sum of the range ranges of each data interval is equal to the data distribution interval range of the candidate data; the interval statisticer accumulates the number of data included in each data interval according to the statistical result, To obtain an accumulated result, the accumulated result is the sum of the number of data included in each data interval and the number of data included in all data intervals before each data interval; the batch picker determines the target data according to the accumulated result The target data interval and output candidate data belonging to the target data interval.

The interval statisticer accumulates the number of data included in each data interval separately, and may perform a prefix and operation on the number of data included in each data interval to obtain an accumulated result of each data interval.

Optionally, the interval statistic may calculate a cumulative sum of the number of data included in each data interval by using a prefix and a prefix sum.

Therefore, in the embodiment of the present application, the data interval is ordered, but the data in each data interval is out of order, and the candidate data does not need to be fully sorted. The output target data only needs 2 full parallel scans and 1 parallel. The batch calculation can be completed by accumulating calculations, avoiding repeated calculations of candidate data, saving memory and bandwidth, and improving system efficiency.

In conjunction with the first aspect, in some implementations of the first aspect, the data analyzer can be a multi-core processor, a plurality of parallel processors, or a multi-threaded processor, or the data analysis The processor is the multi-core processor, the combination of the plurality of parallel processors and the multi-threaded processor.

With reference to the first aspect, in some implementations of the first aspect, the interval configurator may be a multi-core processor, a plurality of parallel processors, or a multi-thread processor, or the interval configuration The processor is the multi-core processor, the combination of the plurality of parallel processors and the multi-threaded processor.

With reference to the first aspect, in some implementations of the first aspect, the batch picker may be a multi-core processor, a plurality of parallel processors, or a multi-thread processor, or the batch selection The processor is the multi-core processor, the combination of the plurality of parallel processors and the multi-threaded processor.

In conjunction with the first aspect, in some implementations of the first aspect, each data interval corresponds to a counter, the counter is configured to record the number of the data intervals, when the data analyzer determines that a data belongs to the data interval, Add 1 to the counter corresponding to the data interval.

In conjunction with the first aspect, in some implementations of the first aspect, before the data analyzer counts the data interval to which the data in the candidate data belongs, the method further includes: the interval configurator determining, according to the data information of the candidate data, a number of the plurality of data intervals and a range of each of the plurality of data intervals; the interval configurator transmits the plurality of data intervals and a range of each of the plurality of data intervals to the range Data analyzer.

At this time, the interval configurator determines the number of the plurality of data intervals and the range of each of the plurality of data intervals according to the data information of the candidate data, so that the result of the subsequent batch selection can be more accurate.

With reference to the first aspect, in some implementations of the first aspect, the interval configurator determines, according to the data information of the candidate data, the number of the plurality of data intervals and the range of each of the plurality of data intervals The method includes: when the candidate data is uniformly distributed, determining a number of the plurality of data intervals and a range of each of the plurality of data intervals according to the uniform quantization strategy, where the range of each data interval is equal; or When the candidate data is non-uniform, the number of the plurality of data intervals and the range of each of the plurality of data intervals are determined according to the non-uniform quantization strategy, and at least two of the ranges of the plurality of data intervals The range of intervals is not equal.

With reference to the first aspect, in some implementations of the first aspect, when the candidate data is uniformly distributed, when the range of each data interval is Δ, determining the number of the plurality of data intervals according to the uniform quantization strategy and the The range of each data interval in multiple data intervals, including:

Determining the number M of the plurality of data intervals according to the formula (1),

M=x/Δ (1)

Where x is the data interval range of the candidate data, and M is the number of the plurality of data intervals.

In conjunction with the first aspect, in some implementations of the first aspect, the method further comprises:

Determining the number M of the plurality of data intervals according to the number of the candidate data and the number of the target data of the output;

Determining the range Δ of each data interval according to equation (1),

M=x/Δ (1)

In conjunction with the first aspect, in some implementations of the first aspect, the interval statistic accumulates the number of the plurality of data intervals according to the statistical result, including:

When the target data is the smallest partial data of the candidate data, accumulating the number included in the plurality of data intervals according to the ascending order of the plurality of data intervals; or

When the target data is the largest partial data of the candidate data, the number of the plurality of data intervals is accumulated according to the descending order of the plurality of data intervals.

In conjunction with the first aspect, in some implementations of the first aspect, the data analyzer, the interval statistic, and the batch picker are the same physical entity or partially identical physical entities.

In a second aspect, an apparatus for batch selection of data is provided, the apparatus comprising:

a data analyzer, configured to count a data interval to which the data in the candidate data belongs, to obtain a statistical result, where the statistical result includes the number of data included in each of the plurality of data intervals, and the interval of each data interval The sum of the ranges is equal to the range of the data distribution interval of the candidate data;

The interval statistic unit accumulates the number of data included in each data interval according to the statistical result to obtain an accumulated result, where the accumulated result is the number of data included in each data interval and before each of the data intervals The sum of the number of data contained in all data intervals;

The batch picker determines a target data interval in which the target data is located according to the accumulated result, and outputs candidate data belonging to the target data interval.

In conjunction with the second aspect, in some implementations of the second aspect, the apparatus further comprises:

An interval configurator, configured to determine, according to the data information of the candidate data, a number of the plurality of data intervals and a range of each of the plurality of data intervals; the interval configurator and the plurality of data intervals A range of each of the plurality of data intervals is transmitted to the first processor.

With reference to the second aspect, in some implementations of the second aspect, the interval configurator is specifically configured to: when the candidate data is uniformly distributed, determine a number of the plurality of data intervals and the plurality of data according to the uniform quantization policy a range of each data interval in the interval, the range of each data interval being equal; or when the candidate data is non-uniformly distributed, determining the number of the plurality of data intervals and the plurality of data intervals according to the non-uniform quantization strategy The range of each data interval, the range of at least two of the plurality of data intervals is not equal.

With reference to the second aspect, in some implementations of the second aspect, when the candidate data is uniformly distributed, and the range of each data interval is Δ, the interval configurator is specifically configured to:

M=x/Δ (1)

In conjunction with the second aspect, in some implementations of the second aspect, the interval configurator is specifically configured to:

Determining the range Δ of each data interval according to equation (1),

M=x/Δ (1)

With reference to the second aspect, in some implementations of the second aspect, the interval statistic is specifically configured to: when the target data is the smallest partial data of the candidate data, according to the ascending order of the multiple data intervals, The number of the plurality of data intervals is prefixed and operated; or when the target data is the largest part of the candidate data, the number of the plurality of data intervals is prefixed according to the descending order of the plurality of data intervals And operation.

In conjunction with the second aspect, in some implementations of the second aspect, the data analyzer, the interval statistic, and the batch picker are the same physical device or portions of the same physical device.

In a third aspect, a computer storage medium is provided, wherein the computer storage medium stores program instructions, and when the instructions are executed, the computer storage medium can perform any of the first aspect or the first aspect The method in the implementation.

In a fourth aspect, a computer program product is provided, the computer program product comprising instructions that, when executed, cause the device for batch selection of data to perform any of the first aspect or any of the first aspects The method in the implementation.

In a seventh aspect, a chip system is provided, comprising: at least one processor, the at least one processor for executing stored instructions, such that the device for batch selection of data can perform the first aspect or the first aspect An alternative implementation.

DRAWINGS

1 is a schematic block diagram of a system architecture of a method and apparatus for data batch selection in accordance with the present application.

2 is a schematic flow chart of a method for data batch selection in the present application.

3 is a schematic block diagram of the number of data intervals according to a prefix and an accumulation of the present application.

4 is a schematic block diagram of the number of data intervals according to a prefix and an accumulation of the present application.

FIG. 5 is a schematic flowchart of a method for data batch selection according to the present application.

6 is a schematic block diagram of an apparatus for data batch selection in accordance with the present application.

7 is a schematic architectural diagram of a system for data batch selection in accordance with the present application.

FIG. 8 shows a schematic block diagram of an apparatus for batch selection of data provided by the present application.

detailed description

The technical solutions in the present application will be described below with reference to the accompanying drawings.

1 is a schematic block diagram of a system 100 architecture of a method and apparatus for data batch selection in accordance with the present application. As shown in FIG. 1, the system 100 architecture includes a front end collection device 110, a storage management device 120, and an intelligent analysis device 130. The front end collection device 110, the storage management device 120, and the intelligent analysis device 130 are connected through a network. The front-end collection device 110 is configured to capture an object, such as a human body, a human face, and a capture of a vehicle body. The front-end collection device 110 transmits the captured information to the storage management device 120, and the storage management device 120 captures the front-end collection device 110. The information is extracted, and the storage management device 120 transmits the feature-extracted data to the intelligent analysis device 130. The intelligent analysis device 130 performs batch selection based on the extracted data, and outputs a detection target.

It should be noted that FIG. 1 is only an exemplary architecture diagram. The system architecture may include other devices in addition to the device shown in FIG.

The technical solution of the embodiment of the present application can be applied to various fields. In the field of deep learning, all enumerations based on candidate regions must use a sorting algorithm, and the algorithm of the present invention can be used to replace the speed increase; Other areas that need to be sorted and then selected for results are equally applicable.

Furthermore, various aspects or features of the present application can be implemented as a method, apparatus, or article of manufacture using standard programming and/or engineering techniques. The term "article of manufacture" as used in this application encompasses a computer program accessible from any computer-readable device, carrier, or media. For example, a computer readable medium may include, but is not limited to, a magnetic storage device (eg, a hard disk, a floppy disk, or a magnetic tape, etc.), such as a compact disc (CD), a digital versatile disc (DVD). Etc.), smart cards and flash memory devices (eg, erasable programmable read-only memory (EPROM), cards, sticks or key drivers, etc.). Additionally, various storage media described herein can represent one or more devices and/or other machine-readable media for storing information. The term "machine-readable medium" may include, but is not limited to, a variety of media capable of storing, containing, and/or carrying instructions and/or data.

A method for data batch selection provided by the present application is described in detail below with reference to FIG. 2. FIG. 2 is a schematic flowchart of a method 200 for data batch selection according to an embodiment of the present application. The method 200 can be applied to FIG. The embodiment of the present application is not limited herein.

As shown in FIG. 2, the method 200 includes the following.

Step 210: The data analyzer collects a data interval to which the data in the candidate data belongs to obtain a statistical result, where the statistical result includes the number of data included in each of the plurality of data intervals, and the interval of each data interval The sum of the ranges is equal to the range of the data distribution interval of the candidate data.

Optionally, the data analyzer may be a multi-core processor, a plurality of parallel processors, or a multi-thread processor, or the data analyzer is the multi-core processor, the multiple parallel A combination of a processor and the multi-threaded processor.

Specifically, taking the data analyzer as a plurality of parallel processors as an example, in order to improve the computing speed of the system, generally, the number of statistical data that each processor is responsible for is equal or approximately equal, that is, the load balancing principle is satisfied, and candidates are The data is evenly distributed to a plurality of parallel processors, each of which counts the data interval to which the data in the candidate data is allocated to obtain statistical results. For example, there are 9 candidate data, and the data distribution interval range of the candidate data is [0, 9], and the data are 1, 2, 3, 4, 5, 6, 7, 8, and 9, respectively, and the data interval is [0, respectively. 3), [3, 6), [6, 9]. The data analyzer is three parallel processors. According to the load balancing principle, each data analyzer is responsible for counting three data. That is, the first processor of the plurality of parallel processors counts the data interval to which the first to third data of the nine data belong, and the second processor counts the fourth to sixth data of the nine data. Data interval, the third processor counts the data interval to which the seventh to ninth data of the nine data belong; or the first processor counts the first, fourth, and seventh data of the nine data The data interval to which the second processor counts the data interval to which the second, fifth, and eighth data of the nine data belong, and the third processor counts the third and sixth of the nine data. The data interval to which the ninth data belongs. According to the statistics of the data statistic, the number of data included in the data interval [0, 3) is 2, the number of data included in the data interval [3, 6) is 3, and the data included in the data interval [6, 9] The number is 4.

It should be understood that the specific form of how to assign candidate data to the data analyzer under the condition that the load balancing principle is satisfied is not limited in this application.

Optionally, each data interval corresponds to a counter for recording the number of the data intervals. When the data analyzer determines that one data belongs to the data interval, the counter corresponding to the data interval is incremented by one.

It should be understood that each data interval may also correspond to a memory space, which is used to record the number of data in the data interval. When any processor determines that a data belongs to the data interval, the memory corresponding to the data interval Add 1 to the space.

Step 220: The interval statistic unit accumulates the number of data included in the plurality of data intervals according to the statistical result, to obtain an accumulated result, where the accumulated result is the number of data included in each data interval and each of the data The sum of the number of data contained in all data intervals before the interval.

Specifically, for example, the above nine candidate data are allocated three data intervals, which are data intervals [0, 3), [3, 6), [6, 9], respectively, and the interval statistic respectively counts [0, 3) The number of data included is 2, the number of data included in [0, 6) is 5, and the number of data included in [0, 9] is 9.

Optionally, the interval statistic may be a processor with multiple cores, multiple parallel processors, or a multi-thread processor, or the interval statistic is the multi-core processor, the multiple parallel A combination of a processor and the multi-threaded processor.

Optionally, the interval statistic and the data analyzer may be the same physical entity or a partially identical physical entity, and the physical entity may be a physical device or device or device. For example, the data analyzer is three parallel processors, then the interval statisticer may also be the three parallel processors, or the interval statisticator may be one or two of the three parallel processors. Device.

Step 230: The batch picker determines, according to the accumulated result, a target data interval in which the target data is located, and outputs candidate data belonging to the target data interval.

Specifically, the target data is data that needs to be selected in the candidate data, and the batch picker determines a target data interval in which the target data is located according to the accumulated result of the interval statistic, and outputs candidate data belonging to the target data interval.

Optionally, the batch picker may be a multi-core processor, a plurality of parallel processors, or a multi-thread processor, or the batch picker is the multi-core processor, the multiple parallel A combination of a processor and the multi-threaded processor.

Optionally, each parallel processor in the batch picker may determine, according to the accumulated result, a target data interval in which the target data is located, and output candidate data belonging to the target data interval; or a certain one in the batch picker A parallel processor determines a target data interval in which the target data is located according to the accumulated result, and sends the target data interval to the other parallel processor, and each parallel processor output in the batch picker belongs to the target data interval Candidate data.

Specifically, the data analyzer is exemplified by a plurality of parallel processors. The target data is the output of the smallest two of the above nine candidate data, and the batch picker determines that the target data interval is [0, 3). Assuming that the batch picker is 3 parallel processors, according to the load balancing principle, each data analyzer is responsible for counting three data. It is assumed that the data processed by the first processor is 1, 2, 3; the data processed by the second processor is 4, 5, 6; the data processed by the second processor is 7, 8, 9. The three

processors output

1, 2 according to the interval of the target data, and the second processor and the third processor have no output.

Optionally, the batch picker and the data analyzer, the interval statistic may be the same physical entity or a partially identical physical entity, and the physical entity may be a physical device or device or device. For example, the data analyzer is three parallel processors, and the batch picker can also be the three parallel processors.

In the embodiment of the present application, in addition to the input/output data space being N, the additional required space is the storage space M or M counters of the number of data included in the M data sections. Let the number of input data be n and the number of parallel selector processors be p, then the time complexity required by the data analyzer to count the data interval in the candidate data is O(n/p): analysis by each parallel processor The counter of which class should be incremented by n/p inputs; the interval statisticer accumulates the number of the multiple data intervals according to the statistical result, and the time complexity is O(log M) when p≥M The batch picker determines, according to the accumulated result, the time complexity of the target data interval in which the target data is located is O(n/p): each parallel processor determines n/p input/output or not. The present invention has a good scalability to performance, and the number of parallel processors can be increased up to p=n while maintaining performance. When p=n, according to the performance formula O(n/p)+O(logM)+O(n/p), the time complexity of the process is O(2)+O(logM).

Optionally, the interval statistic accumulates the number of the plurality of data intervals according to the statistical result, including:

When the target data is the smallest partial data of the candidate data, accumulating the number of the plurality of data intervals according to the ascending order of the plurality of data intervals; or

Specifically, when selecting the largest nth to mth data in the candidate data (for example, the largest top 100 data, that is, n=1, m=100; the maximum of the first 50 to the top 90, that is, n= 50, m=90), according to the ascending order of the plurality of data intervals, accumulating the number of the plurality of data intervals; and when selecting the smallest qth to pth data, according to the plurality of data The descending order of the interval is accumulated for the number of the plurality of data intervals.

Specifically, the interval statistic may use a prefix sum to calculate an accumulated sum of the number of data included in each data interval, and prefix sum is an algorithm for summing sum. It is defined as:

Input: x ₀ , x ₁ , x ₂ , x ₃ ,..., x _n

Output: y ₀ , y ₁ , y ₂ , y ₃ ,..., y _n

Where y ₀ = x ₀ ,

y ₁ =x ₀ +x ₁ ,

y ₂ =x ₀ +x ₁ +x ₂ ,

y ₃ =x ₀ +x ₁ +x ₂ +x ₃ ,

......

y _n =x ₀ +x ₁ +x ₂ +x ₃ +...+x _n

That is, each bit is output as the sum of the inputs from the first bit to the current position.

The following is specifically described in detail by using the prefix and prefix sum algorithm to calculate the number of the plurality of data intervals.

When the number of data intervals is less than or equal to twice the number of parallel processors included in the accumulator, the cumulative calculation can be performed according to the following steps:

(1) Each parallel processor calculates the sum of the number of two consecutive data intervals (assuming that the number of data intervals is 8, from left to by x ₀ , x ₁ , x ₂ , x ₃ , x ₄ , x ₅ , x ₆ , x ₇ , the number of parallel processors is 20. As shown in Figure 3, d = 0, processor 1 calculates x ₀ + x ₁ , and processor 2 calculates x ₂ + x ₃ , Processor 3 calculates x ₄ + x ₅ and processor 4 calculates x ₆ + x ₇ )

(2) Recursively use the processor of the previous step to calculate the sum of the number of consecutive data segments updated in the previous step (such as d=1 and d=2 in Figure 3, processor 5 calculates Σ (x ₀ , x ₁ ) + Σ (x ₂ , x ₃ ), the processor 6 calculates Σ(x ₄ , x ₅ )+Σ(x ₆ , x ₇ ), and the processor 7 calculates Σ(x ₀ ... x _{3 )} ) + Σ (x ₄ ... x ₇ ). If the number of data intervals is not the power of 2, the final update result is postponed in recursion.

(3) At the end of recursion, the last digit is the value of y _n (as shown in the rightmost value of the top row of Figure 3, Σ(x ₀ ... x ₃ ) + Σ(x ₄ ... x ₇ )), recorded, and then Fill in 0 (as in the top line of Figure 5).

(4) Recursively in the reverse order of recursion above (as shown in Figure 4d = 0, d = 1, d = 2, from top to bottom), first use a processor to process the value of the above recursive d ₂ step, and then use two The processors process the values of the above recursive d ₁ steps, and so on, until the recursion ends.

In the reverse order recursive process, the processor 8 moves the saved "0" left to the number corresponding to the data interval x ₃ (shown by the dashed line in step d _{0 in} Fig. 4), and shifts the left value to be replaced Σ (x ₀ ... x ₃ ) is added to the saved value "0" as a new value (shown by the solid line of the d ₀ step in Fig. 4); the processor 9 shifts the saved "0" to the data interval x ₁ Corresponding number (shown by the dashed line in step d _{1 in} Figure 5), and adding the value Σ(x ₀ , x ₁ ) that will be replaced by the left shift and the saved value "0" are added as new values (Figure 4 In the middle of the d ₁ step, the processor 10 shifts the saved "Σ(x ₀ ... x ₃ )" to the left of the data interval x ₅ (shown by the dashed line in step d _{1 of} Fig. 4) And adding the value Σ(x ₄ , x ₅ ) to be replaced by the left shift and the saved value “Σ(x ₀ ... x ₃ )” are added as new values (shown by the solid line in step d _{1 of} Figure 4) And so on, get the value of y ₀ , y ₁ ,...y _(n-1) .

(5) At the end of recursion, the values of y ₀ , y ₁ , ... y _(n-1) are obtained. Complete the prefix sum in conjunction with the previously recorded value of y _n .

When the number of data intervals is greater than twice the number of parallel processors included in the accumulator, the accumulation calculation can be performed according to the following steps:

(1) The number of data intervals is divided into multiple groups of blocks, and the number of data intervals included in each group is less than or equal to twice the number of parallel processors.

(2) Each group block calculates the prefix sum of the group using the above method when the number of data intervals of the precision table is less than or equal to twice the number of parallel processors.

(3) The last value of each group (ie, y _n recorded in step 3 of the above method) constitutes a new auxiliary array auxiliary group, and the number of data interval numbers used in the precision table is less than or equal to the parallel processor. The method of calculating the prefix sum of this group when the number of times is twice.

(4) Block0 does not move, block1 group each element (block group y ₀ ... y _n ) plus auxiliary group y ₀ , block 2 group each element plus auxiliary group y ₁ , block 3 group each element plus auxiliary group y ₂ ,..., block m group plus y _{(m-1) for} each element of the auxiliary group. This completes the prefix sum.

It should be understood that, in 210, when the data analyzer counts the data interval to which the data in the candidate data belongs, the plurality of data intervals and the range of each of the plurality of data intervals have been allocated to the data analyzer. Optionally, the plurality of data intervals and a range of each of the plurality of data intervals are saved in a shared memory, and the data analyzer can obtain the plurality of data intervals and the plurality by reading the shared memory. The range of each data interval in the data interval; or the memory local to the data analyzer stores the plurality of data intervals and a range of each of the plurality of data intervals.

If the data analyzer does not obtain the range of the plurality of data intervals and each of the plurality of data intervals before the data analyzer belongs to the data interval to which the data in the candidate data belongs, the method 200 240 is also included before 210, as shown in FIG.

In step 240, the interval configurator determines, according to the data information of the candidate data, the number of the plurality of data intervals and the range of each of the plurality of data intervals, the interval configurator to the plurality of data intervals And a range of each of the plurality of data intervals is sent to the data analyzer.

Optionally, the interval configurator can allocate candidate data to the data analyzer according to a load balancing principle.

It should be understood that the candidate data may also be received by other components in the embodiment of the present application, and then the candidate data is allocated to the data analyzer, which is not limited in this application.

Optionally, the interval configurator determines, according to the data information of the candidate data, the number of the plurality of data intervals and the range of each of the plurality of data intervals, including:

When the candidate data is uniformly distributed, determining a number of the plurality of data intervals and a range of each of the plurality of data intervals according to the uniform quantization strategy, the range of each of the data intervals being equal; or

When the candidate data is non-uniform, determining a number of the plurality of data intervals and a range of each of the plurality of data intervals according to the non-uniform quantization strategy, at least two of the ranges of the plurality of data intervals The range of data intervals is not equal.

Specifically, when the data is uniformly distributed or approximately uniformly distributed, the number of the plurality of data intervals and the range of each of the plurality of data intervals may be determined according to the uniform quantization strategy; when the candidate data When the non-uniform distribution or the extremely uneven distribution (that is, the equal-width interval causes a serious imbalance of data between the intervals), the number of the plurality of data intervals and the plurality of data intervals are determined according to the non-uniform quantization strategy. The range of each data interval.

When the candidate data is uniformly distributed, when the range of each data interval is Δ, determining the number of the plurality of data intervals and the range of each of the plurality of data intervals according to the uniform quantization strategy includes:

M=x/Δ (1)

Specifically, when the candidate data is uniformly distributed, it is not necessary to know the probability distribution information of the candidate data at this time. The number M of the plurality of data intervals can be determined according to the quantization strategy in the uniform quantization formula, that is, the equation (1).

For example, a set of candidate data 7, 3, 9, 1, 5, the candidate data is evenly distributed, the data interval of the data distribution ranges from 0 to 10, and when the range of each data interval is 2, according to the formula ( 1) Determine the allocation 5 data interval, where the range of each data interval is: [0, 2), [2, 4), [4, 6), [6, 8), [8, 10).

Further, after determining the number M of the plurality of data sections based on the number of the candidate data and the number of the target data to be output, the range Δ of each of the data sections may be determined according to the formula (1).

Specifically, when the candidate data is uniformly distributed, if the range Δ of each data interval is not determined at this time, the plurality of data intervals may be determined by the number of the candidate data and the number of the target data of the output. The number M, and then the range Δ of each data interval is determined according to equation (1).

For example, the total number of candidate data is 9, and the target data to be determined is the largest three of the candidate data, and then the total number of candidate data 9 is equal to the number of data selected 3 to obtain the number M of the plurality of data intervals is 3, and then The range Δ of each data interval is determined according to equation (1).

When the candidate data is non-uniformly distributed, when determining the number of the plurality of data intervals and the range of each of the plurality of data intervals according to the non-uniform quantization strategy, it is necessary to obtain probability distribution information of the candidate data, Determining the number of the plurality of data intervals and the range of each of the plurality of data intervals according to the probability distribution information of the candidate data in combination with the non-uniform quantization strategy, so that the number of data intervals corresponding to the dense portion of the candidate data is larger The number of data intervals corresponding to the sparse portion of the candidate data is small.

For example, given that the probability density function of the candidate data is f(x), divided into M classes, and the selected non-uniform quantization strategy is to use the Lloyd-Max method to convert the problem into a distortion minimization problem, that is, the minimum distortion formula is

In equation (2), given M, the optimal b _i and y _i minimize the mean squared quantization error (MSQE), ie

get:

Where b _i is the boundary point of multiple data intervals.

A specific example is given below to describe the non-uniform quantization strategy in detail. For example, the candidate data is non-uniformly distributed in 9, 4, 5, 6, and 1. The data is concentrated in the middle and sparse on both sides. If you continue to use the uniform strategy, the range Δ of the data interval is 2, then it will appear in 110: there is 1 data in the [0, 2) interval, 0 in the [2, 4) interval, and 3 in the [4, 6) interval. There are 0 in the [6,8) interval and one in the [8,10) interval. If we are looking for the smallest 2 numbers, after 120 we will get: [0, 2) with 1, [0, 4) or only 1, [0, 6) burst to 4, [0, 8) Still only 4, and finally [0, 10) is 5. Eventually, step 130 is required to select the [0,6) range, ie the final output is the minimum of 4 numbers instead of 2. Therefore, it is not suitable to use a uniform strategy. When choosing a non-uniform quantization strategy, we can calculate 5 data intervals to different sizes by Lloyd-Max method: [0,3), [3,4.5), [4.5,5.5), [5.5,7 ), [7, 10). Thus at 110, one data can be calculated for each data interval. In 130, the selected range becomes [0, 4.5), and the final output target data is 4 and 1. In the case where the number of data intervals (the number of data intervals is still 5) is not increased, the "precision" of data batch selection is successfully improved.

A method for batch selection of data according to an embodiment of the present application is described in detail with reference to FIG. 2 to FIG. 5 . The method implements ordering of data intervals, but the data in each data interval is out of order, and the candidate data does not need to be performed. Full sorting, output target data only needs 2 full parallel scans and 1 parallel accumulation calculation to complete batch selection, avoiding repeated calculation of candidate data multiple times, saving memory and bandwidth, and improving system efficiency. In the present application, determining the number of the plurality of data intervals and the range of each of the plurality of data intervals according to the data information of the candidate data may make the result of the subsequent batch selection more accurate. For a clearer understanding of the present application, a method of batch selection of data of the present application will be described below in conjunction with a specific set of candidate data.

The candidate data are 0.66, 0.44, 0.99, 0.33, 0.11, 0.55, 0.22, 0.77, 0.88, and 9 candidate data. The target data is three numbers in which the largest value among the candidate data is selected. The data analyzer is three parallel processors, and the range of the data interval is unqualified in this case. The number M of the data interval should be adjusted as small as possible to minimize the performance formula according to the performance formula O(n/p The value of +O(logM)+O(n/p). In this example, the total number of candidate data is 9 except that the number of data is required to be 3, so the number of data intervals is MM=9/3=3. According to the uniform quantization formula (1), when the candidate value range is (0.0, 1.0), when the number of data intervals is 3, the range of each data interval is 0.33333..., and each of the three parallel processors is processed. The range of responsibility for the device is (0.0, 1/3], (1/3, 2/3), (2/3, 1.0). At this time, the number corresponding to each data interval is 0, as shown in Table 1. .

Table 1

数据区间Data interval	(0.0，1/3](0.0,1/3]	(1/3，2/3](1/3, 2/3)	(2/3，1.0)(2/3, 1.0)
个数 Number	00	00	00

According to the load balancing principle, each of the three parallel processors is responsible for three of the three candidate data. For example, the first processor is responsible for the data 0.66, 0.44, 0.99, and the second processor is responsible for the data. 0.33, 0.11, 0.55, the third processor is responsible for the data 0.22, 0.77, 0.88.

The three processors simultaneously count the data they process, and the statistics can be either a local subtotal or a total, or a global synchronization. The global synchronization direct total example is as follows.

For example, the first processor determines that 0.66 belongs to the interval (1/3, 2/3), the second processor determines that 0.33 belongs to the interval (0.0, 1/3], and the third processor determines that 0.22 belongs to the interval (0.0, 1/3) ], after the first statistics are over, the number of each data interval is shown in Table 2.

Table 2

数据区间Data interval	(0.0，1/3](0.0,1/3]	(1/3，2/3](1/3, 2/3)	(2/3，1.0)(2/3, 1.0)
个数 Number	22	11	00

The first processor determines that 0.44 belongs to the interval (1/3, 2/3), the second processor determines that 0.11 belongs to the interval (0.0, 1/3], and the third processor determines that 0.77 belongs to the interval (2/3, 1.0), After the second statistics are completed, the number of each data interval is as shown in Table 3.

table 3

数据区间Data interval	(0.0，1/3](0.0,1/3]	(1/3，2/3](1/3, 2/3)	(2/3，1.0)(2/3, 1.0)
个数Number	33	22	11

The first processor determines that 0.99 belongs to the interval (2/3, 1.0), the second processor determines that 0.55 belongs to the interval (1/3, 2/3), and the third processor determines that 0.88 belongs to the interval (2/3, 1.0), Then, after the second statistics are completed, the number of each data interval is as shown in Table 4.

Table 4

数据区间Data interval	(0.0，1/3](0.0,1/3]	(1/3，2/3](1/3, 2/3)	(2/3，1.0)(2/3, 1.0)
个数Number	33	33	33

Then the interval accumulator accumulates the three data intervals, and the accumulated result includes a sum of each of the plurality of data intervals and the number of data included in all the data intervals before each of the data intervals, In this example, the maximum number of 3 is selected, so the accumulation is performed in descending order of the data interval, and the cumulative result is shown in Table 5. That is, the class in the (2/3, 1.0) range contains the largest three values, and the two classes in the (1/3, 1.0) range contain a maximum of six values, three in the range of (0.1, 1.0). The class contains the largest 9 values (all values are already here).

table 5

数据区间Data interval	(0.1，1.0)(0.1,1.0)	(1/3，1.0)(1/3, 1.0)	(2/3，1.0)(2/3, 1.0)
个数Number	33	66	99

Finally, the batch picker determines that the data interval of the target data is (2/3, 1.0), and it is assumed here that the batch picker is the above three parallel processors, and therefore. The three parallel processors respectively output data belonging to the data interval (2/3, 1.0), then the first processor outputs 0.99, the second processor has no output, and the third processor outputs 0.77, 0.88.

FIG. 6 is a schematic block diagram of an apparatus 300 for data batch selection in accordance with the present application. As shown in Figure 6, the device 300 includes the following modules.

The data analyzer 310 is configured to collect a data interval to which the data in the candidate data belongs to obtain a statistical result, where the statistical result includes the number of data included in each of the plurality of data intervals, and the data interval of each of the data segments The sum of the interval ranges is equal to the data distribution interval range of the candidate data.

The interval statistic unit 320 is configured to accumulate the number of the plurality of data intervals according to the statistical result, to obtain an accumulated result, where the accumulated result is the number of data included in each data interval and each of the data intervals The sum of the number of data contained in all previous data intervals.

The batch picker 330 is configured to determine, according to the accumulated result, a target data interval in which the target data is located, and output candidate data belonging to the target data interval.

Optionally, the apparatus 300 further includes an interval configurator 340, configured to determine, according to the data information of the candidate data, a number of the plurality of data intervals and a range of each of the plurality of data intervals; the interval The configurator transmits the plurality of data intervals and a range of each of the plurality of data intervals to the first processor.

Optionally, the interval configurator 340 is specifically configured to: when the candidate data is uniformly distributed, determine a number of the plurality of data intervals and a range of each of the plurality of data intervals according to the uniform quantization policy, where The range of each data interval is equal; or when the candidate data is non-uniformly distributed, determining the number of the plurality of data intervals and the range of each of the plurality of data intervals according to the non-uniform quantization strategy, the plurality of The ranges of at least two of the data intervals are not equal.

Optionally, when the candidate data is uniformly distributed, and the range of each data interval is Δ, the interval configurator 340 is specifically configured to: determine the number M of the plurality of data intervals according to the formula (1).

Optionally, the interval configurator 340 is specifically configured to: determine, according to the number of the candidate data and the number of the output target data, the number M of the plurality of data intervals; determine each of the multiple according to formula (1) The range of the data interval Δ.

Optionally, the second processor is specifically configured to: when the target data is the smallest part of the candidate data, prefix the number of the multiple data intervals according to the ascending order of the multiple data intervals Or calculating; or when the target data is the largest partial data in the candidate data, prefixing and counting the number of the plurality of data intervals according to the descending order of the plurality of data intervals.

Optionally, the data analyzer, the interval statistic, and the batch picker are the same physical device or part of the same physical device.

Optionally, the data analyzer 310, the interval statistic 320, the batch picker 330, and the interval configurator 340 are used to perform various operations of the method 200 for data batch selection of the present application. I will not repeat them here.

The data analyzer, the interval statistic, the batch picker and the interval configurator are completely corresponding to the data analyzer, the interval statistic, the batch picker and the interval configurator in the method embodiment, and the corresponding modules execute corresponding steps, specifically Reference can be made to corresponding method embodiments.

It should be noted that the data analyzer 310, the interval statistic 320, the batch picker 330, and the interval configurator 340 may be separately configured or integrated together and implemented by one processing chip.

At the same time, the device of the present application is applicable to the PRAM model, and various parallel processors, accelerators, GPUs, FPGAs, ASICs, clouds, and edges can be configured.

The cloud system is taken as an example to describe a system for batch selection of data in the present application. 7 is a schematic architectural diagram of a system for data batch selection in accordance with the present application. The system 400 includes a data analyzer 410, an interval statistic 420, a batch picker 430, and an interval configurator 440.

The data analyzer 410 is configured to collect a data interval to which the data in the candidate data belongs to obtain a statistical result, where the statistical result includes the number of data included in each of the plurality of data intervals, each of the data The sum of the interval ranges of the data intervals is equal to the data distribution interval range of the candidate data.

The interval statistic unit 420 is configured to accumulate the number of the plurality of data intervals according to the statistical result, to obtain an accumulated result, where the accumulated result is the number of data and the data included in each data interval. The sum of the number of data included in all data intervals before each data interval.

The batch picker 430 is configured to determine, according to the accumulated result, a target data interval in which the target data is located, and output candidate data belonging to the target data interval.

Optionally, the interval configurator 440 is configured to determine, according to the data information of the candidate data, a number of the plurality of data intervals and a range of each of the plurality of data intervals;

The interval configurator 440 transmits a range of each of the plurality of data intervals and the plurality of data intervals to the data analyzer 410.

Optionally, the interval configurator is further configured to allocate candidate data to the data analyzer 410 and the batch picker 430.

Specifically, the interval configurator 440 transmits partial data in the candidate data to the data analyzer 410.

The data analyzer 410 counts a data interval to which the data in the candidate data belongs to obtain a statistical result, and writes the statistical result into the first shared memory, where the statistical result includes each of the plurality of data intervals. The number of pieces of data included in the interval, the sum of the range ranges of the each data interval being equal to the range of the data distribution interval of the candidate data.

The data analyzer 410 sends a first message to the interval statistic 420, the first message being used to instruct the interval statistic 420 to accumulate the number of the plurality of data intervals according to the statistical result.

In response to the first message, the interval statistic 420 accumulates the number of the plurality of data intervals according to the statistical result to obtain an accumulated result, where the accumulated result is that each of the data intervals includes The sum of the number of data and the number of data included in all data intervals preceding each of the data intervals, and the accumulated result is written into the second shared memory.

The interval statistic 420 sends a second message to the batch picker 430, where the second message is used to instruct the batch picker 430 to determine a target data interval in which the target data is located according to the accumulated result.

The batch picker 430 outputs the target data according to the target data section.

Optionally, the data analyzer 410 may include a processor with multiple cores, may also include multiple parallel processors, may also include a multi-thread processor, or the data analyzer 410 is the multi-core processor. A combination of the plurality of parallel processors and the multi-threaded processor.

Optionally, the interval statistic 420 may include a processor with multiple cores, may also include multiple parallel processors, may also include a multi-thread processor, or the data analyzer 410 is the multi-core processor. A combination of the plurality of parallel processors and the multi-threaded processor.

Optionally, the batch picker 430 may include a processor with multiple cores, may also include multiple parallel processors, may also include a multi-threaded processor, or the data analyzer 410 is the multi-core processor. A combination of the plurality of parallel processors and the multi-threaded processor.

Optionally, the first shared memory, the second shared memory, and the third shared memory may be the same shared memory.

It should be understood that in the cloud system, there may be no shared memory, but distributed storage, that is, each digital interval is delivered to a distributed memory group corresponding to one processor, and the data analyzer, batch picker, interval statistics The devices are distributed in software form.

Optionally, in the cloud system, the data analyzer 410, the interval statistic 420, the batch picker 430, and the interval configurator may perform communication interaction through respective sub-processors included.

Specifically, a communication interaction between the data analyzer 410 and the sub-processor between the interval statistic 420 will be described as an example. Assuming that the data interval is (0, 3], (3, 6], (6, 9], the data analyzer 410 can include three distributed processors, and the interval statistic includes three distributed processors, One processor is responsible for statistics (0, 3), the second processor is responsible for counting the number of (3, 6) intervals, the third processor is responsible for counting the number of intervals (6, 9), and three distributed processors can Deploying in the same physical location, any one of the data analyzers 410 sends an indication message to the corresponding processor in the interval statistic 420 to indicate the corresponding data when the data interval to which the candidate data belongs is counted. The processor counts the number of data intervals it is responsible for. If any one of the data analyzers 410 counts the data interval to which the candidate data belongs (0, 3), the data analyzer 410 Any one of the processors sends an indication message to the first processor indicating that the first processor is incremented by one.

It should be understood that the specific process in the system can be understood by referring to the corresponding method 200. To avoid repetition, details are not described herein again.

FIG. 8 is a schematic block diagram of a device 500 for data batch selection provided by the present application, the device 500 including:

a memory 510, configured to store a program, where the program includes a code;

The transceiver 520 is configured to communicate with other devices;

The processor 530 is configured to execute program code in the memory 510.

Optionally, when the code is executed, the processor 530 can implement various operations of the method 200. For brevity, no further details are provided herein. The transceiver 520 is configured to perform specific signal transceiving under the driving of the processor 530.

It should be understood that FIG. 8 only shows a schematic block diagram of a device for data batch selection. In FIG. 8, the memory 510, the transceiver 520, and the processor 530 share the same system bus, but the memory 510 The transceiver 520 and the three components of the processor 530 may also be directly connected. The connection relationship between the components of the device selected in batches of the data is not limited in this application.

It should be understood that, in the embodiment of the present application, the processor 530 may be a central processing unit ("CPU"), and the processor 530 may also be other general-purpose processors, digital signal processors (DSPs). , an application specific integrated circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware component, and the like.

Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the various examples described in connection with the embodiments disclosed herein can be implemented in electronic hardware or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the solution. A person skilled in the art can use different methods to implement the described functions for each particular application, but such implementation should not be considered to be beyond the scope of the present application.

A person skilled in the art can clearly understand that for the convenience and brevity of the description, the specific working process of the system, the device and the unit described above can refer to the corresponding process in the foregoing method embodiment, and details are not described herein again.

In the several embodiments provided by the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the device embodiments described above are merely illustrative. For example, the division of the unit is only a logical function division. In actual implementation, there may be another division manner, for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed. In addition, the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.

The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.

The functions may be stored in a computer readable storage medium if implemented in the form of a software functional unit and sold or used as a standalone product. Based on such understanding, the technical solution of the present application, which is essential or contributes to the prior art, or a part of the technical solution, may be embodied in the form of a software product, which is stored in a storage medium, including The instructions are used to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present application. The foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like, which can store program codes. .

The foregoing is only a specific embodiment of the present application, but the scope of protection of the present application is not limited thereto, and any person skilled in the art can easily think of changes or substitutions within the technical scope disclosed in the present application. It should be covered by the scope of protection of this application. Therefore, the scope of protection of the present application should be determined by the scope of the claims.

Claims

A method for batch selection of data, characterized in that the method comprises:

The data analyzer calculates a data interval to which the data in the candidate data belongs to obtain a statistical result, where the statistical result is the number of data included in each of the plurality of data intervals, and the interval range of each of the data intervals The sum is equal to the data distribution interval range of the candidate data;

The interval statistic accumulates the number of data included in each data interval according to the statistical result, to obtain an accumulated result, where the accumulated result is the number of data included in each data interval and each of the data The sum of the number of data included in all data intervals before the data interval;

The batch picker determines a target data section in which the target data is located according to the accumulated result, and outputs candidate data belonging to the target data section.
The method according to claim 1, wherein before the data analyzer counts the data interval to which the data in the candidate data belongs, the method further includes:

The interval configurator determines, according to the data information of the candidate data, the number of the plurality of data intervals and the range of each of the plurality of data intervals;

The interval configurator transmits a range of each of the plurality of data intervals and the plurality of data intervals to the data analyzer.
The method according to claim 2, wherein the interval configurator determines the number of the plurality of data intervals and each of the plurality of data intervals according to the data information of the candidate data. The scope includes:

When the candidate data is uniformly distributed, determining a number of the plurality of data intervals and a range of each of the plurality of data intervals according to a uniform quantization policy, the range of each of the data intervals being equal; or

When the candidate data is non-uniformly distributed, determining a number of the plurality of data intervals and a range of each of the plurality of data intervals according to a non-uniform quantization strategy, and ranges of the plurality of data intervals The range of at least two data intervals in the unequal is not equal.
The method according to claim 3, wherein when the candidate data is uniformly distributed, when the range of each data interval is Δ, determining the number of the plurality of data intervals according to the uniform quantization strategy The range of each of the plurality of data intervals includes:

Determining the number M of the plurality of data intervals according to the formula (1),

M=x/Δ (1)

Where x is the data interval range of the candidate data, and M is the number of the plurality of data intervals.
The method of claim 3, wherein the method further comprises:

Determining the number M of the plurality of data intervals according to the number of the candidate data and the number of the output target data;

Determining the range Δ of each of the data intervals according to formula (1),

M=x/Δ (1)

Where x is the data interval range of the candidate data, and M is the number of the plurality of data intervals.
The method according to any one of claims 1 to 5, wherein the interval statistic accumulates the number of data included in each of the data intervals according to the statistical result, including:

When the target data is the smallest partial data of the candidate data, accumulate the number of data included in each of the data intervals according to the ascending order of each of the data intervals; or

When the target data is the largest partial data of the candidate data, the number of data included in each data interval is accumulated according to the descending order of each data interval.
The method according to any one of claims 1 to 6, wherein the data analyzer, the interval statistic and the batch picker are the same physical entity or partially identical physical entities.
A device for batch selection of data, characterized in that the device comprises:

a data analyzer, configured to count a data interval to which the data in the candidate data belongs, to obtain a statistical result, where the statistical result includes the number of data included in each of the plurality of data intervals, and each of the data intervals The sum of the interval ranges is equal to the data distribution interval range of the candidate data;

The interval statistic accumulates the number of data included in each data interval according to the statistical result, to obtain an accumulated result, where the accumulated result is the number of data included in each data interval and each of the data The sum of the number of data included in all data intervals before the data interval;

The batch picker determines a target data section in which the target data is located according to the accumulated result, and outputs candidate data belonging to the target data section.
The device according to claim 8, wherein the device further comprises:

An interval configurator, configured to determine, according to the data information of the candidate data, a number of data intervals and a range of each data interval;

The interval configurator transmits the each data interval and a range of each of the data intervals to the data analyzer.
The device according to claim 9, wherein the interval configurator is specifically configured to:

When the candidate data is uniformly distributed, determining the number of data intervals and the range of each data interval in each data interval according to the uniform quantization strategy, the range of each data interval being equal; or

When the candidate data is non-uniformly distributed, determining a number of the plurality of data intervals and a range of each of the plurality of data intervals according to the non-uniform quantization strategy, in a range of the plurality of data intervals The ranges of at least two data intervals are not equal.
The apparatus according to claim 10, wherein when the candidate data is uniformly distributed, and the range of each data interval is Δ, the interval configurator is specifically configured to:

Determining the number M of the plurality of data intervals according to the formula (1),

M=x/Δ (1)

Where x is the data interval range of the candidate data, and M is the number of the plurality of data intervals.
The device according to claim 10, wherein the interval configurator is specifically configured to:

Determining the number M of the plurality of data intervals according to the number of the candidate data and the number of the output target data;

Determining the range Δ of each of the data intervals according to formula (1),

M=x/Δ (1)

Where x is the data interval range of the candidate data, and M is the number of the plurality of data intervals.
The apparatus according to any one of claims 8 to 12, wherein the interval statistic is specifically configured to:

When the target data is the smallest partial data of the candidate data, prefixing and counting the number of each data interval according to the ascending order of the plurality of data intervals; or

When the target data is the largest partial data of the candidate data, a prefix operation is performed on the number of each data interval according to the descending order of the plurality of data intervals.
The apparatus according to any one of claims 8 to 14, wherein the data analyzer, the interval statistic, and the batch picker are the same physical entity or partially identical physical entities.
A computer storage medium, characterized in that the computer storage medium stores program instructions, and when the instructions are executed, the computer storage medium can perform the method of any one of claims 1 to 7.