CN105009142A - System and method for histogram computation using a graphics processing unit - Google Patents

System and method for histogram computation using a graphics processing unit Download PDF

Info

Publication number
CN105009142A
CN105009142A CN201380048176.8A CN201380048176A CN105009142A CN 105009142 A CN105009142 A CN 105009142A CN 201380048176 A CN201380048176 A CN 201380048176A CN 105009142 A CN105009142 A CN 105009142A
Authority
CN
China
Prior art keywords
value
texel
data set
buffer zone
buffer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201380048176.8A
Other languages
Chinese (zh)
Inventor
威廉姆·L·杰迪
威迪亚·塞朗
斯蒂芬·诺沃克
刘永
奇尔达姆巴拉姆·拉马纳坦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Spinella IP Holdings Inc
Original Assignee
Spinella IP Holdings Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Spinella IP Holdings Inc filed Critical Spinella IP Holdings Inc
Publication of CN105009142A publication Critical patent/CN105009142A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/60Memory management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/005General purpose rendering architectures

Abstract

A method and system for obtaining a histogram and related statistical values from a data set of texels is disclosed. A processing device receives from a first buffer, a data set of texels. The data set has a dimensionality D of at least two and each texel contains a value. The processing device sorts the data set into a point list of coordinates, wherein a point in the point list corresponds to a texel location in the data set. The processing device reduces the dimensionality of the point list by arranging points in the point list according to an N-1 dimensional dominancy. The processing device performs a raster operation on each associated value of the arranged points to obtain at least one value. The processing device is to output the at least one value to a second buffer. The processing device may be a graphics processing unit.

Description

Use the system and method for the histogram calculation of Graphics Processing Unit
Technical field
Embodiments of the invention relate to image procossing, and more specifically, relate to the field of histogram calculation and other statistical computation.
background of invention
The histogram calculation that D dimension value collection S is performed and ASSOCIATE STATISTICS computing, such as min (S), max (S), intermediate value standard deviation (S) and mode (S) are the common computings utilized in image processing system.Histogram calculation also for relating in the problem of executed in parallel, the executed in parallel of such as big collection, fast processing amount or both.For example, in the 8th, 451, No. 384 United States Patent (USP)s, the system and method for teaching utilizes multiple histogram and its joining to be provided for one of several means of the scene change detection in high-resolution video.Regrettably, the calculating efficiently performing these types while utilizing extensive multiple parallel hardware (it can comprise Graphics Processing Unit (GPU) and extensive multinuclear SIMD or MIMD vector processing system) is lacked.
Earlier executed meets with the bad performance of recurrence reduction operation aspect based on the trial of the histogram calculation of GPU, such as, as teaching in the the 7th, 889, No. 922 United States Patent (USP) (being hereafter called ' 922 patent).These recurrence reduction operations need to have the large-scale repetition recurrence of little data block size or experience has large data block size and the cache miss of less recurrence.This restriction is as the availability of the recurrence reduction operation for large data collection by ' 922 patent teaches and actual performance.
Other art methods performs abbreviation in a single step by using the feature of current GPU hardware and avoids recurrence, the reading of the texture buffer value namely in vertex shader, as Scheuermann, and Hensley T., J., 2007, " be used in the raw histogram (Efficient histogram generation using scattering onGPUs) of the effective real estate of dispersion on GPU ", Proceedings of the 2007symposium on Interactive 3D graphics and games (I3D'07), disclose in 33rd page to the 37th page (hereinafter referred to as " Scheuermann and Hensley ").Reading as texture buffer value in the vertex shader by Scheuermann and Hensley teaching allows " dispersion " operation, such as, and object writing position on-fixed, but can change based on the decision-making relying on input texture.
By contrast, in the patent of ' 922, the recurrence reduction operation of teaching only allows " concentrating " to operate, and wherein write operation position is fixed, and read operation is variable.Should note, although the method for Scheuermann and Hensley represents good concurrency and only for input data set size but not the further benefit of the zoom capability of histogram bar size, but it meets with the reversion of performance, and wherein huge pillar bar size represents the performance be better than compared with pillar bar size.This unpredictability is due to the serializing of storer write request to GPU buffer memory, especially in the data centralization with high mode, and these method and systems are not suitable for completely real-time streams process application that wherein predictability is necessary factor.
At Nugteren, the people such as Cedric, " the measurable histogram calculation of the high-performance on GPU: explore and assessment algorithm balance (High performance predictable histogramming on gpus:exploring and evaluatingalgorithm trade-offs) ", Proceedings of the Fourth Workshop on General Purpose Processingon Graphics Processing Units, ACM, in 2011 (hereinafter referred to as " Nugteren "), disclose two kinds of histogram calculation method, it solves buffer memory collision problem, but both adopt special API (CUDA), it only can obtain from the single dealer of GPU hardware.In addition, these art methods are applicable to single object, namely use the histogrammic calculating of post bar of GPU but not the statistical function of associating arbitrarily.In addition, for image and Video processing, histogram function usually outside GPU, such as, performs on CPU, thus introduces pipeline stall and waiting state.These stagnations make these system and methods not be suitable for realtime graphic and Video processing.
Therefore, will need but not yet provide format high throughput, memory-efficient, independent of GPU dealer and flexibly histogram and be used for statistical method and the system of compute histograms, it represents consistent performance.
Summary of the invention
System and a method according to the invention performs the function of histogram calculation and realizes searching one or many person in following items from a set: minimum value, maximal value, collective standard deviation and search N number of mode of a set.Although the preferred embodiments of the present invention realize on GPU, it will be apparent to those skilled in the art that the present invention has the multiple use beyond image and video processing function.Any large-scale statistics of D dimension data collection or any problem of histogram analysis is needed to be benefited.Be in this extra reason, efficient GPU histogram calculation system and method for run on GPU arbitrary real time or sensitive image or processing system for video or method provide adjoint benefit At All Other Times.
More specifically, by being provided for concentrating from texel data the method and system that obtains histogram and ASSOCIATE STATISTICS value and solving the problem in the art and actualizing technology solution.Treating apparatus receives texel data collection from the first buffer zone.Data set comprise be at least 2 dimension D and each texel contains a value.Data set is categorized in coordinate points list by treating apparatus, and the point wherein in point list corresponds to the texel position of data centralization.Treating apparatus is by carrying out the dimension of yojan point list according to the point in the list of N-1 dimension advantage layout points.The each associated values for the treatment of apparatus to institute's layout points performs raster manipulation to obtain at least one value.At least one value described is outputted to the second buffer zone by treating apparatus.Treating apparatus can be Graphics Processing Unit.Classification, yojan, execution and output step can be repeated, until D is 1.
In an example, data set classification can be comprised and produce vertex buffer, it has the indivedual summits for each texel position.The dimension of yojan point list can comprise that executes vertex shader all over secondary to notify that the later pixel tinter of the object post bar position for performing raster manipulation is all over secondary.Perform raster manipulation and can comprise at least one used in pixel coloring device execution replacement raster manipulation, additivity raster manipulation, minimum value raster manipulation or maximal value raster manipulation.
In an example, at least one value of output can be at least one in the position of the histogram of data set, the maximal value of data set, the minimum value of data set, the summing value of data set, the mean value of data set, intermediate value or mode value, the standard deviation value of data set, the position of the minimum value of data set or the maximal value of data set.
In an example, can from two dimension or three-dimensional static image or video reception texel data collection in the first buffer zone.
Solve the problem in the art and actualizing technology solution by being provided for concentrating from texel data the method and system that obtains histogram and ASSOCIATE STATISTICS value.Treating apparatus receives two-dimentional texel data collection from the first buffer zone, and wherein each texel of data centralization is associated with a value.Data set from the first buffer zone is categorized in the coordinate points list in the second buffer zone by treating apparatus, and the point wherein in point list corresponds to the texel position of data centralization.Treating apparatus outputs to the 3rd buffer zone of the width having and equal the first size and the height equaling the second size from the second buffer zone read value and by column position.Treating apparatus uses additivity raster manipulation to make the value in the row texel position in the 3rd buffer zone increase progressively 1 to obtain at least one value.At least one value described is outputted to the 4th buffer zone by treating apparatus.
In an example, the first size and the second size correspond to histogram bar size.
In an example, column position is outputted to the 3rd buffer zone with the width equaling the first size and the height equaling the second size to comprise the texel position by being written to column position in the 3rd buffer zone further and the position coordinates of the texel in the second buffer zone is converted to new coordinate system, make the vertical coordinate of the position coordinates of the texel being arranged in the second buffer zone according to be arranged in the first buffer zone the texel texture that is associated value and be converted into new coordinate system.Increment value can comprise each texel position indicating it to operate for position coordinates makes the texel value of the 3rd buffer zone increase progressively 1.
In an example, treating apparatus can comprise a post article texel position with height 1 and the width that equals final histogram bar size is outputted to the 4th buffer zone.Treating apparatus can use additivity raster manipulation to make the value in the 4th buffer zone increase progressively 1 to obtain histogram.
In an example, first I corresponds to the width of the first buffer zone, and the second size is corresponding to the height equaling 1.Treating apparatus can output to the 4th buffer zone by having the post article texel position of height 1 with the width equaling 1.
In an example, perform raster manipulation and can comprise at least one performing and replace in raster manipulation, additivity raster manipulation, minimum value raster manipulation or maximal value raster manipulation.
In an example, treating apparatus can use minimum value raster manipulation to replace value in the 4th buffer zone to obtain the minimum value of data set.Treating apparatus can use the value of suing for peace in raster manipulation replacement the 4th buffer zone to obtain the summing value of data set.The value of replacing in the 4th buffer zone can comprise further and the value in the 4th buffer zone is multiplied by 1 divided by the size of data set to obtain the mean value of data set.
Solve the problem in the art and actualizing technology solution by being provided for the method and system of the position obtaining minimum value or maximal value in texel data collection.Treating apparatus calculates minimum value or the maximal value of two-dimentional texel data collection.Treating apparatus receives two-dimentional texel data collection from the first buffer zone, and wherein each texel of data centralization is associated with a value.Data set from the first buffer zone is categorized in the coordinate points list in the second buffer zone by treating apparatus, and the point wherein in point list corresponds to the texel position of data centralization.Treating apparatus reads texel value from the second buffer zone and when texel value equals minimum value, single texel position and x value and y value is outputted to the 3rd buffer zone, and exports single scope texel position outward when texel value is greater than minimum value.Treating apparatus reads x value and y value from the second buffer zone and these values is copied to the x value of the 3rd buffer zone and y value to calculate the position of minimum value or maximal value in data set via replacement raster manipulation.
Accompanying drawing explanation
Fig. 1 is the block diagram that the exemplary computer system that example of the present invention can run is described.
Fig. 2 is the process flow diagram of the example of the method illustrated for obtaining histogram and ASSOCIATE STATISTICS value from texel data collection.
Fig. 3 is suitable for using dispersion-yojan-increment operation to calculate the block diagram of the exemplary computer system of histogrammic Fig. 1 of data set.
Fig. 4 A to 4B is the process flow diagram of the example of the method illustrated for using dispersion-yojan-increment operation compute histograms.
Fig. 5 describes data and to pass through the space layout of embodiments of the invention via the dispersion-yojan process combining the main deviation of row.
Fig. 6 describes process and the data flow diagram of the histogrammic first exemplary prior art step illustrated for calculating data set on GPU.
Fig. 7 A to 7C is the process of example and the block diagram of data stream of the histogram calculation corresponded to as performed in the the 7th, 889, No. 922 United States Patent (USP) (hereinafter referred to as ' 922 patent).
Fig. 8 is suitable for using dispersion-yojan-replacement operation to calculate the block diagram of the exemplary computer system of Fig. 1 of the minimum value of data set.
Fig. 9 A to 9B illustrates the process flow diagram for using dispersion-yojan-replacement operation to calculate the example of the method for the minimum value of data set.
Figure 10 is suitable for using dispersion-yojan-replacement operation to calculate the block diagram of the exemplary computer system of Fig. 1 of the maximal value of data set.
Figure 11 A to 11B illustrates the process flow diagram for using dispersion-yojan-accumulation operations to calculate the example of the method for the maximal value of data set.
Figure 12 is suitable for using dispersion-yojan-accumulation operations to calculate the block diagram of the exemplary computer system of Fig. 1 of the summing value of data set.
Figure 13 A to 13B illustrates the process flow diagram for using dispersion-yojan-accumulation operations to calculate the example of the method for the summing value of data set.
Figure 14 is suitable for using dispersion-yojan-accumulation operations to calculate the block diagram of the exemplary computer system of Fig. 1 of the mean value of data set.
Figure 15 A to 15B illustrates the process flow diagram for using dispersion-yojan-accumulation operations to calculate the example of the method for the mean value of data set.
Figure 16 A to 16C is suitable for using dispersion-yojan-accumulation operations to calculate the block diagram of the exemplary computer system of Fig. 1 of the standard deviation of data set.
Figure 17 A to 17C illustrates the process flow diagram for using dispersion-yojan-accumulation operations to calculate the example of the method for the standard deviation of data set.
Figure 18 is that the minimum value of the data set calculating being suitable for expander graphs 8 is to determine the block diagram of the exemplary computer system of Fig. 1 of the position of given minimum value in data set.
Figure 19 is the process flow diagram of the example of the method for the position illustrated for calculating minimum value in data set.
Figure 20 is that the minimum value of the data set calculating being suitable for expanding Figure 10 is to determine the block diagram of the exemplary computer system of Fig. 1 of the position of given maximal value in data set.
Figure 21 is the process flow diagram of the example of the method for the position illustrated for calculating maximal value in data set.
The explanation of the machine in the computer system of Figure 22 illustrated example form, can perform the one group of instruction of appointing one or more method causing machine execution to be discussed herein in described computer system.
Embodiment
Method described herein provides a kind of common, efficient system and method for performing above-mentioned calculating, it solves multiple existing SIMD and MIMD framework, represents the much smaller memory bandwidth requirements than institute's teaching in prior art and less calculating strength simultaneously.
In the following description, many details are stated.But, it will be apparent to those skilled in the art that the present invention can put into practice when not having these details.In some instances, well-known construction and device show for block diagram format but not details with the present invention that avoids confusion.
As used herein, vertex shader refers to the logical function of GPU, and it operates vertex buffer, and it is then containing one or more coordinate in 2D or 3d space.Vertex buffer refers to the buffer zone uploading to GPU from host computer system, and it contains one or more data on relevant summit, such as position, normal vector, color and other user's definable data.Pixel coloring device refers to the Logic Core function of GPU, and it is as indicated by vertex shader output vertex in texture buffer without concrete execution sequence ground parallel work-flow texel.Texel refers to the texel in texture buffer.Texture buffer refers to texel array, the spitting image of the picture that can be represented by pel array.For teaching is of the present invention simple and for the purpose of understanding, the discussion about GPU memory architecture will do not comprised.About the good discussion of the memory subsystem architecture of modern GPU and the other side of modern GPU framework is found in Randima Fernando, 2004, in " GPU Gems:Programming Techniques; Tips and Tricks for Real-Time Graphics ", it is incorporated herein by reference.In addition, Nugteren illustrates the storage access scheme of GPU histogram calculation, and is incorporated herein by reference.
Although describe with reference to GPU, but embodiments of the invention may be implemented in old edition GPU hardware, described old edition GPU hardware does not support geometric coloration and other new edition segmentation feature, does not adopt the operation relating to opposite vertexes and pixel or fragment shader and fixed function streamline yet.Term pixel coloring device and fragment shader are interchangeable, but in order to describe understand for the purpose of, will pixel coloring device be used in this article.
Fig. 1 is the block diagram for obtaining the exemplary computer system 100 of histogram and ASSOCIATE STATISTICS value from the texel data collection that wherein example of the present invention can run.By limiting examples, computing system 100 receives data from one or more data source 105 (such as video cameras or on-line storage device or transmitting media).Computing system 100 also can comprise digital video capture system 110 and computing platform 115.Digital video capture system 110 fast acquisition of digital video stream, or be digital video by Video Decode, the form of data source 105 can be treated to by computing platform 115.Computing platform 115 comprises host computer system 120, and it can comprise such as treating apparatus 125, such as one or more CPU (central processing unit) 130a to 130n.Treating apparatus 125 is coupled to mainframe memory 135.Treating apparatus can implement Graphics Processing Unit 140 (GPU) further.In an example, GPU 140 may be implemented on the corpus separatum chip from one or more CPU (central processing unit) 130a to 130n.In another example, GPU 140 can be placed on identical entity chip or logical unit as CPU (central processing unit) 130a to 130n altogether, is called and accelerates processing unit or APU, as finding on mobile phone or panel computer.Independent GPU and cpu function are found on computer server system, and wherein GPU is entity expansion card and personal computer system and laptop computer.GPU/APU can provide format high throughput histogram and statistical computation on these and following device.
GPU 140 can comprise GPU storer 141, vertex processor 142 and fragment processor 143.In an example, mainframe memory 135 and GPU storer 141 may be implemented on independent entity chip and maybe can be placed in altogether on identical entity chip or logical unit, such as, on APU.
Treating apparatus 125 is configured to implement histogram manager 145 to receive data from data source 105 and to create texel data collection 150, and described texel data collection 150 is sent to GPU storer 137 as texel data collection 155.In addition, histogram manager 145 creates and transmits vertex buffer 160a to 160n to GPU storer 137, vertex shader 163a to 163n is configured in vertex processor 142, in fragment processor 143, configuring pixel coloring device 165a to 165n and maintaining the state that is associated with one or more buffer zone 167a to 167n for being stored into texel data collection 155, retrieve from texel data collection 155 and handle texel data collection 155.Texel data collection 155 have be at least 2 dimension D and each texel contains a value.Histogram manager 145 is configured to be categorized into by data set in coordinate points list, and the point wherein in point list corresponds to the texel position of data centralization.Histogram manager 145 is configured to perform one or more vertex shader 163a to 163n further with by the dimension according to N-1 dimension advantage layout points and yojan point list in point list.Histogram manager 145 is configured to perform one or more pixel coloring device 165a to 165n to perform raster manipulation to obtain at least one value to each associated values of institute's layout points further.Histogram manager 145 is configured to export second texture buffer (such as, 167b) of at least one value to one or more buffer zone 167a to 167n to bear results further.In an example, result can be presented on display 170.Classification, yojan, execution and output step repeat by treating apparatus 125, until D is 1.
In another example, result can be transmitted into one or more downstream unit 175 and be used in video processing applications by histogram manager 145.In an example, downstream unit 175 can implement the scene change detector for detecting the scene change in still image or video.As used herein, machine detectable " scene change " can be defined as the just instruction that another different " the uninterrupted image sequence of being captured by single camera " had been changed into or just changed into given " the uninterrupted image sequence of being captured by single camera ".In image sequence scene change reliable detection and to pass on be a great problem in this area.The reliable detection of scene change and reception and registration have been widely used in the field of video frequency signal processing, comprise termination detection, de interlacing, format conversion, compressed encoding and video index and retrieval.Scene change is easily by human viewer identification---and this kind of event comprises switching from collection of drama TV programme to advertisement insertion or video camera conversion, such as, report when being switched to another angle with group from a camera angle when between live news.
For carry out in image sequence with minimum false positive and false negative scene change in real time or the reliable system of nearly automatic, unmanned detection in real time and method teaching in the 8th, 451, in No. 384 United States Patent (USP)s (hereinafter referred to as ' 384 patent), it is incorporated herein by reference in full.In the patent of ' 384, perform chroma histogram and calculate.The calculating of chroma histogram can perform on host CPU 130a to 130n, and it can cause bottleneck being sent to subsequently before GPU 140 is further processed.In another example, chroma histogram calculates and can perform to make the data between host computer system 120 and GPU 140 and state transmission to minimize to provide enough stable treatment capacity to realize the real-time execution of large format video (such as, 1080i/p and 4K) on GPU 140.
In other example, downstream unit 175 can implement other still image or video features, and the parallax depth such as but not limited to the image/object fragments of video and image and tracking, video and image is estimated, text detection in video and image, no-reference video quality estimations, passive sonar target location, sonar image identification, to classify via the histogrammic robot obstacle-avoiding of vector field, Images Classification and note, content-based picture search and retrieval, network packet and at least one in inspection or database inquiry optimization.
Fig. 2 is the process flow diagram of the example of the method 200 illustrated for obtaining histogram and ASSOCIATE STATISTICS value from texel data collection.Method 200 performs while can comprise hardware (such as, circuit, special logic, FPGA (Field Programmable Gate Array), microcode etc.), software (instruction such as, treating apparatus run) or its combination by the computer system 100 of Fig. 1.In an example, method 200 is performed by the histogram manager 145 of the computing system 100 of Fig. 1.
As shown in Figure 2, in order to allow computing system 100 compute histograms and ASSOCIATE STATISTICS computing in block 210, histogram manager 145 from the first buffer zone (such as, 167a) receive texel data collection 155, wherein texel data collection 155 have be at least 2 dimension D and wherein each texel contain a value.In frame 220, data set is categorized in coordinate points list by histogram manager 145, and a point wherein in point list corresponds to a texel position in texel data collection 155.In frame 230, histogram manager 145 is by carrying out the dimension of yojan point list according to the point in the list of N-1 dimension advantage layout points.In frame 240, each associated values of histogram manager to institute's layout points performs raster manipulation to obtain at least one value.In frame 250, at least one value described is outputted to the second buffer zone (such as, 147b) by histogram manager.
Fig. 3 is suitable for using dispersion-yojan-increment operation to calculate the block diagram of the exemplary computer system 100 of histogrammic Fig. 1 of data set.The element of Fig. 3 is similar to the element of Fig. 1.Histogram manager 145 is configured to receive 2D or 3D data set from the data set texture buffer 350 of Texture memory.In an example, data set can upload to data set texture buffer 350 by histogram manager 145 from the mainframe memory 135 of host computer system 120, or data set texture buffer 350 may reside in the Texture memory of GPU 140 (not shown).Histogram manager 145 is configured to produce the first vertex buffer 360 from the data set resided in data set texture buffer 350.First vertex buffer 360 can comprise point list, and it is one group (x, y) or (x, y, z) coordinate.More specifically, this point list can be covered with the list of coordinates of the respective location corresponding to each texel in the data set of data set texture buffer 350.It should be noted that in an example, there is not the size of 2D or 3D layout or the requirement of aspect ratio of the distribution of the data set in companion data collection texture buffer 150.
Histogram manager 145 is configured to point list to be sent to the first vertex shader 365 from the first vertex buffer 360 further, and vertex shader 365 is also configured to the value reading each texel from data set texture buffer 350.Histogram manager 145 is configured to execution first vertex shader 365 further and the position coordinates of the texel in the first vertex buffer 360 is converted to new coordinate system with the texel position by being written to column position in post bar buffer texture buffer zone 375, make the vertical coordinate of the position coordinates of the texel being arranged in post bar buffer texture buffer zone 375 according to be arranged in data set texture buffer 350 the texel data value that is associated value and be converted into new coordinate system.
Histogram manager 145 is configured to coordinate to be sent to the first pixel coloring device 370 from the first vertex shader 365 further.Histogram manager 145 is configured to execution first pixel coloring device 370 further, and its each texel position for the position coordinates that the first vertex shader 365 indicates the first pixel coloring device 370 to operate makes the texel value of post bar buffer texture buffer zone 375 increase progressively 1.In order to make the maintenance of these increment operation across the state of the parallel work-flow of the first pixel coloring device 370, histogram manager 145 is configured to raster manipulation pattern be set as " adding up " in the first pixel coloring device 370.
The post bar buffer texture buffer zone 375 be written into by the first pixel coloring device 370 can be configured to without post bar width and a line, but has width and the height of histogrammic required post bar size.By utilizing large NxN intermediary texture, for the extreme case with large-scale mode, the multiple of nearly data set size/post bar size will be reduced by write operation while post bar buffer texture buffer zone 375.Such as, a kind of like this worst condition of data set is that data set has all identical values, such as zero.In this case, if object texture size is Nxl, so the large decimal of the data set of write request will be stacked on for identical texel position each other behind, namely, post bar position (0 in object texture, 0), thus greatly buffer memory failure rate is increased and most probable causes pipeline stall.
After the operation performed by the first pixel coloring device 370, post bar buffer texture buffer zone 375 comprises in fact the histogram often arranged.In order to obtain final histogram in object post bar buffer texture buffer zone 380, histogram manager 145 creates the second vertex buffer 345 (being point list again), and wherein each coordinate corresponds to the texel position of post bar buffer texture buffer zone 375.The second substantially identical with the first pixel coloring device 330 with the first vertex shader 365 respectively vertex shader 355 and the second pixel coloring device 360 perform identical dispersion-yojan-increment operation respectively, now to having height 1 and equaling the object post texture buffer zone 380 of width of post bar size.It will be understood by a person skilled in the art that, the first dispersion-yojan-increment operation can be gone leading form but not row main mode and be performed, and the second dispersion-yojan-increment operation can arrange leading form but not the execution of row main mode.
Fig. 4 A to 4B is the process flow diagram of the example of the method 400 illustrated for using dispersion-yojan-increment operation compute histograms.Method 400 performs by the computer system 100 of Fig. 1 and can comprise hardware (such as, circuit, special logic, FPGA (Field Programmable Gate Array), microcode etc.), software (instruction such as, treating apparatus run) or its combination.In an example, method 400 is performed by the histogram manager 145 of the computing system 100 of Fig. 1.
As shown in Fig. 4 A to 4B, in order to allow computing system 100 compute histograms in frame 410, histogram manager 145 receives data set existing on GPU as 2D or 3D texture buffer or 2D or the 3D texture buffer uploading to GPU140 from host computer system 120, and it is for creating the first vertex buffer 360.First vertex buffer 360 comprises point list and its each point corresponds to the texel position of each data of data centralization.In frame 420, data are sent to the first vertex shader 365 from the first vertex buffer 360 and data set texture buffer 350 by histogram manager 145.In a block 430, histogram manager 145 performs the first vertex shader 365 and equals to output to from data set texture buffer 350 read value and by column position to have width and the post bar buffer texture buffer zone 375 highly that final institute wants histogram bar size.The position coordinates of the texel in the first vertex buffer 360 is converted to new coordinate system by the texel position be written to column position in post bar buffer texture buffer zone 375 by the first vertex shader 365 further, make the vertical coordinate of the position coordinates of the texel being arranged in the first vertex buffer 360 according to be arranged in data set texture buffer 350 the texel texture that is associated value and be converted into new coordinate system.In frame 440, histogram manager 145 performs the first pixel coloring device 370 and increases progressively 1 to make the value in the row texel position in post bar buffer texture buffer zone 375 via additivity raster manipulation.The each texel position indicating it to operate for vertex shader position coordinates makes the texel value of post bar buffer texture buffer zone 375 increase progressively 1 by the first pixel coloring device 370.
In frame 450, histogram manager 145 reads data to create the second vertex buffer 345 from post bar buffer texture buffer zone 375, and wherein the second vertex buffer 345 comprises point list and its each point corresponds to the texel position of each data in post bar buffer texture buffer zone 375.In frame 460, data are fed to the second vertex shader 355 from the second vertex buffer 345 and post bar buffer texture buffer zone 375 by histogram manager 145.In frame 470, histogram manager 145 performs the second vertex shader 355 to export and have height 1 from post bar buffer texture buffer zone 375 read value and equal final post bar texel position of wanting the final histogram texture of the width of histogram bar size.In frame 480, histogram manager 145 performs the second pixel coloring device 360 and increases progressively 1 to obtain final histogram to make the value in object post bar buffer texture buffer zone 380 via additivity raster manipulation.Each texel position of the position coordinates indicating the first vertex shader 365 to operate for the first pixel coloring device 370 makes the texel value of post bar buffer texture buffer zone 375 increase progressively 1 by the first pixel coloring device 370.
Fig. 5 illustrates that data are passed through from 2D data set texture 350 space layout 500 of two steps of dispersion-yojan-increment operation.2D data set texture buffer 350 is the middle 256x256 data set post bar buffer texture buffer zone 375 from such as 2048x1024 data set yojan, the final 256x1 histogram for the purpose of yojan in post bar buffer texture buffer zone 380.Fig. 5 illustrates the embodiment of row leading form.Should be appreciated that, in another embodiment, the first vertex shader 365 can be configured to by with column position but not line position be written to the texel position in the first vertex buffer 360 and the position coordinates of the texel in the first vertex buffer 360 be converted to new coordinate system.Also will understand, for 3D texture, sequence of operation is similar, just the first operation can perform by xy plane main mode (selection that wherein xy, zy or xz plane is leading is arbitrary again), second operation can perform by row or column main mode, and the 3rd operation obtains the final histogram of data set.
Fig. 6 is the process of example and the block diagram of data stream of the histogram calculation corresponded to as performed in Scheuermann and Hensley.In Scheuermann and Hensley, there is single dispersion-yojan-increment operation.As mentioned above, and by increasing compared with post bar size alleviates, there is a large amount of variability of performance.By distinguishing, alleviation is the dimension of increase first post bar as the more efficient of the buffer memory-write collision problem described in embodiment and optimal way and continues yojan dimension until D=1.This is the advantage of GPU imparting optimize storage buffer efficiency when not undermining working time.In addition, specific embodiment of the present invention is that GPU gives consistent and the interdependent performance of predictable non-data and working time, its for must in real time or the system of operation under strict treatment capacity limits be crucial.
Fig. 7 A to 7C is the process of example and the block diagram of data stream of the histogram calculation corresponded to as performed in the the 7th, 889, No. 922 United States Patent (USP) (being hereafter called ' 922 patent).Although can run in more early stage GPU hardware by relying on vertex shader when the ability without reading Texture memory, in the patent of ' 922, the example of teaching forces unrestricted iteration reduction technique.As a comparison, in certain embodiments of the invention, for D dimension data collection texture buffer, need at most D dispersion-yojan-incremental steps---from practice for similar maximum data collection, D=2; And based on the example shown in Fig. 7 A to 7C, when data set size is the exponential of 2, required reduction steps number is provided by equation 1.
It should be noted that the size due to data set increases for the example of Fig. 7 A to 7C, so primary data block size also increases and yojan operand also increases.Therefore, the performance of the example of Fig. 7 A to 7C is especially suboptimum to large data sets.
Fig. 8 is suitable for using dispersion-yojan-replacement operation to calculate the block diagram of the exemplary computer system 100 of Fig. 1 of the minimum value of data set.The element of Fig. 8 is similar to the element of Fig. 1, just the first pixel coloring device 870 and the second pixel coloring device 860 are suitable for value being placed on respectively in corresponding post bar buffer texture buffer zone 875 and object post bar buffer texture buffer zone 880, but not as increased progressively 1 and employing " minimum value " raster manipulation but not " adding up " raster manipulation in histogram calculation.Post bar buffer texture buffer zone 875 is no longer width and the height of post bar size, but the width of raw data set and height 1.In addition, object post bar buffer texture buffer zone 880 is suitable for the single minimum value containing having width and the height equaling 1.
Fig. 9 A to 9B illustrates the process flow diagram for using dispersion-yojan-replacement operation to calculate the example of the method 900 of the minimum value of data set.Method 900 performs by the computer system 100 of Fig. 1 and can comprise hardware (such as, circuit, special logic, FPGA (Field Programmable Gate Array), microcode etc.), software (instruction such as, treating apparatus run) or its combination.In an example, method 900 is performed by the histogram manager 145 of the computing system 100 of Fig. 1.
As shown in Fig. 9 A to 9B, to fall into a trap at frame 910 to allow computing system 100 and calculate the minimum value of data set, histogram manager 145 receives data set existing on GPU 140 as 2D or 3D texture buffer or 2D or the 3D texture buffer uploading to GPU 140 from host computer system 120, and it is for creating the first vertex buffer 860.First vertex buffer 860 comprises point list and its each point corresponds to the texel position of each data of data centralization.In the block 920, data are sent to the first vertex shader 865 from the first vertex buffer 860 and data set texture buffer 850 by histogram manager 145.In frame 930, histogram manager 145 perform the first vertex shader 865 with output to the width with the width equaling data set texture buffer 850 from data set texture buffer 850 read value and by column position and equal 1 the summation buffer texture buffer zone 885 of height.The position coordinates of the texel in the first vertex buffer 860 is converted to new coordinate system by the texel position be written to column position in summation buffer texture buffer zone 885 by the first vertex shader 865 further, the vertical coordinate making to be arranged in the position coordinates of the texel of summation buffer texture buffer zone 885 according to be arranged in data set texture buffer 850 the texel texture that is associated value and be converted into new coordinate system.In frame 940, histogram manager 145 performs the first pixel coloring device 870 to replace these values in the row texel position in summation buffer texture buffer zone 885 from data set texture buffer 850 read value and via minimum value raster manipulation.
In frame 950, histogram manager 145 utilizes summation buffer texture buffer zone 885 to create the second vertex buffer 845, and wherein the second vertex buffer 845 comprises point list and its each point corresponds to the texel position of each data in summation buffer texture buffer zone 885.In frame 960, data are fed to the second vertex shader 855 from the second vertex buffer 845 and summation buffer texture buffer zone 885 by histogram manager 145.In frame 970, histogram manager 145 performs the second vertex shader 855 has the final summation texture buffer 890 of height 1 and width 1 single texel position with supply.In frame 980, histogram manager 145 performs the second pixel coloring device 860 to replace the value of the final single texel position in final summation texture buffer 890 to calculate the minimum value of data set from summation buffer texture buffer zone 885 read value via minimum value raster manipulation.
Figure 10 is suitable for using dispersion-yojan-replacement operation to calculate the block diagram of the exemplary computer system 100 of Fig. 1 of the maximal value of data set.The element of Figure 10 is similar to the element of Fig. 1, just the first pixel coloring device 1070 and the second pixel coloring device 1060 are suitable for value being placed on respectively corresponding post bar buffer texture buffer zone 1075 and object post bar buffer texture buffer zone 1080, but not as increased progressively 1 and employing " maximal value " raster manipulation but not " adding up " raster manipulation in histogram calculation.Post bar buffer texture buffer zone 1075 is no longer width and the height of post bar size, but the width of raw data set and height 1.In addition, object post bar buffer texture buffer zone 1080 is suitable for the single minimum value containing having width and the height equaling 1.
Figure 11 A to 11B illustrates the process flow diagram for using dispersion-yojan-replacement operation to calculate the example of the method 1100 of the maximal value of data set.Method 1100 performs by the computer system 100 of Fig. 1 and can comprise hardware (such as, circuit, special logic, FPGA (Field Programmable Gate Array), microcode etc.), software (instruction such as, treating apparatus run) or its combination.In an example, method 1100 is performed by the histogram manager 145 of the computing system 100 of Fig. 1.
As shown in Figure 11 A to 11B, to fall into a trap at frame 1110 to allow computing system 100 and calculate the maximal value of data set, histogram manager 145 receives data set existing on GPU 140 as 2D or 3D texture buffer or 2D or the 3D texture buffer uploading to GPU 140 from host computer system 120, and it is for creating the first vertex buffer 1060.First vertex buffer 1060 comprises point list and its each point corresponds to the texel position of each data of data centralization.In frame 1120, data are sent to the first vertex shader 1065 from the first vertex buffer 1060 and data set texture buffer 1050 by histogram manager 145.In frame 1130, histogram manager 145 perform the first vertex shader 1065 with output to the width with the width equaling data set texture buffer 1050 from data set texture buffer 1050 read value and by column position and equal 1 the summation buffer texture buffer zone 1085 of height.The position coordinates of the texel in the first vertex buffer 1060 is converted to new coordinate system by the texel position be written to column position in summation buffer texture buffer zone 1085 by the first vertex shader 1065 further, the vertical coordinate making to be arranged in the position coordinates of the texel of summation buffer texture buffer zone 1085 according to be arranged in data set texture buffer 1050 the texel texture that is associated value and be converted into new coordinate system.In frame 1140, histogram manager 145 performs the first pixel coloring device 1070 to replace these values in the row texel position in summation buffer texture buffer zone 1085 from data set texture buffer 1050 read value and via maximal value raster manipulation.
In frame 1150, histogram manager 145 utilizes summation buffer texture buffer zone 1085 to create the second vertex buffer 1045, and wherein the second vertex buffer comprises 1045 point list and its each point corresponds to the texel position of each data in summation buffer texture buffer zone 1085.In frame 1160, data are fed to the second vertex shader 1055 from the second vertex buffer 1045 and summation buffer texture buffer zone 1085 by histogram manager 145.In frame 1170, histogram manager 145 performs the second vertex shader 1055 has the final summation texture buffer 1090 of height 1 and width 1 single texel position with supply.In frame 1180, histogram manager 145 performs the second pixel coloring device 1060 to replace value in the final single texel position in final summation texture buffer 1090 from summation buffer texture buffer zone 1085 read value and via maximal value raster manipulation to calculate the maximal value of data set.
Figure 12 is suitable for using dispersion-yojan-accumulation operations to calculate the block diagram of the exemplary computer system 100 of Fig. 1 of the summing value of data set.The element of Figure 12 is similar to the element of Fig. 1, and just the first pixel coloring device 1270 and the second pixel coloring device 1260 are suitable for adding value (such as execution " adding up " raster manipulation) but not as increased progressively 1 in the histogram calculation of Fig. 1 and 2.Post bar buffer texture buffer zone 1275 is no longer width and the height of post bar size, but the width of raw data set and height 1.In addition, object post bar buffer texture buffer zone 1280 is suitable for the single summing value containing having width and the height equaling 1.
Figure 13 A to 13B illustrates the process flow diagram for using dispersion-yojan-add operation to calculate the example of the method 1300 of the summing value of data set.Method 1300 performs by the computer system 100 of Fig. 1 and can comprise hardware (such as, circuit, special logic, FPGA (Field Programmable Gate Array), microcode etc.), software (instruction such as, treating apparatus run) or its combination.In an example, method 1300 is performed by the histogram manager 145 of the computing system 100 of Fig. 1.
As shown in Figure 13 A to 13B, to fall into a trap at frame 1310 to allow computing system 100 and calculate the summing value of data set, histogram manager 145 receives data set existing on GPU 140 as 2D or 3D texture buffer or 2D or the 3D texture buffer uploading to GPU 140 from host computer system 120, and it is for creating the first vertex buffer 1260.First vertex buffer 1260 comprises point list and its each point corresponds to the texel position of each data of data centralization.In frame 1320, data are sent to the first vertex shader 1265 from the first vertex buffer 1260 and data set texture buffer 1250 by histogram manager 145.In frame 1330, histogram manager 145 perform the first vertex shader 1265 with output to the width with the width equaling data set texture buffer 1250 from data set texture buffer 1250 read value and by column position and equal 1 the summation buffer texture buffer zone 1285 of height.The position coordinates of the texel in the first vertex buffer 1260 is converted to new coordinate system by the texel position be written to column position in summation buffer texture buffer zone 1285 by the first vertex shader 1265 further, the vertical coordinate making to be arranged in the position coordinates of the texel of summation buffer texture buffer zone 1285 according to be arranged in data set texture buffer 1250 the texel texture that is associated value and be converted into new coordinate system.In frame 1340, histogram manager 145 perform the first pixel coloring device 1270 with from data set texture buffer 1250 read value and via additivity raster manipulation summation buffer texture buffer zone 1285 in row texel position in add these values.
In frame 1350, histogram manager 145 utilizes summation buffer texture buffer zone 1285 to create the second vertex buffer 1245, and wherein the second vertex buffer 1285 comprises point list and its each point corresponds to the texel position of each data in summation buffer texture buffer zone 1285.In frame 1360, data are fed to the second vertex shader 1255 from the second vertex buffer 1245 and summation buffer texture buffer zone 1285 by histogram manager 145.In frame 1370, histogram manager 145 performs the second vertex shader 1255 has the final summation texture buffer 1290 of height 1 and width 1 single texel position with supply.In frame 1380, histogram manager 145 performs the second pixel coloring device 1260 to add that value is to calculate the summing value of data set from summation buffer texture buffer zone 1285 read value via in the final single texel position of additivity raster manipulation in final summation texture buffer 1290.
Figure 14 is suitable for using dispersion-yojan-accumulation operations to calculate the block diagram of the exemplary computer system 100 of Fig. 1 of the mean value of data set.The element of Figure 14 is similar to the element of Fig. 1 and 10, just each summing value divided by the row of the raw data set texture buffer 350 be associated with the first pixel coloring device 370 height and divided by the width of raw data set texture buffer 350 or row.This obtains the final calculating of the mean value of data set.
Figure 15 A to 15B illustrates the process flow diagram for using dispersion-yojan-accumulation operations to calculate the example of the method 1500 of the mean value of data set.Method 1500 performs by the computer system 100 of Fig. 1 and can comprise hardware (such as, circuit, special logic, FPGA (Field Programmable Gate Array), microcode etc.), software (instruction such as, treating apparatus run) or its combination.In an example, method 1500 is performed by the histogram manager 145 of the computing system 100 of Fig. 1.
As shown in Figure 15 A to 15B, to fall into a trap at frame 1510 to allow computing system 100 and calculate the mean value of data set, histogram manager 145 receives data set existing on GPU 140 as 2D or 3D texture buffer or 2D or the 3D texture buffer uploading to GPU 140 from host computer system 120, and it is for creating the first vertex buffer 1460.First vertex buffer 1460 comprises point list and its each point corresponds to the texel position of each data of data centralization.In frame 1420, data are sent to the first vertex shader 1465 from the first vertex buffer 1460 and data set texture buffer 1450 by histogram manager 1445.In frame 1530, histogram manager 145 perform the first vertex shader 1465 with output to the width with the width equaling data set texture buffer 1450 from data set texture buffer 1450 read value and by column position and equal 1 the summation buffer texture buffer zone 1485 of height.The position coordinates of texel in first vertex buffer 1460 is converted to new coordinate system by the texel position be written to column position in summation buffer texture buffer zone 1485 by the first vertex shader 1465 further, the vertical coordinate making to be arranged in the position coordinates of the texel of summation buffer texture buffer zone 1485 according to be arranged in data set texture buffer 150 the texel texture that is associated value and be converted into new coordinate system.In frame 1540, histogram manager 145 perform the first pixel coloring device 1470 with from data set texture buffer 1450 read value and via additivity raster manipulation summation buffer texture buffer zone 1485 in row texel position in add these values.
In frame 1550, histogram manager 145 utilizes summation buffer texture buffer zone 1485 to create the second vertex buffer 1445, and wherein the second vertex buffer 1485 comprises point list and its each point corresponds to the texel position of each data in summation buffer texture buffer zone 1485.In frame 1560, data are fed to the second vertex shader 1455 from the second vertex buffer 1445 and summation buffer texture buffer zone 1485 by histogram manager 145.In frame 1570, histogram manager 145 performs the second vertex shader 1455 has the final summation texture buffer 1490 of height 1 and width 1 single texel position with supply.In frame 1580, histogram manager 145 performs the second pixel coloring device 1460 to add that the value being multiplied by 1/ (data set size) is with the mean value of set of computations from summation buffer texture buffer zone 1485 read value via in the final single texel position of additivity raster manipulation in final summation texture buffer 1490.
Figure 16 A to 16C is suitable for using dispersion-yojan-accumulation operations to calculate the block diagram of the exemplary computer system 100 of Fig. 1 of the standard deviation of data set.The element of Figure 16 A to 16C is similar to the element of Figure 14, and wherein supplementary frame 1650,1660 is for the mean value calculation standard deviation from the previous calculating using the system and method described in Figure 14 and 15 to obtain.Pixel coloring device 1602 is for calculating each data X's of data centralization wherein be in fig. 14 with the mean value of data set that calculates in the left-half of 16A, thus obtain data set texture buffer 1608.Identical D ties up dispersion-yojan-cumulative frame 1640 as performed on data set texture buffer 1608 in the summation of the data set embodiment described in fig. 12.In final frame 1660, pixel coloring device 1632 service data with obtain and with calculation equation 2, it provides the standard deviation of data set:
In essence, combination obtain the function of mean value of the data set of Figure 14, pixel coloring device 1602, obtain Figure 12 data set and frame/computing and pixel coloring device 1632 obtain standard deviation efficiently.
Figure 17 A and 17C illustrates the process flow diagram for using dispersion-yojan-accumulation operations to calculate the example of the method 1700 of the standard deviation of data set.Method 1700 performs by the computer system 100 of Fig. 1 and can comprise hardware (such as, circuit, special logic, FPGA (Field Programmable Gate Array), microcode etc.), software (instruction such as, treating apparatus run) or its combination.In an example, method 1700 is performed by the histogram manager 145 of the computing system 100 of Fig. 1.
As shown in Figure 17 A to 17C, to fall into a trap at frame 1705 to allow computing system 100 and calculate the standard deviation of data set, histogram manager 145 calculates the mean value of the data set be stored in the first vertex buffer 1602.First vertex buffer 1602 comprises the single four-tuple of the identification dimension with relevant data collection texture buffer 1408.In frame 1710, the content of the first vertex buffer 1602 is sent to the first vertex shader 1604 by histogram manager 145.First vertex shader 1604 supply has the texel position of the temporal cache texture buffer 1606 of width and the height equaling data set texture buffer 1608.In frame 1715, histogram manager 145 performs the first pixel coloring device 1610 with from data set texture buffer 1408 read value, read single value from the mean value summation texture buffer 1612 previously calculated and for each texel of temporal cache texture buffer 1606 calculate the difference of each data set values and mean value square.
In frame 1720, histogram manager 145 utilizes temporal cache texture buffer 1606 to create the second vertex buffer 1614.Second vertex buffer 1614 comprises point list and its each point corresponds to the texel position of each data in temporal cache texture buffer 1606.In frame 1725, histogram manager 145 by the content feeds of the second vertex buffer 1614 and temporal cache texture buffer 1606 to the second vertex shader 1616.In frame 1730, histogram manager 145 perform the second vertex shader 1616 with from temporal cache texture 1606 read value and export there is the width that equals temporal cache texture buffer 1606 and equal 1 the column position of summation buffer texture buffer zone 1618 of height.In frame 1735, histogram manager 145 performs the second pixel coloring device 1617 these values are added to the row texel position in summation buffer texture buffer zone 1412 from data set texture buffer 1408 read value via additivity raster manipulation.
In frame 1740, histogram manager 145 utilizes summation buffer texture buffer zone to create the 3rd vertex buffer 1620.3rd vertex buffer 1620 comprises point list and its each point corresponds to the texel position of each data in summation buffer texture 1612.In frame 1745, histogram manager 145 by the content feeds of the 3rd vertex buffer 1620 to the 3rd vertex shader 1622.In frame 1750, histogram manager 145 performs the 3rd vertex shader 1622 has the final summation texture buffer 1624 of height 1 and width 1 single texel position with supply.In frame 1755, histogram manager 145 performs the 3rd pixel coloring device 1626 to add via in the single texel position of additivity raster manipulation in final summation texture buffer 1628 value being multiplied by 1/ (data set size) from summation buffer texture buffer zone 1624 read value.
In frame 1760, histogram manager 145 utilizes the content of summation texture buffer 1628 to create the 4th vertex buffer 1630.4th vertex buffer 1630 comprises point list, and wherein individual element corresponds to the single texel position in summation texture buffer 1628.In frame 1765, the content of the 4th vertex buffer 1630 is sent to the 4th vertex shader 1632 by histogram manager 145.In frame 1770, histogram manager 145 performs the 4th vertex shader 1632 has the ultimate criterion deviation texture buffer 1634 of height 1 and width 1 single texel position with supply.In frame 1775, histogram manager 145 performs the 4th pixel coloring device 1633 to obtain the square root of the value being multiplied by 1/ (data set size) to calculate the standard deviation of data set from summation texture reads value via replacing in the single texel position of raster manipulation in standard deviation texture buffer 1634.
Figure 18 is that the minimum value of the data set calculating being suitable for expander graphs 8 is to determine the block diagram of the exemplary computer system 100 of Fig. 1 of the position of given minimum value in data set.The minimum value texture buffer 1802 previously calculated of given 1x1, histogram manager 145 is configured to generation first vertex buffer 1804, it comprises point list, it is the one group of (x retrieved from data set texture buffer 1806 by histogram manager 145, or (x y), y, z) coordinate.Next, histogram manager 145 that executes vertex shader 1808, its when and texture coordinate only in data set texture buffer 1804 equals minimum value just export effective object coordinate and (x, or (x y), y, z) position is to single pixel (width and highly equal 1), otherwise exports negative position, and it guarantees that later pixel tinter 1810 only operates in conjunction with minimum value.When multiple equal minimum value, the position returned is deterministic by right and wrong.But, those skilled in the art can infer and a kind ofly reliably determine that the mode of all positions of multiple equal minimum value recursively applies the function as described in Figure 18, remove minimum value that raw data concentrates and again run function until minimum value change by NaN (nonnumeric) value.
Figure 19 is the process flow diagram of the example of the method 1900 of the position illustrated for calculating minimum value in data set.Method 1900 performs by the computer system 100 of Fig. 1 and can comprise hardware (such as, circuit, special logic, FPGA (Field Programmable Gate Array), microcode etc.), software (instruction such as, treating apparatus run) or its combination.In an example, method 1900 is performed by the histogram manager 145 of the computing system 100 of Fig. 1.
As shown in Figure 19, in order to allow the position of computing system 100 minimum value in calculation data set fallen into a trap by frame 1910, histogram manager 145 calculates the minimum value of data set.In frame 1920, histogram manager 145 utilizes data set texture buffer 1806 to create the first vertex buffer 1804.First vertex buffer 1804 comprises point list and its each point corresponds to the texel position of each data of data centralization.In frame 1930, the content of the first vertex buffer 1804 and data set texture buffer 1806 is input to the first vertex shader 1808 by histogram manager 145.In frame 1940, histogram manager 145 performs the first vertex shader 1808 with from data set texture buffer 1806 read value and when the value of data centralization equals minimum value, export single texel position and x position and y position, and when the value of data centralization to compare with the value in the texture buffer, minimum value position 1802 with the width that equals 1 and height be greater than minimum value, export the outer texel position of single scope.In frame 1950, histogram manager 145 performs the first pixel coloring device 1810 to read x value and y value from the first vertex buffer 1804 and these values to be copied to the x value of texture buffer, minimum value position 1812 and y value to calculate the position of minimum value in data set via replacing raster manipulation.
Figure 20 is that the minimum value of the data set calculating being suitable for expanding Figure 10 is to determine the block diagram of the exemplary computer system 100 of Fig. 1 of the position of given maximal value in data set.The maximum texture buffer zone 2002 previously calculated of given 1x1, histogram manager 145 is configured to produce the first vertex buffer 2004 comprising point list, described point list is the one group of (x retrieved from data set texture buffer 2006 by histogram manager 145, or (x y), y, z) coordinate.Next, histogram manager 145 is configured to that executes vertex shader 2008, its when and texel coordinate only in data set texture buffer 2006 equals maximal value just export effective object coordinate and (x, or (x y), y, z) position is to single pixel (width and highly equal 1), otherwise exports negative position, and it guarantees that later pixel tinter 2010 only operates in conjunction with minimum value.When multiple equal maximal value, the position returned is deterministic by right and wrong.But, those skilled in the art can infer and a kind ofly reliably determine that the mode of all positions of multiple equal maximal value recursively applies the function as described in Figure 20, remove maximal value that raw data concentrates and again run function until maximal value change by NaN (nonnumeric) value.
Figure 21 is the process flow diagram of the example of the method 2100 of the position illustrated for calculating the maximal value in data set.Method 2100 performs by the computer system 100 of Fig. 1 and can comprise hardware (such as, circuit, special logic, FPGA (Field Programmable Gate Array), microcode etc.), software (instruction such as, treating apparatus run) or its combination.In an example, method 2100 is performed by the histogram manager 145 of the computing system 100 of Fig. 1.
As shown in Figure 21, in order to allow the position of computing system 100 maximal value in calculation data set fallen into a trap by frame 2110, histogram manager 145 calculates the maximal value of data set.In frame 2120, histogram manager 145 utilizes data set texture buffer 2006 to create the first vertex buffer 2004.First vertex buffer 2004 comprises point list and its each point corresponds to the texel position of each data of data centralization.In frame 2130, the content of the first vertex buffer 2004 and data set texture buffer 2006 is input to the first vertex shader 2008 by histogram manager 145.In frame 2140, histogram manager 145 performs the first vertex shader 2008 with from data set texture buffer 2006 read value, and when the value of data centralization equals maximal value, export single texel position and x position and y position, and when the value of data centralization to compare with the value in the maximum value position texture buffer 2002 with the width that equals 1 and height be greater than maximal value, export the outer texel position of single scope.In frame 2150, histogram manager 145 performs the first pixel coloring device 2010 to read x value and y value from the first vertex buffer 2004 and these values to be copied to the x value of maximum value position texture buffer 2012 and y value to calculate the position of maximal value in data set via replacing raster manipulation.
The present invention has several advantages of the art methods being better than compute histograms and ASSOCIATE STATISTICS function.Arrange that dispersion yojan framework is consistent with buffer memory behavior that is modern and previous GPU with the dimension of Reduced Data Set, it allows the performance greatly strengthened.Dispersion yojan framework is come by vague generalization with high-level efficiency, and exactly determined behavior (even in conjunction with data set process high mode) performs from histogram calculation to the function of the intermediate value and mode of searching data set.Process is enough efficient to be performed in real time to exceed for 4K video resolution video by 30fps on the same period, commercial, computer hardware salable, and it starts new application.These application are including but not limited to the video coding efficiency of colors countenance, improvement, scene change detection, motion compensation de interlacing and frame rate conversion and the object fragments for real-time scene analysis, photogrammetric and metrography.
The graphic representation of the machine in the computer system 2200 of Figure 22 illustrated example form, can perform the one group of instruction of appointing one or more method causing machine execution to be discussed herein in described computer system 2200.In some instances, machine can connect (such as, network connects) to other machine in LAN, Intranet, extranet or the Internet.Machine can the server machine identity in client-sever network environment run.Machine can be personal computer (PC), Set Top Box (STB), server, network router, interchanger or bridge maybe can perform any machine of specifying one group of instruction of the action performed by described machine (order or other form).In addition, although only individual machine is described, term " machine " also should be understood to comprise individually or jointly perform one group of (or many groups) instruction to perform the arbitrary collection of appointing the machine of one or more method discussed herein.
Exemplary computer system 2200 comprises treating apparatus (processor) 2202, primary memory 2204 (such as, ROM (read-only memory) (ROM)), flash memory, dynamic RAM (DRAM) (such as synchronous dram (SDRAM)), static memory 2206 (such as, flash memory, static RAM (SRAM)) and data storage device 2216, it communicates with one another via bus 2208.
Processor 2202 represents one or more general processing unit, such as microprocessor, CPU (central processing unit) or similar device.More specifically, processor 2202 can be sophisticated vocabulary calculating (CISC) microprocessor, Jing Ke Cao Neng (RISC) microprocessor, very long instruction word (VLIW) microprocessor or implements the processor of other instruction set or implement the processor of instruction set combination.Processor 2202 also can be one or more special processor, such as special IC (ASIC), field programmable gate array (FPGA), digital signal processor (DSP), network processing unit or similar device.Histogram manager 145 shown in Fig. 1 can be performed by the processor 2202 being configured to perform operation and the step discussed herein.
Computer system 2200 can comprise Network Interface Unit 2222 further.Computer system 2200 also can comprise video display unit 2210 (such as, liquid crystal display (LCD) or cathode-ray tube (CRT) (CRT)), alphanumeric input device 2212 (such as, keyboard), cursor control device 2214 (such as, mouse) and signal generation device 2220 (such as, loudspeaker).
Driver element 2216 can comprise computer-readable media 2224, and it stores a group or more groups instruction (such as, the instruction of histogram manager 145), and it embodies one or many person in described method or function herein.The instruction of histogram manager 145 its by also form the computer system 2200 of computer-readable media, primary memory 2204 and processor 2202 the term of execution, also can reside in completely or at least partly in primary memory 2204 and/or in processor 2202.The instruction of histogram manager 145 can be launched via Network Interface Unit 2222 further or receive on network.
Although computer-readable storage medium 2224 is illustrated as single medium in instances, but term " computer-readable storage medium " should be understood to comprise the single non-transitory media that store one or more groups instruction or multiple non-transitory media (such as, centralized or distributed data base and/or be associated buffer memory and server).Term " computer-readable storage medium " also should be understood to comprise and can store, encodes or carry for being performed by machine and causing machine to perform one group of instruction of of the present invention one or more method.Term " computer-readable storage medium " should correspondingly be understood to including but not limited to solid-state memory, optical media and magnetic medium.
In describing, state many details above.But it is clear to read persons skilled in the art of the present invention, and example of the present invention can be put into practice when not having these details.In some instances, well-known construction and device show for block diagram format but not details with description of avoiding confusion.
The some parts reference described in detail represents according to the algorithm of the operation of the data bit in computer memory and symbol and presents.These arthmetic statements and expression are the means most effectively essence of its works being conveyed to those skilled in the art by the technician in data processing field.Algorithm at this and be contemplated to be haply obtain wanted result be certainly in harmony sequence of steps.Step is the step of the physical manipulation of requirement physical quantity.Usually, but not necessarily, these quantity adopts the form of electrical or magnetic signal, and it can be stored, transmits, combines, compares and otherwise handle.Confirm, sometimes, mainly for general reason, it is easily that these signals are called position, value, element, symbol, character, item, numeral or analog.
But should keep firmly in mind, all these and similar terms will be associated with suitable physical quantity and are only be applied to the convenient of these quantity to mark.Unless otherwise specified, otherwise it is apparent in as discussed above, should be appreciated that in this description, utilize such as " reception ", " write ", the discussion of terms such as " maintenances " or similar terms refers to action and the process of computer system or similar computing electronics, it by the physics that is expressed as in the RS of computer system (such as, electronics) data manipulation measured and be converted to and be expressed as this information of computer system memory or register or other similarly and store, launch or the new coordinate system of other data of physical quantity in display device.
Example of the present invention also relates to a kind of equipment for performing operation herein.This equipment can be required object and special construction or it can comprise by the computer program selective actuation stored in a computer or the special purpose computer reconfigured.The problem in many fields is solved herein as the format high throughput histogram that discloses and statistical computation, such as, in ' 384 patent the scene change detection system and method for teaching, process GPU or APU mobile device on the color balance of real-time video and contrast strengthen, search as contemporary MRI and other 3D scanning system the maximal value of Hough transformation that utilizes, wherein histogram highest value is such as identifying the outstanding line segment in 3D volume and the character pattern in format high throughput cryptanalysis system and frequency analysis.So a kind of computer program can be stored in computer-readable storage medium, such as but not limited to the dish of any type, comprise floppy disk, CD, CD-ROM and magneto-optic disk, ROM (read-only memory) (ROM), random access memory (RAM), EPROM, EEPROM, magnetic or optical card or be applicable to the media of any type of store electrons instruction.
The algorithm shown herein and display do not relate to concrete computing machine or miscellaneous equipment arbitrarily inherently.Various general-purpose system can use in conjunction with the program according to teaching herein, maybe may confirm that the more special equipment of construction is easily to perform required method step.From the example arrangement of apparent this type systematic multiple of description herein.In addition, the present invention is not with reference to concrete programming language description arbitrarily.Should be appreciated that, multiple programming language can be used for implementing teaching of the present invention as described in this article.
Should be appreciated that, description be above intended to illustrate and unrestricted.Those skilled in the art are in reading and will understand other examples many when understanding foregoing description.Therefore, the full breadth of the equivalent that should have with reference to enclose claim and these claims is to determine scope of the present invention.

Claims (27)

1. a method, it comprises:
Receive texel data collection at treating apparatus place from the first buffer zone, wherein said data set have be at least 2 dimension D and wherein each texel contain a value;
Use described treating apparatus to be categorized in coordinate points list by described data set, the point in wherein said point list corresponds to the texel position of described data centralization;
By arranging that the point in described point list carrys out the dimension of point list described in yojan according to N-1 dimension advantage;
Raster manipulation is performed to obtain at least one value to each associated values of described arranged point; And
At least one value described is outputted to the second buffer zone.
2. method according to claim 1, it comprises classification, yojan, execution and output described in repetition further, until D is 1.
3. method according to claim 1, wherein described data set classification comprised and produce vertex buffer, it has the indivedual summits for each texel position.
4. method according to claim 1, wherein the described dimension of point list described in yojan comprises that executes vertex shader all over secondary to notify that the later pixel tinter of the object post bar position for performing described raster manipulation is all over secondary.
5. method according to claim 1, wherein performs raster manipulation and comprises at least one performing and replace in raster manipulation, additivity raster manipulation, minimum value raster manipulation or maximal value raster manipulation.
6. method according to claim 1, at least one value described of wherein said output is at least one in the position of the histogram of described data set, the maximal value of described data set, the minimum value of described data set, the summing value of described data set, the mean value of data set, intermediate value or mode value, the standard deviation value of described data set, the position of the minimum value of data set or the maximal value of data set.
7. method according to claim 1, wherein in described first buffer zone from texel data collection described in two dimension or three-dimensional static image or video reception.
8. method according to claim 1, wherein D is 2 or 3.
9. method according to claim 1, it comprises further at least one value described is transmitted into one or more downstream unit in video processing applications.
10. a method, it comprises:
Receive two-dimentional texel data collection at treating apparatus place from the first buffer zone, each texel of wherein said data centralization contains a value;
In the coordinate points list using described treating apparatus to be categorized into by the described data set from described first buffer zone in the second buffer zone, the point in wherein said point list corresponds to the texel position of described data centralization;
Column position is outputted to the 3rd buffer zone of the height that there is the width that equals the first size and equal the second size from described second buffer zone read value;
Additivity raster manipulation is used to make the value in the described row texel position in described 3rd buffer zone increase progressively 1 to obtain at least one value; And
At least one value described is outputted to the 4th buffer zone.
11. methods according to claim 10, wherein said first size and described second size correspond to histogram bar size.
12. methods according to claim 10, wherein column position is outputted to the 3rd buffer zone with the width equaling the first size and the height equaling the second size comprise the texel position by being written to column position in described 3rd buffer zone further and the position coordinates of the texel in described second buffer zone be converted to new coordinate system, make the vertical coordinate of the position coordinates of the texel being arranged in described second buffer zone according to be arranged in described first buffer zone the texel texture that is associated value and be converted into new coordinate system.
13. methods according to claim 12, wherein increment value comprises each texel position indicating it to operate for described position coordinates and makes the texel value of described 3rd buffer zone increase progressively 1.
14. methods according to claim 10, it comprises further a post article texel position with height 1 and the width that equals final histogram bar size is outputted to described 4th buffer zone.
15. methods according to claim 14, it comprises the described additivity raster manipulation of use further makes the value in described 4th buffer zone increase progressively 1 to obtain histogram.
16. methods according to claim 10, wherein said first size corresponds to the width of described first buffer zone, and described second size is corresponding to the height equaling 1.
17. methods according to claim 10, wherein perform raster manipulation and comprise at least one performing and replace in raster manipulation, additivity raster manipulation, minimum value raster manipulation or maximal value raster manipulation.
18. methods according to claim 10, it comprises further a post article texel position with height 1 and the width that equals 1 is outputted to described 4th buffer zone.
19. methods according to claim 10, it comprises further and uses minimum value raster manipulation to replace value in described 4th buffer zone to obtain the minimum value of described data set.
20. methods according to claim 10, it comprises further and uses summation raster manipulation to replace value in described 4th buffer zone to obtain the summing value of described data set.
21. methods according to claim 10, the value of wherein replacing in described 4th buffer zone comprises further and the described value in described 4th buffer zone is multiplied by 1 divided by the size of described data set to obtain the mean value of described data set.
22. methods according to claim 10, it comprises further at least one value described is transmitted into one or more downstream unit in video processing applications.
23. 1 kinds of methods, it comprises:
Calculate minimum value or the maximal value of two-dimentional texel data collection;
Receive described two-dimentional texel data collection at treating apparatus place from the first buffer zone, each texel of wherein said data centralization is associated with a value;
In the coordinate points list using described treating apparatus to be categorized into by the described data set from described first buffer zone in the second buffer zone, the point in wherein said point list corresponds to the texel position of described data centralization;
Texel value is read from described second buffer zone, and when described texel value equals described minimum value, single texel position and x value and y value are outputted to the 3rd buffer zone, and export the outer texel position of single scope when described texel value is greater than described minimum value; And
Read x value and y value from described second buffer zone and described x value and described y value copied to the x value of described 3rd buffer zone and y value to calculate the position of minimum value in described data set or maximal value via replacement raster manipulation.
24. 1 kinds of computer systems, it comprises:
Storer;
Treating apparatus, it is coupled to described storer, and wherein said treating apparatus is used for:
Receive texel data collection from the first buffer zone, wherein said data set have be at least 2 dimension and wherein each texel contain a value;
Be categorized into by described data set in coordinate points list, the point in wherein said point list corresponds to the texel position of described data centralization;
By arranging that the point in described point list carrys out the dimension of point list described in yojan according to N-1 dimension advantage;
Raster manipulation is performed to obtain at least one value to each associated values of described arranged point; And
At least one value described is outputted to the second buffer zone.
25. systems according to claim 24, wherein said treating apparatus repeats described classification, yojan, execution and output further, until D is 1.
26. systems according to claim 24, wherein said treating apparatus is Graphics Processing Unit.
27. systems according to claim 24, at least one value described is transmitted into one or more downstream unit in video processing applications by wherein said treating apparatus further.
CN201380048176.8A 2013-07-17 2013-07-17 System and method for histogram computation using a graphics processing unit Pending CN105009142A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2013/050870 WO2015009294A2 (en) 2013-07-17 2013-07-17 System and method for histogram computation using a graphics processing unit

Publications (1)

Publication Number Publication Date
CN105009142A true CN105009142A (en) 2015-10-28

Family

ID=52346814

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201380048176.8A Pending CN105009142A (en) 2013-07-17 2013-07-17 System and method for histogram computation using a graphics processing unit

Country Status (9)

Country Link
EP (1) EP3022682A4 (en)
JP (1) JP2016527631A (en)
KR (1) KR20160030871A (en)
CN (1) CN105009142A (en)
BR (1) BR112015008904A2 (en)
CA (1) CA2868297A1 (en)
HK (1) HK1216934A1 (en)
SG (1) SG11201501622UA (en)
WO (1) WO2015009294A2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111261088A (en) * 2020-02-25 2020-06-09 京东方科技集团股份有限公司 Image drawing method and device and display device

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113450737B (en) * 2020-03-27 2022-11-01 京东方科技集团股份有限公司 Image drawing method, display device and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101689306A (en) * 2007-02-16 2010-03-31 高通股份有限公司 Efficient 2-d and 3-d graphics processing
US20100091018A1 (en) * 2008-07-11 2010-04-15 Advanced Micro Devices, Inc. Rendering Detailed Animated Three Dimensional Characters with Coarse Mesh Instancing and Determining Tesselation Levels for Varying Character Crowd Density
CN101819676A (en) * 2004-05-14 2010-09-01 辉达公司 Method and system for a general instruction raster stage that generates programmable pixel packets

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6700583B2 (en) * 2001-05-14 2004-03-02 Ati Technologies, Inc. Configurable buffer for multipass applications
US7564460B2 (en) * 2001-07-16 2009-07-21 Microsoft Corporation Systems and methods for providing intermediate targets in a graphics system
US6753870B2 (en) * 2002-01-30 2004-06-22 Sun Microsystems, Inc. Graphics system configured to switch between multiple sample buffer contexts
US6819324B2 (en) * 2002-03-11 2004-11-16 Sun Microsystems, Inc. Memory interleaving technique for texture mapping in a graphics system
US7889922B2 (en) * 2005-11-14 2011-02-15 Siemens Medical Solutions Usa, Inc. Method and system for histogram calculation using a graphics processing unit
US7710417B2 (en) * 2007-01-15 2010-05-04 Microsoft Corporation Spatial binning of particles on a GPU
EP2128822B1 (en) * 2008-05-27 2012-01-04 TELEFONAKTIEBOLAGET LM ERICSSON (publ) Index-based pixel block processing
JP2013008270A (en) * 2011-06-27 2013-01-10 Renesas Electronics Corp Parallel arithmetic unit and microcomputer

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101819676A (en) * 2004-05-14 2010-09-01 辉达公司 Method and system for a general instruction raster stage that generates programmable pixel packets
CN101689306A (en) * 2007-02-16 2010-03-31 高通股份有限公司 Efficient 2-d and 3-d graphics processing
US20100091018A1 (en) * 2008-07-11 2010-04-15 Advanced Micro Devices, Inc. Rendering Detailed Animated Three Dimensional Characters with Coarse Mesh Instancing and Determining Tesselation Levels for Varying Character Crowd Density

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111261088A (en) * 2020-02-25 2020-06-09 京东方科技集团股份有限公司 Image drawing method and device and display device
CN111261088B (en) * 2020-02-25 2023-12-12 京东方科技集团股份有限公司 Image drawing method and device and display device

Also Published As

Publication number Publication date
EP3022682A4 (en) 2017-02-22
KR20160030871A (en) 2016-03-21
CA2868297A1 (en) 2015-01-17
WO2015009294A2 (en) 2015-01-22
BR112015008904A2 (en) 2017-07-04
SG11201501622UA (en) 2015-04-29
JP2016527631A (en) 2016-09-08
EP3022682A2 (en) 2016-05-25
HK1216934A1 (en) 2016-12-09
WO2015009294A3 (en) 2015-07-16

Similar Documents

Publication Publication Date Title
Fan et al. Point 4d transformer networks for spatio-temporal modeling in point cloud videos
US10249047B2 (en) System and method for detecting and tracking multiple moving targets based on wide-area motion imagery
Bi et al. Fast copy-move forgery detection using local bidirectional coherency error refinement
KR101639852B1 (en) Pixel value compaction for graphics processing
US10979622B2 (en) Method and system for performing object detection using a convolutional neural network
Pepik et al. What is holding back convnets for detection?
US9830736B2 (en) Segmenting objects in multimedia data
CN109754417A (en) The System and method for of unsupervised learning geometry from image
US8704842B1 (en) System and method for histogram computation using a graphics processing unit
CN106203277A (en) Fixed lens real-time monitor video feature extracting method based on SIFT feature cluster
CN103530619A (en) Gesture recognition method of small quantity of training samples based on RGB-D (red, green, blue and depth) data structure
CN110769257A (en) Intelligent video structured analysis device, method and system
CN105069754A (en) System and method for carrying out unmarked augmented reality on image
Zhu et al. Handling occlusions in video‐based augmented reality using depth information
CN111798450A (en) Segmentation using unsupervised neural network training techniques
CN110633628A (en) RGB image scene three-dimensional model reconstruction method based on artificial neural network
CN111667005A (en) Human body interaction system adopting RGBD visual sensing
Zhang et al. Combining depth-skeleton feature with sparse coding for action recognition
Zhang et al. Gsip: Green semantic segmentation of large-scale indoor point clouds
CN110163095B (en) Loop detection method, loop detection device and terminal equipment
Tzevanidis et al. From multiple views to textured 3d meshes: a gpu-powered approach
CN105009142A (en) System and method for histogram computation using a graphics processing unit
Huang et al. Coarse-to-fine sparse self-attention for vehicle re-identification
RU2446472C2 (en) Encoding method and system for displaying digital mock-up of object on screen in form of synthesised image
Shen et al. ImLiDAR: cross-sensor dynamic message propagation network for 3D object detection

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 1216934

Country of ref document: HK

WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20151028

WD01 Invention patent application deemed withdrawn after publication
REG Reference to a national code

Ref country code: HK

Ref legal event code: WD

Ref document number: 1216934

Country of ref document: HK